Strategic Archive Extraction with Distill

on

This article is written for and published on SitePoint.

Perhaps you are building an application which depends on archives; for example, you constantly have to download archives and extract files from them. There are many libraries out there that can help you get files extracted from an archive, and a new player in town capable of doing this job is Distill.

With Distill, you can easily extract an archive into a specified directory. You can also give multiple archives to Distill and let it pick the most optimal one, as per a strategy you define yourself. Let’s dive into the code to see what we can achieve with Distill.

If you want to follow along, you can have a look at this Github repository to check out the code.

Setup

Before we can start using Distill, do note that at the moment of writing, it only supports Unix based systems. The reason for this is that Distill uses command line tools which are currently only available on Unix based systems.

In the supported formats section, you can clearly see which commands need to be available on the command line.

To add Distill to your project, we expect that you already have a project up and running with composer. You can install Distill by running:

composer require raulfraile/distill:~0.6.*

Usage

First and foremost, we need to have files archived in several different formats. If you have downloaded the above mentioned repository, you will have 3 archives within the files directory.

We start off by creating the extractor class. We create a file in src/SitePoint/Extractor named Extractor with the following content.

namespace SitePointExtractor;

use DistillDistill;

/**
 * Class to extract archived files
 */
class Extractor
{    
    /**
     * @var Distill
     */
    private $distiller;

    /**
     * Constructor
     */
    public function __construct()
    {
        $this->distiller = new Distill();
    }
}

We start off by creating a method to extract all files from an archive. We need an actual file for that and a directory to extract to. The method itself does nothing special for now. You could expand it later on with checks if the file is valid for example.

The method should look something like this.

/**
 * Extract files into directory
 *
 * @param string $fromFile
 * @param string $toDirectory
 */
public function extract($fromFile, $toDirectory)
{
    $this->distiller->extract($fromFile, $toDirectory);
}

The fromFile variable can be a path (absolute or relative) or a URL where the file is located. The toDirectory variable can be any directory to extract to, absolute as well as relative. Distill will do the rest for you.

Extracting an archive is something that multiple libraries can do. What is special about Distill is that you can throw in an array of files in which Distill will make the most optimal selection. To create this method, we first are going to add some constants to the class.

/**
 * Minimum size strategy
 */
const MINIMUM_SIZE = "DistillStrategyMinimumSize";

/**
 * Uncompression speed strategy
 */
const UNCOMPRESSION_SPEED = "DistillStrategyUncompressionSpeed";

/**
 * Random strategy
 */
const RANDOM = "DistillStrategyRandom";

When supplying Distill with multiple archived files, Distill will select which archive suits you best based on the chosen strategy. With the minimum size strategy, distill will check which file is the smallest and use that one. You would use this strategy when you want to save bandwidth, for example.

When speed is important for you, you should use the uncompression speed strategy. Distill will check which file it can extract the quickest and will use that file.

If you don’t care about which file it uses, you can use the random strategy to have a file randomly selected for you.

Since we also want to extract the file immediately, we can reuse the already created extract method for this. This is what your method could look like.

/**
 * Choose one of the files within the array and extract it to the given directory
 *
 * @param array  $fromFiles
 * @param string $toDirectory
 * @param string $preferredStrategy
 */
public function chooseAndExtract(array $fromFiles, $toDirectory, $preferredStrategy = self::MINIMUM_SIZE)
{
    $preferredFile = $this->distiller
        ->getChooser()
        ->setStrategy(new $preferredStrategy())
        ->setFiles($fromFiles)
        ->getPreferredFile();

    self::extract($preferredFile, $toDirectory);
}

Based on the array of files Distill is getting, it will choose automatically which file is the preferred file. This file will then be extracted to your chosen directory. If you followed along accordingly, you should now have a class which looks like this.

namespace SitePointExtractor;

use DistillDistill;

/**
 * Class to extract archived files
 */
class Extractor
{
    /**
     * Minimum size strategy
     */
    const MINIMUM_SIZE = "DistillStrategyMinimumSize";

    /**
     * Uncompression speed strategy
     */
    const UNCOMPRESSION_SPEED = "DistillStrategyUncompressionSpeed";

    /**
     * Random strategy
     */
    const RANDOM = "DistillStrategyRandom";

    /**
     * @var Distill
     */
    private $distiller;

    /**
     * Constructor
     */
    public function __construct()
    {
        $this->distiller = new Distill();
    }

    /**
     * Extract files into directory
     *
     * @param string $fromFile
     * @param string $toDirectory
     */
    public function extract($fromFile, $toDirectory)
    {
        $this->distiller->extract($fromFile, $toDirectory);
    }

    /**
     * Choose one of the files within the array and extract it to the given directory
     *
     * @param array  $fromFiles
     * @param string $toDirectory
     * @param string $preferredStrategy
     */
    public function chooseAndExtract(array $fromFiles, $toDirectory, $preferredStrategy = self::MINIMUM_SIZE)
    {
        $preferredFile = $this->distiller
            ->getChooser()
            ->setStrategy(new $preferredStrategy())
            ->setFiles($fromFiles)
            ->getPreferredFile();

        self::extract($preferredFile, $toDirectory);
    }
}

Let’s try out if the class works correctly. We create an index.php file within the root of our project with the following content.

require_once __DIR__ . '/vendor/autoload.php';

$files = array(
    'files/sitepoint.zip',
    'files/sitepoint.tar.gz',
    'files/sitepoint.tar'
);

$extractor = new SitePointExtractorExtractor();
$extractor->extract(current($files), 'files/extracted/simple');
$extractor->chooseAndExtract($files, 'files/extracted/advanced', SitePointExtractorExtractor::RANDOM);

If we run php index.php within our terminal, we will see the SitePoint logo being extracted from an archive.

Conclusion

Distill is a very specific library, and might appear lacking in features when compared to other archive manipulation tools. But in this niche it focuses on, it excels. If you are looking for a lightweight extractor which can help you save bandwidth and/or time, Distill might be the library you are looking for. Maybe you can even combine it with a compressor and make an excellent hybrid package for your app’s archive manipulation features?

Leave a Reply

Your email address will not be published. Required fields are marked *