Throughout this series we’ve skirted around one important issue. Namely, the opaque nature of PHP’s phar
format.
Many groups claim the term “open source” in someway or another. There’s open source the license, which dictates how and where code can be used and shared. Then there’s open source “you can you see and read the literal source code”, which may or may not have an open-source license. Many (of the better) commercial Magento extensions fall into the later category: They’re distributed as plain text files, but the extension vendor still asserts copyright over the files.
While the n98-magerun
is open source both in their licensing and their literal source code, PHP’s phar
format muddies the waters a bit. A phar is, by default, not readable source code.
Fortunately, it’s relatively easy to peek inside any phar
and see the source files it was created from, which is what we’ll show you in today’s article.
What is a phar?
A phar is a PHP Archive file. If you’re familiar with the java programming language, phar files are (were?) an attempt to bring a java’s jar concept into PHP. Or as the official documentation puts it.
What is phar? Phar archives are best characterized as a convenient way to group several files into a single file. As such, a phar archive provides a way to distribute a complete PHP application in a single file and run it from that file without the need to extract it to disk. Additionally, phar archives can be executed by PHP as easily as any other file, both on the commandline and from a web server. Phar is kind of like a thumb drive for PHP applications.
That’s all nice conceptually — but what does it mean? If you open up the n98-magerun.phar
file with text editor, you’ll see something like this
#!/usr/bin/env php
<?php
Phar::mapPhar('n98-magerun.phar');
$application = require_once 'phar://n98-magerun.phar/src/bootstrap.php';
$application->setPharMode(true);
$application->run();
__HALT_COMPILER(); ?>
[gobs of binary-ish looking data]
So, there’s some standard PHP near the top, but then it’s a mumble of mixed binary and ASCII data all the way down. What are we looking at?
Internally, phar archives can collect their files in the zip format, or the tar format. In addition to that the entire archive, as well as individual files in the archive, may be compressed in the gzip or bzip2 format. That’s what all the partially muddled lines are — an archive of every file in the phar.
So what about the stuff near the top?
#!/usr/bin/env php
<?php
Phar::mapPhar('n98-magerun.phar');
$application = require_once 'phar://n98-magerun.phar/src/bootstrap.php';
$application->setPharMode(true);
$application->run();
__HALT_COMPILER();
This is the archive’s stub file. Unlike java and jars, PHP has no history or concept of a class having a main function to run. If you want your phar to be an executable program (as opposed to a portable library), you need to use a stub file.
The phar stub file is a small piece of PHP code used to initialize your application. The n98-magerun
stub file can be found in _cli_stub.php
.
#File: _cli_stub.php
#!/usr/bin/env php
<?php
Phar::mapPhar('n98-magerun.phar');
$application = require_once 'phar://n98-magerun.phar/src/bootstrap.php';
$application->setPharMode(true);
$application->run();
__HALT_COMPILER();
As you can see, it’s identical to what ends up in the n98-magerun.phar
file. The mechanics of phar creation are beyond the scope of this article, but if you wanted to get started with your own archives take a look at the n98-magerun
phing
build.xml
file, particularly the pharpackage
task.
Opening up a phar
Having looked inside the n98-magenrun.phar
file, we now know that a phar is a tiny bit of PHP code (the stub), combined with a messy binary-ish blob. Depending on your particular security needs, trusting the code that’s in that binary-ish blob may or may not fly in your organization.
Fortunately, this isn’t unreadable executable machine code. As we mention earlier, these files are zip or tar archives. As with those archive formats, the files from a phar can be easily extracted. The code to do so is built right into PHP. Just create a simple CLI script with the following content
#File: unphar.php
$phar = new Phar('/path/to/n98-magerun.phar');
$phar->extractTo('/path/to/extract');
This instantiates a Phar
object (PHP’s internal representation of a phar archive), and then uses its extractTo
method to extract the files to a specific folder. If you run the above script, and then take a look at the folder you extracted to
$ ls -l /path/to/extract
-rw-rw-rw- 1 username staff 1085 Jun 11 14:27 MIT-LICENSE.txt
-rw-rw-rw- 1 username staff 3807 Jun 11 14:27 config.yaml
drwxr-xr-x 6 username staff 204 Jun 11 14:27 res
drwxr-xr-x 4 username staff 136 Jun 11 14:27 src
drwxr-xr-x 9 username staff 306 Jun 11 14:27 vendor
you’ll see a directory structure that contains all the files that were included in the phar.
Decompressing
There’s one last step you may need to take before you may start examining the contents of your phar archive. If you use cat
to view the contents of a file, you’ll be in for a surprise.
$ cat /path/to/extract/config.yaml
????P?Ј ZL7?{?R???I?2f?>??<?y?-?3d?h?߂CT.?A~??+?T4v?ѐ44h?9?d?a0L!?# ?h?
...
For some reason, when PHP unarchives n98-magerun.phar
, it fails to decompress the files. You can confirm this with OS X’s file
command, which identifies files of unknown types.
$ cd /path/to/extract/
$ file config.yaml
config.yaml: bzip2 compressed data, block size = 400k
I’ve done outsourced some detective work on this, but haven’t been able to get to the bottom of it. The Phar
class has a decompressFiles method. My assumption is this will decompress the files in the archive. However, this tripped up with the following error
PHP Fatal error: Uncaught exception 'BadMethodCallException' with message
'unable to write contents of file
"vendor/fzaninotto/faker/src/Faker/Provider/sr_Latn_RS/Person.php" to new phar
If anyone’s successfully un-phared and decompressed these files strictly via PHP, please let me know. In the meantime, you’ll want to decompress each individual file manually before performing your code review. I did it by writing this quick PHP command line script
#File: unbzphar-file.php
<?php
namespace PulsestormCliUnbzphar;
function main($argv)
{
$script = array_shift($argv);
echo "starting loop";
foreach($argv as $file)
{
echo "decompressing file $file with bunzip2 --stdout";
$contents = `bunzip2 --stdout $file`;
file_put_contents($file, $contents);
}
echo "ending loop";
}
main($argv);
and then using find
and xargs
to run it on every file in the decompressed archive folder
$ find /path/to/extract -type f | xargs php /path/to/unbzphar-file.php
The unbzphar-file.php
script runs each passed in filename through bunzip2 --stdout
(which decompresses it), and then immediately rewrites the file with the returned content.
The find /path/to/extract -type f
command finds everything in a directory that’s a file (as opposed to a sub-directory).
Then, we pipe the output for this command through xargs
. The xargs
program takes an (almost) unlimited number of lines from standard output, and then uses them as individual arguments to a specific unix command. In out case, that specific unix command is
php /path/to/unbzphar-file.php
and viola! All our files are decompressed.
You’re now ready to fully examine any phar’s contents, and ensure you know exactly what you’re getting into.