First, some business. It’s worth saying out loud that the Just Enough C for PHP Series is on hiatus. The idea for this series came from my starting a gig that involved a PHP extension. As I refreshed by C knowledge and learned the PHP Internals APIs, my hope was this series could become record of best practices and a guidebook for other programmers.
Unfortunately, reality intervened.
Problem 1: It turns out jobs are a lot of work! Who knew? Finding time to write and research is hard, and I’m increasingly picky about where I spend my free time.
Problem 2: I’m, at best, an average C programmer, and C is an incredibly hostile environment to program in. Compiler differences, undefined language behavior, the cultural willingness to solve application problems in the compiler, small edge cases that can crash or privilege escalate — multi-platform C is a tough road to travel and can take decades of real world experience to master. I can do a serviceable job contributing to a C based project — but teaching it the way I’d like to? I’m still cutting myself on C’s sharp edges. I’m in no position to teach folks how to avoid those edges with any sort of authority or accuracy.
Problem 3: While I’ve met some individual C programmers who are great and helpful people (hello team!), the culture of C programmers, especially online, is an ugly place. Knowledge hoarding, “just figure it out, newb”, “why would you even want to do that you idiot”, “be kind to the computer, not the user”, etc. — I’ve got a pretty thick skin after so-many-years online but getting stabbed is no fun, and surrounding yourself with an abusing culture will ultimately turn you into an abuser or the abused. Why submit yourself to that when there are other options?
Problem 4: PHP, especially the internals, is a bit of a mess. (open source, it’s our mess, etc.) I still believe PHP remains, and will remain, a great leveler for anyone to get involved in programming, development, and software engineering. But it starts to wear you down after a while. As so many of folks in my cohort have moved on to other platforms and languages, it’s harder and harder to muster interest in this series.
I still plan on writing the occasional article about C or PHP’s internals — but it’s time to close the door on this series.
Today’s article? A brief survey of what PHP even is, and a case that’s there’s no such thing as PHP. Exploring this outline in greater depth was where the series was headed, and there’s at least half a dozen starting points in here for anyone who wants to go deep on PHP’s internals.
What are We Compiling?
PHP is a pile of C code. Usually when you have a pile of C code, you’re building a specific executable program. For example, if you compile the pile of C code that makes up SQLite, you get a sqlite
binary on your computer.
However, that’s not what PHP’s pile of C code is for. Instead, one important job PHP’s source does is provide functions like php_execute_script
to other programs. From this point of view, PHP is ultimately a library for embedding a “PHP code runner” in other C programs or modules.
PHP’s source ships with seven such programs, called SAPIs. SAPI stands for Server Application Programming Interface. PHP’s infamous Apache module, mod_php
? That’s a SAPI.
However, regardless of the acronym you slap on them, these are just C programs. C programs that that have a PHP engine embedded in them. That mod_php
“SAPI”? It’s just an apache module, compiled (in part) with Apache’s extension tool, apxs. A module that registers a few apache hooks that will look for PHP programs to run, and run them.
If you’re a little more modern and using the PHP-FPM SAPI? That’s just another C program. This program is more stand-alone than the apache module. The PHP-FPM program implements a FastCGI daemon that will start and stop other PHP processes. While it’s a more stand alone C program, something else still needs to request that it run a program (that something else is almost always a web server — nginx, apache, etc.)
What about PHP’s command line tool? People smarter than me have described this as a “fake” server, and that’s true enough. However, it’s equally true to see it as just another C program — one that stubs out enough server information that the engine expects to be there and obeys PHP’s MINIT
and RINIT
lifecycle implementation.
PHP also ships with C programs that implement PHP for traditional CGI, the lightspeed web server, a debugging program I don’t completely understand, and (at the risk of getting recursive) a SAPI that’s not a full C program, but one that’s meant to make it easier to embed PHP in C programs, (presumably with less stubbing out of server information). You can see all these programs in the sapi
folder of the PHP source.
Out of the gate this means there’s no one true PHP. There’s seven different binaries, (or seven different programs), some of which require another C program in order to get PHP to actually execute code. Ideally PHP programmers should be able to remain indifferent to the SAPI they’re using, but in practice there are subtle differences in both running and scaling PHP code that are SAPI dependent.
So which of these programs is PHP? All of them? Or does a singular thing we can point to as PHP not exist?
Core vs. Extensions
So that’s core PHP — the code that parses and executes PHP programs, allocates memory when you create a PHP variable, etc. But what about less internally stuff? Stuff like “parse an XML file”, “make a database query”, etc.
This is still C code — but it’s not part of core PHP. Instead, core PHP implements an extension system (also sometimes called a module system) for loading other people’s C code into the PHP memory space (for people who know better, apologies for the broad terminology).
As of this writing PHP ships with seventy four individual extensions, and the pecl repository provides another 300 — 400 more. In addition to that, anyone who manages to piece together the requirements of the extension API can write their own extension.
These extensions create “internal” PHP functions users can call, like json_decode
or simplexml_load_file
, or internal classes like DomDocument
. Very often programmers use these extensions to provide a thin shim/glue layer into C libraries that already exist. This is one of the things people are talking about when they call PHP a glue language.
These extensions are compiled differently than the core of PHP. And just to keep things interesting? These extensions may be stand-alone shared object (.so
) files which PHP will load, or they may also be compiled directly into the binary of the particular PHP SAPI you’re building. i.e. sometimes the JSON functions may be a part of the mod_php
, PHP-FPM, CLI, etc. binaries — other times the json library may be a stand alone json.so
you’ll need to enable via a php.ini
file.
For internals programming, this can lead to convoluted situations when you’re trying to think about the execution space of your C code. For example, if your PHP SAPI is mod_php
, than means PHP’s code is running in the context of an apache module, but may also be a PHP module. Process, memory, program, and library boundaries are hard enough to keep track of in C — PHP’s system doesn’t make this any easier.
The idea of one true PHP continues to elude us.
Who Builds and Distributes PHP?
So far, this is all standard C software engineering gripes. As I’ve reacquainted myself with C based projects, I’ve found most are as temples of pragmatic decisions rarely revisited. Where the idea of PHP existing as one true thing really falls apart is when we consider distribution.
There’s no one true distribution of PHP. Anyone can build their own version of PHP, and there don’t seem to be strong standards around what this should look like.
This theoretical anyone might build the exact C programs (SAPIs) that ship with the PHP source, or they might tweak them for their own purposes. They might build every internal extension as a stand alone .so
, or might compile them all into the single SAPI they’re using. Usually it’s some combination of the two. No one seems to use the same combination.
Maybe you build PHP yourself — but chances are if you’re writing PHP code you’re using a distribution someone else put together. There’s a lot of someone elses.
Apple computer builds and distributes a version of PHP with their MacOS (formally OS X) operating system. Of course, the Apple provided version doesn’t have all the extensions people are looking for, so people might turn the liip builds (formerly entropy.ch builds), or perhaps the builds provided by the homebrew project. If they’re less comfortable on the command line they might turn to a commercial all-in-one vendor like MAMP, which not only compiles PHP for you, but also streamlines configuring PHP to work with a web server.
Microsoft doesn’t ship with a version of PHP on its OS — but the all-in-one market exists on the Redmond side as well, with projects like WAMP or XAMPP.
So that’s seven different vendors so far — each with its own idea of which extensions should be a part of PHP, and their own opinions about mod_php
vs. PHP-FPM, and whether a command line should be built in.
Leaving commercial operating systems behind, we have our linux vendors. Linux vendors usually manage a list of software via their package managers, and if you have a computer with linux installed chances are you can use apt
, yum
, and friends to install a version of PHP that was packaged together by the linux vendor (Debian, Ubuntu, RedHat, etc.). Again, each individual vendor will have its own idea about how PHP should be built and what extensions should be compiled in vs. what should be stand alone .so
add ons.
Oh — we forgot to mention! Another thing all these vendors need to do: If they don’t compile in an extension, they need to decide how you’re going to get that PHP extension on your machine. Should you use PHP’s extension manager, pecl? Or will your linux’s package manager system include add-on packages that install individual extensions for you.
A second thing we forgot to mention about the packages — will they configure your web servers to point at the SAPIs for you? Or will you need to figure that out for yourself? Or maybe it’s a mix where they will configure mod_php
but not PHP-FPM — or maybe just nginx for PHP-FPM, but not apache. Or maybe this is the job of the web server packagers.
Also, those linux vendors? They tend to be various levels of conservative with the versions of PHP they ship and promise to support. Install a version of Amazon Linux 1 via Amazon EC2 (and let’s just side step the whole cloud vendors providing builds of open source systems that are tweaks from what the vendor provides because madness) and you may find yourself running PHP 5.3.
This is where folks like Ondřej Surý enter the story. Ondřej maintains his own Debian and Red Hat repositories with a packaged version of modern PHP. Just some random (albeit smart and open-source-kind) human providing repositories that end up becoming defacto-official repositories when all the bloggers write their tutorials. It’s also fun times when you’re using these repos and your linux provider also has some version of PHP installed with a different set of packages.
Oh, there’s also Docker, the company. The PHP containers they maintain for both Debian and Alpine linux feature custom built and packaged versions of PHP and some custom helper scripts (docker-php-ext-configure
, docker-php-ext-install
, and docker-php-ext-enable
) for getting extensions into those containers. Another custom build with another set of trade-offs.
I’m leaving out other projects, both big and small. Even with this limited sample, it seems clear to me that there’s no such thing as PHP — there’s just a loose collection of norms, expectations, and software in various stages of neglect.
Wrap Up
Despite all this — for 20+ years everything has sort-of-worked. It’s either a testament to the power of open source or a testament to chaos theory. Or both.
But the next time you inherit some rescue project and you’re starting to believe that stressed-out-belittling-entrepreneur-boss who can’t understand why you need a day or two to figure out what’s going on — just ignore them and Do Your Job™ because you actually do know how to do it.
PHP sits on a complex desert of ever shifting sands, and no one system is the same. As Docker becomes increasingly important and the PHP core project sets to blow-up-the-world again with PHP 8, it’s only going to get more complex.