First, a quick note from your author. When I conceived this series its primary goal was to give myself a refresher in C programming, while writing a few tutorials that would try and solve my perceived problems with most C based curriculums (i.e: my own 1990s era programming classes).
As I got into it, a few things became apparent
- I never really understood C (I suspected this going in)
- Undefined behavior in the language makes it hard to teach
- Common problems being solved in the compiler makes it hard to teach
- C is both a low level language (memory management, pointers, allocs, oh my!), and not a low level language (which is hard to teach)
This last point is what made me change tact in what this series is about. When I considered the best way to teach C arrays, I wanted to start by describing them as individual variables laid out sequentially in memory, and I wanted to linger on this point. Too many programming classes start by teaching C arrays as simple lists — while this is useful it makes the inevitable trip back to arrays as pointers more complicated and confusing than it needs to be.
So — how variables are laid out in bytes of memory. It’s critical to understanding the language. However, as I’ve waded back into real world C programming I’m finding the number of true and simple things I can say about memory layout is limited. In part because in 2018 there’s not an obvious simple through-line but also — I’m just an average C programmer at best.
So all that has kept this series at a standstill. I’m also chomping at the bit to get to the interesting-to-me stuff in PHP Internals code, so I’m putting the C primer portion of the series on hold. I don’t know if this will make the rest of this series inaccessible to folks without C experience (I hope not), but I’ve been spending a good chunk of my days in internals code and being able to talk about that is a big part of how I work.
Before internals though, there’s one last C topic I’d like to cover — and that’s C Macros.
Macros
If pointers were the worst taught thing in my college programming courses, C macros were a close second. My college C++ courses may have covered them, but macros felt like a rediscovery when I started experimenting with C based language again in the middle of my career.
So what’s a macro?
Let’s consider a simple PHP program.
#File: hello.php
<?php
echo "Hello NAME","\n";
Pretty simple — run the program and it’ll print the string HELLO NAME
$ php hello.php
Hello NAME
Now — let’s consider the following command line invocation
$ cat hello.php | sed 's/NAME/Alan/g'
<?php
echo "Hello Alan","\n";
The cat
program outputs our text to unix’s STDOUT stream (i.e. it prints it)
$ cat hello.php
<?php
echo "Hello NAME","\n";
Then, we use a unix pipe to send the file’s contents to the sed
command. The sed
command is a stream editor — it lets us change the contents of a unix stream (in our case, change the output before its printed to the screen) Our invocation of sed
sed 's/NAME/Alan/g'
says “replace the string NAME
with the string Alan
“. That’s why we end up with the output we do.
$ cat hello.php | sed 's/NAME/Alan/g'
<?php
echo "Hello Alan","\n";
So far, nothing too complicated — even if you haven’t seen sed
— string replacement is a pretty common technology.
Next, consider the following two commands run in quick succession.
$ cat hello.php | sed 's/NAME/Alan/g' > /tmp/hello.php
$ php /tmp/hello.php
Hello Alan
Above — we’ve done our cat
/sed
thing again — but then sent the output to a file. After that, we run the file as a program, and the program outputs my name.
Put another way, the above steps perform a find-and-replace on the string NAME
, and then runs the output of that find-and-replace as a program.
Finally, imagine a program that looked like this.
<?php
#define NAME Alan
echo "Hello NAME","\n";
Imagine if we could run this program, but before running it PHP would look for any #define
lines and use them to first run find-and-replaces on the source code, and then run the program.
This is what C macros are. Before compiling your program, a compiler will run it through the C preprocessor. The preprocessor will look for any macros defined in any included C files, and then run a find-and-replace on the program’s source code. Consider the following program
#File: main.c
#include <stdio.h>
#define NAME "Alan"
int main()
{
printf(NAME);
printf("\n");
}
If you compile and run this program, you get the following output
$ cc main.c
$ ./a.out
Alan
We haven’t declared a variable named NAME
. Instead, the C preprocessor sees our define NAME "Alan"
statement, and creates a program that looks (roughly) like this
#include <stdio.h>
#define NAME "Alan"
int main()
{
printf("Alan");
printf("\n");
}
This is the program that’s sent to the compiler.
One of the worst things about my college programming curriculum was the instructors called this technique “defining constants”. In an attempt to cover up the complexity of there being a two-step compilation process (first the preprocessor, then the actual compiler) the curriculum designers of my courses decided to tell us a small white lie, but never correct it.
These little white lies may have been the right choice for using C as a generic teaching language in 1994 — but it was problematic because we’d encounter code that looked like this
#include <stdio.h>
#define GREETING(ARG1) "Hello " ARG1 ", how is your day?"
int main()
{
printf(GREETING("Alan"));
printf("\n");
}
and be suddenly confused. The define
statement looks nothing like a constant. Where is this GREETING
function? I mean — wtf is going on?
Now I know that C Macros have two forms — the first is a simple key/value string replacement. The second form, used above, lets you define small little string replacement routines that look like a function invocation. Run the above program, and you’ll get the following output
$ cc main.c
$ ./a.out
Hello Alan, how is your day?
The White Lie
Since I spent a bit of this tutorial complaining about the white lie my instructors told me, here’s my white lie.
Instead, the C preprocessor sees our
define NAME "Alan"
statement, and create a program and looks (roughly) like this
The first half of the statement is true — the second half (and the code sample that followed) is less true. You can see the actual program the preprocessor creates and sends to the compiler by using the compiler’s -E
flag. Give that a try with the above program and you’ll see a ton of output similar to this
# 1 "main.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
// ...
// omg make it stop
// ...
int main()
{
printf("Hello " "Alan" ", how is your day?");
printf("\n");
}
Where’s all this coming from? Remember this bit of boiler-plate?
#include <stdio.h>
Well, its the preprocessor’s job to expand this out in order to include all of stdio.h
‘s output in the program that’s eventually sent to the compiler.
If we remove #include <stdio.h>
and run this super simple program through the compiler
#File: main.c
int main()
{
printf("Hello World\n");
}
we still see extra output
$ clang -E main.c
# 1 "main.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 341 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "main.c" 2
int main()
{
printf("Hello World\n");
}
The lines starting with #
in the preprocessor output are special code that tells the compiler which lines it can find certain functions on. This allows for things like error messages that point to the actual files and functions that contain the definitions. If you’re curious the gcc manual has more information on preprocessor output, and a full manual for the preprocessor. We’ve only scratched the surface here.
Why Macros
Why so much talk about macros? Other than grinding an axe with my professors from programming class past, macros are an incredibly powerful feature of the C language. Most C code bases leverage them in interesting and unique ways, and they’re one of the reasons that learning any particular C code base can feel like learning a completely new language.
The source code of PHP uses macros all over the place. As we’ll learn over the next few weeks and months, understanding macros is a key part of programming in php-src, and not preparing you for that seems a little cruel.