- What is a Container and why Docker Makes that a Difficult Question to Answer
- Running Go Programs in Docker Scratch Containers
- Five Reasons Your PHP Application is Hard to Containerize for Production
- Accessing Docker Desktop’s Virtual Machine
I’ve been using containers in development environments for a number of years. Usually via Docker Desktop. At this point, I have an OK understanding of how to do things but it’s bothered me that I wasn’t sure how Docker worked. I couldn’t reconcile random facts I knew about Linux containers with the metaphors that Docker presented. I couldn’t answer a question like: “if containers aren’t virtual machines then why do all the Docker files I’m inheriting from require me to list a Linux distribution?”.
I spent a bit of time recently going deep on containers and I think I have the answers I needed — or if not answers then a set of metaphors that’s closer to the truth and helps me reason about containers better.
This is a messy thinking-out-loud piece — so don’t take anything you read here at face value. It almost certainly oversimplifies things and might even get a detail or two completely wrong. It represents my best understanding of containers right now.
What is a Container?
For our purposes, a container is a program running as a process in an isolated environment on a computer that’s using a Linux operating system.
What does isolated mean? A normal Linux process can see all the other running processes on a computer and see the entire file system of the computer. The only thing keeping a program from reading files it shouldn’t is unix permissions, good behavior, and a dash of luck.
A containerized process is different. A containerized process is shut off from the rest of the computer. It can’t see other processes on the computer (unless those processes have been explicitly started in its process namespace). The only files it can see are the ones you allow it to see.
The only part of a computer the containerized process shares with other processes is the kernel of the operating system. Whenever your program needs to make a Linux system call it will be calling into the same Linux kernel as the other programs on the computer. A container does not contain a separate instance of the Linux kernel.
How are Containers Implemented?
How a container is implemented is beyond the scope of this article. Speaking at a high level though, version 3.8 of the Linux kernel shipped with a feature called User Namespaces. This, when combined with a Linux feature called cgroups, allows end-users to use a Linux program like
lxc to create containerized processes. The
lxc program uses cgroups and namespaces to implement the isolation I’ve described above.
If you want to you can go pretty deep down the rabbit hole of “how isolated are containers really” or “where exactly does the Kernel end and the specific Linux distribution begin” — but those are topics for another day.
Putting aside the specific motives of all the people involved in implementing, shipping, and supporting containers, one way to think about why containers were implemented is that they’re a response to virtual machines.
Virtual machines let you run multiple copies of an operating system on a single computer. In the server world, one reason you use virtual machines is that they let you take a program that already exists (like a web server) and run multiple instances of it on the same computer in order to make full use of a computer’s memory and processing power.
In theory, you shouldn’t need virtual machines to do this — a perfect program would make perfect use of a system’s resources. However, there’s no such thing as a perfect program and, more importantly, the people who need to scale programs often have a different set of skills than the people who write programs. When faced with the question “should we rewrite web servers to take advantage of modern hardware or just virtualize that web server multiple times on the same machine”, virtualization almost always wins out.
Containers are an attempt to achieve the same thing without the need to virtualize the entire operating system. With containers, you can still isolate a program and give it access to part of a computer’s resources, but you only need one instance of the operating system to do so.
Containers are a nominal Linux systems programmer looking at how virtual machines are used, crinkling their nose in disgust at the wasted resources (multiple instances of the OS and a hypervisor), and then rolling up their sleeves and building something better.
What else is a Container?
With user namespaces and cgroups landed in the Linux kernel and
lxc shipping in Linux distributions, that nominal Linux programmer wiped the keyboard grime off their hands and called it a day.
In a different universe, containerized processes were slowly adopted by users and became another tool for folks using Linux to have at their disposal when they were solving problems.
In this universe, we got something else. We got Docker.
The company behind Docker saw a business opportunity in this new feature of the Linux kernel. They weren’t quite sure what that opportunity was (and their pitch has changed over the years) but they knew that the tools for working with containerized processes were not great. They wanted to share their container technology with the world and, with any luck, be the indispensable company at the center of that technology.
Docker set out to make working with containers easier, and along the way containers transformed from being a way to run isolated processes in Linux into something much more amorphous.
There are, in my mind, four key features to understand if you want to get how Docker has broadly, (if amorphously), redefined what a container is in the mind of your average programmer.
- Docker’s Build Language
- Docker Hub
- Docker Networks and Volumes
- Docker Desktop
Docker’s Build Language
We’ll start with Docker’s build language. You’ll remember we said a container’s process is isolated from the rest of the computer. It can only access the files you’ve allowed it to access. In a neutral world, these might be files on the computer or might be files in a stand-alone disk image.
Docker containers can only access files on a disk image. Docker has a build language that allows you to write scripts that will tell Docker which files you want on your disk image. These scripts are usually put in files named
Dockerfile. When you build a
Dockerfile, the result is a disk image that contains the files you want your container to have access to, and some metadata. The files include the file you want to execute as your container process and some metadata that points to this file. This metadata means Docker knows which file to execute in your container. This makes Docker container images a stand-alone artifact that you can run — assuming you have Docker installed on your computer.
Docker’s build files also have an inheritance system. You don’t need to start from scratch every time you build a Docker file. Using the
FROM keyword Docker’s build language allows you to start your image with the files from another image. You might have a base apache
Dockerfile and image. Then you might write a second file that inherits from this one that adds a few additional configuration files to your base apache image.
This inheritance system inevitably leads to people wanting and needing to share
Dockerfiles, which inevitably leads to network effects and the next pillar of Docker — Docker Hub
Docker Hub is Docker’s centralized repository for sharing Docker images. With Docker Hub, if you don’t want to write a Docker image for nginx from scratch, you don’t have to. There’s an image for it already. When you run
% docker run -d nginx
You’re telling Docker “go to Docker Hub, download the nginx image, and then run it”.
Docker Hub opens container use to a whole cohort of users who probably don’t understand what, exactly, Docker is doing. Instead of users asking their computer to run a program they’re asking
docker to run a program.
Docker’s centralized hub also gives Docker Inc. influence over how people will use the platform. Because Docker wanted a lot of users, they started populating Docker Hub with distributions of popular software for people to run directly, or for people to use as
FROM lines in a Docker file.
Because they wanted to support as much software as possible as quickly as possible, this meant they built base Docker images for a number of popular Linux distributions. This leads to the very odd case of running programs that might rely on all the “userland” files from Alpine Linux (configuration files, statically compiled libraries, etc.) but will actually be running on a different Linux distribution with a different kernel.
It also leads to people thinking of Docker (and containers more generally) as a virtualization technology. It’s easy to think “Hey I’m on Ubuntu Linux but I can run Alpine Linux inside of it” — when in fact you’re running a program that uses files from Alpine Linux but with an Ubuntu Kernel.
Docker Volumes and Networking
Docker also gives its containers features for mounting network volumes and networking between containers and the operating system the containers are running on. I haven’t dug too deeply into these features beyond “hey mount this folder” or “hey let me in on this port (no wait reverse the ports)”. For our purposes, they’re another layer that makes containers present more like virtual machines than isolated Linux processes.
The last big innovation Docker brings to the table is Docker Desktop. Containers are a Linux technology. They don’t run on Windows. They don’t run on macOS. They don’t run on the BSD Unixes. They rely on two features of the Linux kernel — cgroups and namespaces — that don’t exist in other operating systems. (Yes, there have been container-like features on other operating systems — BSD chroot jails come to mind — but they’re a completely different technology. A different topic for a different day)
This presents a problem for a company like Docker. Since there are lots of potential Docker users who spend their days working on Windows and Mac machines. Docker Desktop tries to solve this “Linux Only” problem.
Docker Desktop gives Mac and Windows users a
docker CLI command that appears to work the same as one on a Linux system. You can build containers. You can run containers. You can see a list of running containers. etc…
Behind the scenes, Docker Desktop is running a light-weight Linux virtual machine and then running the user’s container processes in that Linux virtual machine. Docker Desktop also throws in some networking magic so network ports are forwarded correctly to the virtual machine and that standard input, output, and error streams flow between the VM and your actual computer. If you’re the curious type it’s (apparently?) possible to get a shell into this virtual machine and poke around at what it’s doing.
Once again — while the underlying technology remains “Linux namespaces and cgroups”, the experience of using Docker Desktop feels more like a different sort of virtual machine. “I can run Ubuntu on my mac without VirtualBox!”
So that’s where my understanding of containers sits these days. With the above working model, I’ve been a lot better at debugging container issues, answering questions, and participating in conversations with DevOps folks.
There’s still a lot I don’t understand. Docker’s image format remains a bit of a mystery to me — how much is it proprietary and where does the Open Container Initiative’s (OCI) format fit in? Also, Docker’s build system uses some sort of layering I don’t quite get yet. It’s also not clear if
docker build builds everything from scratch or if it is relying on Docker Hub’s prebuilt images. Also — what does it take to run your own repository of Docker images? How are namespace conflicts solved when there are multiple repositories in play?
All questions for another day — there may have been a point in time when someone could know all of this before getting to work, but if there’s one truism in modern software engineering it’s that you’re charging ahead into a deep fog and responding the best you can in the moment.