Revisiting Using Docker

May 16, 2018 at 01:45 PM | categories: Docker, Mozilla

When Docker was taking off like wildfire in 2013, I was caught up in the excitement like everyone else. I remember knowing of the existence of LXC and container technologies in Linux at the time. But Docker seemed to be the first open source tool to actually make that technology usable (a terrific example of how user experience matters).

At Mozilla, Docker was adopted all around me and by me for various utilities. Taskcluster - Mozilla's task execution framework geared for running complex CI systems - adopted Docker as a mechanism to run processes in self-contained images. Various groups in Mozilla adopted Docker for running services in production. I adopted Docker for integration testing of complex systems.

Having seen various groups use Docker and having spent a lot of time in the trenches battling technical problems, my conclusion is Docker is unsuitable as a general purpose container runtime. Instead, Docker has its niche for hosting complex network services. Other uses of Docker should be highly scrutinized and potentially discouraged.

When Docker hit first the market, it was arguably the only game in town. Using Docker to achieve containerization was defensible because there weren't exactly many (any?) practical alternatives. So if you wanted to use containers, you used Docker.

Fast forward a few years. We now have the Open Container Initiative (OCI). There are specifications describing common container formats. So you can produce a container once and take it to any number OCI compatible container runtimes for execution. And in 2018, there are a ton of players in this space. runc, rkt, and gVisor are just some. So Docker is no longer the only viable tool for executing a container. If you are just getting started with the container space, you would be wise to research the available options and their pros and cons.

When you look at all the options for running containers in 2018, I think it is obvious that Docker - usable though it may be - is not ideal for a significant number of container use cases. If you divide use cases into a spectrum where one end is run a process in a sandbox and the other is run a complex system of orchestrated services in production, Docker appears to be focusing on the latter. Take it from Docker themselves:

Docker is the company driving the container movement and the only container platform provider to address every application across the hybrid cloud. Today's businesses are under pressure to digitally transform but are constrained by existing applications and infrastructure while rationalizing an increasingly diverse portfolio of clouds, datacenters and application architectures. Docker enables true independence between applications and infrastructure and developers and IT ops to unlock their potential and creates a model for better collaboration and innovation.

That description of Docker (the company) does a pretty good job of describing what Docker (the technology) has become: a constellation of software components providing the underbelly for managing complex applications in complex infrastructures. That's pretty far detached on the spectrum from run a process in a sandbox.

Just because Docker (the company) is focused on a complex space doesn't mean they are incapable of exceeding at solving simple problems. However, I believe that in this particular case, the complexity of what Docker (the company) is focusing on has inhibited its Docker products to adequately address simple problems.

Let's dive into some technical specifics.

At its most primitive, Docker is a glorified tool to run a process in a sandbox. On Linux, this is accomplished by using the clone(2) function with specific flags and combined with various other techniques (filesystem remounting, capabilities, cgroups, chroot, seccomp, etc) to sandbox the process from the main operating system environment and kernel. There are a host of tools living at this not-quite-containers level that make it easy to run a sandboxed process. The bubblewrap tool is one of them.

Strictly speaking, you don't need anything fancy to create a process sandbox: just an executable you want to invoke and an executable that makes a set of system calls (like bubblewrap) to run that executable.

When you install Docker on a machine, it starts a daemon running as root. That daemon listens for HTTP requests on a network port and/or UNIX socket. When you run docker run from the command line, that command establishes a connection to the Docker daemon and sends any number of HTTP requests to instruct the daemon to take actions.

A daemon with a remote control protocol is useful. But it shouldn't be the only way to spawn containers with Docker. If all I want to do is spawn a temporary container that is destroyed afterwards, I should be able to do that from a local command without touching a network service. Something like bubblewrap. The daemon adds all kinds of complexity and overhead. Especially if I just want to run a simple, short-lived command.

Docker at this point is already pretty far detached from a tool like bubblewrap. And the disparity gets worse.

Docker adds another abstraction on top of basic process sandboxing in the form of storage / filesystem management. Docker insists that processes execute in self-contained, chroot()'d filesystem environment and that these environments (Docker images) be managed by Docker itself. When Docker images are imported into Docker, Docker manages them using one of a handful of storage drivers. You can choose from devicemapper, overlayfs, zfs, btrfs, and aufs and employ various configurations of all these. Docker images are composed of layers, with one layer stacked on top of the prior. This allows you to have an immutable base layer (that can be shared across containers) where run-time file changes can be isolated to a specific container instance.

Docker's ability to manage storage is pretty cool. And I dare say Docker's killer feature in the very beginning of Docker was the ability to easily produce and exchange self-contained Docker images across machines.

But this utility comes at a very steep price. Again, if our use case is run a process in a sandbox, do we really care about all this advanced storage functionality? Yes, if you are running hundreds of containers on a single system, a storage model built on top of copy-on-write is perhaps necessary for scaling. But for simple cases where you just want to run a single or small number of processes, it is extremely overkill and adds many more problems than it solves.

I cannot stress this enough, but I have spent hours debugging and working around problems due to how filesystems/storage works in Docker.

When Docker was initially released, aufs was widely used. As I previously wrote, aufs has abysmal performance as you scale up the number of concurrent I/O operations. We shaved minutes off tasks in Firefox CI by ditching aufs for overlayfs.

But overlayfs is far from a panacea. File metadata-only updates are apparently very slow in overlayfs. We're talking ~100ms to call fchownat() or utimensat(). If you perform an rsync -a or chown -R on a directory with only just a hundred files that were defined in a base image layer, you can have delays of seconds.

The Docker storage drivers backed by real filesystems like zfs and btrfs are a bit better. But they have their quirks too. For example, creating layers in images is comparatively very slow compared to overlayfs (which are practically instantaneous). This matters when you are iterating on a Dockerfile for example and want to quickly test changes. Your edit-compile cycle grows frustratingly long very quickly.

And I could opine on a handful of other problems I've encountered over the years.

Having spent hours of my life debugging and working around issues with Docker's storage, my current attitude is enough of this complexity, just let me use a directory backed by the local filesystem, dammit.

For many use cases, you don't need the storage complexity that Docker forces upon you. Pointing Docker at a directory on a local filesystem to chroot into is good enough. I know the behavior and performance properties of common Linux filesystems. ext4 isn't going to start making fchownat() or utimensat() calls take ~100ms. It isn't going to complain when a hard link spans multiple layers in an image. Or slow down to a crawl when multiple threads are performing concurrent read I/O. There's not going to be intrinsically complicated algorithms and caching to walk N image layers to find the most recent version of a file (or if there is, it will be so far down the stack in kernel land that I likely won't ever have to deal with it as a normal user). Docker images with their multiple layers add complexity and overhead. For many use uses, the pain it inflicts offsets the initial convenience it saves.

Docker's optimized-for-complex-use-cases architecture demonstrates its inefficiency in simple benchmarks.

On my machine, docker run -i --rm debian:stretch /bin/ls / takes ~850ms. Almost a second to perform a directory listing (sometimes it does take over 1 second - I was being generous and quoting a quicker time). This command takes ~1ms when run in a local shell. So we're at 2.5-3 magnitudes of overhead. The time here does include time to initially create the container and destroy it afterwards. We can isolate that overhead by starting a persistent container and running docker exec -i <cid> /bin/ls / to spawn a new process in an existing container. This takes ~85ms. So, ~2 magnitudes of overhead to spawn a process in a Docker container versus spawning it natively. What's adding so much overhead, I'm not sure. Yes, there are HTTP requests under the hood. But HTTP to a local daemon shouldn't be that slow. I'm not sure what's going on.

If we docker export that image to the local filesystem and use runc state to configure so we can run it with runc, runc run takes ~85ms to run /bin/ls /. If we runc exec <cid> /bin/ls / to start a process in an existing container, that completes in ~10ms. runc appears to be executing these simple tasks ~10x faster than Docker.

But to even get to that point, we had to make a filesystem available to spawn the container in. With Docker, you need to load an image into Docker. Using docker save to produce a 105,523,712 tar file, docker load -i image.tar takes ~1200ms to complete. tar xf image.tar takes ~65ms to extract that image to the local filesystem. Granted, Docker is computing the SHA-256 of the image as part of import. But SHA-256 runs at ~250MB/s on my machine and on that ~105MB input takes ~400ms. Where is that extra ~750ms of overhead in Docker coming from?

The Docker image loading overhead is still present on large images. With a 4,336,605,184 image, docker load was taking ~32s and tar x was taking ~2s. Obviously the filesystem was buffering writes in the tar case. And the ~2s is ignoring the ~17s to obtain the SHA-256 of the entire input. But there's still a substantial disparity here. (I suspect a lot of it is overlayfs not being as optimal as ext4.)

Several years ago there weren't many good choices for tools to execute containers. But today, there are good tools readily available. And thanks to OCI standards, you can often swap in alternate container runtimes. Docker (the tool) has an architecture that is optimized for solving complex use cases (coincidentally use cases that Docker the company makes money from). Because of this, my conclusion - drawn from using Docker for several years - is that Docker is unsuitable for many common use cases. If you care about low container startup/teardown overhead, low latency when interacting with containers (including spawning processes from outside of them), and for workloads where Docker's storage model interferes with understanding or performance, I think Docker should be avoided. A simpler tool (such as runc or even bubblewrap) should be used instead.

Call me a curmudgeon, but having seen all the problems that Docker's complexity causes, I'd rather see my containers resemble a tarball that can easily be chroot()'d into. I will likely be revisiting projects that use Docker and replacing Docker with something lighter weight and architecturally simpler. As for the use of Docker in the more complex environments it seems to be designed for, I don't have a meaningful opinion as I've never really used it in that capacity. But given my negative experiences with Docker over the years, I am definitely biased against Docker and will lean towards simpler products, especially if their storage/filesystem management model is simpler. Docker introduced containers to the masses and they should be commended for that. But for my day-to-day use cases for containers, Docker is simply not the right tool for the job.

I'm not sure exactly what I'll replace Docker with for my simpler use cases. If you have experiences you'd like to share, sharing them in the comments will be greatly appreciated.