Gregory Szorc's Digital Home

Deterministic and Minimal Docker Images

October 13, 2014 at 04:50 PM | categories: sysadmin, Docker, Mozilla

Docker is a really nifty tool. It vastly lowers the barrier to distributing and executing applications. It forces people to think about building server side code as a collection of discrete applications and services. When it was released, I instantly realized its potential, including for uses it wasn't primary intended for, such as applications in automated build and test environments.

Over the months, Docker's feature set has grown and many of its shortcomings have been addressed. It's more usable than ever. Most of my early complaints and concerns have been addressed or are actively being addressed.

But one supposedly solved part of Docker still bothers me: image creation.

One of the properties that gets people excited about Docker is the ability to ship execution environments around as data. Simply produce an image once, transfer it to a central server, pull it down from anywhere, and execute. That's pretty damn elegant. I dare say Docker has solved the image distribution problem. (Ignore for a minute that the implementation detail of how images map to filesystems still has a few quirks to work out. But they'll solve that.)

The ease at which Docker manages images is brilliant. I, like many, was overcome with joy and marvelled at how amazing it was. But as I started producing more and more images, my initial excitement turned to frustration.

The thing that bothers me most about images is that the de facto and recommended method for producing images is neither deterministic nor results in minimal images. I strongly believe that the current recommended and applied approach is far from optimal and has too many drawbacks. Let me explain.

If you look at the Dockerfiles from the official Docker library (examples: Node, MySQL), you notice something in common: they tend to use apt-get update as one of their first steps. For those not familiar with Apt, that command will synchronize the package repository indexes with a remote server. In other words, depending on when you run the command, different versions of packages will be pulled down and the result of image creation will differ. The same thing happens when you clone a Git repository. Depending on when you run the command - when you create the image - you may get different output. If you create an image from scratch today, it could have a different version of say Python than it did the day before. This can be a big deal, especially if you are trying to use Docker to accurately reproduce environments.

This non-determinism of building Docker images really bothers me. It seems to run counter to Docker's goal of facilitating reliable environments for running applications. Sure, one person can produce an image once, upload it to a Docker Registry server, and have others pull it. But there are applications where independent production of the same base image is important.

One area is the security arena. There are many people who are justifiably paranoid about running binaries produced by others and pre-built Docker images set off all kinds of alarms. So, these people would rather build an image from source, from a Dockerfile, than pull binaries. Except then they build the image from a Dockerfile and the application doesn't run because of an incompatibility with a new version of some random package whose version wasn't pinned. Of course, you probably lost numerous hours tracing down this obscure reason. How frustrating! Determinism and verifiability as part of Docker image creation help solve this problem.

Deterministic image building is also important for disaster recovery. What happens if your Docker Registry and all hosts with copies of its images go down? If you go to build the images from scratch again, what guarantee do you have that things will behave the same? Without determinism, you are taking a risk that things will be different and your images won't work as intended. That's scary. (Yes, Docker is no different here from existing tools that attempt to solve this problem.)

What if your open source product relies on a proprietary component that can't be legally distributed? So much for Docker image distribution. The best you can do is provide a base image and instructions for completing the process. But if that doesn't work deterministically, your users now have varying Docker images, again undermining Docker's goal of increasing consistency.

My other main concern about Docker images is that they tend to be large, both in size and in scope. Many Docker images use a full Linux install as their base. A lot of people start with a base e.g. Ubuntu or Debian install, apt-get install the required packages, do some extra configuration, and call it a day. Simple and straightforward, yes. But this practice makes me more than a bit uneasy.

One of the themes surrounding Docker is minimalism. Containers are lighter than VMs; just ship your containers around; deploy dozens or hundreds of containers simultaneously; compose your applications of many, smaller containers instead of larger, monolithic ones. I get it and am totally on board. So why are Docker images built on top of the bloaty excess of a full operating system (modulo the kernel)? Do I really need a package manager in my Docker image? Do I need a compiler or header files so I can e.g. build binary Python extensions? No, I don't, thank you.

As a security-minded person, I want my Docker images to consist of only the files they need, especially binary files. By leaving out non-critical elements from your image and your run-time environment, you are reducing the surface area to attack. If your application doesn't need a shell, don't include a shell and don't leave yourself potentially vulnerable to shellshock. I want the attacker who inevitably breaks out of my application into the outer container to get nothing, not something that looks like an operating system and has access to tools like curl and wget that could potentially be used to craft a more advanced attack (which might even be able to exploit a kernel vulnerability to break out of the container). Of course, you can and should pursue additional security protections in addition to attack surface reduction to secure your execution environment. Defense in depth. But that doesn't give Docker images a free pass on being bloated.

Another reason I want smaller containers is... because they are smaller. People tend to have relatively slow upload bandwidth. Pushing Docker images that can be hundreds of megabytes clogs my tubes. However, I'll gladly push 10, 20, or even 50 megabytes of only the necessary data. When you factor in that Docker image creation isn't deterministic, you also realize that different people are producing different versions of images from the same Dockerfiles and that you have to spend extra bandwidth transferring the different versions around. This bites me all the time when I'm creating new images and am experimenting with the creation steps. I tend to bypass the fake caching mechanism (fake because the output isn't deterministic) and this really results in data explosion.

I understand why Docker images are neither deterministic nor minimal: making them so is a hard problem. I think Docker was right to prioritize solving distribution (it opens up many new possibilities). But I really wish some effort could be put into making images deterministic (and thus verifiable) and more minimal. I think it would make Docker an even more appealing platform, especially for the security conscious. (As an aside, I would absolutely love if we could ship a verifiable Firefox build, for example.)

These are hard problems. But they are solvable. Here's how I would do it.

First, let's tackle deterministic image creation. Despite computers and software being ideally deterministic, building software tends not to be, so deterministic image creation is a hard problem. Even tools like Puppet and Chef which claim to solve aspects of this problem don't do a very good job with determinism. Read my post on The Importance of Time on Machine Provisioning for more on the topic. But there are solutions. NixOS and the Nix package manager have the potential to be used as the basis of a deterministic image building platform. The high-level overview of Nix is that the inputs and contents of a package determine the package ID. If you know how Git or Mercurial get their commit SHA-1's, it's pretty much the same concept. In theory, two people on different machines start with the same environment and bootstrap the exact same packages, all from source. Gitian is a similar solution. Although I prefer Nix's content-based approach and how it goes about managing packages and environments. Nix feels so right as a base for deterministically building software. Anyway, yes, fully verifiable build environments are turtles all the way down (I recommend reading Tor's overview of the problem and their approach. However, Nix's approach addresses many of the turtles and silences most of the critics. I would absolutely love if more and more Docker images were the result of a deterministic build process like Nix. Perhaps you could define the full set of packages (with versions) that would be used. Let's call this the package manifest. You would then PGP sign and distribute your manifest. You could then have Nix step through all the dependencies, compiling everything from source. If PGP verification fails, compilation output changes, or extra files are needed, the build aborts or issues a warning. I have a feeling the security-minded community would go crazy over this. I know I would.

OK, so now you can use Nix to produce packages (and thus images) (more) deterministically. How do you make them minimal? Well, instead of just packaging the entire environment, I'd employ tools like makejail. The purpose of makejail is to create minimal chroot jail environments. These are very similar to Docker/LXC containers. In fact, you can often take a tarball of a chroot directory tree and convert it into a Docker container! With makejail, you define a configuration file saying among other things what binaries to run inside the jail. makejail will trace file I/O of that binary and copy over accessed files. The result is an execution environment that (hopefully) contains only what you need. Then, create an archive of that environment and pipe it into docker build to create a minimal Docker image.

In summary, Nix provides you with a reliable and verifiable build environment. Tools like makejail pair down the produced packages into something minimal, which you then turn into your Docker image. Regular people can still pull binary images, but they are much smaller and more in tune with Docker's principles of minimalism. The paranoid among us can produce the same bits from source (after verifying the inputs look credible and waiting through a few hours of compiling). Or, perhaps the individual files in the image could be signed and thus verified via trust somehow? The company deploying Docker can have peace of mind that disaster scenarios resulting in Docker image loss should not result in total loss of the image (just rebuild it exactly as it was before).

You'll note that my proposed solution does not involve Dockerfiles as they exist today. I just don't think Dockerfile's design of stackable layers of commands is the right model, at least for people who care about determinism and minimalism. You really want a recipe that knows how to create a set of relevant files and some metadata like what ports to expose, what command to run on container start, etc and turn that into your Docker image. I suppose you could accomplish this all inside Dockerfiles. But that's a pretty radical departure from how Dockerfiles work today. I'm not sure the two solutions are compatible. Something to think about.

I'm pretty sure of what it would take to add deterministic and verifiable building of minimal and more secure Docker images. And, if someone solved this problem, it could be applicable outside of Docker (again, Docker images are essentially chroot environments plus metadata). As I was putting the finishing touches on this article, I discovered nix-docker. It looks very promising! I hope the Docker community latches on to these ideas and makes deterministic, verifiable, and minimal images the default, not the exception.

Mozilla Mercurial Statistics

September 30, 2014 at 01:17 PM | categories: Mercurial, Mozilla

I recently gained SSH access to Mozilla's Mercurial servers. This allows me to run some custom queries directly against the data. I was interested in some high-level numbers and thought I'd share the results.

hg.mozilla.org hosts a total of 3,445 repositories. Of these, there are 1,223 distinct root commits (i.e. distinct graphs). Altogether, there are 32,123,211 commits. Of those, there are 865,594 distinct commits (not double counting commits that appear in multiple repositories).

We have a high ratio of total commits to distinct commits (about 37:1). This means we have high duplication of data on disk. This basically means a lot of repos are clones/forks of existing ones. No big surprise there.

What is surprising to me is the low number of total distinct commits. I was expecting the number to run into the millions. (Firefox itself accounts for ~240,000 commits.) Perhaps a lot of the data is sitting in Git, Bitbucket, and GitHub. Sounds like a good data mining expedition...

On Monolithic Repositories

September 09, 2014 at 10:00 AM | categories: Git, Mercurial, Mozilla

When companies or organizations deploy version control, they have to make many choices. One of them is how many repositories to create. Your choices are essentially a) a single, monolithic repository that holds everything b) many separate, smaller repositories that hold all the individual parts c) something in between.

The prevailing convention today (especially in the open source realm) is to create many separate and loosely coupled repositories, each repository mapping to a specific product or service. That does seem reasonable: if you were organizing files on your filesystem, you would group them by functionality or role (photos, music, documents, etc). And, version control tools are functionally filesystems. So it makes sense to draw repository boundaries at directory/role levels.

Further reinforcing the separate repository convention is the scaling behavior of our version control tools. Git, the popular tool in open source these days, doesn't scale well to very large repositories due to - among other things - not having narrow clones (fetching a subset of files). It scales well enough to the overwhelming majority of projects. But if you are a large organization generating lots of data (read: gigabytes of data over hundreds of thousands of files and commits) for version control, Git is unsuitable in its current form. Other tools (like Mercurial) don't currently fare that much better (although Mercurial has plans to tackle these scaling vectors).

Despite popular convention and even limitations in tools, companies like Google and Facebook opt to run large, monolithic repositories. Google runs Perforce. Facebook is on Mercurial, or at least is in the process of migrating to Mercurial.

Why do these companies run monolithic repositories? In Google's words:

We have a single large depot with almost all of Google's projects on it. This aids agile development and is much loved by our users, since it allows almost anyone to easily view almost any code, allows projects to share code, and allows engineers to move freely from project to project. Documentation and data is stored on the server as well as code.

So, monolithic repositories are all about moving fast and getting things done more efficiently. In other words, monolithic repositories increase developer productivity.

Furthermore, monolithic repositories are also more compatible with the ebb and flow of large organizations and large software projects. Components, features, products, and teams come and go, merge and split. The only constant is change. And if you are maintaining separate repositories that attempt to map to this ever-changing organizational topology, you are going to have a bad time. Either you'll be constantly copying, moving, merging, splitting, etc data and repositories. Or your repositories will be organized in a very non-logical and non-intuitive manner. That translates to overhead and lost productivity. I think that monolithic repositories handle the realities of large organizations much better. Big change or reorganization you want to reflect? You can make a single, atomic, history-preserving commit to move things around. I think that's much more manageable, especially when you consider the difficulty and annoyance of history-preserving changes across repositories.

Naysayers will decry monolithic repositories on principled and practical grounds.

The principled camp will say that separate repositories constitute a loosely coupled (dare I say service oriented) architecture that maps better to how software is consumed, assembled, and deployed and that erecting barriers in the form of separate repositories deliberately enforces this architecture. I agree. However, you can still maintain a loosely coupled architecture with monolithic repositories. The Subversion model of checking out a single tree from a larger repository proves this. Furthermore, I would say architecture decisions should be enforced by people (via code review, etc), not via version control repository topology. I believe this principled argument against monolithic repositories to be rather weak.

The principled camp living in the open source realm may also decry monolithic repositories as an affront to the spirit of open source. They would say that a monolithic repository creates unfairly strong ties to the organization that operates it and creates barriers to forking, etc. This may be true. But monolithic repositories don't intrinsically infringe on the basic software freedoms, organizations do. Therefore, I find this principled argument rather weak.

The practical camp will say that monolithic repositories just don't scale or aren't suitable for general audiences. These concerns are real.

Fully distributed version control systems (every commit on every machine) definitely don't scale past certain limits. Depending on your repository and user base, your scaling limits include disk space (repository data terabytes in size), bandwidth (repository data terabytes in size), filesystem (repository hundreds of thousands or millions of files), CPU and memory (operations on large repositories take too many system resources), and many heads/branches (tools like Git and Mercurial don't scale well to tens of thousands of heads/branches). These limitations with fully distributed version control are why distributed version control tools like Git and Mercurial support a partially-distributed mode that behaves more like your classical server-client model, like those employed by Subversion, Perforce, etc. Git supports shallow clone and sparse checkout. Mercurial supports shallow clone (via remotefilelog) and has planned support for narrow clone and sparse checkout by the end of 2015. Of course, you can avoid the scaling limitations of distributed version control by employing a non-distributed tool, such as Subversion. Many companies continue to reach this conclusion today. However, users adapted to the distributed workflow would likely be up in arms (they would probably use tools like hg-subversion or git-svn to maintain their workflows). So, while scaling of version control can be a real concern, there are solutions and workarounds. However, they do involve falling back to a partially-distributed model.

Another concern with monolithic repositories is user access control. You inevitably have code or data that is more sensitive and want to limit who can change or even access it. Separate repositories seem to facilitate a simpler model: per-repository access control. With monolithic repositories, you have to worry about per-directory/subtree permissions, an increased risk of data leaking, etc. This concern is more real with distributed version control, as distributed data and access control aren't naturally compatible. But these issues can be resolved. And if the tooling supports it, there is only a semantic difference between managing access control between repositories versus components of a single repository.

When it comes to repository hosting conversions, I agree with Google and Facebook: I prefer monolithic repositories. When I am interacting with version control, I just want to get stuff done. I don't want to waste time dealing with multiple commands to manage multiple repositories. I don't want to waste time or expend cognitive load dealing with submodule, subrepository, or big files management. I don't want to waste time trying to find and reuse code, data, or documentation. I want everything at my fingertips, where it can be easily discovered, inspected, and used. Monolithic repositories facilitate these workflows more than separate repositories and make me more productive as a result.

Now, if only all the tools and processes we use and love would work with monolithic repositories...

Want to read more about monolithic repositories? I highly recommend Advantages of Monolithic Version Control by Dan Luu.

Reproducing Mozilla's Mercurial Server

September 05, 2014 at 02:50 PM | categories: Mercurial, Mozilla

Of of my first tasks in my new role as a Developer Productivity Engineer is to help make Mozilla's Mercurial server better. Many of the awesome things we have planned rely on features in newer versions of Mercurial. It's therefore important for us to upgrade our Mercurial server to a modern version (we are currently running 2.5.4) and to keep our Mercurial server upgraded as time passes.

There are a few reasons why we haven't historically upgraded our Mercurial server. First, as anyone who has maintained high-availability systems will tell you, there is the attitude of if it isn't broken, don't fix it. In other words, Mercurial 2.5.4 is working fine, so why mess with a good thing. This was all fine and dandy - until Mercurial started falling over in the last few weeks.

But the blocker towards upgrading that I want to talk about today is systems verification. There has been extreme caution around upgrading Mercurial at Mozilla because it is a critical piece of Mozilla's infrastructure and if the upgrade were to not go well, the outage would be disastrous for developer productivity and could even jeopardize an emergency Firefox release.

As much as I'd like to say that a modern version of Mercurial on the server would be a drop-in replacement (Mercurial has a great committment to backwards compatibility and has loose coupling between clients and servers such that upgrading servers should not impact clients), there is always a risk that something will change. And that risk is compounded by the amount of custom code we have running on our server.

The way you protect against unexpected changes is testing. In the ideal world, you have a robust test suite that you run against a staging instance of a service to validate that any changes have no impact. In the absence of testing, you are left with fear, uncertainty, and doubt. FUD is an especially horrible philosophy when it comes to managing servers.

Unfortunately, we don't really have a great testing infrastructure for Mozilla's Mercurial server. And I want to change that.

Reproducing the Server Environment

When writing tests, it is important for the thing being tested to be as similar as possible to the real thing. This is why so many people have an aversion to mocking: every time you alter the test environment, you run the risk that those differences from reality will mask changes seen in the real environment.

So, it makes sense that a good first goal for creating a test suite against our Mercurial server should be to reproduce the production server and environment as closely as possible.

I'm currently working on a Vagrant environment that attempts to reproduce the official environment as closely as possible. It starts one virtual machine for the SSH/master server. It starts a separate virtual machine for the hgweb/slave servers. The virtual machines are booting CentOS. This is different than production, where we run RHEL. But they are similar enough (and can share the same packages) that the differences shouldn't matter too much, at least for now.

Using Puppet

In production, Mozilla is using Puppet to manage the Mercurial servers. Unfortunately, the actual Puppet configs that Mozilla is running are behind a firewall, mainly for security reasons. This is potentially a huge setback for my reproducibility effort, as I'd like to have my virtual machines use the same exact Puppet configs as whats used in production so the environments match as closely as possible. This would also save me a lot of work from having to reinvent the wheel.

Fortunately, Ben Kero has extracted the Mercurial-relevant Puppet config files into a standalone repository. Apparently that repository gets rolled into the production Puppet configs periodically. So, my virtual machines and production can share the same Mercurial Puppet files. Nice!

It wasn't long after starting to use the standalone Puppet configs that I realized this would be a rabbit hole. This first manifests in the standalone Puppet code referencing things that exist in the hidden Mozilla Puppet files. So the liberation was only partially successful. Sad panda.

So, I'm now in the process of creating a fake Mozilla Puppet environment that mimics the base Mozilla environment (from the closed repo) and am modifying the shared Puppet Mercurial code to work with both versions. This is a royal pain, but it needs to be done if we want to reproduce production and maintain peace of mind that test results reflect reality.

Because reproducing runtime environments is important for reproducing and solving bugs and for testing, I call on the maintainers of Mozilla's closed Puppet repository to liberate it from behind its firewall. I'd like to see a public Puppet configuration tree available for all to use so that anyone anywhere can reproduce the state of a server or service operated by Mozilla to within reasonable approximation. Had this already been done, it would have saved me hours of work. As it stands, I'm reverse engineering systems and trying to cobble together understanding of how the Mozilla Puppet configs work and what parts of them can safely be ignored to reproduce an approximate testing environment.

Along that vein, I finally got access to Mozilla's internal Puppet repository. This took a few meetings and apparently a lot of backroom chatter was generated - "developer's don't normally get access, oh my!" All I wanted was to see how systems are configured so I can help improve them. Instead, getting access felt like pulling teeth. This feels like a major roadblock towards productivity, reproducibility, and testing.

Facebook gives its developers access to most production machines and trusts them to not be stupid. I know we (Mozilla) like to hold ourselves to a high standard of security and privacy. But not giving developers access to the configurations for the systems their code runs on feels like a very silly policy. I hope Mozilla invests in opening up this important code and data, if not to the world, at least to its trusted employees.

Anyway, hopefully I'll soon have a Vagrant environment that allows people to build a standalone instance of Mozilla's Mercurial server. And once that's in place, I can start writing tests that basic services and workflows (including repository syncing) work as expected. Stay tuned.

New Job Role

September 05, 2014 at 11:30 AM | categories: Mozilla

As of today, I have a new role and title at Mozilla: Developer Productivity Engineer. I'll be reporting to Laura Thomson as a member of the Developer Services team.

I have an immediate goal to make our version control work better. This includes making Try scale and helping out with the deployment of ReviewBoard. After that, I'm not entirely sure. But Autoland and Firefox build system improvements have been discussed.

I'm really excited to be in this new role. If someone were to give me a clean slate and tell me to design my own job role, I think I'd answer with something very similar to the role I am now in. I am passionate about tools and enabling people to become more productive. I have little doubt I'll thrive in this new role.

« Previous Page -- Next Page »