Gregory Szorc's Digital Home

My Current Thoughts on System Administration

April 17, 2015 at 03:35 PM | categories: sysadmin, Mozilla

I attended PyCon last week. It's a great conference. You should attend. While I should write up a detailed trip report, I wanted to quickly share one of my takeaways.

Ansible was talked about a lot at PyCon. Sitting through a few presentations and talking with others helped me articulate why I've been drawn to Ansible (over say Puppet, Chef, Salt, etc) lately.

First, Ansible doesn't require a central server. Administration is done remotely Ansible establishes a SSH connection to a remote machine and does stuff. Having Ruby, Python, support libraries, etc installed on production systems just for system administration never really jived with me. I love Ansible's default hands off approach. (Yes, you can use a central server for Ansible, but that's not the default behavior. While tools like Puppet could be used without a central server, it felt like they were optimized for central server use and thus local mode felt awkward.)

Related to central servers, I never liked how that model consists of clients periodically polling for and applying updates. I like the idea of immutable server images and periodic updates work against this goal. The central model also has a major bazooka pointed at you: at any time, you are only one mistake away from completely hosing every machine doing continuous polling. e.g. if you accidentally update firewall configs and lock out central server and SSH connectivity, every machine will pick up these changes during periodic polling and by the time anyone realizes what's happened, your machines are all effectively bricked. (Yes, I've seen this happen.) I like having humans control exactly when my systems apply changes, thank you. I concede periodic updates and central control have some benefits.

Choosing not to use a central server by default means that hosts are modeled as a set of applied Ansible playbooks, not necessarily as a host with a set of Ansible playbooks attached. Although, Ansible does support both models. I can easily apply a playbook to a host in a one-off manner. This means I can have playbooks represent common, one-off tasks and I can easily run these tasks without having to muck around with the host to playbook configuration. More on this later.

I love the simplicity of Ansible's configuration. It is just YAML files. Not some Ruby-inspired DSL that takes hours to learn. With Ansible, I'm learning what modules are available and how they work, not complicated syntax. Yes, there is complexity in Ansible's configuration. But at least I'm not trying to figure out the file syntax as part of learning it.

Along that vein, I appreciate the readability of Ansible playbooks. They are simple, linear lists of tasks. Conceptually, I love the promise of full dependency graphs and concurrent execution. But I've spent hours debugging race conditions and cyclic dependencies in Puppet that I'm left unconvinced the complexity and power is worth it. I do wish Ansible could run faster by running things concurrently. But I think they made the right decision by following KISS.

I enjoy how Ansible playbooks are effectively high-level scripts. If I have a shell script or block of code, I can usually port it to Ansible pretty easily. One pass to do the conversion 1:1. Another pass to Ansibilize it. Simple.

I love how Ansible playbooks can be checked in to source control and live next to the code and applications they manage. I frequently see people maintain separate source control repositories for configuration management from the code it is managing. This always bothered me. When I write a service, I want the code for deploying and managing that service to live next to it in version control. That way, I get the configuration management and the code versioned in the same timeline. If I check out a release from 2 years ago, I should still be able to use its exact configuration management code. This becomes difficult to impossible when your organization is maintaining configuration management code in a separate repository where a central server is required to do deployments (see Puppet).

Before PyCon, I was having an internal monolog about adopting the policy that all changes to remote servers be implemented with Ansible playbooks. I'm pleased to report that a fellow contributor to the Mercurial project has adopted this workflow himself and he only has great things to say! So, starting today, I'm going to try to enforce that every change I make to a remote server is performed via Ansible and that the Ansible playbooks are checked into version control. The Ansible playbooks will become implicit documentation of every process involved with maintaining a server.

I've already applied this principle to deploying MozReview. Before, there was some internal Mozilla wiki documenting commands to execute in a terminal to deploy MozReview. I have replaced that documentation with a one-liner that invokes Ansible. And, the Ansible files are now in a public repository.

If you poke around that repository, you'll see that I have Ansible playbooks referencing Docker. I have Ansible provisioning Docker images used by the test and development environment. That same Ansible code is used to configure our production systems (or is at least in the process of being used in that way). Having dev, test, and prod using the same configuration management has been a pipe dream of mine and I finally achieved it! I attempted this before with Puppet but was unable to make it work just right. The flexibility that Ansible's design decisions have enabled has made this finally possible.

Ansible is my go to system management tool right now. And I still feel like I have a lot to learn about its hidden powers.

If you are still using Puppet, Chef, or other tools invented in previous generations, I urge you to check out Ansible. I think you'll be pleasantly surprised.

Deterministic and Minimal Docker Images

October 13, 2014 at 04:50 PM | categories: sysadmin, Docker, Mozilla

Docker is a really nifty tool. It vastly lowers the barrier to distributing and executing applications. It forces people to think about building server side code as a collection of discrete applications and services. When it was released, I instantly realized its potential, including for uses it wasn't primary intended for, such as applications in automated build and test environments.

Over the months, Docker's feature set has grown and many of its shortcomings have been addressed. It's more usable than ever. Most of my early complaints and concerns have been addressed or are actively being addressed.

But one supposedly solved part of Docker still bothers me: image creation.

One of the properties that gets people excited about Docker is the ability to ship execution environments around as data. Simply produce an image once, transfer it to a central server, pull it down from anywhere, and execute. That's pretty damn elegant. I dare say Docker has solved the image distribution problem. (Ignore for a minute that the implementation detail of how images map to filesystems still has a few quirks to work out. But they'll solve that.)

The ease at which Docker manages images is brilliant. I, like many, was overcome with joy and marvelled at how amazing it was. But as I started producing more and more images, my initial excitement turned to frustration.

The thing that bothers me most about images is that the de facto and recommended method for producing images is neither deterministic nor results in minimal images. I strongly believe that the current recommended and applied approach is far from optimal and has too many drawbacks. Let me explain.

If you look at the Dockerfiles from the official Docker library (examples: Node, MySQL), you notice something in common: they tend to use apt-get update as one of their first steps. For those not familiar with Apt, that command will synchronize the package repository indexes with a remote server. In other words, depending on when you run the command, different versions of packages will be pulled down and the result of image creation will differ. The same thing happens when you clone a Git repository. Depending on when you run the command - when you create the image - you may get different output. If you create an image from scratch today, it could have a different version of say Python than it did the day before. This can be a big deal, especially if you are trying to use Docker to accurately reproduce environments.

This non-determinism of building Docker images really bothers me. It seems to run counter to Docker's goal of facilitating reliable environments for running applications. Sure, one person can produce an image once, upload it to a Docker Registry server, and have others pull it. But there are applications where independent production of the same base image is important.

One area is the security arena. There are many people who are justifiably paranoid about running binaries produced by others and pre-built Docker images set off all kinds of alarms. So, these people would rather build an image from source, from a Dockerfile, than pull binaries. Except then they build the image from a Dockerfile and the application doesn't run because of an incompatibility with a new version of some random package whose version wasn't pinned. Of course, you probably lost numerous hours tracing down this obscure reason. How frustrating! Determinism and verifiability as part of Docker image creation help solve this problem.

Deterministic image building is also important for disaster recovery. What happens if your Docker Registry and all hosts with copies of its images go down? If you go to build the images from scratch again, what guarantee do you have that things will behave the same? Without determinism, you are taking a risk that things will be different and your images won't work as intended. That's scary. (Yes, Docker is no different here from existing tools that attempt to solve this problem.)

What if your open source product relies on a proprietary component that can't be legally distributed? So much for Docker image distribution. The best you can do is provide a base image and instructions for completing the process. But if that doesn't work deterministically, your users now have varying Docker images, again undermining Docker's goal of increasing consistency.

My other main concern about Docker images is that they tend to be large, both in size and in scope. Many Docker images use a full Linux install as their base. A lot of people start with a base e.g. Ubuntu or Debian install, apt-get install the required packages, do some extra configuration, and call it a day. Simple and straightforward, yes. But this practice makes me more than a bit uneasy.

One of the themes surrounding Docker is minimalism. Containers are lighter than VMs; just ship your containers around; deploy dozens or hundreds of containers simultaneously; compose your applications of many, smaller containers instead of larger, monolithic ones. I get it and am totally on board. So why are Docker images built on top of the bloaty excess of a full operating system (modulo the kernel)? Do I really need a package manager in my Docker image? Do I need a compiler or header files so I can e.g. build binary Python extensions? No, I don't, thank you.

As a security-minded person, I want my Docker images to consist of only the files they need, especially binary files. By leaving out non-critical elements from your image and your run-time environment, you are reducing the surface area to attack. If your application doesn't need a shell, don't include a shell and don't leave yourself potentially vulnerable to shellshock. I want the attacker who inevitably breaks out of my application into the outer container to get nothing, not something that looks like an operating system and has access to tools like curl and wget that could potentially be used to craft a more advanced attack (which might even be able to exploit a kernel vulnerability to break out of the container). Of course, you can and should pursue additional security protections in addition to attack surface reduction to secure your execution environment. Defense in depth. But that doesn't give Docker images a free pass on being bloated.

Another reason I want smaller containers is... because they are smaller. People tend to have relatively slow upload bandwidth. Pushing Docker images that can be hundreds of megabytes clogs my tubes. However, I'll gladly push 10, 20, or even 50 megabytes of only the necessary data. When you factor in that Docker image creation isn't deterministic, you also realize that different people are producing different versions of images from the same Dockerfiles and that you have to spend extra bandwidth transferring the different versions around. This bites me all the time when I'm creating new images and am experimenting with the creation steps. I tend to bypass the fake caching mechanism (fake because the output isn't deterministic) and this really results in data explosion.

I understand why Docker images are neither deterministic nor minimal: making them so is a hard problem. I think Docker was right to prioritize solving distribution (it opens up many new possibilities). But I really wish some effort could be put into making images deterministic (and thus verifiable) and more minimal. I think it would make Docker an even more appealing platform, especially for the security conscious. (As an aside, I would absolutely love if we could ship a verifiable Firefox build, for example.)

These are hard problems. But they are solvable. Here's how I would do it.

First, let's tackle deterministic image creation. Despite computers and software being ideally deterministic, building software tends not to be, so deterministic image creation is a hard problem. Even tools like Puppet and Chef which claim to solve aspects of this problem don't do a very good job with determinism. Read my post on The Importance of Time on Machine Provisioning for more on the topic. But there are solutions. NixOS and the Nix package manager have the potential to be used as the basis of a deterministic image building platform. The high-level overview of Nix is that the inputs and contents of a package determine the package ID. If you know how Git or Mercurial get their commit SHA-1's, it's pretty much the same concept. In theory, two people on different machines start with the same environment and bootstrap the exact same packages, all from source. Gitian is a similar solution. Although I prefer Nix's content-based approach and how it goes about managing packages and environments. Nix feels so right as a base for deterministically building software. Anyway, yes, fully verifiable build environments are turtles all the way down (I recommend reading Tor's overview of the problem and their approach. However, Nix's approach addresses many of the turtles and silences most of the critics. I would absolutely love if more and more Docker images were the result of a deterministic build process like Nix. Perhaps you could define the full set of packages (with versions) that would be used. Let's call this the package manifest. You would then PGP sign and distribute your manifest. You could then have Nix step through all the dependencies, compiling everything from source. If PGP verification fails, compilation output changes, or extra files are needed, the build aborts or issues a warning. I have a feeling the security-minded community would go crazy over this. I know I would.

OK, so now you can use Nix to produce packages (and thus images) (more) deterministically. How do you make them minimal? Well, instead of just packaging the entire environment, I'd employ tools like makejail. The purpose of makejail is to create minimal chroot jail environments. These are very similar to Docker/LXC containers. In fact, you can often take a tarball of a chroot directory tree and convert it into a Docker container! With makejail, you define a configuration file saying among other things what binaries to run inside the jail. makejail will trace file I/O of that binary and copy over accessed files. The result is an execution environment that (hopefully) contains only what you need. Then, create an archive of that environment and pipe it into docker build to create a minimal Docker image.

In summary, Nix provides you with a reliable and verifiable build environment. Tools like makejail pair down the produced packages into something minimal, which you then turn into your Docker image. Regular people can still pull binary images, but they are much smaller and more in tune with Docker's principles of minimalism. The paranoid among us can produce the same bits from source (after verifying the inputs look credible and waiting through a few hours of compiling). Or, perhaps the individual files in the image could be signed and thus verified via trust somehow? The company deploying Docker can have peace of mind that disaster scenarios resulting in Docker image loss should not result in total loss of the image (just rebuild it exactly as it was before).

You'll note that my proposed solution does not involve Dockerfiles as they exist today. I just don't think Dockerfile's design of stackable layers of commands is the right model, at least for people who care about determinism and minimalism. You really want a recipe that knows how to create a set of relevant files and some metadata like what ports to expose, what command to run on container start, etc and turn that into your Docker image. I suppose you could accomplish this all inside Dockerfiles. But that's a pretty radical departure from how Dockerfiles work today. I'm not sure the two solutions are compatible. Something to think about.

I'm pretty sure of what it would take to add deterministic and verifiable building of minimal and more secure Docker images. And, if someone solved this problem, it could be applicable outside of Docker (again, Docker images are essentially chroot environments plus metadata). As I was putting the finishing touches on this article, I discovered nix-docker. It looks very promising! I hope the Docker community latches on to these ideas and makes deterministic, verifiable, and minimal images the default, not the exception.

The Importance of Time on Automated Machine Configuration

June 24, 2013 at 09:00 PM | categories: sysadmin, Mozilla, Puppet

Usage of machine configuration management software like Puppet and Chef has taken off in recent years. And rightly so - these pieces of software make the lives of countless system administrators much better (in theory).

In their default (and common) configuration, these pieces of software do a terrific job of ensuring a machine is provisioned with today's configuration. However, for many server provisioning scenarios, we actually care about yesterday's configuration.

In this post, I will talk about the importance of time when configuring machines.

Describing the problem

If you've worked on any kind of server application, chances are you've had to deal with a rollback. Some new version of a package or web application is rolled out to production. However, due to unforeseen problems, it needed to be rolled back.

Or, perhaps you operate a farm of machines that continuously build or compile software from version control. It's desirable to be able to reproduce the output from a previous build (ideally bit identical).

In these scenarios, the wall time plays a crucial rule when dealing with a central, master configuration server (such as a Puppet master).

Since a client will always pull the latest revision of its configuration from the server, it's very easy to define your configurations such that the result of machine provisioning today is different from yesterday (or last week or last month).

For example, let's say you are running Puppet to manage a machine that sits in a continuous integration farm and recompiles a source tree over and over. In your Puppet manifest you have:

package {
    "gcc":
        ensure => latest
}

If you run Puppet today, you may pull down GCC 4.7 from the remote package repository because 4.7 is the latest version available. But if you run Puppet tomorrow, you may pull down GCC 4.8 because the package repository has been updated! If for some reason you need to rebuild one of today's builds tomorrow (perhaps you want to rebuild that revision plus a minor patch), they'll use different compiler versions (or any package for that matter) and the output may not be consistent - it may not even work at all! So much for repeatability.

File templates are another example. In Puppet, file templates are evaluated on the server and the results are sent to the client. So, the output of file template execution today might be different from the output tomorrow. If you needed to roll back your server to an old version, you may not be able to do that because the template on the server isn't backwards compatible! This can be worked around, sure (commonly by copying the template and branching differences), but over time these hacks accumulate in a giant pile of complexity.

The common issue here is that time has an impact on the outcome of machine configuration. I refer to this issue as time-dependent idempotency. In other words, does time play a role in the supposedly idempotent configuration process? If the output is consistent no matter when you run the configuration, it is time-independent and truly idempotent. If it varies depending on when configuration is performed, it is time-dependent and thus not truly idempotent.

Solving the problem

My attitude towards machine configuration and automation is that it should be as time independent as possible. If I need to revert to yesterday's state or want to reproduce something that happened months ago, I want strong guarantees that it will be similar, if not identical. Now, this is just my opinion. I've worked in environments where we had these strong guarantees. And having had this luxury, I abhore the alternative where so many pieces of configuration vary over time as the central configuration moves forward without the ability to turn back the clock. As always, your needs may be different and this post may not apply to you!

I said as possible a few times in the previous paragraph. While you could likely make all parts of your configuration time independent, it's not a good idea. In the real world, things change over time and making all configuration data static regardless of time will produce a broken or bad configuration.

User access is one such piece of configuration. Employees come and go. Passwords and SSH keys change. You don't want to revert user access to the way it was two months ago, restoring access to a disgruntled former employee or allowing access via a compromised password. Network configuration is another. Say the network topology changed and the firewall rules need updating. If you reverted the networking configuration, the machine likely wouldn't work on the network!

This highlights an important fact: if making your machine configuration time independent is a goal, you will need to bifurcate configuration by time dependency and solve for both. You'll need to identify every piece of configuration and ask do I put this in the bucket that is constant over time or the bucket that changes over time?

Machine configuration software can do a terrific job of ensuring an applied configuration is idempotent. The problem is it typically can't manage both time-dependent and time-independent attributes at the same time. Solving this requires a little brain power, but is achievable if there is will. In the next section, I'll describe how.

Technical implementation

Time-dependent machine configuration is a solved problem. Deploy Puppet master (or similar) and you are good to go.

Time-independent configuration is a bit more complicated.

As I mentioned above, the first step is to isolate all of the configuration you want to be time independent. Next, you need to ensure time dependency doesn't creep into that configuration. You need to identify things that can change over time and take measures to ensure those changes won't affect the configuration. I encourage you to employ the external system test: does this aspect of configuration depend on an external system or entity? If so how will I prevent changes in it over time from affecting us?

Package repositories are one such external system. New package versions are released all the time. Old packages are deleted. If your configuration says to install the latest package, there's no guarantee the package version won't change unless the package repository doesn't change. If you simply pin a package to a specific version, that version may disappear from the server. The solution: pin packages to specific versions and run your own package mirror that doesn't delete or modify existing packages.

Does your configuration fetch a file from a remote server or use a file as a template? Cache that file locally (in case it disappears) and put it under version control. Have the configuration reference the version control revision of that file. As long as the repository is accessible, the exact version of the file can be retrieved at any time without variation.

In my professional career, I've used two separate systems for managing time-independent configuration data. Both relied heavily on version control. Essentially, all the time-independent configuration data is collected into a single repository - an independent repository from all the time-dependent data (although that's technically an implementation detail). For Puppet, this would include all the manifests, modules, and files used directly by Puppet. When you want to activate a machine with a configuration, you simply say check out revision X of this repository and apply its configuration. Since revision X of the repository is constant over time, the set of configuration data being used to configure the machine is constant. And, if you've done things correctly, the output is idempotent over time.

In one of these systems, we actually had two versions of Puppet running on a machine. First, we had the daemon communicating with a central Puppet master. It was continually applying time-dependent configuration (user accounts, passwords, networking, etc). We supplemented this was a manually executed standalone Puppet instance. When you ran a script, it asked the Puppet master for its configuration. Part of that configuration was the revision of the time-independent Git repository containing the Puppet configuration files the client should use. It then pulled the Git repo, checked out the specified revision, merged Puppet master's settings for the node with that config (not the manifests, just some variables), then ran Puppet locally to apply the configuration. While a machine's configuration typically referenced a SHA-1 of a specific Git commit to use, we could use anything git checkout understood. We had some machines running master or other branches if we didn't care about time-independent idempotency for that machine at that time. What this all meant was that if you wanted to roll back a machine's configuration, you simply specified an earlier Git commit SHA-1 and then re-ran local Puppet.

We were largely satisfied with this model. We felt like we got the best of both worlds. And, since we were using the same technology (Puppet) for time-dependent and time-independent configuration, it was a pretty simple-to-understand system. A downside was there were two Puppet instances instead of one. With a little effort, someone could probably devise a way for the Puppet master to merge the two configuration trees. I'll leave that as an exercise for the reader. Perhaps someone has done this already! If you know of someone, please leave a comment!

Challenges

The solution I describe does not come without its challenges.

First, deciding whether a piece of configuration is time dependent or time independent can be quiet complicated. For example, should a package update for a critical security fix be time dependent or time independent? It depends! What's the risk of the machine not receiving that update? How often is that machine rolled back? Is that package important to the operation/role of that machine (if so, I'd lean more towards time independent).

Second, minimizing exposure to external entities is hard. While I recommend putting as much as possible under version control in a single repository and pinning versions everywhere when you interface with an external system, this isn't always feasible. It's probably a silly idea to have your 200 GB Apt repository under version control and distributed locally to every machine in your network. So, you end up introducing specialized one-off systems as necessary. For our package repository, we just ran an internal HTTP server that only allowed inserts (no deletes or mutates). If we were creative, we could have likely devised a way for the client to pass a revision with the request and have the server dynamically serve from that revision of an underlying repository. Although, that may not work for every server type due to limited control over client behavior.

Third, ensuring compatibility between the time-dependent configuration and time-independent configuration is hard. This is a consequence of separating those configurations. Will a time-independent configuration from a revision two years ago work with the time-dependent configuration of today? This issue can be mitigated by first having as much configuration as possible be time independent and second not relying on wide support windows. If it's good enough to only support compatibility for time-independent configurations less than a month old, then it's good enough! With this issue, I feel you are trading long-term future incompatibility for well-defined and understood behavior in the short to medium term. That's a trade-off I'm willing to make.

Conclusion

Many machine configuration management systems only care about idempotency today. However, with a little effort, it's possible to achieve consistent state over time. This requires a little extra effort and brain power, but it's certainly doable.

The next time you are programming your system configuration tool, I hope you take the time to consider the effects time will have and that you will take the necessary steps to ensure consistency over time (assuming you want that, of course).