Gregory Szorc's Digital Home

Mercurial Pushlog Is Now Robust Against Interrupts

December 30, 2014 at 12:25 PM | categories: Mozilla, Firefox

hg.mozilla.org - Mozilla's Mercurial server - has functionality called the pushlog which records who pushed what when. Essentially, it's a log of when a repository was changed. This is separate from the commit log because the commit log can be spoofed and the commit log doesn't record when commits were actually pushed.

Since its inception, the pushlog has suffered from data consistency issues. If you aborted the push at a certain time, data was not inserted in the pushlog. If you aborted the push at another time, data existed in the pushlog but not in the repository (the repository would get rolled back but the pushlog data wouldn't).

I'm pleased to announce that the pushlog is now robust against interruptions and its updates are consistent with what is recorded by Mercurial. The pushlog database commit/rollback is tied to Mercurial's own transaction API. What Mercurial does to the push transaction, the pushlog follows.

This former inconsistency has caused numerous problems over the years. When data was inconsistent, we often had to close trees until someone could SSH into the machines and manually run SQL to fix the problems. This also contributed to a culture of don't press ctrl+c during push: it could corrupt Mercurial. (Ctrl+c should be safe to press any time: if it isn't, there is a bug to be filed.)

Any time you remove a source of tree closures is a cause for celebration. Please join me in celebrating your new freedom to abort pushes without concern for data inconsistency.

In case you want to test things out, aborting pushes and (and rolling back the pushlog) should now result in something like:

pushing to ssh://hg.mozilla.org/mozilla-central
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
Trying to insert into pushlog.
Inserted into the pushlog db successfully.
^C
rolling back pushlog
transaction abort!
rollback completed

Using Docker to Build Firefox

May 19, 2013 at 01:45 PM | categories: Mozilla, Firefox

I have the privilege of having my desk located around a bunch of really intelligent people from the Mozilla Services team. They've been talking a lot about all the new technologies around server provisioning. One that interested me is Docker.

Docker is a pretty nifty piece of software. It's essentially a glorified wrapper around Linux Containers. But, calling it that is doing it an injustice.

Docker interests me because it allows simple environment isolation and repeatability. I can create a run-time environment once, package it up, then run it again on any other machine. Furthermore, everything that runs in that environment is isolated from the underlying host (much like a virtual machine). And best of all, everything is fast and simple.

For my initial experimentation with Docker, I decided to create an environment for building Firefox.

Building Firefox with Docker

To build Firefox with Docker, you'll first need to install Docker. That's pretty simple.

Then, it's just a matter of creating a new container with our build environment:

curl https://gist.github.com/indygreg/5608534/raw/30704c59364ce7a8c69a02ee7f1cfb23d1ffcb2c/Dockerfile | docker build

The output will look something like:

FROM ubuntu:12.10
MAINTAINER Gregory Szorc "gps@mozilla.com"
RUN apt-get update
===> d2f4faba3834
RUN dpkg-divert --local --rename --add /sbin/initctl && ln -s /bin/true /sbin/initctl
===> aff37cc837d8
RUN apt-get install -y autoconf2.13 build-essential unzip yasm zip
===> d0fc534feeee
RUN apt-get install -y libasound2-dev libcurl4-openssl-dev libdbus-1-dev libdbus-glib-1-dev libgtk2.0-dev libiw-dev libnotify-dev libxt-dev mesa-common-dev uuid-dev
===> 7c14cf7af304
RUN apt-get install -y binutils-gold
===> 772002841449
RUN apt-get install -y bash-completion curl emacs git man-db python-dev python-pip vim
===> 213b117b0ff2
RUN pip install mercurial
===> d3987051be44
RUN useradd -m firefox
===> ce05a44dc17e
Build finished. image id: ce05a44dc17e
ce05a44dc17e

As you can see, it is essentially bootstrapping an environment to build Firefox.

When this has completed, you can activate a shell in the container by taking the image id printed at the end and running it:

docker run -i -t ce05a44dc17e /bin/bash
# You should now be inside the container as root.
su - firefox
hg clone https://hg.mozilla.org/mozilla-central
cd mozilla-central
./mach build

If you want to package up this container for distribution, you just find its ID then export it to a tar archive:

docker ps -a
# Find ID of container you wish to export.
docker export 2f6e0edf64e8 > image.tar
# Distribute that file somewhere.
docker import - < image.tar

Simple, isn't it?

Future use at Mozilla

I think it would be rad if Release Engineering used Docker for managing their Linux builder configurations. Want to develop against the exact system configuration that Mozilla uses in its automation - you could do that. No need to worry about custom apt repositories, downloading custom toolchains, keeping everything isolated from the rest of your system, etc: Docker does that all automatically. Mozilla simply needs to publish Docker images on the Internet and anybody can come along and reproduce the official environment with minimal effort. Once we do that, there are few excuses for someone breaking Linux builds because of an environment discrepancy.

Release Engineering could also use Docker to manage isolation of environments between builds. For example, it could spin up a new container for each build or test job. It could even save images from the results of these jobs. Have a weird build failure like a segmentation fault in the compiler? Publish the Docker image and have someone take a look! No need to take the builder offline while someone SSH's into it. No need to worry about the probing changing state because you can always revert to the state at the time of the failure! And, builds would likely start faster. As it stands, our automation spends minutes managing packages before builds begin. This lag would largely be eliminated with Docker. If nothing else, executing automation jobs inside a container would allow us to extract accurate resource usage info (CPU, memory, I/O) since the Linux kernel effectively gives containers their own namespace independent of the global system's.

I might also explore publishing Docker images that construct an ideal development environment (since getting recommended tools in the hands of everybody is a hard problem).

Maybe I'll even consider hooking up build system glue to automatically run builds inside containers.

Lots of potential here.

Conclusion

I encourage Linux users to play around with Docker. It enables some new and exciting workflows and is a really powerful tool despite its simplicity. So far, the only major faults I have with it are that the docs say it should not be used in production (yet) and it only works on Linux.

Mozilla Build System Brain Dump

May 13, 2013 at 05:25 PM | categories: build system, Mozilla, Firefox, mach

I hold a lot of context in my head when it comes to the future of Mozilla's build system and the interaction with it. I wanted to perform a brain dump of sorts so people have an idea of where I'm coming from when I inevitably propose radical changes.

The sad state of build system interaction and the history of mach

I believe that Mozilla's build system has had a poor developer experience for as long as there has been a Mozilla build system. Getting started with Firefox development was a rite of passage. It required following (often out-of-date) directions on MDN. It required finding pages through MDN search or asking other people for info over IRC. It was the kind of process that turned away potential contributors because it was just too damn hard.

mach - while born out of my initial efforts to radically change the build system proper - morphed into a generic command dispatching framework by the time it landed in mozilla-central. It has one overarching purpose: provide a single gateway point for performing common developer tasks (such as building the tree and running tests). The concept was nothing new - individual developers had long coded up scripts and tools to streamline workflows. Some even published these for others to use. What set mach apart was a unified interface for these commands (the mach script in the top directory of a checkout) and that these productivity gains were in the tree and thus easily discoverable and usable by everybody without significant effort (just run mach help).

While mach doesn't yet satisfy everyone's needs, it's slowly growing new features and making developers' lives easier with every one. All of this is happening despite that there is not a single person tasked with working on mach full time. Until a few months ago, mach was largely my work. Recently, Matt Brubeck has been contributing a flurry of enhancements - thanks Matt! Ehsan Akhgari and Nicholas Alexander have contributed a few commands as well! There are also a few people with a single command to their name. This is fulfilling my original vision of facilitating developers to scratch their own itches by contributing mach commands.

I've noticed more people referencing mach in IRC channels. And, more people get angry when a mach command breaks or changes behavior. So, I consider the mach experiment a success. Is it perfect, no. If it's not good enough for you, please file a bug and/or code up a patch. If nothing else, please tell me: I love to know about everyone's subtle requirements so I can keep them in mind when refactoring the build system and hacking on mach.

The object directory is a black box

One of the ideas I'm trying to advance is that the object directory should be considered a black box for the majority of developers. In my ideal world, developers don't need to look inside the object directory. Instead, they interact with it through condoned and supported tools (like mach).

I say this for a few reasons. First, as the build config module owner I would like the ability to massively refactor the internals of the object directory without disrupting workflows. If people are interacting directly with the object directory, I get significant push back if things change. This inevitably holds back much-needed improvements and triggers resentment towards me, build peers, and the build system. Not a good situation. Whereas if people are indirectly interacting with the object directory, we simply need to maintain a consistent interface (like mach) and nobody should care if things change.

Second, I believe that the methods used when directly interacting with the object directory are often sub-par compared with going through a more intelligent tool and that productivity suffers as a result. For example, when you type make in inside the object directory you need to know to pass -j8, use make vs pymake, and that you also need to build toolkit/library, etc. Also, by invoking make directly, you bypass other handy features, such as automatic compiler warning aggregation (which only happens if you invoke the build system through mach). If you go through a tool like mach, you should automatically get the most ideal experience possible.

In order for this vision to be realized, we need massive improvements to tools like mach to cover the missing workflows that still require direct object directory interaction. We also need people to start using mach. I think increased mach usage comes after mach has established itself as obviously superior to the alternatives (I already believe it offers this for tasks like running tests).

I don't want to force mach upon people but...

Nobody likes when they are forced to change a process that has been familiar for years. Developers especially. I get it. That's why I've always attempted to position mach as an alternative to existing workflows. If you don't like mach, you can always fall back to the previous workflow. Or, you can improve mach (patches more than welcome!). Having gone down the please-use-this-tool-it's-better road before at other organizations, I strongly believe that the best method to incur adoption of a new tool is to gradually sway people through obvious superiority and praise (as opposed to a mandate to switch). I've been trying this approach with mach.

Lately, more and more people have been saying things like we should have the build infrastructure build through mach instead of client.mk and why do we need testsuite-targets.mk when we have mach commands. While I personally feel that client.mk and testsuite-targets.mk are antiquated as a developer-facing interface compared to mach, I'm reluctant to eliminate them because I don't like forcing change on others. That being said, there are compelling reasons to eliminate or at least refactor how they work.

Let's take testsuite-targets.mk as an example. This is the make file that provides the targets to run tests (like make xpcshell-test and make mochitest-browser-chrome). What's interesting about this file is that it's only used in local builds: our automation infrastructure does not use testsuite-targets.mk! Instead, mozharness and the old buildbot configs manually build up the command used to invoke the test harnesses. Initially, the mach commands for running tests simply invoked make targets defined in testsuite-targets.mk. Lately, we've been converting the mach commands to invoke the Python test runners directly. I'd argue that the logic for invoke the test runner only needs to live in one place in the tree. Furthermore as a build module peer, I have little desire to support multiple implementations. Especially considering how fragile they can be.

I think we're trending towards an outcome where mach (or the code behind mach commands) transitions into the authoratitive invocation method and legacy interfaces like client.mk and testsuite-targets.mk are reimplemented to either call mach commands or the same routine that powers them. Hopefully this will be completely transparent to developers.

The future of mozconfigs and environment configuration

mozconfig files are shell scripts used to define variables consumed by the build system. They are the only officially supported mechanism for configuring how the build system works.

I'd argue mozconfig files are a mediocre solution at best. First, there's the issue of mozconfig statements that don't actually do anything. I've seen no-op mozconfig content cargo culted into the in-tree mozconfigs (used for the builder configurations)! Oops. Second, doing things in mozconfig files is just awkward. Defining the object directory requires mk_add_options MOZ_OBJDIR=some-path. What's mk_add_options? If some-path is relative, what is it relative to? While certainly addressable, the documentation on how mozconfig files work is not terrific and fails to explain many pitfalls. Even with proper documentation, there's still the issue of the file format allowing no-op variable assignments to persist.

I'm very tempted to reinvent build configuration as something not mozconfigs. What exactly, I don't know. mach has support for ini-like configuration files. We could certainly have mach and the build system pull configs from the same file.

I'm not sure what's going to happen here. But deprecating mozconfig files as they are today is part of many of the options.

Handling multiple mozconfig files

A lot of developers only have a single mozconfig file (per source tree at least). For these developers, life is easy. You simply install your mozconfig in one of the default locations and it's automagically used when you use mach or client.mk. Easy peasy.

I'm not sure what the relative numbers are, but many developers maintain multiple mozconfig files per source tree. e.g. they'll have one mozconfig to build desktop Firefox and another one for Android. They may have debug variations of each.

Some developers even have a single mozconfig file but leverage the fact that mozconfig files are shell scripts and have their mozconfig dynamically do things depending on the current working directory, value of an environment variable, etc.

I've also seen wrapper scripts that glorify setting environment variables, changing directory, etc and invoke a command.

I've been thinking a lot about providing a common and well-supported solution for switching between active build configurations. Installing mach on $PATH goes a long way to facilitate this. If you are in an object directory, the mozconfig used when that object directory was created is automatically applied. Simple enough. However, I want people to start treating object directories as black boxes. So, I'd rather not see people have their shell inside the object directory.

Whenever I think about solutions, I keep arriving at a virtualenv-like solution. Developers would potentially need to activate a Mozilla build environment (similar to how Windows developers need to launch MozillaBuild). Inside this environment, the shell prompt would contain the name of the current build configuration. Users could switch between configurations using mach switch or some other magic command on the $PATH.

Truth be told, I'm skeptical if people would find this useful. I'm not sure it's that much better than exporting the MOZCONFIG environment variable to define the active config. This one requires more thought.

The integration between the build environment and Python

We use Python extensively in the build system and for common developer tasks. mach is written in Python. moz.build processing is implemented in Python. Most of the test harnesses are written in Python.

Doing practically anything in the tree requires a Python interpreter that knows about all the Python code in the tree and how to load it.

Currently, we have two very similar Python environments. One is a virtualenv created while running configure at the beginning of a build. The other is essentially a cheap knock-off that mach creates when it is launched.

At some point I'd like to consolidate these Python environments. From any Python process we should have a way to automatically bootstrap/activate into a well-defined Python environment. This certainly sounds like establishing a unified Python virtualenv used by both the build system and mach.

Unfortunately, things aren't straightforward. The virtualenv today is constructed in the object directory. How do we determine the current object directory? By loading the mozconfig file. How do we do that? Well, if you are mach, we use Python. And, how does mach know where to find the code to load the mozconfig file? You can see the dilemma here.

A related issue is that of portable build environments. Currently, a lot of our automation recreates the build system's virtualenv from its own configuration (not that from the source tree). This has and will continue to bite us. We'd really like to package up the virtualenv (or at least its config) with tests so there is no potential for discrepancy.

The inner workings of how we integrate with Python should be invisible to most developers. But, I figured I'd capture it here because it's an annoying problem. And, it's also related to an activated build environment. What if we required all developers to activate their shell with a Mozilla build environment (like we do on Windows)? Not only would this solve Python issues, but it would also facilitate simpler config switching (outlined above). Hmmm...

Direct interaction with the build system considered harmful

Ever since there was a build system developers have been typing make (or make.py) to build the tree. One of the goals of the transition to moz.build files is to facilitate building the tree with Tup. make will do nothing when you're not using Makefiles! Another goal of the moz.build transition is to start derecursifying the make build system such that we build things in parallel. It's likely we'll produce monolithic make files and then process all targets for a related class IDLs, C++ compilation, etc in one invocation of make. So, uh, what happens during a partial tree build? If a .cpp file from /dom/src/storage is being handled by a monolithic make file invoked by the Makefile at the top of the tree, how does a partial tree build pick that up? Does it build just that target or every target in the monolithic/non-recursive make file?

Unless the build peers go out of our way to install redundant targets in leaf Makefiles, directly invoking make from a subdirectory of the tree won't do what it's done for years.

As I said above, I'm sympathetic to forced changes in procedure, so it's likely we'll provide backwards-compatibile behavior. But, I'd prefer to not do it. I'd first prefer partial-tree builds are not necessary and a full tree build finishes quickly. But, we're not going to get there for a bit. As an alternative, I'll take people building through mach build. That way, we have an easily extensible interface on which to build partial tree logic. We saw this recently when dumbmake/smartmake landed. And, going through mach also reinforces my ideal that the object directory is a black box.

Semi-persistent state

Currently, most state as it pertains to a checkout or build is in the object directory. This is fine for artifacts from the build system. However, there is a whole class of state that arguably shouldn't be in the object directory. Specifically, it shouldn't be clobbered when you rebuild. This includes logs from previous builds, the warnings database, previously failing tests, etc. The list is only going to grow over time.

I'd like to establish a location for semi-persistant state related to the tree and builds. Perhaps we change the clobber logic to ignore a specific directory. Perhaps we start storing things in the user's home directory. Perhaps we could establish a second object directory named the state directory? How would this interact with build environments?

This will probably sit on the backburner until there is a compelling use case for it.

The battle against C++

Compiling C++ consumes the bulk of our build time. Anything we can do to speed up C++ compilation will work wonders for our build times.

I'm optimistic things like precompiled headers and compiling multiple .cpp files with a single process invocation will drastically decrease build times. However, no matter how much work we put in to make C++ compilation faster, we still have a giant issue: dependency hell.

As shown in my build system presentation a few months back, we have dozens of header files included by hundreds if not thousands of C++ files. If you change one file: you invalidate build dependencies and trigger a rebuild. This is why whenever files like mozilla-config.h change you are essentially confronted with a full rebuild. ccache may help if you are lucky. But, I fear that as long as headers proliferate the way they do, there is little the build system by itself can do.

My attitude towards this is to wait and see what we can get out of precompiled headers and the like. Maybe that makes it good enough. If not, I'll likely be making a lot of noise at Platform meetings requesting that C++ gurus brainstorm on a solution for reducing header proliferation.

Conclusion

Belive it or not, these are only some of the topics floating around in my head! But I've probably managed to bore everyone enough so I'll call it a day.

I'm always interested in opinions and ideas, especially if they are different from mine. I encourage you to leave a comment if you have something to say.

Bulk Analysis of Mozilla's Build and Test Data

April 01, 2013 at 01:12 PM | categories: Mozilla, Firefox

When you push code changes to Firefox and other similar Mozilla projects, a flood of automated jobs is triggered on Mozilla's infrastructure. It works like any other continuous integration system. First you build, then you run tests, etc. What sets it apart from other continuous integration systems is the size: Mozilla runs thousands of jobs per week and the combined output sums into the tens of gigabytes.

Most of the data from Mozilla's continuous integration is available on public servers, notably ftp.mozilla.org. This includes compiled binaries, logs, etc.

While there are tools that can sift through this mountain of data (like TBPL), they don't allow ad-hoc queries over the raw data. Furthermore, these tools are very function-specific and there are many data views they don't expose. This missing data has always bothered me because, well, there are cool and useful things I'd like to do with this data.

This itch has been bothering me for well over a year. The persistent burning sensation coupled with rain over the weekend caused me to scratch it.

The product of my weekend labor is a system facilitating bulk storage and analysis of Mozilla's build data. While it's currently very alpha, it's already showing promise for more throrough data analysis.

Essentially, the tool works by collecting the dumps of all the jobs executed on Mozilla's infrastructure. It can optionally supplement this with the raw logs from those jobs. Then, it combs through this data, extracts useful bits, and stores them. Once the initial fetching has completed, you simply need to re-"parse" the data set into useful data. And, since all data is stored locally, the performance of this is not bound by Internet bandwidth. In practice, this means that you can obtain a new metric faster than would have been required before. The downside is you will likely be storing gigabytes of raw data locally. But, disks are cheap. And, you have control over what gets pulled in, so you can limit it to what you need.

Please note the project is very alpha and is only currently serving my personal interests. However, I know there is talk about TBPL2 and what I have built could evolve into the data store for the next generation TBPL tool. Also, most of the work so far has centered on data import. There is tons of analysis code waiting to be written.

If you are interested in improving the tool, please file a GitHub pull request.

I hope to soon blog about useful information I've obtained through this tool.

Omnipresent mach

March 03, 2013 at 12:30 PM | categories: Mozilla, Firefox, mach

Matt Brubeck recently landed an awesome patch for mach in bug 840588: it allows mach to be used by any directory. I'm calling it omnipresent mach.

Essentially, Matt changed the mach driver (the script in the root directory of mozilla-central) so instead of having it look in hard-coded relative paths for all its code, it walks up the directory tree and looks for signs of the source tree or the object directory.

What this all means is that if you have the mach script installed in your $PATH and you just type mach in your shell from within any source directory or object directory, mach should just work. So, no more typing ./mach: just copy mach to ~/bin, /usr/local/bin or some other directory on your $PATH and you should just be able to type mach.

Unfortunately, there are bound to be bugs here. Since mach traditionally was only executed with the current working directory as the top source directory, some commands are not prepared to handle a variable current working directory. Some commands will likely get confused when it comes resolving relative paths, etc. If you find an issue, please report it! A temporary workaround is to just invoke mach from the top source directory like you've always been doing.

If you enjoy the feature, thank Matt: this was completely his idea and he saw it through from conception to implementation.

Mercurial Pushlog Is Now Robust Against Interrupts

Using Docker to Build Firefox

Building Firefox with Docker

Future use at Mozilla

Conclusion

Mozilla Build System Brain Dump

The sad state of build system interaction and the history of mach

The object directory is a black box

I don't want to force mach upon people but...

The future of mozconfigs and environment configuration

Handling multiple mozconfig files

The integration between the build environment and Python

Direct interaction with the build system considered harmful

Semi-persistent state

The battle against C++

Conclusion

Bulk Analysis of Mozilla's Build and Test Data

Omnipresent mach

Categories