Distributed Compiling and Firefox

October 31, 2013 at 11:35 AM | categories: Mozilla, build system

If you had infinite CPU cores available and the Firefox build system could distribute them all for concurrent compilation, Firefox clobber build times would likely be 3-5 minutes instead of ~15 minutes on modern machines. This is a massive win. It therefore should come as no surprise that distributed compiling is very interesting to us.

Up until recently, the benefits of distributed compiling in the Firefox build system couldn't be fully realized. This was because the build system was performing recursive make traversal and make only knew about a tiny subset of the tree's total C++ files at one time. For example, when visiting /layout/base it only knew about 35 of the close to 6000 files that get compiled as part of building Firefox. This meant there was a hard ceiling to the max concurrency the build system could achieve. This ceiling was often higher than the number of cores in an individual machine, so it wasn't a huge issue for single machine builds. But it did significantly limit the benefits of distributed compiling. This all changed recently.

As of a few weeks ago, the build system no longer encounters a low ceiling preventing distributed compilation from reaping massive benefits. If you have build with make -j128, make will spawn 128 compiler processes when processing the compile tier (which is where most compilation occurs). If your compiler is set to a distributed compiler, you will win.

So, what should you do about it?

I encourage people to set up distributed compilation networks to reap the benefits of distributed compilation. Here are some tools you should know about and some things to keep in mind.

distcc is the tried and proven tool for performing distributed compilation. It's heavily used and gets the job done. It even works on Windows and can perform remote processing, which is a huge win for our tree, where preprocessing can be computationally expensive because of excessive includes. But, it has a few significant drawbacks. Read the next paragraph.

I'm personally more excited about icecream. It has some very compelling advantages to distcc. It has a scheduler that can intelligently distribute load between machines. It uses network broadcast to discover the scheduler. So, you just start the client daemon and if there is a scheduler on the local network, it's all set up. Icecream transfers the compiler toolchain between nodes so you are guaranteed to have consistent output. (With distcc, output may not be idempotent if the nodes aren't homogenous since distcc relies on the system-local toolchain. If different versions are installed on different nodes, you are out of luck). Icecream also supports cross-compiling. In theory, you can have Linux machines building for OS X, 32-bit machines building for 64-bit, etc. This is all very difficult (if not impossible) to do with distcc. Unfortunately, icecream doesn't work on Windows and doesn't appear to support server-side preprocessing. Although, I imagine both could be made to work if someone put in the effort.

Distributed compilation is very network intensive. I haven't measured, but I suspect Wi-Fi bandwidth and latency constraints might make it prohibitive there. It certainly won't be good for Wi-Fi saturation! If you are in a Mozilla office, please do not attempt to perform distributed compilation over Wi-Fi! For the same reasons, distributed compilation will likely not benefit you if you are attempting to compile on network-distant nodes.

I have set up an icecream server in the Mozilla San Francisco office. If you install the icecream client daemon (iceccd) on your machine, it should just work. I'm not sure what broadcast nets are configured as, but I've successfully had machines on the 7th floor discover it automatically. I guarantee no SLA for this server. Ping me privately if you have difficulty connecting.

I've started very preliminary talks with Mozilla IT about setting up dedicated compiler farms in Mozilla offices. I'm not saying this is coming any time soon. I feel this will have a major impact on developer productivity and I wanted to get the ball rolling months in advance so nobody can claim this is a fire drill.

For distributed compilation to work well, the build system really needs to be aware of distributed compilation. For example, to yield the benefits of distributed compilation with make, you need to pass -j64 or some other large value for concurrency. However, this value would be universal for every task in the build. There are still thousands of processes that must run locally. Using -j64 on these local tasks could cause memory exhaustion, I/O saturation, excessive context switching, etc. But if you decrease the concurrency ceiling, you lose the benefits of distributed compilation! The build system thus needs to be taught when distributed compilation is available and what tasks can be made concurrent so it can intelligently adjust the -j concurrency limit at run-time. This is why we have a higher-level build wrapper tool: mach build. (This is another reason why people should be building through mach instead of invoking make directly.)

No matter what technical solution we employ, I would like the build system to automatically discover and use distributed compilation if it is available. If we need to hardcode Mozilla IP addresses or hostnames into the build system, I'm fine with that. I just don't want developers not achieving much-faster build times because they are ignorant. If you are in a physical location with distributed compilation support, you should get that automatically: fast builds should not be hard.

We can and should investigate distributed compilation as part of release automation. Icecream should mitigate the concerns about build reproducibility since the toolchain is transferred at build time.

I have had success getting Icecream to work with Linux builds. However, OS X is problematic. Specifically, Icecream is unable to create the build environment for distribution (likely modern OS X/Xcode compatibility issue). Details are in bug 927952.

Build peers have a lot on our plate this quarter and making distributed compilation work well is not in our official goals. I would love, love, love if someone could step up and be a hero to make distributed compilation work better with the build system. If you are interested, pop into #build on irc.mozilla.org.

In summary, there are massive developer productivity wins waiting to be realized through distributed compiling. There is nobody tasked to work on this officially. Although, I'd love it if there were. If you find yourself setting up ad-hoc networks in offices, I'd really like to see some kind of discovery in mach. If not, there will be people left behind and that really stinks for those individuals. If you do any work around distributed compiling, please have it tracked under bug 485559.