Many developers use MacBook Pros for day-to-day Firefox development. So, I thought it would be worthwhile to perform a comparison of Firefox build times for various models of MacBook Pros.
The numbers in this post are obtained from 3 generations of MacBook Pros:
A 2011 Sandy Bridge 4 core x 2.3 GHz with 8 GB RAM and an aftermarket SSD.
A 2012 Ivy Bridge retina with 4 core x 2.6 GHz, 16 GB RAM, and a factory SSD (or possibly flash storage).
A 2013 Haswell retina with 4 core x 2.6 GHz, 16 GB RAM, and flash storage.
All machines were running OS X 10.9 Mavericks and were using the Xcode 5.0.1 toolchain (Xcode 5 clang: Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)) to build.
The power settings prevented machine sleep and machines were plugged into A/C power during measuring. I did not use the machines while obtaining measurements.
The 2012 and 2013 machines were very vanilla OS installs. However, the 2011 machine was my primary work computer and may have had a few background services running and may have been slower due to normal wear and tear. The 2012 machine was a loaner machine from IT and has an unknown history.
All data was obtained from mozilla-central revision d4a27d8eda28.
The mozconfig used contained:
export MOZ_PSEUDO_DERECURSE=1 mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-firefox.noindex
Please note that the objdir name ends with .noindex to prevent Finder from indexing build files.
I performed all tests multiple times and used the fastest time. I used time command for obtaining measurements of wall, user, and system time.
The result of mach configure is as follows:
|Machine||Wall time||User time||System time|
Clobber build no ccache
mach build was performed after running mach configure. ccache was not enabled.
|Machine||Wall time||User time||System time||Total CPU time|
|2011||22:29 (1349)||145:35 (8735)||12:03 (723)||157:38 (9458)|
|2012||15:00 (900)||94:18 (5658)||8:14 (494)||102:32 (6152)|
|2013||11:13 (673)||69:55 (4195)||6:04 (364)||75:59 (4559)|
Clobber build with empty ccache
mach build was performed after running mach configure. ccache was enabled. The ccache ccache was cleared before running mach configure.
|Machine||Wall time||User time||System time||Total CPU time|
|2011||25:57 (1557)||161:30 (9690)||18:21 (1101)||179:51 (10791)|
|2012||16:58 (1018)||104:50 (6290)||12:32 (752)||117:22 (7042)|
|2013||12:59 (779)||79:51 (4791)||9:24 (564)||89:15 (5355)|
Clobber build with populated ccache
mach build was performed after running mach configure. ccache was enabled and the ccache was populated with the results of a prior build. In theory, all compiler invocations should be serviced by ccache entries.
This measure is a very crude way to measure how fast clobber builds would be if compiler invocations were nearly instantaneous.
|Machine||Wall time||User time||System time|
|2011||3:59 (239)||8:04 (484)||3:21 (201)(|
|2012||3:11 (191)||6:45 (405)||2:53 (173)|
|2013||2:31 (151)||5:22 (322)||2:12 (132)|
mach build was performed on a tree that was already built.
|Machine||Wall time||User time||System time|
|2011||1:58 (118)||2:25 (145)||0:41 (41)|
|2012||1:42 (102)||2:02 (122)||0:37 (37)|
|2013||1:20 (80)||1:39 (99)||0:28 (28)|
mach build binaries was performed on a fully built tree. This results in nothing being executed. It's a way to test the overhead of the binaries make target.
|Machine||Wall time||User time||System time|
binaries touch single .cpp
mach build binaries was performed on a fully built tree after touching the file netwerk/dns/nsHostResolver.cpp. ccache was enabled but cleared before running this test. This test simulates common C++ developer workflow of changing C++ and recompiling.
|Machine||Wall time||User time||System time|
The times of each build system tier were measured on the 2013 Haswell MacBook Pro. These timings were obtained out of curiosity to help isolate the impact of different parts of the build. ccache was not enabled for these tests.
|Action||Wall time||User time||System time||Total CPU time|
|compile clobber||9:01 (541)||64:58 (3898)||5:08 (308)||70:06 (4206)|
|libs clobber||1:34 (94)||2:15 (135)||0:39 (39)||2:54 (174)|
Observations and conclusions
The data speaks for itself: the 2013 Haswell MacBook Pro is significantly faster than its predecessors. It clocks in at 2x faster than the benchmarked 2011 Sandy Bridge model (keep in mind the 300 MHz base clock difference) and is ~34% faster than the 2012 Ivy Bridge (at similar clock speed). Personally, I was surprised by this. I was expecting speed improvements over Ivy Bridge, but not 34%.
It should go without saying: if you have the opportunity to upgrade to a new, Haswell-based machine: do it. If possible, purchase the upgrade to a 2.6 GHz CPU, as it contains ~13% more MHz than the base 2.3 GHz model: this will make a measurable difference in build times.
It's worth noting the increased efficiency of Haswell over its predecessors. The total CPU time required to build decreased from ~158 minutes to ~103 minutes to 76 minutes! That 76 minute number is worth highlighting because it means if we get 100% CPU saturation during builds, we'll be able to build the tree in under 10 wall time minutes!
I hadn't performed crude benchmarks of high-level build system actions since the MOZ_PSEUDO_DERECURSE work landed and I wanted to use the opportunity of this hardware comparison to grab some numbers.
The overhead of ccache continues to surprise me. On the 2013 machine, enabling ccache increased the wall time of a clobber build by 1:46 and added 13:16 of CPU time. This is an increase of 16% and 17%, respectively.
It's worth highlighting just how much time is spent compiling C/C++. In our artificial tier measuring results, our clobber build time was ~660 wall time seconds (11 minutes) and used ~4473s CPU time (74:33). Of this, 9:01 wall time and 70:06 CPU time was spent compiling C/C++. This represents ~82% wall time and ~94% CPU time! Please note this does not include linking. Anything we can do to decrease the CPU time used by the compiler will make the build faster.
I also found it interesting to note variances in obtained times. Even on my brand new 2013 Haswell MacBook Pro where I know there aren't many background processes running, wall times could vary significantly. I think I isolated it to CPU bursting and heat issues. If I wait a few minutes between CPU intensive tests, results are pretty consistent. But if I perform CPU intensive tests back-to-back, the run times often vary. The only other thing coming into play could be page caching or filesystem indexing. I accounted for the latter by disabling Finder on the object directory. And, I'd like to think that flash storage is fast enough to remove I/O latency from the equation. Who knows. At the end of the day, laptops aren't servers and OS X is a consumer OS, so I don't expect ultra consistency.
Finally, I want to restate just how fast Haswell is. If you have the opportunity to upgrade, do it.
If you had infinite CPU cores available and the Firefox build system could distribute them all for concurrent compilation, Firefox clobber build times would likely be 3-5 minutes instead of ~15 minutes on modern machines. This is a massive win. It therefore should come as no surprise that distributed compiling is very interesting to us.
Up until recently, the benefits of distributed compiling in the Firefox build system couldn't be fully realized. This was because the build system was performing recursive make traversal and make only knew about a tiny subset of the tree's total C++ files at one time. For example, when visiting /layout/base it only knew about 35 of the close to 6000 files that get compiled as part of building Firefox. This meant there was a hard ceiling to the max concurrency the build system could achieve. This ceiling was often higher than the number of cores in an individual machine, so it wasn't a huge issue for single machine builds. But it did significantly limit the benefits of distributed compiling. This all changed recently.
As of a few weeks ago, the build system no longer encounters a low ceiling preventing distributed compilation from reaping massive benefits. If you have build with make -j128, make will spawn 128 compiler processes when processing the compile tier (which is where most compilation occurs). If your compiler is set to a distributed compiler, you will win.
So, what should you do about it?
I encourage people to set up distributed compilation networks to reap the benefits of distributed compilation. Here are some tools you should know about and some things to keep in mind.
distcc is the tried and proven tool for performing distributed compilation. It's heavily used and gets the job done. It even works on Windows and can perform remote processing, which is a huge win for our tree, where preprocessing can be computationally expensive because of excessive includes. But, it has a few significant drawbacks. Read the next paragraph.
I'm personally more excited about icecream. It has some very compelling advantages to distcc. It has a scheduler that can intelligently distribute load between machines. It uses network broadcast to discover the scheduler. So, you just start the client daemon and if there is a scheduler on the local network, it's all set up. Icecream transfers the compiler toolchain between nodes so you are guaranteed to have consistent output. (With distcc, output may not be idempotent if the nodes aren't homogenous since distcc relies on the system-local toolchain. If different versions are installed on different nodes, you are out of luck). Icecream also supports cross-compiling. In theory, you can have Linux machines building for OS X, 32-bit machines building for 64-bit, etc. This is all very difficult (if not impossible) to do with distcc. Unfortunately, icecream doesn't work on Windows and doesn't appear to support server-side preprocessing. Although, I imagine both could be made to work if someone put in the effort.
Distributed compilation is very network intensive. I haven't measured, but I suspect Wi-Fi bandwidth and latency constraints might make it prohibitive there. It certainly won't be good for Wi-Fi saturation! If you are in a Mozilla office, please do not attempt to perform distributed compilation over Wi-Fi! For the same reasons, distributed compilation will likely not benefit you if you are attempting to compile on network-distant nodes.
I have set up an icecream server in the Mozilla San Francisco office. If you install the icecream client daemon (iceccd) on your machine, it should just work. I'm not sure what broadcast nets are configured as, but I've successfully had machines on the 7th floor discover it automatically. I guarantee no SLA for this server. Ping me privately if you have difficulty connecting.
I've started very preliminary talks with Mozilla IT about setting up dedicated compiler farms in Mozilla offices. I'm not saying this is coming any time soon. I feel this will have a major impact on developer productivity and I wanted to get the ball rolling months in advance so nobody can claim this is a fire drill.
For distributed compilation to work well, the build system really needs to be aware of distributed compilation. For example, to yield the benefits of distributed compilation with make, you need to pass -j64 or some other large value for concurrency. However, this value would be universal for every task in the build. There are still thousands of processes that must run locally. Using -j64 on these local tasks could cause memory exhaustion, I/O saturation, excessive context switching, etc. But if you decrease the concurrency ceiling, you lose the benefits of distributed compilation! The build system thus needs to be taught when distributed compilation is available and what tasks can be made concurrent so it can intelligently adjust the -j concurrency limit at run-time. This is why we have a higher-level build wrapper tool: mach build. (This is another reason why people should be building through mach instead of invoking make directly.)
No matter what technical solution we employ, I would like the build system to automatically discover and use distributed compilation if it is available. If we need to hardcode Mozilla IP addresses or hostnames into the build system, I'm fine with that. I just don't want developers not achieving much-faster build times because they are ignorant. If you are in a physical location with distributed compilation support, you should get that automatically: fast builds should not be hard.
We can and should investigate distributed compilation as part of release automation. Icecream should mitigate the concerns about build reproducibility since the toolchain is transferred at build time.
I have had success getting Icecream to work with Linux builds. However, OS X is problematic. Specifically, Icecream is unable to create the build environment for distribution (likely modern OS X/Xcode compatibility issue). Details are in bug 927952.
Build peers have a lot on our plate this quarter and making distributed compilation work well is not in our official goals. I would love, love, love if someone could step up and be a hero to make distributed compilation work better with the build system. If you are interested, pop into #build on irc.mozilla.org.
In summary, there are massive developer productivity wins waiting to be realized through distributed compiling. There is nobody tasked to work on this officially. Although, I'd love it if there were. If you find yourself setting up ad-hoc networks in offices, I'd really like to see some kind of discovery in mach. If not, there will be people left behind and that really stinks for those individuals. If you do any work around distributed compiling, please have it tracked under bug 485559.
As we look ahead to Q4 planning for the Firefox build system, I wanted to take the time to reflect on what was accomplished in Q3 and to simultaneously look forward to Q4 and beyond.
2013 Q3 Build System Improvements
There were notable improvements in the build system during the last quarter.
The issues our customers care most about is speed. Here is a list of accomplishments in that area:
MOZ_PSEUDO_DERECURSE work to change how make directory traversal works. This enabled the binaries make target, which can do no-op libxul-only builds in just a few seconds. Of all the changes that landed this quarter, this is the most impactful to local build times. This change also enables C++ compilation to scale out to as many cores as you have. Previously, the build system was starved in many parts of the tree when compiling C++. Mike Hommey is responsible for this work. I reviewed most of it.
WebIDL and IPDL bindings are now compiled in unified mode, reducing compile times and linker memory usage. Nathan Froyd wrote the code. I reviewed the patches.
XPIDL files are generated much more efficiently. This removed a few minutes of CPU core time from builds. I wrote these patches and Mike Hommey reviewed.
Increased reliance on install manifests to process file installs. They have drastically reduced the number of processes required to build by performing all actions inside Python processes as system calls and removing the clownshoes of having to delete parts of the object directory at the beginning of builds. When many mochitests were converted to manifests, no-op build times dropped by ~15% on my machine. Many people are responsible for this work. Mike Hommey wrote the original install code for packaging a few months ago. I built in manifest file support, support for symlinks, and made the code a bit more robust and faster. Mike Hommey reviewed these patches.
Many bugs and issues around dependency files on Windows have been discovered and fixed. These were a common source of clobbers. Mike Hommey found most of these, many during his work to make MOZ_PSEUDO_DERECURSE work.
The effort to reduce C++ include hell is resulting in significantly shorter incremental builds. While this effort is largely outside the build config module, it is worth mentioning. Ehsan Akhgari is leading this effort. He's been assisted by too many people to mention.
The build system now has different build modes favoring faster building vs release build options depending on the environment. Mike Hommey wrote most (all?) of the patches.
A number of other non-speed related improvements have been made:
The build system now monitors resource usage during builds and can graph the results. I wrote the code. Ted Mielczarek, Mike Hommey, and Mike Shal had reviews.
Support for test manifests has been integrated with the build system. This enabled some build speed wins and is paving the road for better testing UX, such as the automagical mach test command, which will run the appropriate test suite automatically. Multiple people were involved in the work to integrate test manifests with the build system. I wrote the patches. But Ted Mielczarek got primary review. Joel Maher, Jeff Hammel, and Ms2ger provided excellent assistance during the design and implementation phase. The work around mochitest manifests likely wouldn't have happened this quarter if all of us weren't attending an A*Team work week in August.
There are now in-tree build system docs. They are published automatically. Efforts have been made to purge MDN of cruft. I am responsible for writing the code and most of the docs. Benjamin Smedberg and Mike Shal performed code reviews.
Improvements have been made to object directory detection in mach. This was commonly a barrier to some users using mach. I am responsible for the code. Nearly every peer has reviewed patches.
We now require Python 2.7.3 to build, making our future Python 3 compatibility story much easier while eliminating a large class of Python 2.7.2 and below bugs that we constantly found ourselves working around.
mach bootstrap has grown many new features and should be more robust than ever. There are numerous contributors here, including many community members that have found and fixed bugs and have added support for additional distributions.
The boilerplate from Makefile.in has disappeared. Mike Hommey is to thank.
dumbmake integrated with mach. Resulted in friendlier build interface for a nice UX win. Code by Nick Alexander. I reviewed.
Many variables have been ported from Makefile.in to moz.build. We started Q3 with support for 47 variables and now support 73. We started with 1226 Makefile.in and 1517 moz.build and currently have 941 Makefile.in and 1568 moz.build. Many people contributed to this work. Worth mentioning are Joey Armstrong, Mike Shal, Joshua Cranmer, and Ms2ger.
Many build actions are moving to Python packages. This enabled pymake inlining (faster builds) and is paving the road towards no .pyc files in the source directory. (pyc files commonly are the source of clobber headaches and make it difficult to efficiently perform builds on read-only filesystems.) I wrote most of the patches and Mike Shal and Mike Hommey reviewed.
moz.build is now more strict about what it accepts. We check for missing files at config parse time rather than build time, causing errors to surface faster. Many people are responsible for this work. Mike Shal deserves kudos for work around C/C++ file validation.
mach has been added to the B2G repo. Jonathan Griffin and Andrew Halberstadt drove this.
Current status of the build bystem
Q3 was a very significant quarter for the build system. For the first time in years, we made fundamental changes to how the build system goes about building. The moz.build work to free our build config from the shackles of make files had enabled us to consume that data and do new and novel things with it. This has enabled improvements in build robustness and - most importantly - speed.
This is most evident with the MOZ_PSEUDO_DERECURSE work, which effectively replaces how make traverses directories. The work there has allowed Gecko developers focused on libxul to go from e.g. 50s no-op build times to less than 5s. Combined with optimized building of XPIDL, IPDL, and WebIDL files, processing of file installs via manifests, and C++ header dependency reduction, and a host of other changes, and we are finally turning a corner on build times! Much of this work wouldn't have been possible without moz.build files providing a whole world view of our build config.
The quarter wasn't all roses. Unfortunately, we also broke things. A lot. The total number of required clobbers this quarter grew slightly from 38 in Q2 to 43 in Q3. Many of these clobbers were regressions from supposed improvements to the build system. Too many of these regressions were Windows/pymake only and surely would have been found prior to landing if more build peers were actively building on Windows. There are various reasons we aren't. We should strive to fix them so more build development occurs on Windows and Windows users aren't unfairly punished.
The other class of avoidable clobbers mostly revolves around the theme that the build system is complicated, particularly when it comes to integration with release automation. Build automation has its build logic currently coded in Buildbot config files. This means it's all but impossible for build peers to test and reproduce that build environment and flow without time-intensive, stop-energy abundant excessive try pushes or loading out build slaves. The RelEng effort to extract this code from buildbot to mozharness can't come soon enough. See my overview on how automation works for more.
This quarter, the sheriffs have been filing bugs whenever a clobber is needed. This has surfaced clobber issues to build peers better and I have no doubt their constant pestering caused clobber issues to be resolved sooner. It's a terrific incentive for us to fix the build system.
I have mixed feelings on the personnel/contribution front in Q3. Kyle Huey no longer participates in active build system development or patch review. Ted Mielczarek is also starting to drift away from active coding and review. Although, he does constantly provide knowledge and historical context, so not all is lost. It is disappointing to see fantastic people and contributors no longer actively participating on the coding front. But, I understand the reasons behind it. Mozilla doesn't have a build team with a common manager and decree (a mistake if you ask me). Ted and Kyle are both insanely smart and talented and they work for teams that have other important goals. They've put in their time (and suffering). So I see why they've moved on.
On the plus side, Mike Hommey has been spending a lot more time on build work. He was involved in many of the improvements listed above. Due to review load and Mike's technical brilliance, I don't think many of our accomplishments would have happened without him. If there is one Mozillian who should be commended for build system work in Q3, it should be Mike Hommey.
Q3 also saw the addition of new build peers. Mike Shal is now a full build config module peer. Nick Alexander is now a peer of a submodule covering just the Fennec build system. Aside from his regular patch work, Mike Shal has been developing his review skills and responsibilities. Without him, we would likely be drowning in review requests and bug investigations due to the departures of Kyle and Ted. Nick is already doing what I'd hope he'd do when put in charge of the Fennec build system: looking at a proper build backend for Java (not make) and Eclipse project generation. (I still can't believe many of our Fennec developers code Java in vanilla text editors, not powerful IDEs. If there is one language that would miss IDEs the most, I'd think it would be Java. Anyway.)
There was a steady stream of contributions from people not in the build config module. Joshua Cranmer has been keeping up with moz.build conversions for comm-central. Nathan Froyd and Boris Zbarsky have helped with all kinds of IDL work. Trevor Saunders has helped keep things clean. Ms2ger has been eager to provide assistance through code and reviews. Various community contributors have helped with moz.build conversion patches and improvements to mach and the bootstrapper. Thank you to everyone who contributed last quarter!
Looking to the future
At the beginning of the quarter, I didn't think it would be possible to attain no-op build speeds with make as quickly as make binaries now does. But, Mike Hommey worked some magic and this is now possible. This was a game changer. The code he wrote can be applied to other build actions. And, our other solutions involving moz.build files to autogenerated make files seems to be working pretty well too. This raises some interesting questions with regards to priortization.
Long term, we know we want to move away from make. It is old and clumsy. It's easy to do things wrong. It doesn't scale to handle a single DAG as large as our build system. The latter is particularly important if we are to ever have a build system that doesn't require clobbers periodically.
Up to this point we've prioritized work on moz.build conversion, with the rationale being that it would more soon enable a clean break from make and thus we'd arrive at drastically faster builds sooner. The assumption in that argument was that drastically faster builds weren't attainable with make. Between the directory traversal overhaul and the release of GNU make 4.0 last week (which actually seems to work on Windows, making the pymake slowness a non-issue), the importance of breaking away from make now seems much less pressing.
While we would like to actively move off make, developments in the past few weeks seem to say that we can reassess priorities. I believe that we can drive down no-op builds with make to a time that satisfies many - let's say under 10s to be conservative. Using clever tricks optimizing for common developer workflows, we can probably get that under 5s everywhere, including Windows (people only caring about libxul can get 2.5s on mozilla-central today). This isn't the 250ms we could get with Tup. But it's much better than 45s. If we got there, I don't think many people would be complaining.
So, the big question for goals setting this quarter will be whether we want to focus on a new build backend (likely Tup) or whether we should continue with an emphasis on make. Now, a lot of the work involved applies to both make and any other build backend. But, I have little doubt it would be less overall work to support one build backend (make) than two. On the other hand, we know we want to support multiple build backends eventually. Why wait? In the balance are numerous other projects that have varying impact for developers and release automation. While important in their own right, it is difficult to balance them against build speed. While we could strive towards instantaneous builds, at some point we'll hit good enough and the diminishing returns that accompany them. There is already a small vocal faction advocating for Ninja support, even though it would only decrease no-op libxul build times from ~2.5s to 250ms. While a factor of 10x improvement, I think this is dangerously close to diminishing returns territory and our time investment would be better spent elsehwere. (Of course, once we can support building libxul with Ninja, we could easily get it for Tup. And, I believe Tup wins that tie.). Anyway, I'm sure it will be an interesting discussion!
Whatever the future holds, it was a good quarter for the build system and the future is looking brighter than ever. We have transitioned from a maintain-and-react mode (which I understand has largely been the norm since the dawn of Firefox) to a proactive and future-looking approach that will satisfy the needs of Firefox and its developers for the next ten years. All of this progress is even more impressive when you consider that we still react to an aweful lot of fire drills and unwanted maintenance!
The Firefox build system is improving. I'm as anxioux as you are to see various milestones in terms of build speed and other features. But it's hard work. Wish us luck. Please help out where you can.
I'd like to make an attempt at delivering regular status updates on the Gecko/Firefox build system and related topics. Here we go with the first instance. I'm sure I missed awesomeness. Ping me and I'll add it to the next update.
MozillaBuild Windows build environment updated
Kyle Huey released version 1.7 of our Windows build environment. It contains a newer version of Python and a modern version of Mercurial among other features.
I highly recommend every Windows developer update ASAP. Please note that you will likely encounter Python errors unless you clobber your build.
New submodule and peers
I used my power as module owner to create a submodule of the build config module whose scope is the (largely mechanical) transition of content from Makefile.in to moz.build files. I granted Joey Armstrong and Mike Shal peer status for this module. I would like to eventually see both elevated to build peers of the main build module.
The following progress has been made:
- Mike Shal has converted variables related to defining XPIDL files in bug 818246.
- Mike Shal converted MODULE in bug 844654.
- Mike Shal converted EXPORTS in bug 846634.
- Joey Armstrong converted xpcshell test manifests in bug 844655.
- Brian O'Keefe converted PROGRAM in bug 862986.
- Mike Shal is about to land conversion of CPPSRCS in bug 864774.
Non-recursive XPIDL generation
In bug 850380 I'm trying to land non-recursive building of XPIDL files. As part of this I'm trying to combine the generation of .xpt and .h for each input .idl file into a single process call because profiling revealed that parsing the IDL consumes most of the CPU time. This shaves a few dozen seconds off of build times.
I have encounterd multiple pymake bugs when developing this patch, which is the primary reason it hasn't landed yet.
I was looking at my build logs and noticed WebIDL generation was taking longer than I thought it should. I filed bug 861587 to investigate making it faster. While my initial profiling turned out to be wrong, Boris Zbarsky looked into things and discovered that the serialization and deserialization of the parser output was extremely slow. He is currently trying to land a refactor of how WebIDL bindings are handled. The early results look very promising.
I think the bug is a good example of the challenges we face improving the build system, as Boris can surely attest.
Test directory reorganization
Joel Maher is injecting sanity into the naming scheme of test directories in bug 852065.
Manifests for mochitests
Jeff Hammel, Joel Maher, Ted Mielczarek, and I are working out using manifests for mochitests (like xpcshell tests) in bug 852416.
Mach core is now a standalone package
Mach now categorizes commands in its help output.
Requiring Python 2.7.3
Now that the Windows build environment ships with Python 2.7.4, I've filed bug 870420 to require Python 2.7.3+ to build the tree. We already require Python 2.7.0+. I want to bump the point release because there are many small bug fixes in 2.7.3, especially around Python 3 compatibility.
This is currently blocked on RelEng rolling out 2.7.3 to all the builders.
Eliminating master xpcshell manifest
Now that xpcshell test manifests are defined in moz.build files, we theoretically don't need the master manifest. Joshua Cranmer is working on removing them in bug 869635.
Enabling GTests and dual linking libxul
Benoit Gerard and Mike Hommey are working in bug 844288 to dual link libxul so GTests can eventually be enabled and executed as part of our automation.
This will regress build times since we need to link libxul twice. But, giving C++ developers the ability to write unit tests with a real testing framework is worth it, in my opinion.
ICU was briefly enabled in bug 853301 but then backed out because it broke cross-compiling. It should be on track for enabling in Firefox 24.
Resource monitoring in mozbase
I gave mozbase a class to record system resource usage. I plan to eventually hook this up to the build system so the build system records how long it took to perform key events. This will give us better insight into slow and inefficient parts of the build and will help us track build system speed improvements over time.
Sorted lists in moz.build files
I'm working on requiring lists in moz.build be sorted. Work is happening in bug 863069.
This idea started as a suggestion on the dev-platform list. If anyone has more great ideas, don't hold them back!
Smartmake added to mach
Nicholas Alexander taught mach how to build intelligently by importing some of Josh Matthews' smartmake tool's functionality into the tree.
Source server fixed
Kyle Huey and Ted Mielczarek collaborated to fix the source server.
Auto clobber functionality
Auto clobber functionality was added to the tree. After flirting briefly with on-by-default, we changed it to opt-in. When you encounter it, it will tell you how to enable it.
Faster clobbers on automation
I was looking at build logs and identified we were inefficiently performing clobber.
Massimo Gervasini and Chris AtLee deployed changes to automation to make it more efficient. My measurements showed a Windows try build that took 15 fewer minutes to start - a huge improvement.
Upgrading to Mercurial 2.5.4
RelEng is tracking the global deployment of Mercurial 2.5.4. hg.mozilla.org is currently running 2.0.2 and automation is all over the map. The upgrade should make Mercurial operations faster and more robust across the board.
I'm considering adding code to mach or the build system that prompts the user when her Mercurial is out of date (since an out of date Mercurial can result in a sub-par user experience).
Nathan Froyd is leading an effort to parallelize reftest execution. If he pulls this off, it could shave hours off of the total automation load per checkin. Go Nathan!
Overhaul of MozillaBuild in the works
I am mentoring a pair of interns this summer. I'm still working out the final set of goals, but I'm keen to have one of them overhaul the MozillaBuild Windows development environment. Cross your fingers.
I hold a lot of context in my head when it comes to the future of Mozilla's build system and the interaction with it. I wanted to perform a brain dump of sorts so people have an idea of where I'm coming from when I inevitably propose radical changes.
The sad state of build system interaction and the history of mach
I believe that Mozilla's build system has had a poor developer experience for as long as there has been a Mozilla build system. Getting started with Firefox development was a rite of passage. It required following (often out-of-date) directions on MDN. It required finding pages through MDN search or asking other people for info over IRC. It was the kind of process that turned away potential contributors because it was just too damn hard.
mach - while born out of my initial efforts to radically change the build system proper - morphed into a generic command dispatching framework by the time it landed in mozilla-central. It has one overarching purpose: provide a single gateway point for performing common developer tasks (such as building the tree and running tests). The concept was nothing new - individual developers had long coded up scripts and tools to streamline workflows. Some even published these for others to use. What set mach apart was a unified interface for these commands (the mach script in the top directory of a checkout) and that these productivity gains were in the tree and thus easily discoverable and usable by everybody without significant effort (just run mach help).
While mach doesn't yet satisfy everyone's needs, it's slowly growing new features and making developers' lives easier with every one. All of this is happening despite that there is not a single person tasked with working on mach full time. Until a few months ago, mach was largely my work. Recently, Matt Brubeck has been contributing a flurry of enhancements - thanks Matt! Ehsan Akhgari and Nicholas Alexander have contributed a few commands as well! There are also a few people with a single command to their name. This is fulfilling my original vision of facilitating developers to scratch their own itches by contributing mach commands.
I've noticed more people referencing mach in IRC channels. And, more people get angry when a mach command breaks or changes behavior. So, I consider the mach experiment a success. Is it perfect, no. If it's not good enough for you, please file a bug and/or code up a patch. If nothing else, please tell me: I love to know about everyone's subtle requirements so I can keep them in mind when refactoring the build system and hacking on mach.
The object directory is a black box
One of the ideas I'm trying to advance is that the object directory should be considered a black box for the majority of developers. In my ideal world, developers don't need to look inside the object directory. Instead, they interact with it through condoned and supported tools (like mach).
I say this for a few reasons. First, as the build config module owner I would like the ability to massively refactor the internals of the object directory without disrupting workflows. If people are interacting directly with the object directory, I get significant push back if things change. This inevitably holds back much-needed improvements and triggers resentment towards me, build peers, and the build system. Not a good situation. Whereas if people are indirectly interacting with the object directory, we simply need to maintain a consistent interface (like mach) and nobody should care if things change.
Second, I believe that the methods used when directly interacting with the object directory are often sub-par compared with going through a more intelligent tool and that productivity suffers as a result. For example, when you type make in inside the object directory you need to know to pass -j8, use make vs pymake, and that you also need to build toolkit/library, etc. Also, by invoking make directly, you bypass other handy features, such as automatic compiler warning aggregation (which only happens if you invoke the build system through mach). If you go through a tool like mach, you should automatically get the most ideal experience possible.
In order for this vision to be realized, we need massive improvements to tools like mach to cover the missing workflows that still require direct object directory interaction. We also need people to start using mach. I think increased mach usage comes after mach has established itself as obviously superior to the alternatives (I already believe it offers this for tasks like running tests).
I don't want to force mach upon people but...
Nobody likes when they are forced to change a process that has been familiar for years. Developers especially. I get it. That's why I've always attempted to position mach as an alternative to existing workflows. If you don't like mach, you can always fall back to the previous workflow. Or, you can improve mach (patches more than welcome!). Having gone down the please-use-this-tool-it's-better road before at other organizations, I strongly believe that the best method to incur adoption of a new tool is to gradually sway people through obvious superiority and praise (as opposed to a mandate to switch). I've been trying this approach with mach.
Lately, more and more people have been saying things like we should have the build infrastructure build through mach instead of client.mk and why do we need testsuite-targets.mk when we have mach commands. While I personally feel that client.mk and testsuite-targets.mk are antiquated as a developer-facing interface compared to mach, I'm reluctant to eliminate them because I don't like forcing change on others. That being said, there are compelling reasons to eliminate or at least refactor how they work.
Let's take testsuite-targets.mk as an example. This is the make file that provides the targets to run tests (like make xpcshell-test and make mochitest-browser-chrome). What's interesting about this file is that it's only used in local builds: our automation infrastructure does not use testsuite-targets.mk! Instead, mozharness and the old buildbot configs manually build up the command used to invoke the test harnesses. Initially, the mach commands for running tests simply invoked make targets defined in testsuite-targets.mk. Lately, we've been converting the mach commands to invoke the Python test runners directly. I'd argue that the logic for invoke the test runner only needs to live in one place in the tree. Furthermore as a build module peer, I have little desire to support multiple implementations. Especially considering how fragile they can be.
I think we're trending towards an outcome where mach (or the code behind mach commands) transitions into the authoratitive invocation method and legacy interfaces like client.mk and testsuite-targets.mk are reimplemented to either call mach commands or the same routine that powers them. Hopefully this will be completely transparent to developers.
The future of mozconfigs and environment configuration
mozconfig files are shell scripts used to define variables consumed by the build system. They are the only officially supported mechanism for configuring how the build system works.
I'd argue mozconfig files are a mediocre solution at best. First, there's the issue of mozconfig statements that don't actually do anything. I've seen no-op mozconfig content cargo culted into the in-tree mozconfigs (used for the builder configurations)! Oops. Second, doing things in mozconfig files is just awkward. Defining the object directory requires mk_add_options MOZ_OBJDIR=some-path. What's mk_add_options? If some-path is relative, what is it relative to? While certainly addressable, the documentation on how mozconfig files work is not terrific and fails to explain many pitfalls. Even with proper documentation, there's still the issue of the file format allowing no-op variable assignments to persist.
I'm very tempted to reinvent build configuration as something not mozconfigs. What exactly, I don't know. mach has support for ini-like configuration files. We could certainly have mach and the build system pull configs from the same file.
I'm not sure what's going to happen here. But deprecating mozconfig files as they are today is part of many of the options.
Handling multiple mozconfig files
A lot of developers only have a single mozconfig file (per source tree at least). For these developers, life is easy. You simply install your mozconfig in one of the default locations and it's automagically used when you use mach or client.mk. Easy peasy.
I'm not sure what the relative numbers are, but many developers maintain multiple mozconfig files per source tree. e.g. they'll have one mozconfig to build desktop Firefox and another one for Android. They may have debug variations of each.
Some developers even have a single mozconfig file but leverage the fact that mozconfig files are shell scripts and have their mozconfig dynamically do things depending on the current working directory, value of an environment variable, etc.
I've also seen wrapper scripts that glorify setting environment variables, changing directory, etc and invoke a command.
I've been thinking a lot about providing a common and well-supported solution for switching between active build configurations. Installing mach on $PATH goes a long way to facilitate this. If you are in an object directory, the mozconfig used when that object directory was created is automatically applied. Simple enough. However, I want people to start treating object directories as black boxes. So, I'd rather not see people have their shell inside the object directory.
Whenever I think about solutions, I keep arriving at a virtualenv-like solution. Developers would potentially need to activate a Mozilla build environment (similar to how Windows developers need to launch MozillaBuild). Inside this environment, the shell prompt would contain the name of the current build configuration. Users could switch between configurations using mach switch or some other magic command on the $PATH.
Truth be told, I'm skeptical if people would find this useful. I'm not sure it's that much better than exporting the MOZCONFIG environment variable to define the active config. This one requires more thought.
The integration between the build environment and Python
We use Python extensively in the build system and for common developer tasks. mach is written in Python. moz.build processing is implemented in Python. Most of the test harnesses are written in Python.
Doing practically anything in the tree requires a Python interpreter that knows about all the Python code in the tree and how to load it.
Currently, we have two very similar Python environments. One is a virtualenv created while running configure at the beginning of a build. The other is essentially a cheap knock-off that mach creates when it is launched.
At some point I'd like to consolidate these Python environments. From any Python process we should have a way to automatically bootstrap/activate into a well-defined Python environment. This certainly sounds like establishing a unified Python virtualenv used by both the build system and mach.
Unfortunately, things aren't straightforward. The virtualenv today is constructed in the object directory. How do we determine the current object directory? By loading the mozconfig file. How do we do that? Well, if you are mach, we use Python. And, how does mach know where to find the code to load the mozconfig file? You can see the dilemma here.
A related issue is that of portable build environments. Currently, a lot of our automation recreates the build system's virtualenv from its own configuration (not that from the source tree). This has and will continue to bite us. We'd really like to package up the virtualenv (or at least its config) with tests so there is no potential for discrepancy.
The inner workings of how we integrate with Python should be invisible to most developers. But, I figured I'd capture it here because it's an annoying problem. And, it's also related to an activated build environment. What if we required all developers to activate their shell with a Mozilla build environment (like we do on Windows)? Not only would this solve Python issues, but it would also facilitate simpler config switching (outlined above). Hmmm...
Direct interaction with the build system considered harmful
Ever since there was a build system developers have been typing make (or make.py) to build the tree. One of the goals of the transition to moz.build files is to facilitate building the tree with Tup. make will do nothing when you're not using Makefiles! Another goal of the moz.build transition is to start derecursifying the make build system such that we build things in parallel. It's likely we'll produce monolithic make files and then process all targets for a related class IDLs, C++ compilation, etc in one invocation of make. So, uh, what happens during a partial tree build? If a .cpp file from /dom/src/storage is being handled by a monolithic make file invoked by the Makefile at the top of the tree, how does a partial tree build pick that up? Does it build just that target or every target in the monolithic/non-recursive make file?
Unless the build peers go out of our way to install redundant targets in leaf Makefiles, directly invoking make from a subdirectory of the tree won't do what it's done for years.
As I said above, I'm sympathetic to forced changes in procedure, so it's likely we'll provide backwards-compatibile behavior. But, I'd prefer to not do it. I'd first prefer partial-tree builds are not necessary and a full tree build finishes quickly. But, we're not going to get there for a bit. As an alternative, I'll take people building through mach build. That way, we have an easily extensible interface on which to build partial tree logic. We saw this recently when dumbmake/smartmake landed. And, going through mach also reinforces my ideal that the object directory is a black box.
Currently, most state as it pertains to a checkout or build is in the object directory. This is fine for artifacts from the build system. However, there is a whole class of state that arguably shouldn't be in the object directory. Specifically, it shouldn't be clobbered when you rebuild. This includes logs from previous builds, the warnings database, previously failing tests, etc. The list is only going to grow over time.
I'd like to establish a location for semi-persistant state related to the tree and builds. Perhaps we change the clobber logic to ignore a specific directory. Perhaps we start storing things in the user's home directory. Perhaps we could establish a second object directory named the state directory? How would this interact with build environments?
This will probably sit on the backburner until there is a compelling use case for it.
The battle against C++
Compiling C++ consumes the bulk of our build time. Anything we can do to speed up C++ compilation will work wonders for our build times.
I'm optimistic things like precompiled headers and compiling multiple .cpp files with a single process invocation will drastically decrease build times. However, no matter how much work we put in to make C++ compilation faster, we still have a giant issue: dependency hell.
As shown in my build system presentation a few months back, we have dozens of header files included by hundreds if not thousands of C++ files. If you change one file: you invalidate build dependencies and trigger a rebuild. This is why whenever files like mozilla-config.h change you are essentially confronted with a full rebuild. ccache may help if you are lucky. But, I fear that as long as headers proliferate the way they do, there is little the build system by itself can do.
My attitude towards this is to wait and see what we can get out of precompiled headers and the like. Maybe that makes it good enough. If not, I'll likely be making a lot of noise at Platform meetings requesting that C++ gurus brainstorm on a solution for reducing header proliferation.
Belive it or not, these are only some of the topics floating around in my head! But I've probably managed to bore everyone enough so I'll call it a day.
I'm always interested in opinions and ideas, especially if they are different from mine. I encourage you to leave a comment if you have something to say.
Next Page »