Build Firefox Faster with Build Splendid

August 15, 2012 at 02:30 PM | categories: Mozilla, Firefox, build system

Would you like to build Firefox faster? If so, do the following:

hg qimport http://people.mozilla.org/~gszorc/build-splendid.patch
hg qpush
rm .mozconfig* (you may want to create a backup first)
./mach build

This should just work on OS X, Linux, and other Unix-style systems. Windows support is currently broken, sorry.

mach can do much more than build. Run the following to see what:

./mach --help

Important Info

mach replaces client.mk. mach has its own configuration file. The first time you run mach, it will create the file mach.ini in the same directory as the mach script. This is your new mozconfig file.

The default mach.ini places the object directory into the directory objdir under the top source directory. It also builds an optimized binary without debug info.

Run the following to see which config settings you can add to mach.ini:

./mach settings-create
./mach settings-list

This may fail because I'm still working out the kinks with gettext. If it doesn't work, open python/mozbuild-bs/mozbuild/base.py and search for _register_settings. Open python/mozbuild-bs/mozbuild/locale/en-US/LC_MESSAGES/mozbuild.po for the help messages.

As a point of reference, my mach.ini looks like the following:

[build]
application = browser

configure_extra = --enable-dtrace --enable-tests

[compiler]
cc = /usr/local/llvm/bin/clang
cxx = /usr/local/llvm/bin/clang++

cflags = -fcolor-diagnostics
cxxflags = -fcolor-diagnostics

[paths]
source_directory = /Users/gps/src/mozilla-central-git
object_directory = /Users/gps/src/mozilla-central-git/objdir

I am on OS X and am using a locally-built version of LLVM/Clang, which I have installed to /usr/local/llvm.

You'll notice there are no options to configure make. The patch automatically selects optimal settings for your platform!

Known Issues and Caveats

This is alpha. It works in scenarios in which I have tested it, mainly building the browser application on OS X and Linux. There are many features missing and likely many bugs.

I have been using this as part of my day-to-day development for weeks. However, your mileage may vary.

As stated above, Windows support is lacking. It will appear to work, but things will blow up during building. Don't even try to use it on Windows.

There are likely many bugs. Please don't file Bugzilla bugs, as this isn't part of the build system just yet.

This patch takes over the build system. Do not attempt to use client.mk or run make directly with this patch applied.

If you encounter an issue, your methods of recourse are:

  1. Post a comment on this blog post
  2. Ping me on irc.mozilla.org. My nick is gps. Try the #buildfaster channel.
  3. Send an email to gps@mozilla.com

I am particularly interested in exceptions and build failures.

If you encounter an issue building with this, just reverse the patch and build like you have always done (don't forget to restore your mozconfig file).

If mach.ini does not support everything you were doing in your mozconfig, please send me a copy of your mozconfig so I can implement whatever you need.

Other Info

I will likely write a follow-up post detailing what's going on. If you are curious, the code lives in python/mozbuild-bs. The backend and frontend sub-packages are where the magic is at. Once the backend has been configured, check out hybridmake.mk and all of the splendid.mk files in the object directory.

I am particularly interested in the real-world impact of this patch on people's build times. In this early version of the patch, you likely won't see drastic speed increases. On my MacBook Pro with an SSD, I see end-to-end clobber build times decrease by over a minute. With a little more work, I should be able to shave another minute or two off of that.

I will try to keep the patch up-to-date as I improve the build system. Refresh early and often.


mozilla-central Build Times

July 29, 2012 at 01:20 PM | categories: Mozilla, build system

In my previous post, I explained how Mozilla's build system works. In this post, I want to give people a feel for where time is spent and to identify obvious areas that need improvement.

To my knowledge, nobody has provided such a comprehensive collection of measurements before. Thus, most of our thoughts on where build time goes have been largely lacking scientific rigor. I hope this post changes matters and injects some much-needed quantitative figures into the discussion.

Methodology

All the times reported in this post were obtained from my 2011 generation MacBook Pro. It has 8 GB of RAM and a magnetic hard drive. It is not an ultimate build rig by any stretch of the imagination. However, it's no slouch either. The machine has 1 physical Core i7 processor with 4 cores, each clocked at 2.3GHz. Each core has hyperthreading, so to the OS there appears to be 8 cores. For the remainder of this post, I'll simply state that my machine has 8 cores.

When I obtained measurements, I tried to limit the number of processes running elsewhere on the machine. I also performed multiple runs and reported the best timings. This means that the results are likely synthetic and highly influenced by the page cache. More on that later.

I configured make to use up to 8 parallel processes (adding -j8 to the make flags). I also used silent builds (the -s flag to make). Silent builds are important because terminal rendering can add many seconds of wall time to builds, especially on slow terminals (like Windows). I measured results with make output being written to the terminal. In hindsight, I wish I hadn't done this. Next time.

To obtain the times, I used the ubiquitous time utility. Wall times are the real times from time. CPU time is the sum of the user and sys times.

CPU utilization is the percentage of CPU cores busy during the wall time of execution. In other words, I divided the CPU time by 8 times the wall time (8 for the number of cores in my machine). 100% is impossible to obtain, as obviously the CPU on my machine is doing other things during measurement. But, I tried to limit the number of background processes running, so there shouldn't have been that much noise.

I built a debug version of Firefox (the browser app in mozilla-central) using r160922 of the Clang compiler (pulled and built the day I did this measurement). The revision of mozilla-central being built was 08428edb1e89. I also had --enable-tests, which adds a significant amount of extra work to builds.

Configure

time reported the following for running configure:

real 0m25.927s
user 0m9.595s
sys  0m8.729s

This is a one-time cost. Unless you are hacking on the build system or pull down a tree change that modified the build system, you typically don't need to worry about this.

Clobber Builds with Empty ccache

I built each tier separately with an empty ccache on a recently-configured object directory. This measures the optimal worst case time for building mozilla-central. In other words, we have nothing cached in the object directory, so the maximum amount of work needs to be performed. Since I measured multiple times and used the best results, this is what I mean by optimal.

The table below contains the measurements. I omitted CPU utilization calculation for small time values because I don't feel it is relevant.

Tier - Sub-tier Wall Time (s) CPU Time (s) CPU Utilization
base export 0.714 0.774 N/A
base libs 5.081 8.620 21%
base tools 0.453 0.444 N/A
base (total) 6.248 9.838 19.7%
nspr 9.309 8.404 11.3%
js export 1.030 1.877 22.8%
js libs 71.450 416.718 52%
js tools 0.324 0.246 N/A
js (total) 72.804 418.841 71.9%
platform export 40.487 141.704 43.7%
platform libs 1211 4896 50.5%
platform tools 70.416 90.917 16.1%
platform (total) 1312 5129 48.9%
app export 4.383 3.059 8.7%
app libs 18.727 18.976 12.7%
app tools (no-op) 0.519s 0.968 N/A
app (total) 23.629 23.003 12.2%
Total 1424 (23:44) 5589 (93:15) 49.1%

It's worth mentioning that linking libxul is part of the platform libs tier. libxul linking should be called out because it is unlike other parts of the build in that it is more I/O bound and can't use multiple cores. On my machine, libxul linking (not using gold) takes ~61s. During this time, only 1 CPU core is in use. The ~61s wall time corresponds to roughly 5% of platform libs' wall time. Yet, even if we subtract this ~61s from the effective CPU calculation, the percentage doesn't change much.

Clobber with Populated ccache

Using the ccache from a recently built tree to make C/C++ compilation faster, I measured how long it took each tier to build on a clobber build.

This measurement can be used to estimate the overhead of C/C++ compilation during builds. In theory, the difference between CPU times between this and the former measurement will be the amount of CPU time spent in the C/C++ compiler.

This will also isolate how much time we spend not in the C/C++ compiler. It will arguably be very difficult to make the C/C++ compiler faster (although things like reducing the abuse of templates can have a measureable impact). However, we do have control over many of the other things we do. If we find that CPU time spent outside the C/C++ compiler is large, we can look for pieces to optimize.

Tiers not containing compiled files are omitted from the data.

Tier - Sub-tier Wall Time (s) CPU Time (s) ccache savings (s) (Time in Compiler)
base libs 1.075 1.525 7.095
base tools 1.957 0.933 1.522
nspr 5.582 1.688 6.716
js libs 22.829 9.529 407.189
platform libs 431 328 4568
platform tools 14.498 25.744 65.173
app libs 10.193 15.809 3.167
Total 487.134 (6:07) 383.229 (6:23) 5059 (84:19)

No-op Build

A no-op build is a build performed in an object directory that was just built. Nothing changed in the source repository nor object directory, so theoretically the build should do nothing. And, it should be fast.

In reality, our build system isn't smart and performs some redundant work. One part of redundant work is because one of the first things the main Makefile does before invoking the tiers is delete a large chunk of the dist/ directory and the entirety of the _tests/ directory from the object directory.

In these measurements, I bypassed the deletion of these directories. In other words, I measure what no-op builds are if we eliminate the clown shoes around blowing away large parts of the object directory.

Tier - Sub-tier Wall Time (s) CPU Time (s)
base export 0.524 0.537
base libs 0.625 0.599
base tools 0.447 0.437
nspr 0.809 0.752
js export 0.334 0.324
js libs 0.375 0.361
platform export 10.904 13.136
platform libs 30.969 44.25
platform tools 8.213 10.737
app export 0.524 1.006
app libs 6.090 13.753
Total 59.814 85.892

So, no-op builds use ~60s of wall time and only make use of 17.9% of available CPU resources.

No-op Build With Directory Removal Silliness

As mentioned above, before the tiers are iterated, the top-level Makefile blows away large parts of dist/ and the entirety of _tests/. What impact does this have?

In this section, I try to isolate how much time was thrown away by doing this.

First, we have to account for the deletion of these directories. On my test build, deleting 15,005 files in these directories took ~3 seconds.

The table below contains my results. This is a more accurate reading than the above on how long no-op builds takes because this is actually what we do during normal builds. The time delta column contains the difference between this build and a build without the removal silliness. Positive times can be attributes to overhead associated with repopulating dist/ and _tests/.

Tier - Sub-tier Wall Time (s) Wall Time Delta (s) CPU Time (s) CPU Time Delta (s)
base export 0.544 Negligible 0.559 Negligible
base libs 0.616 Negligible 0.594 Negligible
base tools 0.447 Negligible 0.436 Negligible
nspr 0.803 Negligible 0.743 Negligible
js export 0.338 Negligible 0.329 Negligible
js libs 0.378 Negligible 0.363 Negligible
platform export 13.140 2.236 13.314 Negligible
platform libs 35.290 4.329 43.059 -1.191 (normal variance?)
platform tools 8.819 0.606 10.983 0.246
app export 0.525 Negligible 1.012 Negligible
app libs 8.876 2.786 13.527 -0.226
Total 69.776 9.962 84.919 -0.973 (normal variance)

If a delta is listed as negligible, it was within 100ms of the original value and I figured this was either due to expected variance between runs or below our threshold for caring. In the case of base, nspr, and js tiers, the delta was actually much smaller than 100ms, often less then 10ms.

It certainly appears that the penalty for deleting large parts of dist/ and the entirety of _tests/ is about 10 seconds.

The Overhead of Make

We've measured supposed no-op build times. As I stated above, our no-op builds actually aren't no-op builds. Even if we bypass the deletion of dist/ and _tests/ we always evaluate some make rules. Can we measure how much work it takes to just load the make files without actually doing anything? This would allow us to get a rough estimate of how much we are wasting by doing redundant work. It will also help us establish a baseline for make overhead.

Turns out we can! Make has a --dry-run argument which evaluates the make file but doesn't actually do anything. It simply prints what would have been done.

Using --dry-run, I timed the different tiers. The difference from a no-op build should roughly be the overhead associated with make itself. It is possible that --dry-run adds a little overhead because it prints the commands that would have been executed. (Previous timings were using -s, which suppresses this.)

The delta times in the following table are the difference in times between the true no-op build from above (the one where we don't delete dist/ and _tests/) and the times measured here. It roughly isolates the amount of time spent outside of make, doing redundant work.

Tier - Sub-tier Wall Time (s) Wall Time Delta (s) CPU Time (s) CPU Time Delta (s)
base export 0.369 0.155 0.365 0.172
base libs 0.441 0.184 0.431 0.168
base tools 0.368 0.079 0.364 0.073
nspr 0.636 0.173 0.591 0.161
js export 0.225 0.109 0.225 0.099
js libs 0.278 0.097 0.273 0.088
platform export 3.841 7.063 6.108 7.028
platform libs 8.938 22.031 14.723 29.527
platform tools 3.962 4.251 6.185 4.552
app export 0.422 0.102 0.865 0.141
app libs 0.536 5.554 1.148 12.605
Total 20.016 39.798 31.278 54.614

Observations

The numbers say a lot. I'll get to key takeaways in a bit.

First, what the numbers don't show is the variance between runs. Subsequent runs are almost always significantly faster than the initial, even on no-op builds. I suspect this is due mostly to I/O wait. In the initial tier run, files are loaded into the page cache. Then, in subsequent runs, all I/O comes from physical memory rather than waiting on a magnetic hard drive.

Because of the suspected I/O related variance, I fear that the numbers I obtained are highly synthetic, at least for my machine. It is unlikely I'll ever see all these numbers in one mozilla-central build. Instead, it requires a specific sequence of events to obtain the best times possible. And, this sequence of events is not likely to correspond with real-world usage.

That being said, I think these numbers are important. If you remove I/O from the equation - say you have an SSD with near 0 service times or have enough memory so you don't need a hard drive - these numbers will tell what limits you are brushing up against. And, as computers get more powerful, I think more and more people will cross this threshold and will be more limited by the build system than the capabilities of their hardware. (A few months ago, I measured resource usage when compiling mozilla-central on Linux and concluded you need roughly 9GB of dedicated memory to compile and link mozilla-central without page cache eviction. In other words, if building on a machine with only 8GB of RAM, your hard drive will play a role.)

Anyway, to the numbers.

I think the most important number in the above tables is 49.1%. That is the effective CPU utilization during a clobber build. This means that during a build, on average half of the available CPU cores are unused. Now, I could be generous and bump this number to 50.7%. That's what the effective CPU utilization is if you remove the ~60s of libxul linking from the calculation.

The 49.1% has me reaching the following conclusions:

  1. I/O wait really matters.
  2. Our recursive use of make is incapable of executing more than 4 items at a time on average (assuming 8 cores).
  3. My test machine had more CPU wait than I think.

I/O wait is easy to prove: compare times on an SSD or with a similar I/O bus with near zero service times (e.g. a filled page cache with no eviction - like a machine with 16+ GB of memory that has built mozilla-central recently).

A derived time not captured in any table is 11:39. This is the total CPU time of a clobber build (93:15) divided by the number of cores (8). If we had 100% CPU utilization across all cores during builds, we should be able to build mozilla-central in 11:39. This is an ideal figure and won't be reached. As mentioned above, libxul linking takes ~60s itself! I think 13:00 is a more realistic optimal compilation time for a modern 8 core machine. This points out a startling number: we are wasting ~12 minutes of wall time due to not fully utilizing CPU cores during clobber builds.

Another important number is 5059 out of 5589, or 90.5%. That is the CPU time in a clobber build spent in the C/C++ compiler, as measured by the speedup of using ccache. It's unlikely we are going to make the C/C++ compiler go much faster (short of not compiling things). So, this is a big fat block of time we will never be able to optimize. On my machine compiling mozilla-central will always take at least ~10:30 wall time, just in compiling C/C++.

A clobber build with a saturated ccache took 487s wall time but only 383s CPU time. That's only about 10% total CPU utilization. And, this represents only 6.8% of total CPU time from the original clobber build. Although, it is 34.2% of total wall time.

The above means that everything not the C/C++ compiler is horribly inefficient. These are clown shoes of epic proportions. We're not even using 1 full core doing build actions outside of the C/C++ compiler!

Because we are inefficient when it comes to core usage, I think a key takeaway is that throwing more cores at the existing build system will have diminishing returns. Sure, some parts of the build system today could benefit from it (mainly js, layout, and dom, as they have Makefile's with large numbers of source files). But, most of the build system won't take advantage of many cores. If you want to throw money at a build machine, I think your first choice should be an SSD. If you can't do that, have as much memory as you can so most of your filesystem I/O is serviced by the page cache, not your disk drive.

In the final table, we isolated how much time make is spending to just to figure out what to do. That amounts to ~20 seconds wall time and ~31s CPU time. That leaves ~40s wall and ~55s CPU for non-make work during no-op builds. Translation: we are doing 40s of wall time work during no-op builds. Nothing changed. We are throwing 40s of wall time away because the build system isn't using proper dependencies and is doing redundant work.

I've long been a critic of us blowing away parts of dist/ and _tests/ at the top of builds. Well, after measuring it, I have mixed reactions. It only amounts to about ~10s of added time to builds. This doesn't seem like a lot in the grand scheme of things. However, this is ~10s on top of the ~60s it actually takes to iterate through the tiers. So, in terms of percentages for no-op builds, it is actually quite significant.

No-op builds with the existing build system take ~70s under ideal conditions. In order of time, the breakdown is roughly:

  • ~40s for doing redundant work in Makefiles
  • ~20s for make traversal and loading overhead
  • ~10s for repopulating deleted content from dist/ and _tests/

In other words, ~50s of ~70s no-op build times are spent doing work we have already done. This is almost purely clown shoes. Assuming we can't make make traversal and loading faster, the shortest possible no-op build time will be ~20s.

Splitting things up a bit more:

  • ~22s - platform libs make evaluation
  • ~20s - make file traversal and loading (readying for evaluation)
  • ~10s - repopulating deleted content from dist/ and _tests/
  • ~7s - platform export make evaluation
  • ~5.5 - app libs make evaluation
  • ~4s - platform tools

The ~20s for make file traversal and loading is interesting. I suspect (although I haven't yet measured) that a lot of this is due to the sheer size of rules.mk. As I measured on Friday, the overhead of rules.mk with pymake is significant. I hypothesized that it would have a similar impact on GNU make. I think a good amount of this ~20s is similar overhead. I need to isolate, however. I am tempted to say that if we truly did no-op builds and make Makefile's load into make faster, we could attain no-op build times in the ~10s range. I think this is pretty damn good! Even ~20s isn't too bad. As surprising as it is for me to say it, recursive make is not (a significant) part of our no-op build problem.

Why is the Build System Slow?

People often ask the question above. As the data has told me, the answer, like many to complicated problems, is nuanced.

If you are doing a clobber build on a fresh machine, the build system is slow because 1) compiling all the C/C++ takes a lot of time (84:19 CPU time actually) 2) we don't make efficient use of all available cores when building. Half of the CPU horsepower during a fresh build is unharnessed.

If you are doing a no-op build, the build system is slow mainly because it is performing a lot of needless and redundant work. A significant contributor is the overhead of make, probably due to rules.mk being large.

If you are doing an incremental build, you will fall somewhere between either extreme. You will likely get nipped by both inefficient core usage as well as redundant work. Which one hurts the most depends on the scope of the incremental change.

If you are building on a machine with a magnetic hard drive (not an SSD), your builds are slow because you are waiting on I/O. You can combat this by putting 8+GB of memory in your system and doing your best to ensure that building mozilla-central can use as much of it as possible. I highly recommend 12GB, if not 16GB.

Follow-ups

The measurements reported in this post are only the tip of the iceberg. If I had infinite time, I would:

  • Measure other applications, not just browser/Firefox. I've heard that mobile/Fennec's build config is far from optimal, for example. I would love to quantify that.
  • Set up buildbot to record and post measurements so we have a dashboard of build times. We have some of this today, but the granularity isn't as fine as what I captured.
  • Record per-directory times.
  • Isolate time spent in different processes (DTrace could be used here).
  • Capture I/O numbers.
  • Correlate the impact of I/O service times on build times.
  • Isolate the overhead of ccache (mainly in terms of I/O).
  • Obtain numbers on other platforms and systems. Ensure results can be reproduced.

Next Steps

If we want to make our existing recursive make build backend faster, I recommend the following actions (in no particular order):

  1. Factor pieces out of rules.mk into separate .mk files and conditionally load based on presence of specific variables. In other words, finish what we have started. This definitely cuts down on the overhead with pymake (as measured on Friday) and likely makes GNU make faster as well.
  2. Don't blow away parts of dist/ and _tests/ at the top of builds. I know this introduces a problem where we could leave orphaned files in the object directory. We should solve this problem by having proper manifests for everything so we can detect and delete orphans. The cheap man's solution is to periodically clobber these directories.
  3. Don't perform unnecessary work during no-op builds. I suspect a lot of redundant work is due to rules in Makefile's not the rules in rules.mk. As we eliminate rules from Makefile's, this problem should gradually go away since rules.mk is generally intelligent about these things.
  4. More parallelism. I'm not sure how we're going to solve this with recursive make short of using PARALLEL_DIRS more and/or consolidating Makefile's together.

Again, these steps apply to our current recursive make build backend.

Because the most significant losses are due to ungained parallelism, our focus should be on increasing parallelism. We can only do this so much with recursive make. It is clear now more than ever that recursive make needs to be replaced with something that can fully realize the potential of multiple CPU cores. That could be non-recursive make or a separate build backend altogether.

We will likely not have an official alternate build backend soon. Until then, there are no shortage of clown shoes that can be looked at.

The redundant work during no-op builds is definitely tempting to address, as I think that has significant impact to most developers. Eliminating the absurdly long no-op build times removes the needs for hacks like smart-make and instills a culture of trust the build system.

I suspect a lot of the redundant work during no-op builds is due to poorly implemented rules in individual Makefiles rather than on silliness in rules.mk. Therefore, removing rules from Makefile's again seems to be one of the most important things we can do to make the build system faster. It also prepares us for implementing newer build backends, so it is a win-win!


Mozilla Build System Overview

July 29, 2012 at 01:15 PM | categories: Mozilla, build system

Mozilla's build system is a black box to many. This post attempts to shed some light onto how it works.

Configuration File

The first part of building is creating a configuration file. This defines what application to build (Firefox, Firefox OS, Fennec, etc) as well as build options, like to create a release or debug build. This step isn't technically required, but most people do it.

Configuration files currently exist as mozconfig files. By default, most people create a .mozconfig file in the root directory of mozilla-central.

Interaction

All interaction with the build system is currently gated through the client.mk file in the root directory of mozilla-central. Although, I'm trying to land an alternate (and eventual replacement) to client.mk called mach. You can read about it in previous posts on this blog.

When you run make -f client.mk, you are invoking the build system and telling it to do whatever it needs to do build the tree.

Running Configure

The first thing client.mk does to a fresh tree is invoke configure. configure is a shell script in the root directory of the repository. It is generated from the checked-in configure.in file using the GNU autoconf utility. I won't go into detail on how autoconf works because I don't have a beard.

configure accomplishes some important tasks.

First, it validates that the build environment is sane. It performs some sanity testing on the directory tree then looks at the system and build configuration to make sure everything should work.

It identifies the active compiler, locations of common tools and utilities, and ensures everything works as needed. It figures out how to convert desired traits into system-specific options. e.g. the exact argument to pass to the compiler to enable warnings.

Once configure determines the environment is sane, it writes out what it learned.

Currently, configure takes what it has learned and invokes the allmakefiles.sh script in the root directory. This script prints out the set of Makefile's that will be used to build the tree for the current configuration. configure takes the output of filenames and then procedes to generate those files.

Generation of Makefile's is rather simple. In the source tree are a bunch of .in files, typically Makefile.in. These contain special markers. configure takes the set of determined configuration variables and performs substitution of the variable markers in the .in files with them. The .in files with variables substitutes are written out in the object directory. There are also some GYP files in the source tree. configure invokes a tool to convert these into Mozilla-style Makefile's.

configure also invokes configure for other managed projects in mozilla-central, such as the SpiderMonkey source in js/src.

configure finishes by writing out other miscellaneous files in the object directory.

Running Make

The next step of the build is running make. client.mk simply points GNU make (or pymake) at the Makefile in the top-level directory of the object directory and essentially says evaluate.

Build System Tiers

The build system is broken up into different tiers. Each tier represents a major phase or product in the build system. Most builds have the following tiers:

  1. base - Builds global dependencies
  2. nspr - Builds NSPR
  3. js - Builds SpiderMonkey
  4. platform - Builds the Gecko platform
  5. app - Builds the configured application (e.g. Firefox, Fennec, Firefox OS)

Inside each tier are the distinct sub-tiers:

  1. export
  2. libs
  3. tools

A Makefile generally belongs to 1 main tier. Inside Makefile's or in other included .mk files (make files that are not typically called directly by make) are statements which define which directories belong to which tiers. See toolkit-tiers.mk for an example.

When the top-level Makefile is invoked, it iterates through every tier and every sub-tier within it. It starts at the first tier and evaluates the export target on every Makefile/directory defined in it. It then moves on to the libs target then finally the tools target. When it's done with the tools target, it moves on to the next tier and does the same iteration.

For example, we first start by evaluating the export target of the base tier. Then we evaluate base's libs and tools tiers. We then move on to nspr and do the same. And, we keep going. In other words, the build system makes 3 passes through each tier.

Tiers are composed of directory members. e.g. dom or layout. When make descends into a tier member directory, it looks for specially named variables that tell it what sub-directories are also part of this directory. The DIRS variable is the most common. But, we also use TEST_DIRS, PARALLEL_DIRS, TOOL_DIRS, and a few others. make will invoke make for all defined child directories and for the children of the children, and so on. This is what we mean by recursive make. make essentially recurses into directory trees, evaluating all the directories linearly.

Getting back to the tiers, the sub-tiers export, libs, and tools can be thought of as pre-build, build, and post-build events. Although, this analogy is far from perfect.

export generally prepares the object directory for more comprehensive building. It copies C/C++ header files into a unified object directory, generates header files from IDLs files, etc.

libs does most of the work. It compiles C++ code and performs lots of other main work, such as Jar manifest creation.

tools does a lot of miscellaneous work. If you have tests enabled, this is where tests are typically compiled and/or installed, for example.

Processing a Makefile

For each directory inside a tier, make evaluates the Makefile in that directory for the target/sub-tier specified.

The basic gist of Makefile execution is actually pretty simple.

Mozilla's Makefiles typically look like:

DEPTH := .
topsrcdir := @top_srcdir@
srcdir := @srcdir@
VPATH := @srcdir@

include $(DEPTH)/config/autoconf.mk

IDLSRCS := foo.idl bar.idl
CPPSRCS := hello.cpp world.cpp

include $(topsrcdir)/config/rules.mk

All the magic in Makefile processing happens in rules.mk. This make file simply looks for specially named variables (like IDLSRCS or CPPSRCS) and magically converts them into targets for make to evaluate.

In the above sample Makefile, the IDLSRCS variable will result in an implicit export target which copies IDLs into the object directory and compiles them to .h files. CPPSRCS will result in a libs target that results in each .cpp file being compiled into a .o file.

Of course, there is nothing stopping you from defining targets/rules in Makefile's themselves. This practice is actually quite widespread. Unfortunately, it is a bad practice, so you shouldn't do it. The preferred behavior is to define variables in a Makefile and have rules.mk magically provide the make targets/rules to do stuff with them. Bug 769378 tracks fixing this bad practice.

Conclusion

So, there you have it: a very brief overview of how Mozilla's build system works!

In my next post, I will shed some light onto how much times goes into different parts of the build system.


Makefile Execution Times

July 28, 2012 at 12:45 AM | categories: Mozilla, pymake, build system

In my course of hacking about with Mozilla's build system, I've been using pymake (a Python implementation of GNU make) to parse, examine, and manipulate make files. In doing so, I've learned some interesting things, dispelling myths in the process.

People often say that parsing make files is slow and that the sheer number of Makefile.in's in mozilla-central (Firefox's source tree) is leading to lots of overhead in make execution. This statement is only partially correct.

Parsing make files is actually pretty fast. Using pymake's parser API, I'm able to parse every Makefile.in in mozilla-central in under 5 seconds on my 2011 generation MacBook Pro using a single core. Not too shabby, especially considering that there are about 82,500 lines in all the Makefile.in's.

Evaluation of make files, however, is a completely different story. You see, parsing a string containing make file directives is only part of what needs to be done. Once you've parsed a make file into a statement list (essentially an AST), you need to load that into a data structure fit for evaluation. Because of the way make files are evaluated, you need to iterate through every parsed statement and evaluate it for side-effects. This occurs before you actually evaluate specific targets in the make file itself. As I found out, this process can be time-consuming.

For mozilla-central, the cost of loading the statement list into a data structure ready for target evaluation takes about 1 minute in aggregate. And, considering we effectively iterate through every Makefile in mozilla-central 3 times when building (once for every tier state of export, libs, and tools), you can multiply this figure by 3.

Put another way, parsing Makefile's is fast: loading them for target evaluation is slow.

Digging deeper, I uncovered the main source of the additional overhead: rules.mk.

Nearly every Makefile in mozilla-central has a pattern that looks like:

DEPTH = ../..
topsrcdir = @top_srcdir@
srcdir = @srcdir@
VPATH = @srcdir@

include $(DEPTH)/config/autoconf.mk

<LOCAL MAKE FILE DECLARATIONS>

include $(topsrcdir)/config/rules.mk

We have a header boilerplate, followed by a bunch of Makefile-specific variables definitions and rules. Finally, we include the rules.mk file. This is the make file that takes specially-named variables and converts them to rules (actions) for make to perform.

A typical Makefile.in is a few dozen lines or so. This often reduces to maybe a dozen parsed statements. By contrast, rules.mk is massive. It is currently 1770 lines and may include other make files, bringing the total to ~3000 lines.

Pymake has an LRU cache that caches the results of parsing make files. This means it only has to parse a single make file into a statement list once (assuming no cache eviction). rules.mk is frequently used, so it should have no eviction. Even if it were evicted, I've measured that parsing is pretty fast.

Unfortunately, the cache doesn't help with evaluation. For every Makefile in mozilla-central, pymake will need to evaluate rules.mk within the context of that specific Makefile. It's impossible to cache the results of a previous evaluation because the side-effects of rules.mk are determined by what is defined in the Makefile that includes it.

I performed an experiment where I stripped the include rules.mk statement from all parsed Makefile.in's. This essentially isolates the overhead of loading rules.mk. It turns out that all but ~2 seconds of evaluation time is spent in rules.mk. In other words, without rules.mk, the Makefile.in's are loaded and ready for evaluation in just a few seconds (over parsing time), not ~1 minute!

What does this all mean?

Is parsing make files slow? Technically no. Parsing itself is not slow. It is actually quite fast! Pymake even surprised me at how fast it can parse all the Makefile.in's in mozilla-central.

Loading parsed make file statements to be ready for evaluation is actually the bit that is slow - at least in the case of mozilla-central. Specifically, the loading of rules.mk is what constitutes the overwhelming majority of the time spent loading Makefile's.

That being said, parsing and loading go hand in hand. You almost never parse a make file without loading and evaluating it. So, if you consider parsing to include parsing and readying the make file for execution, there is some truth to the statement that parsing make files is slow. Someone splitting hairs may say differently.

Is there anything we can do? Good question.

I believe that build times of mozilla-central can be reduced by reducing the size of rules.mk. Obviously, the content of rules.mk is important, so we can't just delete content. But, we can be more intelligent about how it is loaded. For example, we can move pieces of rules.mk into separate .mk files and conditionally include these files based on the presence of specific variables. We already do this today, but only partially: there are still a number of bits of rules.mk that could be factored out into separate files. By conditionally loading make file content from rules.mk, we would be reducing the number of statements that need to be loaded before evaluating each Makefile. And, this should, in turn, make build times faster. Keep in mind that any savings will be multiplied by roughly 3 since we do 3 passes over Makefile's during a build.

To my knowledge, there aren't any bugs yet on file to do this. Given the measurements I've obtained, I encourage somebody to do this work. Even if it doesn't reduce build times, I think it will be a win since it will make the make rules easier to understand since they will be contained in function-specific files rather than one monolithic file. At worse, we have better readability. At best, we have better readability and faster build times. Win!

Finally, I don't know what the impact on GNU make is. Presumably, GNU make evaluates make files faster than pymake (C is generally faster than python). Therefore, reducing the size of rules.mk should make GNU make faster. By how much, I have no clue.


Mozilla Build System Plan of Attack

July 25, 2012 at 11:30 PM | categories: Mozilla, build system

Since I published my brain dump post on improving Mozilla's build system for Gecko applications (including Firefox and Firefox OS), there has been some exciting progress.

It wasn't stated in that original post, but the context for that post was to propose a plan in preparation of a meeting between the core contributors to the build system at Mozilla. I'm pleased to report that the plan was generally well-received.

We pretty much all agreed that parts 1, 2, and 3 are all important and we should actively work towards them. Parts 4, 5, and 6 were a bit more contentious. There are some good parts and some bad parts. We're not ready to adopt them just quite yet. (Don't bother reading the original post to look up what these parts corresponded to - I'll cover that later.)

Since that post and meeting, there has also been additional discussion around more specifics. I am going to share with you now where we stand.

BuildSplendid

BuildSplendid is an umbrella term associated with projects/goals to make the developer experience (including building) better - more splendid if you will.

BuildSplendid started as the name of my personal Git branch for hacking on the build system. I'm encouraging others to adopt the term because, well, it is easier to refer to (people like project codenames). If it doesn't stick, that's fine by me - there are other terms that will.

BuildFaster

An important project inside BuildSplendid is BuildFaster.

BuildFaster focuses on the following goals:

  1. Making the existing build system faster, better, stronger (but not harder).
  2. Making changes to the build system to facilitate the future use of alternate build backends (like Tup or Ninja). Work to enable Visual Studio, Xcode, etc project generation also falls here.

The distinction between these goals can be murky. But, I'll try.

Falling squarely in #1 are:

  • Switching the buildbot infrastructure to use pymake on Windows

Falling in #2 are:

  • Making Makefile.in's data-centric
  • Supporting multiple build backends

Conflated between the two are:

  • Ensuring Makefile's no-op if nothing has changed
  • Optimizing existing make rules. This involves merging related functionality as well as eliminating clown shoes in existing rules.

The two goals of BuildFaster roughly map to the short-term and long-term strategies, respectively. There is consensus that recursive make (our existing build backend) does not scale and we will plateau in terms of performance no matter how optimal we make it. That doesn't mean we are giving up on it: there are things we can and should do so our existing non-recursive make backend builds faster.

In parallel, we will also work towards the longer-term solution of supporting alternate build backends. This includes non-recursive make as well as things like Tup, Ninja, and even Visual Studio and Xcode. (I consider non-recursive make to be a separate build backend because changing our existing Makefile.in's to support non-recursive execution effectively means rewriting the input files (Makefile.in's). At that point, you've invented a new build backend.)

For people who casually interact with the build system, these two goals will blend together. It is not important for most to know what bucket something falls under.

BuildFaster Action Items

BuildFaster action items are being tracked in Bugzilla using the [BuildFaster:*] whiteboard annotation.

There is no explicit tracking bug for the short-term goal (#1). Instead, we are relying on the whiteboard annotation.

We are tracking the longer-term goal of supporting alternate build backends at bug 774049.

The most important task to help us reach the goals is to make our Makefile.in's data centric. This means:

  • Makefile.in's must consist of only simple variable assignment
  • Makefile.in's must not rely on being evaluated to perform variable assignment.

Basically, our build config should be defined by static key-value pairs.

This translates to:

  • Move all rules out of Makefile.in's bug 769378
  • Remove use of $(shell) from Makefile.in's bug 769390
  • Remove filesystem functions from Makefile.in's bug 769407
  • And more, as we identify the need

While we have these tracking bugs on file, we still don't have bugs filed that track individual Makefile.in's that need updated. If you are a contributor, you can help by doing your part to file bugs for your Makefile.in's. If your Makefile.in violates the above rules, please file a bug in the Build Config component of the product it is under (typically Core). See the tree of the above bugs for examples.

Also part of BuildFaster (but not relevant to most) is the task of changing the build system to support multiple backends. Currently, pieces like configure assume the Makefile.in to Makefile conversion is always what is wanted. These parts will be worked on by core contributors to the build system and thus aren't of concern to most.

I will be the first to admit that a lot of the work to purify Makefile.in's to be data centric will look like a lot of busy work with little immediate gain. The real benefits to this work will manifest down the road. That being said, removing rules from Makefile.in's and implementing things as rules in rules.mk helps ensure that the implementation is proper (rules.mk is coded by make ninjas and thus probably does things right). This can lead to faster build times.

mozbuild

mozbuild is a Python package that provides an API to the build system.

What mozbuild will contain is still up in the air because it hasn't landed in mozilla-central yet. In the code I'm waiting on review to uplift to mozilla-central, mozbuild contains:

  • An API for invoking the build system backend (e.g. launching make). It basically reimplements client.mk because client.mk sucks and needs to be replaced.
  • An API for launching tests easily. This reimplements functionality in testsuite-targets.mk, but in a much cleaner way. Running a single test can now be done with a single Python function call. This may sound boring, but it is very useful. You just import a module and pass a filesystem path to a test file to a function and a test runs. Boom!
  • Module for extracting compiler warnings from build output and storing in a persisted database for post-build retrieval. Compiler warning tracking \o/
  • Module for converting build system output into structured logs. It records things like time spent in different directories, etc. We could use this for tracking build performance regressions. We just need a arewe*yet.com domain...
  • A replacement for .mozconfig's that sucks less (stronger validation, settings for not just build config, convenient Python API, etc).

And, upcoming features which I haven't yet tried to land in mozilla-central include:

  • API for extracting metadata from Makefile.in's and other frontend files. Want a Python class instance describing the IDLs defined in an individual Makefile.in or across the entire tree? mozbuild can provide that. This functionality will be used to configure alternate build backends.
  • Build backend API which allows for different build backends to be configured (e.g. recursive make, Tup, Ninja, etc). When we support multiple build backends, they'll live in mozbuild.

mozbuild can really be thought of as a clean backend to the build system and related functionality (like running tests). Everything in mozbuild could exist in make files or in .py files littered in build/, config/, etc. But, that would involve maintaining make files and/or not having a cohesive API. I wanted a clean slate that was free from the burdens of the existing world. mozbuild was born.

I concede that there will be non-clear lines of division between mozbuild and other Python packages and/or Mozilla modules. For example, is mozbuild the appropriate location to define an API for taking the existing build configuration and launching a Mochitest? I'm not sure. For now, I'm stuffing functionality inside mozbuild unless there is a clear reason for it to exist elsewhere. If we want to separate (because of module ownership issues, for example), we can do that.

My vision for mozbuild is for it to be the answer to the question how does the Mozilla build system work? You should be able to say, look at the code in python/mozbuild and you will have all the answers.

mozbuild Action Items

The single action item for mozbuild is getting it landed. I have code written that I think is good enough for an initial landing (with obvious shortcomings being addressed in follow-up bugs). It just needs some love from reviewers.

Landing mozbuild is tracked in bug 751795. I initially dropped a monolithic patch. I have since started splitting bits up into bite-sized patches to facilitate faster, smaller reviews. (See the blocking bugs.)

mach

mozbuild is just a Python package - an API. It has no frontend.

Enter mach.

mach is a command-line frontend to mozbuild and beyond.

Currently, mach provides some convenient shortcuts for performing common tasks. e.g. you can run a test by tab-completing to its filename using your shell. It also provides nifty output in supported terminals, including colorizing and a basic progress indicator during building.

You can think of mach as a replacement for client.mk and other make targets. But, mach's purpose doesn't end there. My vision for mach is for it to be the one-stop shop for all your mozilla-central interaction needs.

From a mozilla-central tree, you should be able to type ./mach and do whatever you need to do. This could be building Firefox, running tests, uploading a patch to Bugzilla, etc.

I'm capturing ideas for mach features in bug 774108.

mach Action Items

mach is in the same boat as mozbuild: it's waiting for reviewer love. If you are interested in reviewing it, please let me know.

Once mach lands, I will be looking to the larger community to improve it. I want people to go wild implementing features. I believe that mach will have a significant positive impact on driving contributions to Mozilla because it will make the process much more streamlined and less prone to error. I think it is a no-brainer to check in mach as soon as possible so these wins can be realized.

There is an open question of who will own mach in terms of module ownership. mach isn't really part of anything. Sure, it interacts with the build system, testing code, tools, etc. But, it isn't actually part of any of that. Maybe a new module will be created for it. I'm not familiar with how the modules system works and I would love to chat with someone about options here.

Project Interaction

How are all of these projects related?

BuildFaster is what everyone is working on today. It currently focuses on making the existing recursive make based build backend faster using whatever means necessary. BuildFaster could theoretically evolve to cover other build backends (like non-recursive make). Time will tell what falls under the BuildFaster banner.

mozbuild is immediately focused on providing the functionality to enable mach to land.

mach is about improving the overall developer experience when it comes to contributing to Firefox and other in-tree applications (like Firefox OS). It's related to the build system in that it provides a nice frontend to it. That's the only relationship. mach isn't part of the build system (at least not yet - it may eventually be used to perform actions on buildbot machines).

Down the road, mozbuild will gain lots of new features core to the build system. It will learn how to extract metadata from Makefile.in's which can be used by other build backends. It will define a build backend interface and the various build backends will be implemented in mozbuild. Aspects of the existing build system currently implemented in make files or in various Python files scattered across the tree will be added to mozbuild and exposed with a clean, reusable, and testable API.

Today, BuildFaster is the important project with all the attention. When mach lands, it will (hopefully) gather a lot of attention. But, it will be from a different group (the larger contributor community - not just build system people).

mozbuild today is only needed to support mach. But, once mozbuild lands and mach is on its own, mozbuild's purpose will shift to support BuildFaster and other activities under the BuildSplendid banner.

Conclusion

We have a lot of work ahead of us. Please look at the bugs linked above and help out in any way you can.


« Previous Page -- Next Page »