Makefile Execution Times
July 28, 2012 at 12:45 AM | categories: Mozilla, pymake, build systemIn my course of hacking about with Mozilla's build system, I've been using pymake (a Python implementation of GNU make) to parse, examine, and manipulate make files. In doing so, I've learned some interesting things, dispelling myths in the process.
People often say that parsing make files is slow and that the sheer number of Makefile.in's in mozilla-central (Firefox's source tree) is leading to lots of overhead in make execution. This statement is only partially correct.
Parsing make files is actually pretty fast. Using pymake's parser API, I'm able to parse every Makefile.in in mozilla-central in under 5 seconds on my 2011 generation MacBook Pro using a single core. Not too shabby, especially considering that there are about 82,500 lines in all the Makefile.in's.
Evaluation of make files, however, is a completely different story. You see, parsing a string containing make file directives is only part of what needs to be done. Once you've parsed a make file into a statement list (essentially an AST), you need to load that into a data structure fit for evaluation. Because of the way make files are evaluated, you need to iterate through every parsed statement and evaluate it for side-effects. This occurs before you actually evaluate specific targets in the make file itself. As I found out, this process can be time-consuming.
For mozilla-central, the cost of loading the statement list into a data structure ready for target evaluation takes about 1 minute in aggregate. And, considering we effectively iterate through every Makefile in mozilla-central 3 times when building (once for every tier state of export, libs, and tools), you can multiply this figure by 3.
Put another way, parsing Makefile's is fast: loading them for target evaluation is slow.
Digging deeper, I uncovered the main source of the additional overhead: rules.mk.
Nearly every Makefile in mozilla-central has a pattern that looks like:
DEPTH = ../..
topsrcdir = @top_srcdir@
srcdir = @srcdir@
VPATH = @srcdir@
include $(DEPTH)/config/autoconf.mk
<LOCAL MAKE FILE DECLARATIONS>
include $(topsrcdir)/config/rules.mk
We have a header boilerplate, followed by a bunch of Makefile-specific variables definitions and rules. Finally, we include the rules.mk file. This is the make file that takes specially-named variables and converts them to rules (actions) for make to perform.
A typical Makefile.in is a few dozen lines or so. This often reduces to maybe a dozen parsed statements. By contrast, rules.mk is massive. It is currently 1770 lines and may include other make files, bringing the total to ~3000 lines.
Pymake has an LRU cache that caches the results of parsing make files. This means it only has to parse a single make file into a statement list once (assuming no cache eviction). rules.mk is frequently used, so it should have no eviction. Even if it were evicted, I've measured that parsing is pretty fast.
Unfortunately, the cache doesn't help with evaluation. For every Makefile in mozilla-central, pymake will need to evaluate rules.mk within the context of that specific Makefile. It's impossible to cache the results of a previous evaluation because the side-effects of rules.mk are determined by what is defined in the Makefile that includes it.
I performed an experiment where I stripped the include rules.mk statement from all parsed Makefile.in's. This essentially isolates the overhead of loading rules.mk. It turns out that all but ~2 seconds of evaluation time is spent in rules.mk. In other words, without rules.mk, the Makefile.in's are loaded and ready for evaluation in just a few seconds (over parsing time), not ~1 minute!
What does this all mean?
Is parsing make files slow? Technically no. Parsing itself is not slow. It is actually quite fast! Pymake even surprised me at how fast it can parse all the Makefile.in's in mozilla-central.
Loading parsed make file statements to be ready for evaluation is actually the bit that is slow - at least in the case of mozilla-central. Specifically, the loading of rules.mk is what constitutes the overwhelming majority of the time spent loading Makefile's.
That being said, parsing and loading go hand in hand. You almost never parse a make file without loading and evaluating it. So, if you consider parsing to include parsing and readying the make file for execution, there is some truth to the statement that parsing make files is slow. Someone splitting hairs may say differently.
Is there anything we can do? Good question.
I believe that build times of mozilla-central can be reduced by reducing the size of rules.mk. Obviously, the content of rules.mk is important, so we can't just delete content. But, we can be more intelligent about how it is loaded. For example, we can move pieces of rules.mk into separate .mk files and conditionally include these files based on the presence of specific variables. We already do this today, but only partially: there are still a number of bits of rules.mk that could be factored out into separate files. By conditionally loading make file content from rules.mk, we would be reducing the number of statements that need to be loaded before evaluating each Makefile. And, this should, in turn, make build times faster. Keep in mind that any savings will be multiplied by roughly 3 since we do 3 passes over Makefile's during a build.
To my knowledge, there aren't any bugs yet on file to do this. Given the measurements I've obtained, I encourage somebody to do this work. Even if it doesn't reduce build times, I think it will be a win since it will make the make rules easier to understand since they will be contained in function-specific files rather than one monolithic file. At worse, we have better readability. At best, we have better readability and faster build times. Win!
Finally, I don't know what the impact on GNU make is. Presumably, GNU make evaluates make files faster than pymake (C is generally faster than python). Therefore, reducing the size of rules.mk should make GNU make faster. By how much, I have no clue.
Improving Mozilla's Build System
June 25, 2012 at 10:15 AM | categories: make, Mozilla, pymake, build systemMakefiles. Love them or hate them, they are ubiquitous. For projects like Firefox (which has on the order of 1500 Makefiles), Makefiles aren't going away any time soon. This despite that newer - arguably better - build system (such as CMake and GYP) exist.
Many have decried the numerous ways in which GNU make and make files suck. I won't pile on here. Instead, please read What's Wrong with GNU make for a great overview.
Despite the many flaws with make, I would argue that the single most significant one is the slow speed often associated with it. People lauding make replacements (such as Ninja) almost always emphasize this one aspect above all others. (OK, Ninja isn't exactly a make replacement, but the execution aspects are comparable.)
Why is make slow? It is almost always due to the use of recursive make. This is make executing itself recursively, usually as it traverses a directory tree. This is how Mozilla's build system works. You start in one directory. Then, for each directory defined in the DIRS variable, make invokes make in that directory. In Mozilla's case, we amplify this by performing multiple passes over the directory tree - an export pass that installs header files, etc into a common directory, a libs pass that does most of the compilation, and a tools pass which performs other miscellaneous actions. Each of these is a recursive make iteration. As we say at Mozilla: clown shoes.
The deficiencies of recursive make and a workaround are called out in the classic paper Recursive Make Considered Harmful. Despite its age, it is still relevant - GNU make hasn't changed that much, after all. The thesis of the paper can be boiled down to the fact that make assembles a directed acyclic graph (DAG) and then executes it. Using recursive make, you are assembling N DAGs instead of of 1. Each DAG must be executed separately. This adds execution overhead. On top of that, since the DAGs are separate, you are either a) missing information from each DAG (it doesn't know about other DAGs) thus creating an incomplete and possibly error-prone dependency graph or b) duplicating information in multiple DAGs, creating more work for make in aggregate.
The solution for this is non-recursive make. That's just a fancy way of saying create 1 monolithic DAG and execute it once (as opposed to N separate DAGs with at least N invocations of make). Modern build systems like Ninja do this. While these build systems have other properties that contribute to faster execution times, in my opinion the single feature that has the greatest impact is consolidating the build dependencies into a single graph. This minimizes the aggregate amount of work the driver needs to perform and all but eliminates redundant operations.
Non-recursive make is typically achieved one of 2 ways:
- Create a single monolithic make file
- Concatenate all of your make files together (using the built-in include directive)
Either way, you produce a single DAG and all is well. More on this later.
Transitioning to a Single DAG
A big problem with non-recursive make is transitioning to it. For projects like Firefox with its 1500 Makefiles, each existing in its own little universe, this is a Herculean effort. There is a reason why many (including Mozilla) haven't done it: it's just too hard.
The problem of slow build times with Firefox is important to me because I suffer from it almost every day. So, I got to thinking of creative ways to solve this problem. How can we work towards a monolithic DAG? How can we do this incrementally so as to not cause too much disruption and so the effort isn't near impossible?
I've come up with a plan that I believe Mozilla should pursue. The plan consists of many different parts, each of which is described in a separate section below. Each part is presented in roughly the order it needs to be addressed in.
I want to emphasize that this is my personal opinion and what I'm about to propose is merely that: a proposal. For all I know, people will think I'm smoking crack and none of these ideas will be pursued.
Let's begin.
Part 1: No Rules
Make files consist of rules. These are, well, rules that tell make how to evaluate a target. A rule will say something like to turn hello.cpp into hello.o, call the C++ compiler with these arguments.
People naturally take advantage of the fact that the body of the rule (the recipe in make parlance) is often similar for many targets, so you make use of wildcard rules or by specifying multiple prerequisites and/or targets for a rule.
Wildcard rules are great. Their problem is that the recipe is often useful across many make files. The solution here is to put the rule definition (and thus the recipe) in a common file and use make's include directive to bring those rules into the current make file.
This is the general approach Mozilla takes. All of the generic rules are defined in the (oh-it-hurts-my-eyes-to-look-at-so-much-make-syntax) rules.mk file. Individual Makefiles in the tree simply define specifically-named variables such as CPPSRCS or XPIDLSRCS and then include rules.mk. When make executes, rules.mk transforms these variables into rules which are automagically inserted into the DAG.
From a user perspective, this is splendid. You simply define some variables and magic ensues. Instead of hairy looking make files, you literally have a file with simple variable assignments. If all goes to plan, you don't need to look at these rules definitions. Instead, you leave that up to professional dragon trainers, such as Ted Mielczarek, Joey Armstrong, and Kyle Huey.
A large problem is that many make files define custom rules - things not using the magic from rules.mk. What's worse is that these custom rules are often cargo culted around. They work, yes, but there is redundancy. The installation of mochitest files is a perfect example of this. Bug 370750) tracks establishing a rule for this absurd repetition of make logic in Mozilla's tree. Even in the case of mochitest rules where the recipe is simple (it's just calling $(INSTALL)), the lack of a unified rule hurts us. If we wanted to change the directory the mochitest files were installed to, this would require editing scores of files. Not cool.
Anyway, the first step towards improving the Mozilla build system (and any make file based build system really) is what I'm calling the no rules initiative.
We should strive for no explicit rules in non-shared make files (things outside of config/ and build/ in the Mozilla source tree - basically all the files named Makefile.in or Makefile). Instead, individual make files should define variables that cause rules from a shared make file (like rules.mk) to be applied.
In Mozilla's case, this will be a lot of manual effort and it will seem like there is no immediate gain. The benefits will be explained in subsequent sections.
Part 2: Eliminate Make File Content Not Related to Building
Mozilla's make files are riddled with content that isn't directly related to the action of building Mozilla applications. Not only does this contribute to the overhead of invoking make (these bits need to be evaluated when parsing, added to the DAG, etc, adding cost), but they also make the make files harder to read and thus edit.
testsuite-targets.mk is a good example of this. While it isn't included in every make file (just the top-level one), the code isn't related to building at all! Instead, it is essentially proxy code that maps make targets to commands. It is supposed to be convenient: type |make xpcshell-tests| and some tests run. OK, fine. I'll give you that. The main problem is this make file is just a glorified shell script. Make is great at constructing a dependency graph. As a DSL for shell scripts, I'd rather just write a script. Therefore, I would argue code like this belongs elsewhere - not in make files.
Anyway, a problem for Mozilla is that our make files are riddled with pieces of content that aren't directly related to building. And, build system maintainers pay the price. Every time you see some part of a make file that doesn't appear to be directly related to the act of building, your mind is like, "wut ist das?" (That's High German for WTF?) You start asking questions. What's it for. Who uses it? Is it really needed? What happens if it breaks? How will I know? These questions all need answered and that adds overhead to editing these files. Oftentimes these little one-off gems are undocumented and the questions are nearly impossible to answer easily. The wise course of action is usually preserve existing behavior. Nobody wants to melt someone else's precious snowflake, after all. Thus, cargo cult programming prevails.
The solution I propose is along the same vein as the no rules initiative - you can actually think of it as a specifically-tailored subset: limit the make files to what's necessary for building. If it doesn't relate to the act of building the application - if it is an island in the DAG - remove it.
But where do you move it to? Good question. We don't have an obvious location today. But, in bug 751795 I am creating what is essentially a frontend for the build system. That is where it should go. (I will publish more information about that project on this blog once it has landed.)
Part 3: Make File Style Conventions and Enforced Review Requirements
My first two points revolved around what is effectively a make file style convention. If we are serious about making that convention stick, we need to enforce it better.
This can be done in two ways.
First, we establish a review policy that any changes to make files must be signed off by a build system peer. Maybe we carve out an exception for simple cases, such as adding a file to a XPIDLSRCS variable. I don't know. We certainly don't allow people to add new make files or even targets without explicit sign-off. I think this is how it is supposed to be today. It isn't enforced very well if it is.
To aid in enforcing the style convention, we arm people with a tool that checks the style so they can fix things before it lands on a build peer's plate. This is actually pretty trivial to implement (I even have code that does this). The hard part is coming to consensus on the rules. If there is a will, I can make this happen easily.
Once you have the tool to check for convention/style violations, your checkin policy is amended to include: no make file changes can be checked in unless the file passes the style checker. This is sure to draw ire from some, I'm sure. But, the alternative is having a cat and mouse game between the build system maintainers making the build system suck less and other people undoing their work with new features, etc.
Anyway, these rules wouldn't be absolute - there could always be exceptions, of course. If we're serious about making the build system better, we need it to be more consistent and have less variance. I'm open to other suggestions to accomplish this.
Part 4: Extracting Data from Make Files
Once we have make files which consist of simple variable assignments (not rules), it is possible to take the values of these variables, and extract them from the make files to some higher-level system. For example, we could scour all the make files for the CPPSRCS variable and assemble a unified list of all C++ source files defined by that variable across the entire build system. Then, we could do some really interesting things.
What kind of things you ask? Well, one idea is you could write out a monolithic make file that explicitly listed every target from the extracted data. No recursive make necessary here! Another idea would be to produce Visual Studio project files.
Think I'm crazy and this is impossible? Feast your eyes. That make file was generated by parsing the existing Firefox make files, extracting statically-defined data, then transforming that data structure back into a single make file. I also have code somewhere that produces Visual Studio projects.
Now, all of this is just a proof-of-concept. It works just well enough for me to prove it is feasible. (The linked example is 7 months old and won't work on the current tree.) Although, at one time, I did have have a functioning transformation of essentially the make export stage. All include files and IDL generation was in a single make file. It ran in a fraction of the time as recursive make. A no-op execution of the monolithic make file took about 0.5 seconds and - gasp - actually did no work (unlike the no-op builds in the Mozilla tree today which actually run many commands, even though nothing changed).
My point is that by having your make files - your build system - be statically defined, you can extract and combine pieces of information. Then, with relative ease, you can transform it to something else. Instead of make files/build systems being a DSL for shell script execution, they are data structures. And, you can do a lot more with data structures (including generating shell scripts). Representing a build system as a unified, constant data structure is what I consider to be the holy grail of build systems. If you can do this, all you need to do is transform a data structure to a format some other build tool can read and you are set. If you do a good job designing your data structure, what you've essentially built is an intermediate language (IR) that can be compiled to different build tools (like make, Ninja, and Visual Studio).
(The previous paragraph is important. You may want to read it again.)
Getting back on point, our goal is to assemble aggregate data for Mozilla's build system so we can construct a monolithic DAG. We can work towards this by extracting variable values from our make files.
Unfortunately, there are technical problems with naive data extraction from make files. Fortunately, I have solutions.
The first problem is getting at individual variable values in make files. Simple parsing is ruled out, as make files can be more complex than you realize (all those pesky built-in functions). You don't want to burden people to add targets to print variable values and then execute make just to print variables. This would be violating steps 1 and 2 from above! So, you somehow need to inject yourself into a make execution context.
Fortunately, we have pymake. Pymake is a mostly feature complete implementation of GNU make written in Python. You should be able to swap out GNU make for pymake and things just work. If they don't, there is a bug in pymake.
pymake is useful because it provides an API for manipulating make and make files. Contrast this was GNU make, which is essentially a black box: make file goes in, process invocations come out. There's no really good way to inspect things. The closest is the -p argument, which dumps some metadata of the parsed file. But, this would require you to parse that output. As they say, now you have two problems.
Using pymake, it's possible to get at the raw parser output and to poke around the assembled data structures before the make file is evaluated. This opens up some interesting possibilities.
With pymake's API, you can query for the value of a variable, such as XPIDLSRCS. So, you just need to feed every make file into pymake, query for the value of a variable, then do cool things with the extracted data.
Not so fast.
The problem with simple extraction of variables from make files is that there is no guarantee of idempotence. You see, make files aren't static structures. You don't parse a make file into a read-only structure, build the DAG, and go. Oh no, because of how make files work, you have to constantly re-evaluate things, possibly modifying the DAG in the process.
When you obtain the value of a variable in make, you need to evaluate it. In make parlance, this is called expansion.
Expansion is easy when variables have simple string content. e.g.
CPPSRCS = foo.cpp bar.cpp
It gets more complicated when they have references to other variables:
FILES = foo.cpp bar.cpp
CPPSRCS = $(FILES) other.cpp
Here the expansion of CPPSRCS must descend into FILES and expand that. If FILES contained a reference, make would descend into that. And so on.
The problem of non-guaranteed idempotence is introduced when variable expansion interfaces with something that is non-deterministic, such as the file system. This almost always involve a call to one of make's built-in functions. For example:
CPPSRCS = $(wildcard *.cpp)
SOURCES = $(shell find . -type f -name '*.cpp')
Here, the expanded value of CPPSRCS depends on the state of the filesystem at the time of expansion. This is obviously not deterministic. Since you can't guarantee the results of that expansion, doing something with the evaluated value (such as generating a new make file) is dangerous.
It gets worse.
The expansion of a variable can occur multiple times during the execution of a make file due to deferred evaluation. When you use the = operator in make files, the variable isn't actually expanded until it is accessed (make just stores a reference to the string literal on the right side of the assignment operator). Furthermore, the expansion occurs every time the variable is accessed. Seriously.
The := operator, however, expands the variable once - at assignment time. Instead of storing a string reference, make evaluates that string immediately and assigns the result to the variable.
The distinction is important and can have significant implications. Use of = can lead to degraded performance via redundant work during multiple expansions (e.g. file system access or shell invocation). It can also cause the value of a variable to change during the course of execution from changes to systems not directly under the make file's control (such as the file system). For these reasons, I recommend to use the immediate assignment operator (:=) instead of the deferred assignment operator (=) unless you absolutely need deferred assignment. This is because immediate assignment approximates what most think assignment should be. Unfortunately, since = is the only assignment operator in the overwhelming majority of popular programming languages, most people don't even know that other assignment operators exist or that the deferred assignment operator comes with baggage. Now you do. I hope you use it properly.
Anyway, if a variable's value changes while a make file is executing, what is the appropriate value to extract for external uses? In my opinion, the safe answer is there is none: don't extract the value of that variable. If you do, you are asking for trouble.
It sounds like I just tore a giant hole in my idea to extract simple data from make files since I just said that the value may not be deterministic. Correct. But, the key word here is may. Again, pymake comes to the rescue.
Using pymake, I was able to implement a prover that guarantees whether a variable is deterministic and thus idempotent. Basically, it examines the statement list generated by pymake's parser. It traces the assignment of a variable through the course of a statement list. If the variable itself, any variable referenced inside of it, or any variable that impacts the assignment (such as assignment inside a condition block) is tainted by a non-deterministic evaluation (notably file system querying and shell calls), that variable is marked as non-deterministic. Using this prover, we can identify which variables are guaranteed to be idempotent as long as the make file doesn't change. If you want to learn more, see the code.
Now that we have a way of proving that a variable in a make file is deterministic as long as the source make file doesn't change, we can extract data from make files with confidence, without having to worry about side-effects during execution of the make file. Essentially, the only thing we need to monitor is the original make file and any make files included by it. As long as they don't change, our static analysis holds.
So, I've solved the technical problem of extracting data from make files. This allows us to emancipate build system data into a constant data structure. As mentioned above, this opens up a number of possibilities.
It's worth noting that we can combine the prover with the style enforcer from part 3 to help ensure Mozilla's make files are statically defined. Over time, the make files will not only look like simple variable assignments (from part 1), but will also become static data structures. This will enable the Mozilla build system to be represented as a single, constant data structure. This opens up the transformation possibilities described above. It also allows for more radical changes, such as replacing the make files with something else. More on that later.
Part 5: Rewriting Make Files
Extracting data from make files to produce new build system data (possibly a unified make file) introduces a new problem: redundancy and/or fragmentation. You now have the original definition sitting around in the original make file and an optimized version in a derived file. What happens when the original file (still containing the original values) is evaluated? Do you want the targets (now defined elsewhere from the extracted data) to still work when executed from the original file? (Probably not.)
There is also a concern for how to handle the partially converted build system scenario. In theory, porting the build system to the new world order would go something like this:
1) Identify a variable/feature you want to extract from the make files 2) Write routine to convert extracted data into an optimized builder (maybe it just prints out a new make file) 3) For make files where that variable is defined but couldn't be safely extracted (because it wasn't determinant), continue to execute the old, inherited rule from rules.mk (as we do today).
This acknowledges the reality that any particular transition of a variable to use an optimized build routine from extracted values will likely be difficult to do atomically. The flag day will be held up by stragglers. It would be much more convenient to accomplish the transition incrementally. Files that are ready see the benefit today, not only after everyone else is ready.
My solution to this problem is make file rewriting. For variables that are deterministic and have their functionality replaced by some other system, we simply strip out references to these variables from the make files. No variable. No inherited rule. No redundancy. No extra burden. Of course, we're not altering the original make file. Instead, we take the input make file (Makefile.in in the Mozilla tree), do static analysis, remove variables handled elsewhere, then write out a new make file.
Unfortunately, this invented a new problem: rewriting make files. As far as I can tell, nobody has done this before. At least not to the degree or robustness that I have (previous solutions are effectively sed + grep).
Using pymake, I was able to create an API sitting between the parser and evaluator which allows high-level manipulation of the make file data structure. Want to delete a variable? Go for it. Want to delete a target? Sure! It's really the missing API from pymake (or any make implementation for that matter). I stopped short of writing a fully-featured API (things like adding rules, etc). Someday I would love to fill that in because then you could create new/anonymous make files/DAGS using just API calls. You would thus have a generic API for creating and evaluating a DAG using make's semantics. This is potentially very useful. For example, the optimized make files generated from static data extracted from other make files I've been talking about could be produced with this API. In my opinion, this would be much cleaner than printing make directives by hand (which is how I'm doing it today).
This new make file manipulation API was relatively easy to write. The hard part was actually formatting the statement list back into a make file that was both valid and functionally equivalent to the original. (There's actually a lot of dark corners involving escaping, etc.) But, I eventually slayed this dragon. And, the best part is I can prove it is correct by feeding the output into the pymake parser and verifying the two statement lists are identical! Furthermore, after the first iteration (which doesn't preserve formatting of the original make file because a) it doesn't need to and b) it would be a lot of effort) I can even compare the generated string content of the make files as an added sanity check.
If you are interested, the code for this lives alongside the deterministic proving code linked to in the previous section. I would like to land it as part of pymake someday. We'll see how receptive the pymake maintainers are to my crazy ideas.
Anyway, we can now strip extracted variables handled by more efficient build mechanisms out of generated make files. That's cool. It gives us an incremental transition path for Mozilla's build system that bridges new and old.
But wait - there's more!
That deterministic prover I wrote for data extraction: I can actually use it for rewriting make files!
Make files support conditional directives such as ifeq, ifneq, ifdef, else, etc. For each conditional directive, I can query the deterministic prover and ask whether an expansion of that directive is deterministic. If it is, I evaluate the conditional directive and determine which branch to take. And, I may have invented the first make file optimizer!
I simply replace the condition block (and all the branches therein) with the statements constituting the branch that is guaranteed to be executed.
As a contrived example:
DO_FOO := 1
ifeq ($(DO_FOO), 1)
foo.o: foo.cpp
clang++ -o foo.o foo.cpp
endif
We can prove the ifeq branch will always be taken. So, using the deterministic prover, we can rewrite this as:
DO_FOO := 1
foo.o: foo.cpp
clang++ -o foo.o foo.cpp
Actually, we can now see that the value of DO_FOO is not referenced and is free from side-effects (like a shell call), and can eliminate it!
foo.o: foo.cpp
clang++ -o foo.o foo.cpp
Cool!
The practical implication of this is that it is shifting the burden of make file evaluation from once every time you run make to some time before. Put another way, we are reducing the amount of work make performs when evaluating a make file. As long as the original file and any files included by it don't change frequently, the cost of the static analysis and rewriting is repaid by simpler make files and the lower run-time cost they incur. This should translate to speedier make invocations. (Although, I have no data to prove this.)
Another nice side-effect is that the rewritten and reduced make file is easy to read. You know exactly what is going to happen without having to perform the evaluation of (constant) conditional statements in your head (which likely involves consulting other files, like autoconf.mk in the case of Mozilla's make files). My subjective feeling is that this makes generated make files much easier to understand.
So, not only does rewriting make files allow us to incrementally transition to a build system where work is offloaded to a more optimized execution model, but it also makes the evaluated make files smaller, easier to understand, and hopefully faster. As Charlie Sheen would say: winning!
Review Through Part 5
At this point, I've laid out effectively phase 1 of making Mozilla's build system suck much less. We now have make files that are:
- Reusing rule logic from rules.mk to do everything (specially-named variables cause magic to ensure). No cargo culted rules.
- Devoid of cruft from targets not related to building the application.
- Variables defined such that they are deterministic (essentially no shell and filesystem calls in make files)
- A review system in place to ensure we don't deviate from the above
- A higher-level tool to extract specific variables from make files which also produces an optimized, non-recursive make file to evaluate those targets. Ideally, this encompasses all rules inherited from rules.mk and the need for recursion and to evaluate individual make files is eliminated.
If we wanted, we could stop here. This is about the limit we can take the existing build system while maintaining some resemblence to the current architecture.
But, I'm not done.
Part 6: Transition Away from Make Files
If we've implemented the previous parts of this post, it is only natural for us to shift away from make files.
If all we have in our make files are variable assignments, why are we using make files to define a static document? There are much better formats for static data representation. YAML and JSON come to mind.
If we transition the build system definition to something actually declarative - not just something we shoehorn into being declarative (because that is how you make it fast and maintainable) - that cuts out a lot of the hacky-feeling middleware described above (static analysis, data extraction, and rewriting). It makes parsing simpler (no need for pymake). It also takes away a foot gun (non-deterministic make statements). Still not convinced? We could actually properly represent an array (make doesn't have arrays, just strings split on whitespace)! OK, just kidding - the array thing isn't that big of a deal.
Anyway, the important characteristic of the build system is achieving that holy grail where the entire thing can be represented as a single generic (and preferably constant) data structure. As long as you can arrive at that (a state that can be transformed into something else), it doesn't matter the road you take. But, some roads are certainly cleaner than others.
At the point where you are transitioning away from make files, I would argue the correct thing to do is convert the build definition to files understood by another build tool. Or, we should at least use a format that is easily loaded into another tool. If we don't, we've just invented a custom one-off build system. A project the size of Mozilla could probably justify it. But, I think the right thing to do is find an open source solution that fits the bill and run with that.
While I'm far from an expert on it, I'll throw out GYP as a candidate. GYP defines the build system in JSON files (which can actually be directly evaluated in a Python interpreter). GYP consumes all of the individual .gyp/JSON files and assembles an aggregate representation of the entire build system. You then feed that representation into a generator, which produces make files, Ninja files, or even Visual Studio or Xcode project files. Sound familiar? It's essentially my holy grail where you transform a data structure into something consumed by a specific tool. I approve.
Now, GYP is not perfect. No build system is perfect. I will say that GYP is good enough to build Chromium. And, if it can handle Chromium, surely it can handle Firefox.
Next Steps for Mozilla
I'm not sure what the next steps are. I've been trying to convince people for a while that a data-centric declarative build system is optimal and we should be working towards it. Specifically, we should be working towards a monolithic DAG, not separate DAGs in separate make files. I think people generally agree with me here. We obviously have a build system today that is very declarative (with variables turning into rules). And, I know people like Joey Armstrong understand why recursive make is so slow and are working on ways to switch away from it.
I think the main problem is people are overwhelmed by what a transition plan would look like. There is a lot of crap in our make files and fixing it all in one go is next to impossible. Nobody wants to do it this way. And, I think the sheer scope of the overall problem along with a constant stream of new features (Fennec make files, ARM support, Windows 8, etc) has relegated people to merely chipping away at the fringe.
Supplementing that, I think there has been doubt over some of what I've proposed, specifically around the scarier-sounding bits like performing static analysis and rewriting of make files. It scared me when I first realized it was possible, too. But, it's possible. It's real. I have code that does it. I'll admit the code is far from perfect. But, I've done enough that I've convinced my self it is possible. It just needs a little polish and everything I described above is a reality. We just need to commit to it. Let's do it.
Loose Ends
What About "Sand-boxed" Make or Concatenating Make Files?
I anticipate that the main competition for my proposal will involve the following approaches:
- "sand-boxed" make files (described in the aforementioned Recursive Make Considered Harmful paper)
- Concatenate all the make files together
The concatenating one is definitely a non-controversial alternative to my proposal of extracting data then creating derived build files (possibly a monolithic make file) from it. And, if we go with concatenating, I'll generally be happy. We'll have a mostly statically-defined build system and it will exist in one monolithic DAG. This has desirable attributes.
FWIW, the stalled bug 623617 has an amazing patch from Joey Armstrong which partially implements concatenated make files. If this lands, I will be extremely happy because it will make the build system much faster and thus make developers' lives less painful.
That being said, I do have a complaint against the concatenating approach: it is still defined in make files.
The problem isn't make itself: it's that at the point you have pulled off concatenating everything together, your build system is effectively static data declarations. Why not use something like GYP instead? That way, you get Ninja, Visual Studio, Xcode, Eclipse, or even make file generation for free. I think it would be silly to ignore this opportunity.
If we continue to define everything in make files, we'll have to rely on data extraction from the concatenated make file to create project files for Visual Studio, Xcode, etc. This is possible (as I've proved above). And, I will gladly lend my time to implementing solutions (I've already performed a lot of work to get Visual Studio project generation working in this manner, but I abandoned work because I reached the long tail that is one-off targets and redundancy - which would be eliminated if we accomplish parts 1 and 2). Furthermore, we may have to reinvent the wheel of generating these project files (believe me, generating Visual Studio projects is not fun). Or, we could find a way to extract the make file data into something like GYP and have it do the generation for us. But, if we are going to end up in GYP (or similar), why don't we just start there?
I will also throw out that if we are really intent on preserving our existing make rules, it would be possible to define the build system in GYP/other files and write a custom GYP/other generator which produced make files tailored for our existing make rules.
In summary, the crux of my argument against concatenated make files is that it is a lot of work and that for roughly the same effort you could produce something else which has almost identical properties but has more features (which will make the end-developer experience better).
You could say the additional complexity of my proposal is just to buy us Visual Studio, etc support. That is more or less accurate. I think I've put in more time than anyone to devise a solution that programmatically generates Visual Studio projects. (Manually tailored files out out of the question because the maintenance of keeping them in sync with build system changes would be too high and error prone.) From my perspective, if we want to provide Visual Studio, Xcode, etc files for Mozilla projects - if we want to provide developers access to amazing tools so they can be more productive - we need the build system defined in something not make so this is robust.
Eclipse Project Files
Jonathan Watt recently posted instructions on configuring Eclipse CDT to build Firefox. Awesome!
Some may say that debunks my theory that it is difficult/impossible to generate build project files from just make files (without the sorcery I describe above). Perhaps.
As cool as the Eclipse CDT integration is, the way it works is a giant hack. And, I don't mean to belittle Jonathan or his work here - it is very much appreciated and a large step in the right direction.
The Eclipse CDT integration works by scanning the output of make, looking for compiler invocations. It parses these out, converting the arguments to Eclipse project data. Not only do you have to perform a standard make to configure Eclipse, but it also has to run without any parallelization so as to not confuse the parser (interleaved output, etc). And, Eclipse doesn't do everything for you - you still have to manage part of your build from the command line.
So, Eclipse is really a hybrid solution - albeit a rather useful one and, importantly, one that seems to work. Yet, In my opinion, this is all a little fragile and less than ideal. I want the ability to build the whole thing from your build tool. This is possible with the solutions I've proposed above.
Improving GYP
Above, I recommended GYP as a build system which I think is a good example of my holy grail build system - one which represents everything as data and allows transformations to multiple build tools.
While I like the concept of GYP, there are parts of the implementation I don't like. Every build system inevitably needs to deal with conditionals. e.g. if on Windows, use this flag; if on Linux, use this one. GYP represents conditions as specially crafted property names in JSON objects. For example:
{
'conditions': [
['OS=="linux"', {
'targets': [
{
'target_name': 'linux_target'
},
],
}],
['OS=="win"', {
'targets': [
{
'target_name': 'windows_target',
},
],
}]
]
}
That special string as a key name really rubs me the wrong way. You are using declarative data to represent procedural flow. Yuck. It reminds me of a horror from a previous life - VXML. Don't ever try to write a computer program complete with branching and functions in XML: you will just hate yourself.
Anyway, despite the conflation of logic and data, it isn't too bad. As long as you limit the expressions that can be performed (or that actually are performed - I think I read they just feed it into eval), it's still readable. And, I'll happily sacrifice some purity to achieve the holy grail.
Anyway, I think my optimal build system would look a lot like GYP except that the data definition language would be a little more powerful. Knowing that you will inevitably need to tackle conditionals and other simple logic, I would give up and give my build system files access to a real programming language. But, it would need to be a special language. One that doesn't allow foot guns (so you can preserve determinism) and one that can't be abused to allow people to stray too far from a data-first model.
For this task, I would employ the Lua programming language. I would use Lua because it is designed to be an embeddable and easily sandboxed programming language. These attributes are perfect for what I'd need it to do.
Basically, files defining the build system would be Lua scripts. Some driver program would discover all these Lua files. It would create a new Lua context (an instance of the Lua interpreter) for each file. The context's global registry (namespace) would be populated with variables that defined the existing build configuration. IS_WINDOWS, HAVE_STRFTIME, etc - the kind of stuff exported by configure. The Lua script would get executed in that context. It would do stuff. It would either set specially-named global variables to certain values or it would call functions installed in the global registry like add_library, etc. Then, the driving program would collect everything the Lua script did and merge that into a data structure representing the build system, achieving the holy grail.
In the simple case, build files/Lua scripts would just look like simple variable assignments:
EXPORTED_INCLUDES = {"foo.h", "bar.h"]
LIBRARY_NAME = "hello"
If the build system needed to do something conditional, you would have a real programming language to fall back on:
if IS_WINDOWS then
SOURCE_FILES = "windows.cpp"
else
SOURCE_FILES = "other.cpp"
end
To my eye, this is cleaner than hacking JSON while still readable.
For people not familiar with Lua, new context instances are cheap, tiny, and have almost no features. Even the standard library must be explicitly added to a context! Contrast this with a batteries-included language like Python or JavaScript. So, while the build definition files would have a fully-featured programming language backing them, they would be severely crippled. I'm thinking I would load the string and table libraries from the standard library. Maybe I'd throw the math one in too. If I did, I'd disable the random function, because that's not deterministic!
Anyway, I think that would get me to a happy place. Maybe I'll get bored on a weekend and implement it! I'm thinking I'll have a Python program drive Lua through the ctypes module (if someone hasn't written a binding already - I'm sure they have). Python will collect data from the Lua scripts and then translate that to GYP data structures. If someone beats me to it, please name it Lancelot and leave a comment linking to the source.
Using Code for Other Projects
The code I wrote for pymake can be applied to any make file, not just Mozilla's. I hope to have some of it integrated with the official pymake repository some day. If this interests you, please drop me a line in the comments and I'll see what I can do.