Mozilla Build System Plan of Attack

July 25, 2012 at 11:30 PM | categories: Mozilla, build system

Since I published my brain dump post on improving Mozilla's build system for Gecko applications (including Firefox and Firefox OS), there has been some exciting progress.

It wasn't stated in that original post, but the context for that post was to propose a plan in preparation of a meeting between the core contributors to the build system at Mozilla. I'm pleased to report that the plan was generally well-received.

We pretty much all agreed that parts 1, 2, and 3 are all important and we should actively work towards them. Parts 4, 5, and 6 were a bit more contentious. There are some good parts and some bad parts. We're not ready to adopt them just quite yet. (Don't bother reading the original post to look up what these parts corresponded to - I'll cover that later.)

Since that post and meeting, there has also been additional discussion around more specifics. I am going to share with you now where we stand.

BuildSplendid

BuildSplendid is an umbrella term associated with projects/goals to make the developer experience (including building) better - more splendid if you will.

BuildSplendid started as the name of my personal Git branch for hacking on the build system. I'm encouraging others to adopt the term because, well, it is easier to refer to (people like project codenames). If it doesn't stick, that's fine by me - there are other terms that will.

BuildFaster

An important project inside BuildSplendid is BuildFaster.

BuildFaster focuses on the following goals:

  1. Making the existing build system faster, better, stronger (but not harder).
  2. Making changes to the build system to facilitate the future use of alternate build backends (like Tup or Ninja). Work to enable Visual Studio, Xcode, etc project generation also falls here.

The distinction between these goals can be murky. But, I'll try.

Falling squarely in #1 are:

  • Switching the buildbot infrastructure to use pymake on Windows

Falling in #2 are:

  • Making Makefile.in's data-centric
  • Supporting multiple build backends

Conflated between the two are:

  • Ensuring Makefile's no-op if nothing has changed
  • Optimizing existing make rules. This involves merging related functionality as well as eliminating clown shoes in existing rules.

The two goals of BuildFaster roughly map to the short-term and long-term strategies, respectively. There is consensus that recursive make (our existing build backend) does not scale and we will plateau in terms of performance no matter how optimal we make it. That doesn't mean we are giving up on it: there are things we can and should do so our existing non-recursive make backend builds faster.

In parallel, we will also work towards the longer-term solution of supporting alternate build backends. This includes non-recursive make as well as things like Tup, Ninja, and even Visual Studio and Xcode. (I consider non-recursive make to be a separate build backend because changing our existing Makefile.in's to support non-recursive execution effectively means rewriting the input files (Makefile.in's). At that point, you've invented a new build backend.)

For people who casually interact with the build system, these two goals will blend together. It is not important for most to know what bucket something falls under.

BuildFaster Action Items

BuildFaster action items are being tracked in Bugzilla using the [BuildFaster:*] whiteboard annotation.

There is no explicit tracking bug for the short-term goal (#1). Instead, we are relying on the whiteboard annotation.

We are tracking the longer-term goal of supporting alternate build backends at bug 774049.

The most important task to help us reach the goals is to make our Makefile.in's data centric. This means:

  • Makefile.in's must consist of only simple variable assignment
  • Makefile.in's must not rely on being evaluated to perform variable assignment.

Basically, our build config should be defined by static key-value pairs.

This translates to:

  • Move all rules out of Makefile.in's bug 769378
  • Remove use of $(shell) from Makefile.in's bug 769390
  • Remove filesystem functions from Makefile.in's bug 769407
  • And more, as we identify the need

While we have these tracking bugs on file, we still don't have bugs filed that track individual Makefile.in's that need updated. If you are a contributor, you can help by doing your part to file bugs for your Makefile.in's. If your Makefile.in violates the above rules, please file a bug in the Build Config component of the product it is under (typically Core). See the tree of the above bugs for examples.

Also part of BuildFaster (but not relevant to most) is the task of changing the build system to support multiple backends. Currently, pieces like configure assume the Makefile.in to Makefile conversion is always what is wanted. These parts will be worked on by core contributors to the build system and thus aren't of concern to most.

I will be the first to admit that a lot of the work to purify Makefile.in's to be data centric will look like a lot of busy work with little immediate gain. The real benefits to this work will manifest down the road. That being said, removing rules from Makefile.in's and implementing things as rules in rules.mk helps ensure that the implementation is proper (rules.mk is coded by make ninjas and thus probably does things right). This can lead to faster build times.

mozbuild

mozbuild is a Python package that provides an API to the build system.

What mozbuild will contain is still up in the air because it hasn't landed in mozilla-central yet. In the code I'm waiting on review to uplift to mozilla-central, mozbuild contains:

  • An API for invoking the build system backend (e.g. launching make). It basically reimplements client.mk because client.mk sucks and needs to be replaced.
  • An API for launching tests easily. This reimplements functionality in testsuite-targets.mk, but in a much cleaner way. Running a single test can now be done with a single Python function call. This may sound boring, but it is very useful. You just import a module and pass a filesystem path to a test file to a function and a test runs. Boom!
  • Module for extracting compiler warnings from build output and storing in a persisted database for post-build retrieval. Compiler warning tracking \o/
  • Module for converting build system output into structured logs. It records things like time spent in different directories, etc. We could use this for tracking build performance regressions. We just need a arewe*yet.com domain...
  • A replacement for .mozconfig's that sucks less (stronger validation, settings for not just build config, convenient Python API, etc).

And, upcoming features which I haven't yet tried to land in mozilla-central include:

  • API for extracting metadata from Makefile.in's and other frontend files. Want a Python class instance describing the IDLs defined in an individual Makefile.in or across the entire tree? mozbuild can provide that. This functionality will be used to configure alternate build backends.
  • Build backend API which allows for different build backends to be configured (e.g. recursive make, Tup, Ninja, etc). When we support multiple build backends, they'll live in mozbuild.

mozbuild can really be thought of as a clean backend to the build system and related functionality (like running tests). Everything in mozbuild could exist in make files or in .py files littered in build/, config/, etc. But, that would involve maintaining make files and/or not having a cohesive API. I wanted a clean slate that was free from the burdens of the existing world. mozbuild was born.

I concede that there will be non-clear lines of division between mozbuild and other Python packages and/or Mozilla modules. For example, is mozbuild the appropriate location to define an API for taking the existing build configuration and launching a Mochitest? I'm not sure. For now, I'm stuffing functionality inside mozbuild unless there is a clear reason for it to exist elsewhere. If we want to separate (because of module ownership issues, for example), we can do that.

My vision for mozbuild is for it to be the answer to the question how does the Mozilla build system work? You should be able to say, look at the code in python/mozbuild and you will have all the answers.

mozbuild Action Items

The single action item for mozbuild is getting it landed. I have code written that I think is good enough for an initial landing (with obvious shortcomings being addressed in follow-up bugs). It just needs some love from reviewers.

Landing mozbuild is tracked in bug 751795. I initially dropped a monolithic patch. I have since started splitting bits up into bite-sized patches to facilitate faster, smaller reviews. (See the blocking bugs.)

mach

mozbuild is just a Python package - an API. It has no frontend.

Enter mach.

mach is a command-line frontend to mozbuild and beyond.

Currently, mach provides some convenient shortcuts for performing common tasks. e.g. you can run a test by tab-completing to its filename using your shell. It also provides nifty output in supported terminals, including colorizing and a basic progress indicator during building.

You can think of mach as a replacement for client.mk and other make targets. But, mach's purpose doesn't end there. My vision for mach is for it to be the one-stop shop for all your mozilla-central interaction needs.

From a mozilla-central tree, you should be able to type ./mach and do whatever you need to do. This could be building Firefox, running tests, uploading a patch to Bugzilla, etc.

I'm capturing ideas for mach features in bug 774108.

mach Action Items

mach is in the same boat as mozbuild: it's waiting for reviewer love. If you are interested in reviewing it, please let me know.

Once mach lands, I will be looking to the larger community to improve it. I want people to go wild implementing features. I believe that mach will have a significant positive impact on driving contributions to Mozilla because it will make the process much more streamlined and less prone to error. I think it is a no-brainer to check in mach as soon as possible so these wins can be realized.

There is an open question of who will own mach in terms of module ownership. mach isn't really part of anything. Sure, it interacts with the build system, testing code, tools, etc. But, it isn't actually part of any of that. Maybe a new module will be created for it. I'm not familiar with how the modules system works and I would love to chat with someone about options here.

Project Interaction

How are all of these projects related?

BuildFaster is what everyone is working on today. It currently focuses on making the existing recursive make based build backend faster using whatever means necessary. BuildFaster could theoretically evolve to cover other build backends (like non-recursive make). Time will tell what falls under the BuildFaster banner.

mozbuild is immediately focused on providing the functionality to enable mach to land.

mach is about improving the overall developer experience when it comes to contributing to Firefox and other in-tree applications (like Firefox OS). It's related to the build system in that it provides a nice frontend to it. That's the only relationship. mach isn't part of the build system (at least not yet - it may eventually be used to perform actions on buildbot machines).

Down the road, mozbuild will gain lots of new features core to the build system. It will learn how to extract metadata from Makefile.in's which can be used by other build backends. It will define a build backend interface and the various build backends will be implemented in mozbuild. Aspects of the existing build system currently implemented in make files or in various Python files scattered across the tree will be added to mozbuild and exposed with a clean, reusable, and testable API.

Today, BuildFaster is the important project with all the attention. When mach lands, it will (hopefully) gather a lot of attention. But, it will be from a different group (the larger contributor community - not just build system people).

mozbuild today is only needed to support mach. But, once mozbuild lands and mach is on its own, mozbuild's purpose will shift to support BuildFaster and other activities under the BuildSplendid banner.

Conclusion

We have a lot of work ahead of us. Please look at the bugs linked above and help out in any way you can.


One Year at Mozilla

July 18, 2012 at 12:00 AM | categories: Mozilla, Firefox, Sync

It is hard to believe that my first day as a full-time employee of Mozilla was one year ago today! But, here I am. And, I'd like to think the past year is worth reflecting on.

So, what have I been doing for a year? Good question!

Accomplishments

  • First patch and commit to Firefox: bug 673209. Yes, my first patch was to configure.in. It scared me too.
  • Number of non-merge commits: 133
  • Number of merge commits: 49
  • Number of reviews: 31
  • Favorite commit message: 88d02a07d390 Fix bug 725478 because we run a buggy Python released almost 5 years ago; r=tigerblood (it was a bustage follow-up).
  • Biggest mistake: buildbotcustom change to packaging turned every tree red. All of them. Including release. I think I have philor on record saying it is the worst burnage he's ever seen. Mission accomplished!

Major User-Facing Features:

  • Wrote add-on sync for Firefox (with help from many others, especially Dave Townsend and Blair McBride).
  • Principle reviewer for Apps in the Cloud (apps sync) in Firefox. (Are reviewers allowed to take credit?)

Honestly, when I see that list, I think that's all? It doesn't feel like a lot, especially since I work on a major user-facing feature (Firefox Sync). I do contribute to more. But, if you ask me what the most significant impact was, I don't think I can call anything else out as major.

There are certainly a lot of minor things:

  • Actually managed to understand how Firefox Sync works, including the crypto model. I even documented it! I don't think I knew any of that when I started (I basically knew that Firefox Sync was using strong client-side encryption. What that meant, I had no clue.)
  • Filed initial bugs (including some patches) to compile mozilla-central on Windows 8 and MSVC 2011. By the time people started working on Metro development, mozilla-build, configure, and the like already supported building.
  • Configured Android Sync Jenkins Builder. I'm told the Android Sync team loves it! You gotta love all the built-in dashboards. I really wish we had this for mozilla-central.
  • Made xpcshell test harness write xUnit files. Machine readable output of test results, baby!
  • Implement send tab to device API in Firefox Sync. Too bad we don't have UX to use it :(
  • Implemented testing-only JavaScript modules. So, instead of the hack that is [head] files in tests, you can Cu.import("resource://testing-common/foo.js");
  • Made test package generation quieter. This made build logs about 30% smaller. This means you can get at build results faster.
  • Hacked around with JavaScript code coverage using the JS Debugger API. Failed in the process. But, I learned a lot about the JS Debugger API, so I consider it a win.
  • Rewrote Firefox Sync's record reconciling algorithm. The old one was full of subtle data loss corner cases. The new algorithm is much more robust (we think - even with 3 reviewers there was still some head scratching).
  • Emancipated lots of generic JavaScript code from the Sync tree into the a services-common package. This anticipated Apps in the Cloud and notifications' requirement to use this functionality and allowed those projects to get going quicker. At this point, we're effectively running a mini Toolkit. We really need to port some of this upstream.
  • Helped design the next version of the HTTP service to be used by Sync.
  • Implemented a standalone JavaScript implementation of the above. The production server used by Mozilla runs on Python (running the Python server in Mozilla's test environment would be too difficult). The server was also implemented using just the spec. This allowed us to clarify many parts of the spec. The Python functional tests can also be executed against the JS server. This gives us confidence that functionality is equivalent and tests hitting the test/JS server will behave the same as if they are hitting the production Python server.
  • Implemented a standalone JavaScript client for the above service. Previously, all the logic for this was scattered over about a dozen files and was notoriously difficult to audit and update. I also think it is a decent example of good code. Clean. Highly documented. No hacks.
  • Reviewed a skeleton for the notifications service, which will eventually power browser notifications in Firefox.
  • Build system patches to better support Clang. Thankfully, Rafael Espíndola has been our Clang champion as of late and is now ensuring the bleeding edge of Clang does not break the tree. Thanks, Rafael! (I actually think he is in the process of switching our OS X builds to use Clang as I type this!)
  • Worked with security and crypto people to devise the security model behind the next version of Firefox Sync. (Brian Warner and Ben Adida have been doing most of the crypto work. I'm mostly on the sidelines making sure they design a system that can easily interop with Sync.)
  • Helped devise the next version of Sync's server-side storage format. This will make the next version of Sync faster and able to hold more data at lower server cost.
  • Gave lots of love to documentation at https://docs.services.mozilla.com/ (especially the Sync docs). It's almost at the point where others can implement Sync clients without having to ask one of the fewer than 10 people on the planet who actually know.
  • Contributed many small patches to the build system. Mostly general cleanup so far. Although, I have much bigger plans in the works.
  • Many miscellaneous small things. (I get distracted easily.)

Well, that list doesn't seem too shabby. But, a lot of it is smaller ticket items. I don't think there's anything there worth writing home about. Whatever. The future is looking bright for Firefox Sync (Persona integration will make Sync usable by millions more) and new sync backends are coming (including search engine sync). So, I'm fine with not having a longer list of big ticket contributions after only a year.

On Ramping up at Mozilla

I will be the first to admit that I had a difficult time getting into the groove at Mozilla and my first months (dare I say my first half year) were unproductive by my personal standards.

I can't say I wasn't warned. My manager (Mike Connor) told me multiple times that it would happen. I was skeptical, insteading rationalizing that my previous track record of learnly quickly would hold. He was right. I was wrong. I got fat from the humble pie.

There are a few reasons for this. For starters, I was starting a new job. It's almost impossible achieve 100% productivity on your first day. Second, I was working with tools I knew little about, including JavaScript and especially including the flavor of JavaScript used inside Firefox. (Short version: the JavaScript within Firefox is awesome in that it implements bleeding-edge features of the language. Unfortunately, the JavaScript inside Firefox/Gecko is contaminated by this blight called XPCOM. It makes things ugly and not very JavaScript-y. XPCOM also makes the learning curve that much harder because now you have to learn multiple technologies at the same time.) It was daunting.

Not helping matters was the fact that Firefox Sync is complicated. Syncing data is in of itself a difficult problem. Throw in remote servers, an HTTP protocol, a encryption, and interaction with systems inside Firefox that are themselves complicated, and you have a hard problem. My first months were literally spent being a thorn in Philipp von Wieter^H^H^H^H^H^H philikon's side, barraging him with an endless stream of questions. I am forever in beer debt to him because of this. When Philipp left the team to work on Boot 2 Gecko and the rest of the Firefox Sync team was retasked to work on Android Sync shortly thereafter, I was on my own little island to manage Firefox Sync. I kind of felt like Tom Hanks' character in Castaway.

If I have one piece of advice for people starting at Mozilla it's this: be prepared to be humbled by your own ignorance. There is a lot to learn. It's not easy. Don't feel bad when you struggle. The payoff is worth it.

On Life at Mozilla

Despite the hurdles I initially faced ramping up at Mozilla, life at Mozilla is great. This mostly stems from the people, of course.

If you are just looking for technical excellence, I think Mozilla has one of the highest concentrations of any company in the world. Sure, larger companies will have more amazing individuals. But, the number per capita at Mozilla is just staggering. I don't know how many times I've met or talked with someone only to find out later they are considered to be one of the best in his or her respective field. Reading Mozilla's phonebook is like looking at a Who's Who list. For someone like me who loves being a sponge for knowledge, Mozilla is an environment in which I can thrive. Just thinking back at everything I've learned in the past year makes my mind asplode.

On the personal front, the personalities of Mozillians are also top notch. People are generally helpful and supportive. (They need to be for an open source project to thrive.) People recognize good ideas when they hear them and there is comparatively few political battles to be won when enacting change. People themselves are generally interesting and passionate about the world and the work they do. If you are living inside the Mozilla bubble, you may not realize how lucky you have it. I could give specific examples, but I'd be writing all night. Just take my word for it.

If you need something to whet your appetite, just check out the zaniness that is Mozilla Memes. I don't expect you to understand many of the posts unless you are a Mozillian or follower of Reddit and know what internet memes are. But, if you are either, it pretty much sums up a large part of the culture for you. Sometimes I feel like I'm living in one giant, happy meme.

One of the aspects I love most about working at Mozilla is I finally feel that my career interests are aligned with an organization I philosophically agree with. Just read the Mozilla Manifesto. What's not to like?

This is one of the primary factors that convinced me to join Mozilla. After Microsoft acquired the startup where I had my first post-college job (Tellme Networks), I could never hold my head high in Silicon Valley among my friends in the tech sector. Normal people and those outside of Silicon Valley were like, "Microsoft, cool!" But, something just didn't feel right about saying I worked for them. I felt like I was working for the Empire while I really wanted to be working for the Rebel Alliance. I felt like I had to atone for my time at Microsoft. I felt like I needed to cleanse my soul. Mozilla was an obvious answer.

(I don't mean to disparage Microsoft. I actually think the culture has changed since the days when their behavior earned them the reputation that Silicon Valley stills holds them accountable for. Still, I would not work for them in Silicon Valley. Anyway, I'm not here to talk about the past.)

Mozilla is an organization I'm proud to work for. I exercise that pride by frequently wearing my awesome Firefox hoodie. Nearly every time I do, random people come up to me and say how they love Firefox and/or what Mozilla does for the world. Every time they do, it brings a smile to my face. This constantly reinforces what I know to be true: that I'm working for a great organization.

Future at Mozilla

I'm already looking forward to the next year at Mozilla. It is already shaping up to be much, much more productive than my first.

On the roadmap, all of my hacking about with the build system is about to pay dividends. Ever since my first day at Mozilla I have been frustrated with the build system and the developer experience one must go through to contribute to Firefox. After many months of casual (mostly evenings and weekends) experimentation, my work is about to pay off.

I have successfully formulated a plan of attack and helped convince others this is what we need to do. We have since committed to the fundamental components of that plan and are tracking its progress. (I don't mean to take sole or even primary responsibility for this as the credit resides with a number of people. But, I would like to think that the dozens of times I championed aspects of this plan in IRC and in hallway chats before I was the first person to articulate it in a post helped lay the groundwork for the eventual acceptance of this project.) Once we see progress on this project, great things will come from it. I promise.

My work towards making the build system faster had an unintended consequence: the creation of a new tool that serves as a frontend to the build system. One day, I took a step backwards and realized that the potential for such a tool is much greater than simply interacting with the build system. So, I extracted that work from my build system hacking and polished it up a bit. It is now one review away from landing. When it does, thousands of Firefox developers will have a much better experience when developing Firefox. And, my hope is for many more features to follow to make it even more awesome, especially for first-time contributors. I believe this is important to help advance Mozilla's Mission.

Improving the developer experience of Firefox is exciting and it will likely make a lot of people really happy. But, it's neither the most exciting nor most important project I'll contribute to in the upcoming year. The most exciting and important project for me will be refactoring Firefox Sync to make it faster, more robust, sync more data, and, most importantly, usable by more people.

Firefox Sync stands out from similar products in that it keeps your data safe. Really safe. I blogged about this previously. But, I intentionally kept the tone of that post neutral and factual. The truth is that the security model of Firefox Sync makes it look like nearly all other products aren't even trying. I take immense pride in working on a data-sharing feature that makes users' lives better without undermining security. Firefox Sync stands in rare company in this regard.

Unfortunately, in our zeal for the best security possible, we designed a product that isn't usable by the majority of people because it is too complicated to set up and is prone to losing your data. In the end, this doesn't really serve the overall Firefox user base.

We've been hard at work devising the next version of Firefox Sync which will retain the optimum security and privacy settings of the existing product while extending usability at nearly-comparable security and ofer data recovery to the vast majority of our users. This is huge.

Yeah, I'm pretty damn excited about my next year at Mozilla.


Improving Mozilla's Build System

June 25, 2012 at 10:15 AM | categories: make, Mozilla, pymake, build system

Makefiles. Love them or hate them, they are ubiquitous. For projects like Firefox (which has on the order of 1500 Makefiles), Makefiles aren't going away any time soon. This despite that newer - arguably better - build system (such as CMake and GYP) exist.

Many have decried the numerous ways in which GNU make and make files suck. I won't pile on here. Instead, please read What's Wrong with GNU make for a great overview.

Despite the many flaws with make, I would argue that the single most significant one is the slow speed often associated with it. People lauding make replacements (such as Ninja) almost always emphasize this one aspect above all others. (OK, Ninja isn't exactly a make replacement, but the execution aspects are comparable.)

Why is make slow? It is almost always due to the use of recursive make. This is make executing itself recursively, usually as it traverses a directory tree. This is how Mozilla's build system works. You start in one directory. Then, for each directory defined in the DIRS variable, make invokes make in that directory. In Mozilla's case, we amplify this by performing multiple passes over the directory tree - an export pass that installs header files, etc into a common directory, a libs pass that does most of the compilation, and a tools pass which performs other miscellaneous actions. Each of these is a recursive make iteration. As we say at Mozilla: clown shoes.

The deficiencies of recursive make and a workaround are called out in the classic paper Recursive Make Considered Harmful. Despite its age, it is still relevant - GNU make hasn't changed that much, after all. The thesis of the paper can be boiled down to the fact that make assembles a directed acyclic graph (DAG) and then executes it. Using recursive make, you are assembling N DAGs instead of of 1. Each DAG must be executed separately. This adds execution overhead. On top of that, since the DAGs are separate, you are either a) missing information from each DAG (it doesn't know about other DAGs) thus creating an incomplete and possibly error-prone dependency graph or b) duplicating information in multiple DAGs, creating more work for make in aggregate.

The solution for this is non-recursive make. That's just a fancy way of saying create 1 monolithic DAG and execute it once (as opposed to N separate DAGs with at least N invocations of make). Modern build systems like Ninja do this. While these build systems have other properties that contribute to faster execution times, in my opinion the single feature that has the greatest impact is consolidating the build dependencies into a single graph. This minimizes the aggregate amount of work the driver needs to perform and all but eliminates redundant operations.

Non-recursive make is typically achieved one of 2 ways:

  1. Create a single monolithic make file
  2. Concatenate all of your make files together (using the built-in include directive)

Either way, you produce a single DAG and all is well. More on this later.

Transitioning to a Single DAG

A big problem with non-recursive make is transitioning to it. For projects like Firefox with its 1500 Makefiles, each existing in its own little universe, this is a Herculean effort. There is a reason why many (including Mozilla) haven't done it: it's just too hard.

The problem of slow build times with Firefox is important to me because I suffer from it almost every day. So, I got to thinking of creative ways to solve this problem. How can we work towards a monolithic DAG? How can we do this incrementally so as to not cause too much disruption and so the effort isn't near impossible?

I've come up with a plan that I believe Mozilla should pursue. The plan consists of many different parts, each of which is described in a separate section below. Each part is presented in roughly the order it needs to be addressed in.

I want to emphasize that this is my personal opinion and what I'm about to propose is merely that: a proposal. For all I know, people will think I'm smoking crack and none of these ideas will be pursued.

Let's begin.

Part 1: No Rules

Make files consist of rules. These are, well, rules that tell make how to evaluate a target. A rule will say something like to turn hello.cpp into hello.o, call the C++ compiler with these arguments.

People naturally take advantage of the fact that the body of the rule (the recipe in make parlance) is often similar for many targets, so you make use of wildcard rules or by specifying multiple prerequisites and/or targets for a rule.

Wildcard rules are great. Their problem is that the recipe is often useful across many make files. The solution here is to put the rule definition (and thus the recipe) in a common file and use make's include directive to bring those rules into the current make file.

This is the general approach Mozilla takes. All of the generic rules are defined in the (oh-it-hurts-my-eyes-to-look-at-so-much-make-syntax) rules.mk file. Individual Makefiles in the tree simply define specifically-named variables such as CPPSRCS or XPIDLSRCS and then include rules.mk. When make executes, rules.mk transforms these variables into rules which are automagically inserted into the DAG.

From a user perspective, this is splendid. You simply define some variables and magic ensues. Instead of hairy looking make files, you literally have a file with simple variable assignments. If all goes to plan, you don't need to look at these rules definitions. Instead, you leave that up to professional dragon trainers, such as Ted Mielczarek, Joey Armstrong, and Kyle Huey.

A large problem is that many make files define custom rules - things not using the magic from rules.mk. What's worse is that these custom rules are often cargo culted around. They work, yes, but there is redundancy. The installation of mochitest files is a perfect example of this. Bug 370750) tracks establishing a rule for this absurd repetition of make logic in Mozilla's tree. Even in the case of mochitest rules where the recipe is simple (it's just calling $(INSTALL)), the lack of a unified rule hurts us. If we wanted to change the directory the mochitest files were installed to, this would require editing scores of files. Not cool.

Anyway, the first step towards improving the Mozilla build system (and any make file based build system really) is what I'm calling the no rules initiative.

We should strive for no explicit rules in non-shared make files (things outside of config/ and build/ in the Mozilla source tree - basically all the files named Makefile.in or Makefile). Instead, individual make files should define variables that cause rules from a shared make file (like rules.mk) to be applied.

In Mozilla's case, this will be a lot of manual effort and it will seem like there is no immediate gain. The benefits will be explained in subsequent sections.

Part 2: Eliminate Make File Content Not Related to Building

Mozilla's make files are riddled with content that isn't directly related to the action of building Mozilla applications. Not only does this contribute to the overhead of invoking make (these bits need to be evaluated when parsing, added to the DAG, etc, adding cost), but they also make the make files harder to read and thus edit.

testsuite-targets.mk is a good example of this. While it isn't included in every make file (just the top-level one), the code isn't related to building at all! Instead, it is essentially proxy code that maps make targets to commands. It is supposed to be convenient: type |make xpcshell-tests| and some tests run. OK, fine. I'll give you that. The main problem is this make file is just a glorified shell script. Make is great at constructing a dependency graph. As a DSL for shell scripts, I'd rather just write a script. Therefore, I would argue code like this belongs elsewhere - not in make files.

Anyway, a problem for Mozilla is that our make files are riddled with pieces of content that aren't directly related to building. And, build system maintainers pay the price. Every time you see some part of a make file that doesn't appear to be directly related to the act of building, your mind is like, "wut ist das?" (That's High German for WTF?) You start asking questions. What's it for. Who uses it? Is it really needed? What happens if it breaks? How will I know? These questions all need answered and that adds overhead to editing these files. Oftentimes these little one-off gems are undocumented and the questions are nearly impossible to answer easily. The wise course of action is usually preserve existing behavior. Nobody wants to melt someone else's precious snowflake, after all. Thus, cargo cult programming prevails.

The solution I propose is along the same vein as the no rules initiative - you can actually think of it as a specifically-tailored subset: limit the make files to what's necessary for building. If it doesn't relate to the act of building the application - if it is an island in the DAG - remove it.

But where do you move it to? Good question. We don't have an obvious location today. But, in bug 751795 I am creating what is essentially a frontend for the build system. That is where it should go. (I will publish more information about that project on this blog once it has landed.)

Part 3: Make File Style Conventions and Enforced Review Requirements

My first two points revolved around what is effectively a make file style convention. If we are serious about making that convention stick, we need to enforce it better.

This can be done in two ways.

First, we establish a review policy that any changes to make files must be signed off by a build system peer. Maybe we carve out an exception for simple cases, such as adding a file to a XPIDLSRCS variable. I don't know. We certainly don't allow people to add new make files or even targets without explicit sign-off. I think this is how it is supposed to be today. It isn't enforced very well if it is.

To aid in enforcing the style convention, we arm people with a tool that checks the style so they can fix things before it lands on a build peer's plate. This is actually pretty trivial to implement (I even have code that does this). The hard part is coming to consensus on the rules. If there is a will, I can make this happen easily.

Once you have the tool to check for convention/style violations, your checkin policy is amended to include: no make file changes can be checked in unless the file passes the style checker. This is sure to draw ire from some, I'm sure. But, the alternative is having a cat and mouse game between the build system maintainers making the build system suck less and other people undoing their work with new features, etc.

Anyway, these rules wouldn't be absolute - there could always be exceptions, of course. If we're serious about making the build system better, we need it to be more consistent and have less variance. I'm open to other suggestions to accomplish this.

Part 4: Extracting Data from Make Files

Once we have make files which consist of simple variable assignments (not rules), it is possible to take the values of these variables, and extract them from the make files to some higher-level system. For example, we could scour all the make files for the CPPSRCS variable and assemble a unified list of all C++ source files defined by that variable across the entire build system. Then, we could do some really interesting things.

What kind of things you ask? Well, one idea is you could write out a monolithic make file that explicitly listed every target from the extracted data. No recursive make necessary here! Another idea would be to produce Visual Studio project files.

Think I'm crazy and this is impossible? Feast your eyes. That make file was generated by parsing the existing Firefox make files, extracting statically-defined data, then transforming that data structure back into a single make file. I also have code somewhere that produces Visual Studio projects.

Now, all of this is just a proof-of-concept. It works just well enough for me to prove it is feasible. (The linked example is 7 months old and won't work on the current tree.) Although, at one time, I did have have a functioning transformation of essentially the make export stage. All include files and IDL generation was in a single make file. It ran in a fraction of the time as recursive make. A no-op execution of the monolithic make file took about 0.5 seconds and - gasp - actually did no work (unlike the no-op builds in the Mozilla tree today which actually run many commands, even though nothing changed).

My point is that by having your make files - your build system - be statically defined, you can extract and combine pieces of information. Then, with relative ease, you can transform it to something else. Instead of make files/build systems being a DSL for shell script execution, they are data structures. And, you can do a lot more with data structures (including generating shell scripts). Representing a build system as a unified, constant data structure is what I consider to be the holy grail of build systems. If you can do this, all you need to do is transform a data structure to a format some other build tool can read and you are set. If you do a good job designing your data structure, what you've essentially built is an intermediate language (IR) that can be compiled to different build tools (like make, Ninja, and Visual Studio).

(The previous paragraph is important. You may want to read it again.)

Getting back on point, our goal is to assemble aggregate data for Mozilla's build system so we can construct a monolithic DAG. We can work towards this by extracting variable values from our make files.

Unfortunately, there are technical problems with naive data extraction from make files. Fortunately, I have solutions.

The first problem is getting at individual variable values in make files. Simple parsing is ruled out, as make files can be more complex than you realize (all those pesky built-in functions). You don't want to burden people to add targets to print variable values and then execute make just to print variables. This would be violating steps 1 and 2 from above! So, you somehow need to inject yourself into a make execution context.

Fortunately, we have pymake. Pymake is a mostly feature complete implementation of GNU make written in Python. You should be able to swap out GNU make for pymake and things just work. If they don't, there is a bug in pymake.

pymake is useful because it provides an API for manipulating make and make files. Contrast this was GNU make, which is essentially a black box: make file goes in, process invocations come out. There's no really good way to inspect things. The closest is the -p argument, which dumps some metadata of the parsed file. But, this would require you to parse that output. As they say, now you have two problems.

Using pymake, it's possible to get at the raw parser output and to poke around the assembled data structures before the make file is evaluated. This opens up some interesting possibilities.

With pymake's API, you can query for the value of a variable, such as XPIDLSRCS. So, you just need to feed every make file into pymake, query for the value of a variable, then do cool things with the extracted data.

Not so fast.

The problem with simple extraction of variables from make files is that there is no guarantee of idempotence. You see, make files aren't static structures. You don't parse a make file into a read-only structure, build the DAG, and go. Oh no, because of how make files work, you have to constantly re-evaluate things, possibly modifying the DAG in the process.

When you obtain the value of a variable in make, you need to evaluate it. In make parlance, this is called expansion.

Expansion is easy when variables have simple string content. e.g.

CPPSRCS = foo.cpp bar.cpp

It gets more complicated when they have references to other variables:

FILES = foo.cpp bar.cpp
CPPSRCS = $(FILES) other.cpp

Here the expansion of CPPSRCS must descend into FILES and expand that. If FILES contained a reference, make would descend into that. And so on.

The problem of non-guaranteed idempotence is introduced when variable expansion interfaces with something that is non-deterministic, such as the file system. This almost always involve a call to one of make's built-in functions. For example:

CPPSRCS = $(wildcard *.cpp)
SOURCES = $(shell find . -type f -name '*.cpp')

Here, the expanded value of CPPSRCS depends on the state of the filesystem at the time of expansion. This is obviously not deterministic. Since you can't guarantee the results of that expansion, doing something with the evaluated value (such as generating a new make file) is dangerous.

It gets worse.

The expansion of a variable can occur multiple times during the execution of a make file due to deferred evaluation. When you use the = operator in make files, the variable isn't actually expanded until it is accessed (make just stores a reference to the string literal on the right side of the assignment operator). Furthermore, the expansion occurs every time the variable is accessed. Seriously.

The := operator, however, expands the variable once - at assignment time. Instead of storing a string reference, make evaluates that string immediately and assigns the result to the variable.

The distinction is important and can have significant implications. Use of = can lead to degraded performance via redundant work during multiple expansions (e.g. file system access or shell invocation). It can also cause the value of a variable to change during the course of execution from changes to systems not directly under the make file's control (such as the file system). For these reasons, I recommend to use the immediate assignment operator (:=) instead of the deferred assignment operator (=) unless you absolutely need deferred assignment. This is because immediate assignment approximates what most think assignment should be. Unfortunately, since = is the only assignment operator in the overwhelming majority of popular programming languages, most people don't even know that other assignment operators exist or that the deferred assignment operator comes with baggage. Now you do. I hope you use it properly.

Anyway, if a variable's value changes while a make file is executing, what is the appropriate value to extract for external uses? In my opinion, the safe answer is there is none: don't extract the value of that variable. If you do, you are asking for trouble.

It sounds like I just tore a giant hole in my idea to extract simple data from make files since I just said that the value may not be deterministic. Correct. But, the key word here is may. Again, pymake comes to the rescue.

Using pymake, I was able to implement a prover that guarantees whether a variable is deterministic and thus idempotent. Basically, it examines the statement list generated by pymake's parser. It traces the assignment of a variable through the course of a statement list. If the variable itself, any variable referenced inside of it, or any variable that impacts the assignment (such as assignment inside a condition block) is tainted by a non-deterministic evaluation (notably file system querying and shell calls), that variable is marked as non-deterministic. Using this prover, we can identify which variables are guaranteed to be idempotent as long as the make file doesn't change. If you want to learn more, see the code.

Now that we have a way of proving that a variable in a make file is deterministic as long as the source make file doesn't change, we can extract data from make files with confidence, without having to worry about side-effects during execution of the make file. Essentially, the only thing we need to monitor is the original make file and any make files included by it. As long as they don't change, our static analysis holds.

So, I've solved the technical problem of extracting data from make files. This allows us to emancipate build system data into a constant data structure. As mentioned above, this opens up a number of possibilities.

It's worth noting that we can combine the prover with the style enforcer from part 3 to help ensure Mozilla's make files are statically defined. Over time, the make files will not only look like simple variable assignments (from part 1), but will also become static data structures. This will enable the Mozilla build system to be represented as a single, constant data structure. This opens up the transformation possibilities described above. It also allows for more radical changes, such as replacing the make files with something else. More on that later.

Part 5: Rewriting Make Files

Extracting data from make files to produce new build system data (possibly a unified make file) introduces a new problem: redundancy and/or fragmentation. You now have the original definition sitting around in the original make file and an optimized version in a derived file. What happens when the original file (still containing the original values) is evaluated? Do you want the targets (now defined elsewhere from the extracted data) to still work when executed from the original file? (Probably not.)

There is also a concern for how to handle the partially converted build system scenario. In theory, porting the build system to the new world order would go something like this:

1) Identify a variable/feature you want to extract from the make files 2) Write routine to convert extracted data into an optimized builder (maybe it just prints out a new make file) 3) For make files where that variable is defined but couldn't be safely extracted (because it wasn't determinant), continue to execute the old, inherited rule from rules.mk (as we do today).

This acknowledges the reality that any particular transition of a variable to use an optimized build routine from extracted values will likely be difficult to do atomically. The flag day will be held up by stragglers. It would be much more convenient to accomplish the transition incrementally. Files that are ready see the benefit today, not only after everyone else is ready.

My solution to this problem is make file rewriting. For variables that are deterministic and have their functionality replaced by some other system, we simply strip out references to these variables from the make files. No variable. No inherited rule. No redundancy. No extra burden. Of course, we're not altering the original make file. Instead, we take the input make file (Makefile.in in the Mozilla tree), do static analysis, remove variables handled elsewhere, then write out a new make file.

Unfortunately, this invented a new problem: rewriting make files. As far as I can tell, nobody has done this before. At least not to the degree or robustness that I have (previous solutions are effectively sed + grep).

Using pymake, I was able to create an API sitting between the parser and evaluator which allows high-level manipulation of the make file data structure. Want to delete a variable? Go for it. Want to delete a target? Sure! It's really the missing API from pymake (or any make implementation for that matter). I stopped short of writing a fully-featured API (things like adding rules, etc). Someday I would love to fill that in because then you could create new/anonymous make files/DAGS using just API calls. You would thus have a generic API for creating and evaluating a DAG using make's semantics. This is potentially very useful. For example, the optimized make files generated from static data extracted from other make files I've been talking about could be produced with this API. In my opinion, this would be much cleaner than printing make directives by hand (which is how I'm doing it today).

This new make file manipulation API was relatively easy to write. The hard part was actually formatting the statement list back into a make file that was both valid and functionally equivalent to the original. (There's actually a lot of dark corners involving escaping, etc.) But, I eventually slayed this dragon. And, the best part is I can prove it is correct by feeding the output into the pymake parser and verifying the two statement lists are identical! Furthermore, after the first iteration (which doesn't preserve formatting of the original make file because a) it doesn't need to and b) it would be a lot of effort) I can even compare the generated string content of the make files as an added sanity check.

If you are interested, the code for this lives alongside the deterministic proving code linked to in the previous section. I would like to land it as part of pymake someday. We'll see how receptive the pymake maintainers are to my crazy ideas.

Anyway, we can now strip extracted variables handled by more efficient build mechanisms out of generated make files. That's cool. It gives us an incremental transition path for Mozilla's build system that bridges new and old.

But wait - there's more!

That deterministic prover I wrote for data extraction: I can actually use it for rewriting make files!

Make files support conditional directives such as ifeq, ifneq, ifdef, else, etc. For each conditional directive, I can query the deterministic prover and ask whether an expansion of that directive is deterministic. If it is, I evaluate the conditional directive and determine which branch to take. And, I may have invented the first make file optimizer!

I simply replace the condition block (and all the branches therein) with the statements constituting the branch that is guaranteed to be executed.

As a contrived example:

DO_FOO := 1

ifeq ($(DO_FOO), 1)
foo.o: foo.cpp
    clang++ -o foo.o foo.cpp
endif

We can prove the ifeq branch will always be taken. So, using the deterministic prover, we can rewrite this as:

DO_FOO := 1

foo.o: foo.cpp
    clang++ -o foo.o foo.cpp

Actually, we can now see that the value of DO_FOO is not referenced and is free from side-effects (like a shell call), and can eliminate it!

foo.o: foo.cpp
    clang++ -o foo.o foo.cpp

Cool!

The practical implication of this is that it is shifting the burden of make file evaluation from once every time you run make to some time before. Put another way, we are reducing the amount of work make performs when evaluating a make file. As long as the original file and any files included by it don't change frequently, the cost of the static analysis and rewriting is repaid by simpler make files and the lower run-time cost they incur. This should translate to speedier make invocations. (Although, I have no data to prove this.)

Another nice side-effect is that the rewritten and reduced make file is easy to read. You know exactly what is going to happen without having to perform the evaluation of (constant) conditional statements in your head (which likely involves consulting other files, like autoconf.mk in the case of Mozilla's make files). My subjective feeling is that this makes generated make files much easier to understand.

So, not only does rewriting make files allow us to incrementally transition to a build system where work is offloaded to a more optimized execution model, but it also makes the evaluated make files smaller, easier to understand, and hopefully faster. As Charlie Sheen would say: winning!

Review Through Part 5

At this point, I've laid out effectively phase 1 of making Mozilla's build system suck much less. We now have make files that are:

  • Reusing rule logic from rules.mk to do everything (specially-named variables cause magic to ensure). No cargo culted rules.
  • Devoid of cruft from targets not related to building the application.
  • Variables defined such that they are deterministic (essentially no shell and filesystem calls in make files)
  • A review system in place to ensure we don't deviate from the above
  • A higher-level tool to extract specific variables from make files which also produces an optimized, non-recursive make file to evaluate those targets. Ideally, this encompasses all rules inherited from rules.mk and the need for recursion and to evaluate individual make files is eliminated.

If we wanted, we could stop here. This is about the limit we can take the existing build system while maintaining some resemblence to the current architecture.

But, I'm not done.

Part 6: Transition Away from Make Files

If we've implemented the previous parts of this post, it is only natural for us to shift away from make files.

If all we have in our make files are variable assignments, why are we using make files to define a static document? There are much better formats for static data representation. YAML and JSON come to mind.

If we transition the build system definition to something actually declarative - not just something we shoehorn into being declarative (because that is how you make it fast and maintainable) - that cuts out a lot of the hacky-feeling middleware described above (static analysis, data extraction, and rewriting). It makes parsing simpler (no need for pymake). It also takes away a foot gun (non-deterministic make statements). Still not convinced? We could actually properly represent an array (make doesn't have arrays, just strings split on whitespace)! OK, just kidding - the array thing isn't that big of a deal.

Anyway, the important characteristic of the build system is achieving that holy grail where the entire thing can be represented as a single generic (and preferably constant) data structure. As long as you can arrive at that (a state that can be transformed into something else), it doesn't matter the road you take. But, some roads are certainly cleaner than others.

At the point where you are transitioning away from make files, I would argue the correct thing to do is convert the build definition to files understood by another build tool. Or, we should at least use a format that is easily loaded into another tool. If we don't, we've just invented a custom one-off build system. A project the size of Mozilla could probably justify it. But, I think the right thing to do is find an open source solution that fits the bill and run with that.

While I'm far from an expert on it, I'll throw out GYP as a candidate. GYP defines the build system in JSON files (which can actually be directly evaluated in a Python interpreter). GYP consumes all of the individual .gyp/JSON files and assembles an aggregate representation of the entire build system. You then feed that representation into a generator, which produces make files, Ninja files, or even Visual Studio or Xcode project files. Sound familiar? It's essentially my holy grail where you transform a data structure into something consumed by a specific tool. I approve.

Now, GYP is not perfect. No build system is perfect. I will say that GYP is good enough to build Chromium. And, if it can handle Chromium, surely it can handle Firefox.

Next Steps for Mozilla

I'm not sure what the next steps are. I've been trying to convince people for a while that a data-centric declarative build system is optimal and we should be working towards it. Specifically, we should be working towards a monolithic DAG, not separate DAGs in separate make files. I think people generally agree with me here. We obviously have a build system today that is very declarative (with variables turning into rules). And, I know people like Joey Armstrong understand why recursive make is so slow and are working on ways to switch away from it.

I think the main problem is people are overwhelmed by what a transition plan would look like. There is a lot of crap in our make files and fixing it all in one go is next to impossible. Nobody wants to do it this way. And, I think the sheer scope of the overall problem along with a constant stream of new features (Fennec make files, ARM support, Windows 8, etc) has relegated people to merely chipping away at the fringe.

Supplementing that, I think there has been doubt over some of what I've proposed, specifically around the scarier-sounding bits like performing static analysis and rewriting of make files. It scared me when I first realized it was possible, too. But, it's possible. It's real. I have code that does it. I'll admit the code is far from perfect. But, I've done enough that I've convinced my self it is possible. It just needs a little polish and everything I described above is a reality. We just need to commit to it. Let's do it.

Loose Ends

What About "Sand-boxed" Make or Concatenating Make Files?

I anticipate that the main competition for my proposal will involve the following approaches:

  1. "sand-boxed" make files (described in the aforementioned Recursive Make Considered Harmful paper)
  2. Concatenate all the make files together

The concatenating one is definitely a non-controversial alternative to my proposal of extracting data then creating derived build files (possibly a monolithic make file) from it. And, if we go with concatenating, I'll generally be happy. We'll have a mostly statically-defined build system and it will exist in one monolithic DAG. This has desirable attributes.

FWIW, the stalled bug 623617 has an amazing patch from Joey Armstrong which partially implements concatenated make files. If this lands, I will be extremely happy because it will make the build system much faster and thus make developers' lives less painful.

That being said, I do have a complaint against the concatenating approach: it is still defined in make files.

The problem isn't make itself: it's that at the point you have pulled off concatenating everything together, your build system is effectively static data declarations. Why not use something like GYP instead? That way, you get Ninja, Visual Studio, Xcode, Eclipse, or even make file generation for free. I think it would be silly to ignore this opportunity.

If we continue to define everything in make files, we'll have to rely on data extraction from the concatenated make file to create project files for Visual Studio, Xcode, etc. This is possible (as I've proved above). And, I will gladly lend my time to implementing solutions (I've already performed a lot of work to get Visual Studio project generation working in this manner, but I abandoned work because I reached the long tail that is one-off targets and redundancy - which would be eliminated if we accomplish parts 1 and 2). Furthermore, we may have to reinvent the wheel of generating these project files (believe me, generating Visual Studio projects is not fun). Or, we could find a way to extract the make file data into something like GYP and have it do the generation for us. But, if we are going to end up in GYP (or similar), why don't we just start there?

I will also throw out that if we are really intent on preserving our existing make rules, it would be possible to define the build system in GYP/other files and write a custom GYP/other generator which produced make files tailored for our existing make rules.

In summary, the crux of my argument against concatenated make files is that it is a lot of work and that for roughly the same effort you could produce something else which has almost identical properties but has more features (which will make the end-developer experience better).

You could say the additional complexity of my proposal is just to buy us Visual Studio, etc support. That is more or less accurate. I think I've put in more time than anyone to devise a solution that programmatically generates Visual Studio projects. (Manually tailored files out out of the question because the maintenance of keeping them in sync with build system changes would be too high and error prone.) From my perspective, if we want to provide Visual Studio, Xcode, etc files for Mozilla projects - if we want to provide developers access to amazing tools so they can be more productive - we need the build system defined in something not make so this is robust.

Eclipse Project Files

Jonathan Watt recently posted instructions on configuring Eclipse CDT to build Firefox. Awesome!

Some may say that debunks my theory that it is difficult/impossible to generate build project files from just make files (without the sorcery I describe above). Perhaps.

As cool as the Eclipse CDT integration is, the way it works is a giant hack. And, I don't mean to belittle Jonathan or his work here - it is very much appreciated and a large step in the right direction.

The Eclipse CDT integration works by scanning the output of make, looking for compiler invocations. It parses these out, converting the arguments to Eclipse project data. Not only do you have to perform a standard make to configure Eclipse, but it also has to run without any parallelization so as to not confuse the parser (interleaved output, etc). And, Eclipse doesn't do everything for you - you still have to manage part of your build from the command line.

So, Eclipse is really a hybrid solution - albeit a rather useful one and, importantly, one that seems to work. Yet, In my opinion, this is all a little fragile and less than ideal. I want the ability to build the whole thing from your build tool. This is possible with the solutions I've proposed above.

Improving GYP

Above, I recommended GYP as a build system which I think is a good example of my holy grail build system - one which represents everything as data and allows transformations to multiple build tools.

While I like the concept of GYP, there are parts of the implementation I don't like. Every build system inevitably needs to deal with conditionals. e.g. if on Windows, use this flag; if on Linux, use this one. GYP represents conditions as specially crafted property names in JSON objects. For example:

{
  'conditions': [
    ['OS=="linux"', {
      'targets': [
        {
          'target_name': 'linux_target'
        },
      ],
    }],
    ['OS=="win"', {
      'targets': [
        {
          'target_name': 'windows_target',
        },
      ],
    }]
  ]
}

That special string as a key name really rubs me the wrong way. You are using declarative data to represent procedural flow. Yuck. It reminds me of a horror from a previous life - VXML. Don't ever try to write a computer program complete with branching and functions in XML: you will just hate yourself.

Anyway, despite the conflation of logic and data, it isn't too bad. As long as you limit the expressions that can be performed (or that actually are performed - I think I read they just feed it into eval), it's still readable. And, I'll happily sacrifice some purity to achieve the holy grail.

Anyway, I think my optimal build system would look a lot like GYP except that the data definition language would be a little more powerful. Knowing that you will inevitably need to tackle conditionals and other simple logic, I would give up and give my build system files access to a real programming language. But, it would need to be a special language. One that doesn't allow foot guns (so you can preserve determinism) and one that can't be abused to allow people to stray too far from a data-first model.

For this task, I would employ the Lua programming language. I would use Lua because it is designed to be an embeddable and easily sandboxed programming language. These attributes are perfect for what I'd need it to do.

Basically, files defining the build system would be Lua scripts. Some driver program would discover all these Lua files. It would create a new Lua context (an instance of the Lua interpreter) for each file. The context's global registry (namespace) would be populated with variables that defined the existing build configuration. IS_WINDOWS, HAVE_STRFTIME, etc - the kind of stuff exported by configure. The Lua script would get executed in that context. It would do stuff. It would either set specially-named global variables to certain values or it would call functions installed in the global registry like add_library, etc. Then, the driving program would collect everything the Lua script did and merge that into a data structure representing the build system, achieving the holy grail.

In the simple case, build files/Lua scripts would just look like simple variable assignments:

EXPORTED_INCLUDES = {"foo.h", "bar.h"]
LIBRARY_NAME = "hello"

If the build system needed to do something conditional, you would have a real programming language to fall back on:

if IS_WINDOWS then
    SOURCE_FILES = "windows.cpp"
else
    SOURCE_FILES = "other.cpp"
end

To my eye, this is cleaner than hacking JSON while still readable.

For people not familiar with Lua, new context instances are cheap, tiny, and have almost no features. Even the standard library must be explicitly added to a context! Contrast this with a batteries-included language like Python or JavaScript. So, while the build definition files would have a fully-featured programming language backing them, they would be severely crippled. I'm thinking I would load the string and table libraries from the standard library. Maybe I'd throw the math one in too. If I did, I'd disable the random function, because that's not deterministic!

Anyway, I think that would get me to a happy place. Maybe I'll get bored on a weekend and implement it! I'm thinking I'll have a Python program drive Lua through the ctypes module (if someone hasn't written a binding already - I'm sure they have). Python will collect data from the Lua scripts and then translate that to GYP data structures. If someone beats me to it, please name it Lancelot and leave a comment linking to the source.

Using Code for Other Projects

The code I wrote for pymake can be applied to any make file, not just Mozilla's. I hope to have some of it integrated with the official pymake repository some day. If this interests you, please drop me a line in the comments and I'll see what I can do.


Finding Oldest Firefox Code

June 18, 2012 at 10:10 AM | categories: Mozilla

On Twitter the other night, Justin Dolske posed a question:

Weekend challenge: what is the oldest line of code still shipping
in Firefox (tiebreaker: largest contiguous chunk)?

Good question and good challenge!

Technical Approach

To solve this problem, I decided my first task would be to produce a database holding line-by-line metadata for files currently in the Firefox repository. This sounded like a difficult problem at first, especially considering the Mercurial repository doesn't contain CVS history and this would be needed to identify code older than Mercurial that is still active.

Fortunately, there exists a Git repository with full history of mozilla-central, including CVS and Mercurial! Armed with a clone of this repository, I wrote a quick shell one-liner to ascertain the history of every line in the repository:

for f in `git ls-tree -r --name-only HEAD`; do \
  echo "BEGIN_RECORD $f"; \
  git blame -l -t -M -C -n -w -p $f; \
  echo "END_RECORD $f"; \
done

The git ls-tree command prints the names of every file in the current tree. This is basically doing find . -type f except for files under version control by Git. git blame attempts to ascertain the history of each line in a file. It is worth pointing out arguments -M and -C. These attempt to find moves/copies of the line from within the same commit. If these are omitted, simple refactoring such as renaming a file or reordering code within a file would result in a improper commit attribution. Basically, Git would associate the line with the commit that changed it. With these flags, Git attempts to complete the chain and find the true origin of the line (to some degree).

Now, something I thought was really cool is git blame's porcelain output format (-p). Not only does it allow for relatively simple machine readability of the output (yay), but it also compacts the output so adjacent lines sharing the same history metadata share the same metadata/header block. In other words, it solves Dolske's largest contiguous chunk tiebreaker for free! Thanks, Git!

I should also say that git blame isn't perfect for attributing code. But I think it is good enough to solve a Twitter challenge.

I piped the output of the above command into a file so I could have the original data available to process. After all, this data is idempotent, so it makes no sense to not save it. After running for a while, I noticed things were running slower than I'd like. I think it took about 2 hours to obtain info for ~5000 files. No good. I played around a bit and realized the -M and -C flags were slowing things down. This is expected. But, I really wanted this data for a comprehensive data analysis.

I re-discovered GNU Parallel and modified my one-liner to use all the cores:

git ls-tree -r --name-only HEAD | \
parallel 'echo "BEGIN_RECORD {}"; git blame -l -t -M -C -n -w -p {}; echo "END_RECORD {}"'

This made things run substantially faster since I was now running on all 8 cores, not just 1. With GNU Parallel, this simple kind of parallelism is almost too easy. Now, I say substantially faster, but overall execution is still slow. How slow? Well, on my Sandy Bridge Macbook Pro:

real  525m49.149s
user  3592m15.862s
sys   201m4.482s

8:45 wall time and nearly 60 hours of CPU time. Yeah, I'm surprised my laptop didn't explode too! The output file was 1,099,071,448 bytes uncompressed and 155,354,423 bzipped.

While Git was obtaining data, I set about writing a consumer and data processor. I'm sure the wheel of parsing this porcelain output format has been invented before. But, I hadn't coded any Perl in a while and figured this was as good of an excuse as any!

The Perl script to do the parsing and data analysis is available at https://gist.github.com/2945604. The core parsing function simply calls a supplied callback whenever a new block of code from the same commit/file is encountered.

I implemented a function that records a mapping of commit times to blocks. Finally, I wrote a simple function to write the results.

Results

What did 60 hours of CPU time tell us? Well, the oldest recorded line dates from 1998-03-28. This is actually the Free the Lizard commit - the first commit of open source Gecko to CVS. From this commit (Git commit 781c48087175615674 for those playing at home), a number of lines linger, including code for mkdepend and nsinstall.

But, Dolske's question was about shipping code. Well, as far as I can tell, the oldest shipping code in the tree honor is shared by the following:

  • js/jsd/jsd1640.rc (pretty much the whole file)
  • js/jsd/jsd3240.rc (ditto)
  • js/jsd/jsd_atom.c:47-73
  • js/src/jsfriendapi.cpp:388-400
  • js/src/yarr/YarrParser.h:295-523
  • media/libjpeg/jdarith.c:21-336
  • media/libjpeg/jctrans.c:20-178
  • media/libjpeg (misc files all have large blocks)
  • xpcom/components/nsComponentManager.cpp:897-901
  • gfx/src/nsTransform2D.h (a couple 3-4 line chunks)
  • toolkit/library/nsDllMain.cpp:22-31
  • widget/windows/nsUXThemeData.cpp:213-257
  • widget/windows/nsWindowGfx.cpp:756-815
  • xpcom/ds/nsUnicharBuffer.cpp:14-33
  • xpcom/glue/nsCRTGlue.cpp:128-174

There are also a few small chunks of 1 or 2 lines in a couple dozen other files from that initial commit.

Further Analysis

If anyone is interested in performing additional analysis, you can just take my Gist and install your own onBlock and output formatting functions! Of course, you'll need a data set. My code is written so it will work with any Git repository. If you want to analyze Firefox, it will take hours of CPU time to extract the data from Git. To save some time, you can obtain a copy of the raw data from commit 0b961fb702a9576cb456809410209adbbb956bc8.

There is certainly no shortage of interesting analysis that can be performed over this data set. Some that come to mind are a scoreboard for most lines of code in the current tree (friendly competition, anyone?) and a breakdown of active lines by the period they were introduced.

I'll leave more advanced analysis as an exercise for the reader.


Smaller Firefox Build Logs

May 23, 2012 at 08:50 AM | categories: Mozilla, Firefox

The other day I looked at a full Firefox build log from TBPL and noticed that ~84,000 of the ~170,000 lines in the log I looked at was output from archive processes. We were printing thousands of lines showing the files that were being added and extracted from the archives that contain test files!

I thought this was wasteful, so I filed bug 757397 and coded up a patch. Ted agreed that these lines were rather worthless and the patch has landed in mozilla-inbound.

The result of the patch is build logs are about half as big in terms of lines. And, it appears at least 500kb is shaved off the compressed log files as well.

The real world impact is you should be able to load build logs from the server faster because they are smaller.

If you were parsing this data before and are impacted by this, please leave a comment on the aforementioned bug and we'll go from there.


« Previous Page -- Next Page »