I hold a lot of context in my head when it comes to the future of Mozilla's build system and the interaction with it. I wanted to perform a brain dump of sorts so people have an idea of where I'm coming from when I inevitably propose radical changes.
The sad state of build system interaction and the history of mach
I believe that Mozilla's build system has had a poor developer experience for as long as there has been a Mozilla build system. Getting started with Firefox development was a rite of passage. It required following (often out-of-date) directions on MDN. It required finding pages through MDN search or asking other people for info over IRC. It was the kind of process that turned away potential contributors because it was just too damn hard.
mach - while born out of my initial efforts to radically change the build system proper - morphed into a generic command dispatching framework by the time it landed in mozilla-central. It has one overarching purpose: provide a single gateway point for performing common developer tasks (such as building the tree and running tests). The concept was nothing new - individual developers had long coded up scripts and tools to streamline workflows. Some even published these for others to use. What set mach apart was a unified interface for these commands (the mach script in the top directory of a checkout) and that these productivity gains were in the tree and thus easily discoverable and usable by everybody without significant effort (just run mach help).
While mach doesn't yet satisfy everyone's needs, it's slowly growing new features and making developers' lives easier with every one. All of this is happening despite that there is not a single person tasked with working on mach full time. Until a few months ago, mach was largely my work. Recently, Matt Brubeck has been contributing a flurry of enhancements - thanks Matt! Ehsan Akhgari and Nicholas Alexander have contributed a few commands as well! There are also a few people with a single command to their name. This is fulfilling my original vision of facilitating developers to scratch their own itches by contributing mach commands.
I've noticed more people referencing mach in IRC channels. And, more people get angry when a mach command breaks or changes behavior. So, I consider the mach experiment a success. Is it perfect, no. If it's not good enough for you, please file a bug and/or code up a patch. If nothing else, please tell me: I love to know about everyone's subtle requirements so I can keep them in mind when refactoring the build system and hacking on mach.
The object directory is a black box
One of the ideas I'm trying to advance is that the object directory should be considered a black box for the majority of developers. In my ideal world, developers don't need to look inside the object directory. Instead, they interact with it through condoned and supported tools (like mach).
I say this for a few reasons. First, as the build config module owner I would like the ability to massively refactor the internals of the object directory without disrupting workflows. If people are interacting directly with the object directory, I get significant push back if things change. This inevitably holds back much-needed improvements and triggers resentment towards me, build peers, and the build system. Not a good situation. Whereas if people are indirectly interacting with the object directory, we simply need to maintain a consistent interface (like mach) and nobody should care if things change.
Second, I believe that the methods used when directly interacting with the object directory are often sub-par compared with going through a more intelligent tool and that productivity suffers as a result. For example, when you type make in inside the object directory you need to know to pass -j8, use make vs pymake, and that you also need to build toolkit/library, etc. Also, by invoking make directly, you bypass other handy features, such as automatic compiler warning aggregation (which only happens if you invoke the build system through mach). If you go through a tool like mach, you should automatically get the most ideal experience possible.
In order for this vision to be realized, we need massive improvements to tools like mach to cover the missing workflows that still require direct object directory interaction. We also need people to start using mach. I think increased mach usage comes after mach has established itself as obviously superior to the alternatives (I already believe it offers this for tasks like running tests).
I don't want to force mach upon people but...
Nobody likes when they are forced to change a process that has been familiar for years. Developers especially. I get it. That's why I've always attempted to position mach as an alternative to existing workflows. If you don't like mach, you can always fall back to the previous workflow. Or, you can improve mach (patches more than welcome!). Having gone down the please-use-this-tool-it's-better road before at other organizations, I strongly believe that the best method to incur adoption of a new tool is to gradually sway people through obvious superiority and praise (as opposed to a mandate to switch). I've been trying this approach with mach.
Lately, more and more people have been saying things like we should have the build infrastructure build through mach instead of client.mk and why do we need testsuite-targets.mk when we have mach commands. While I personally feel that client.mk and testsuite-targets.mk are antiquated as a developer-facing interface compared to mach, I'm reluctant to eliminate them because I don't like forcing change on others. That being said, there are compelling reasons to eliminate or at least refactor how they work.
Let's take testsuite-targets.mk as an example. This is the make file that provides the targets to run tests (like make xpcshell-test and make mochitest-browser-chrome). What's interesting about this file is that it's only used in local builds: our automation infrastructure does not use testsuite-targets.mk! Instead, mozharness and the old buildbot configs manually build up the command used to invoke the test harnesses. Initially, the mach commands for running tests simply invoked make targets defined in testsuite-targets.mk. Lately, we've been converting the mach commands to invoke the Python test runners directly. I'd argue that the logic for invoke the test runner only needs to live in one place in the tree. Furthermore as a build module peer, I have little desire to support multiple implementations. Especially considering how fragile they can be.
I think we're trending towards an outcome where mach (or the code behind mach commands) transitions into the authoratitive invocation method and legacy interfaces like client.mk and testsuite-targets.mk are reimplemented to either call mach commands or the same routine that powers them. Hopefully this will be completely transparent to developers.
The future of mozconfigs and environment configuration
mozconfig files are shell scripts used to define variables consumed by the build system. They are the only officially supported mechanism for configuring how the build system works.
I'd argue mozconfig files are a mediocre solution at best. First, there's the issue of mozconfig statements that don't actually do anything. I've seen no-op mozconfig content cargo culted into the in-tree mozconfigs (used for the builder configurations)! Oops. Second, doing things in mozconfig files is just awkward. Defining the object directory requires mk_add_options MOZ_OBJDIR=some-path. What's mk_add_options? If some-path is relative, what is it relative to? While certainly addressable, the documentation on how mozconfig files work is not terrific and fails to explain many pitfalls. Even with proper documentation, there's still the issue of the file format allowing no-op variable assignments to persist.
I'm very tempted to reinvent build configuration as something not mozconfigs. What exactly, I don't know. mach has support for ini-like configuration files. We could certainly have mach and the build system pull configs from the same file.
I'm not sure what's going to happen here. But deprecating mozconfig files as they are today is part of many of the options.
Handling multiple mozconfig files
A lot of developers only have a single mozconfig file (per source tree at least). For these developers, life is easy. You simply install your mozconfig in one of the default locations and it's automagically used when you use mach or client.mk. Easy peasy.
I'm not sure what the relative numbers are, but many developers maintain multiple mozconfig files per source tree. e.g. they'll have one mozconfig to build desktop Firefox and another one for Android. They may have debug variations of each.
Some developers even have a single mozconfig file but leverage the fact that mozconfig files are shell scripts and have their mozconfig dynamically do things depending on the current working directory, value of an environment variable, etc.
I've also seen wrapper scripts that glorify setting environment variables, changing directory, etc and invoke a command.
I've been thinking a lot about providing a common and well-supported solution for switching between active build configurations. Installing mach on $PATH goes a long way to facilitate this. If you are in an object directory, the mozconfig used when that object directory was created is automatically applied. Simple enough. However, I want people to start treating object directories as black boxes. So, I'd rather not see people have their shell inside the object directory.
Whenever I think about solutions, I keep arriving at a virtualenv-like solution. Developers would potentially need to activate a Mozilla build environment (similar to how Windows developers need to launch MozillaBuild). Inside this environment, the shell prompt would contain the name of the current build configuration. Users could switch between configurations using mach switch or some other magic command on the $PATH.
Truth be told, I'm skeptical if people would find this useful. I'm not sure it's that much better than exporting the MOZCONFIG environment variable to define the active config. This one requires more thought.
The integration between the build environment and Python
We use Python extensively in the build system and for common developer tasks. mach is written in Python. moz.build processing is implemented in Python. Most of the test harnesses are written in Python.
Doing practically anything in the tree requires a Python interpreter that knows about all the Python code in the tree and how to load it.
Currently, we have two very similar Python environments. One is a virtualenv created while running configure at the beginning of a build. The other is essentially a cheap knock-off that mach creates when it is launched.
At some point I'd like to consolidate these Python environments. From any Python process we should have a way to automatically bootstrap/activate into a well-defined Python environment. This certainly sounds like establishing a unified Python virtualenv used by both the build system and mach.
Unfortunately, things aren't straightforward. The virtualenv today is constructed in the object directory. How do we determine the current object directory? By loading the mozconfig file. How do we do that? Well, if you are mach, we use Python. And, how does mach know where to find the code to load the mozconfig file? You can see the dilemma here.
A related issue is that of portable build environments. Currently, a lot of our automation recreates the build system's virtualenv from its own configuration (not that from the source tree). This has and will continue to bite us. We'd really like to package up the virtualenv (or at least its config) with tests so there is no potential for discrepancy.
The inner workings of how we integrate with Python should be invisible to most developers. But, I figured I'd capture it here because it's an annoying problem. And, it's also related to an activated build environment. What if we required all developers to activate their shell with a Mozilla build environment (like we do on Windows)? Not only would this solve Python issues, but it would also facilitate simpler config switching (outlined above). Hmmm...
Direct interaction with the build system considered harmful
Ever since there was a build system developers have been typing make (or make.py) to build the tree. One of the goals of the transition to moz.build files is to facilitate building the tree with Tup. make will do nothing when you're not using Makefiles! Another goal of the moz.build transition is to start derecursifying the make build system such that we build things in parallel. It's likely we'll produce monolithic make files and then process all targets for a related class IDLs, C++ compilation, etc in one invocation of make. So, uh, what happens during a partial tree build? If a .cpp file from /dom/src/storage is being handled by a monolithic make file invoked by the Makefile at the top of the tree, how does a partial tree build pick that up? Does it build just that target or every target in the monolithic/non-recursive make file?
Unless the build peers go out of our way to install redundant targets in leaf Makefiles, directly invoking make from a subdirectory of the tree won't do what it's done for years.
As I said above, I'm sympathetic to forced changes in procedure, so it's likely we'll provide backwards-compatibile behavior. But, I'd prefer to not do it. I'd first prefer partial-tree builds are not necessary and a full tree build finishes quickly. But, we're not going to get there for a bit. As an alternative, I'll take people building through mach build. That way, we have an easily extensible interface on which to build partial tree logic. We saw this recently when dumbmake/smartmake landed. And, going through mach also reinforces my ideal that the object directory is a black box.
Currently, most state as it pertains to a checkout or build is in the object directory. This is fine for artifacts from the build system. However, there is a whole class of state that arguably shouldn't be in the object directory. Specifically, it shouldn't be clobbered when you rebuild. This includes logs from previous builds, the warnings database, previously failing tests, etc. The list is only going to grow over time.
I'd like to establish a location for semi-persistant state related to the tree and builds. Perhaps we change the clobber logic to ignore a specific directory. Perhaps we start storing things in the user's home directory. Perhaps we could establish a second object directory named the state directory? How would this interact with build environments?
This will probably sit on the backburner until there is a compelling use case for it.
The battle against C++
Compiling C++ consumes the bulk of our build time. Anything we can do to speed up C++ compilation will work wonders for our build times.
I'm optimistic things like precompiled headers and compiling multiple .cpp files with a single process invocation will drastically decrease build times. However, no matter how much work we put in to make C++ compilation faster, we still have a giant issue: dependency hell.
As shown in my build system presentation a few months back, we have dozens of header files included by hundreds if not thousands of C++ files. If you change one file: you invalidate build dependencies and trigger a rebuild. This is why whenever files like mozilla-config.h change you are essentially confronted with a full rebuild. ccache may help if you are lucky. But, I fear that as long as headers proliferate the way they do, there is little the build system by itself can do.
My attitude towards this is to wait and see what we can get out of precompiled headers and the like. Maybe that makes it good enough. If not, I'll likely be making a lot of noise at Platform meetings requesting that C++ gurus brainstorm on a solution for reducing header proliferation.
Belive it or not, these are only some of the topics floating around in my head! But I've probably managed to bore everyone enough so I'll call it a day.
I'm always interested in opinions and ideas, especially if they are different from mine. I encourage you to leave a comment if you have something to say.
Matt Brubeck recently landed an awesome patch for mach in bug 840588: it allows mach to be used by any directory. I'm calling it omnipresent mach.
Essentially, Matt changed the mach driver (the script in the root directory of mozilla-central) so instead of having it look in hard-coded relative paths for all its code, it walks up the directory tree and looks for signs of the source tree or the object directory.
What this all means is that if you have the mach script installed in your $PATH and you just type mach in your shell from within any source directory or object directory, mach should just work. So, no more typing ./mach: just copy mach to ~/bin, /usr/local/bin or some other directory on your $PATH and you should just be able to type mach.
Unfortunately, there are bound to be bugs here. Since mach traditionally was only executed with the current working directory as the top source directory, some commands are not prepared to handle a variable current working directory. Some commands will likely get confused when it comes resolving relative paths, etc. If you find an issue, please report it! A temporary workaround is to just invoke mach from the top source directory like you've always been doing.
If you enjoy the feature, thank Matt: this was completely his idea and he saw it through from conception to implementation.