Bulk Analysis of Mozilla's Build and Test Data

April 01, 2013 at 01:12 PM | categories: Mozilla, Firefox

When you push code changes to Firefox and other similar Mozilla projects, a flood of automated jobs is triggered on Mozilla's infrastructure. It works like any other continuous integration system. First you build, then you run tests, etc. What sets it apart from other continuous integration systems is the size: Mozilla runs thousands of jobs per week and the combined output sums into the tens of gigabytes.

Most of the data from Mozilla's continuous integration is available on public servers, notably ftp.mozilla.org. This includes compiled binaries, logs, etc.

While there are tools that can sift through this mountain of data (like TBPL), they don't allow ad-hoc queries over the raw data. Furthermore, these tools are very function-specific and there are many data views they don't expose. This missing data has always bothered me because, well, there are cool and useful things I'd like to do with this data.

This itch has been bothering me for well over a year. The persistent burning sensation coupled with rain over the weekend caused me to scratch it.

The product of my weekend labor is a system facilitating bulk storage and analysis of Mozilla's build data. While it's currently very alpha, it's already showing promise for more throrough data analysis.

Essentially, the tool works by collecting the dumps of all the jobs executed on Mozilla's infrastructure. It can optionally supplement this with the raw logs from those jobs. Then, it combs through this data, extracts useful bits, and stores them. Once the initial fetching has completed, you simply need to re-"parse" the data set into useful data. And, since all data is stored locally, the performance of this is not bound by Internet bandwidth. In practice, this means that you can obtain a new metric faster than would have been required before. The downside is you will likely be storing gigabytes of raw data locally. But, disks are cheap. And, you have control over what gets pulled in, so you can limit it to what you need.

Please note the project is very alpha and is only currently serving my personal interests. However, I know there is talk about TBPL2 and what I have built could evolve into the data store for the next generation TBPL tool. Also, most of the work so far has centered on data import. There is tons of analysis code waiting to be written.

If you are interested in improving the tool, please file a GitHub pull request.

I hope to soon blog about useful information I've obtained through this tool.


Omnipresent mach

March 03, 2013 at 12:30 PM | categories: Mozilla, Firefox, mach

Matt Brubeck recently landed an awesome patch for mach in bug 840588: it allows mach to be used by any directory. I'm calling it omnipresent mach.

Essentially, Matt changed the mach driver (the script in the root directory of mozilla-central) so instead of having it look in hard-coded relative paths for all its code, it walks up the directory tree and looks for signs of the source tree or the object directory.

What this all means is that if you have the mach script installed in your $PATH and you just type mach in your shell from within any source directory or object directory, mach should just work. So, no more typing ./mach: just copy mach to ~/bin, /usr/local/bin or some other directory on your $PATH and you should just be able to type mach.

Unfortunately, there are bound to be bugs here. Since mach traditionally was only executed with the current working directory as the top source directory, some commands are not prepared to handle a variable current working directory. Some commands will likely get confused when it comes resolving relative paths, etc. If you find an issue, please report it! A temporary workaround is to just invoke mach from the top source directory like you've always been doing.

If you enjoy the feature, thank Matt: this was completely his idea and he saw it through from conception to implementation.


moz.build Files and the Firefox Build System

February 28, 2013 at 07:45 PM | categories: Mozilla, Firefox, build system

The next time you update mozilla-central you may notice some significant changes with the build system. That's because this morning we finally landed the start of a massive overhaul of the build system! There are many end goals to this effort. The important ones for most will be faster build times and a build system that is friendlier to make changes to.

Introducing moz.build Files

If you look in the tree, you'll notice that nearly every directory now has a moz.build file.

moz.build files are what we are using to define the build system. Think of them each as a descriptor that describes how to build its own part of the tree. An individual moz.build file will contain the C++ sources to compile, the headers to export, the tests to run, etc. Eventually. Currently, they are limited to directory traversal information.

moz.build files essentially add a level of indirection between the build system definition and how the tree is actually built. Before moz.build files, the same metadata we are now capturing in moz.build files (or plan to capture) was captured in Makefile.in files. We performed simple variable substitution on these Makefile.in files to produce Makefile files in the object directory. These Makefile files were used by GNU Make (or Pymake on Windows) to build the tree.

As I outlined in Improving Mozilla's Build System, Makefile.in are suboptimal for a number of reasons. The important bit is they essentially tie us to the use of make (recursive or otherwise). We are very interested in supporting modern build systems such as Tup (the theory being they will build the tree faster).

Enter moz.build files. Storing our build configuration in moz.build files allows us to decouple the definition of the build system from the tool used to build it.

How moz.build Files Work

At the tail end of configure, the build system invokes the config.status script in the object directory. The role of config.status is to combine the information collected during configure with the build configuration obtained from moz.build files and take the necessary actions to ensure the build backend (make) can build the tree.

Before today, config.status essentially iterated over the source tree and converted Makefile.in files to Makefile in the object directory. Things are now slightly more complicated with moz.build files.

When config.status runs, it starts with the root moz.build from the source tree. It feeds this file into a Python interpreter. It then looks for special variables like DIRS and PARALLEL_DIRS to determine which directories contain additional moz.build files. It then descends into all the referenced directories, reading their moz.build files. While this is happening, we are converting the results of each moz.build file execution into backend.mk files that make knows how to build. It also performs the Makefile.in to Makefile conversion like it always has. When the whole process has finished, the object directory contains a bunch of Makefile and backend.mk files. make runs like it always has. The only real difference is some variables are coming from the moz.build-derived backend.mk files instead of Makefile.

This is just a brief overview, of course. If you want to know more, see the code in /python/mozbuild/mozbuild/frontend and /python/mozbuild/mozbuild/backend.

Build System Future

With the introduction of moz.build files, the intent is to eventually completely eliminate Makefile.in and have all the build definition live in moz.build files.

Doing this all at once would have been next to impossible. So, we decided to eliminate Makefile.in gradually. The first step is what landed today: essentially moving DIRS variables out of Makefile.in and into moz.build files. Next, we will be eliminating empty Makefile.in (bug 844635) and will be moving more parts of the build definition from Makefile.in to moz.build files. The next part to move over will likely be IDLs (bug 818246). After that, it may be exported files (EXPORTS in Makefile.in parlance). And repeat until we have no more Makefile.in in the tree.

Each migration of build definition data to moz.build files will likely occur in two phases:

  1. A largely mechanical move of the variables from Makefile.in to moz.build.
  2. Better build backend integration resulting from the move.

In phase #1, we will essentially cut and paste variable assignments to moz.build files. make will read the same variables it does today and perform the same actions. The only difference is the values in these variables will be defined in moz.build files.

In phase #2, we will leverage the fact that our build definition now has an API. We will change our existing make backend to be more efficient. For example, we should soon be able to compile IDLs and copy exported headers without make traversing through the directory tree at build time. We will be able to do this because the moz.build traversal at pre-build time sees data from all moz.build files and with this complete world view is able to produce more optimal make files than what you would get if you recursed into multiple directories. In short: it will make the build faster.

Once we have a sufficient portion of the build definition moved to moz.build files we will be able to experiment with new build backends (like Tup), look into automatic Visual Studio project generation, and more easily experiment with different ways of building (such as precompiled headers, fewer compiler process invocations, etc). These should all contribute to faster build times.

Frequently Asked Questions

What impact will I see from this change?

If you never touched Makefile.in files in the tree, you should not notice anything different about how the tree builds or how the build system works. You should have nothing to fear.

The most obvious changes to the source tree are:

  1. There is a moz.build file in almost every directory now.
  2. The variables related to directory traversal (those containing DIRS in their name) are now defined in moz.build files instead of Makefile.in.
  3. If your Makefile.in contains a variable that has been moved to moz.build files, make will spew an error when processing that file and the build will fail.

Will this change how I build?

It shouldn't. You should build the tree just like you always have. Most of you won't notice any differences.

If you see something weird, speak up in #build or file a bug if you are really confident it is wrong.

What are the risks to this change?

The migration of variables from Makefile.in to moz.build files is largely mechanical and significant portions are done manually. This can be a mind-numbing and tedious process. Not helping things is the fact that Splinter's review interface for these kinds of patches is hard to read.

This all means that there is a non-trivial risk for transcription errors. All it takes is an inverted conditional block and all of a sudden large parts of the tree are no longer built, for example.

We have established bug 846825 to investigate any oddities from the initial transfer. Developers are encouraged to help with the effort. Please spot check that your directories are still being built, tests run, etc. Pay special attention to changes made in the last 4 months as these parts of Makefile.in would have been bit rotted and more prone to data loss.

Parts of the tree not enabled in standard configurations are more prone to breakage due to less testing. i.e. build configurations not captured by TBPL have a higher chance of breaking.

Will this make the tree build faster?

Not yet. But eventually it will. This initial landing paves the groundwork to making the tree build faster (see the Future section above).

I see a lot of empty moz.build files!

Yes. Sorry about that. The good news is they shouldn't be empty for long. As things move from Makefile.in to moz.build we'll see fewer and fewer empty moz.build files. We'll also see fewer and fewer Makefile.in files once we start deleting empty Makefile.in.

If you want to know why we must have empty files, it's mainly for validation. If we allowed moz.build files to be optional, how would you detect a typo in a directory name? Directory exists? What if that directory exists but isn't supposed to have a moz.build file?

You bitrotted my patches!

Yes. I'm sorry. The transition period to moz.build files could be a little messy. There will be lots of changes to Makefile.in and moz.build files and lots of chances for bit rot. Uplifts could be especially nasty. (Although I don't believe many uplifts involve significant changes to the build configuration.)

This all means there is a strong incentive for us to complete the transition as quickly as possible.

Can I help with the transition to moz.build files?

Yes!

The transition is largely mechanical (at least phase #1). If you are interested in moving a variable or set of variables, hop in #build on IRC and speak up!

You said moz.build files are actually Python files?!

Yes they are! However, they are executed in a very tightly controlled sandbox. You can't import modules, open files, etc. UPPERCASE variable names are reserved and only a few functions are exposed. If you attempt to assign to an unknown UPPERCASE variable or assign an invalid value, an error will occur. This is already much better than Makefile because we can now detect errors earlier in the build process (rather than 15 minutes into a build).

What variables and functions are available in moz.build files?

If you run |./mach mozbuild-reference| you will see a print-out of all the variables, functions, and symbols that are exposed to the Python sandbox that moz.build files execute in. There are even tests that will fail the build if the sandbox contains symbols not in this output!

The output should be valid reSTructuredText (in case you want to convert to HTML for reading in your browser).

What if a moz.build file contains an error?

The build will break.

A lot of work has gone into making the output of moz.build errors human friendly and actionable. If you do something wrong, it won't just complain: it will tell you how to fix it!

Besides build times, how else will this improve the build system?

There are several ways!

As mentioned above, moz.build are more strict about what data is allowed to be defined. If you assign to an UPPERCASE variable, that variable must be known to the sandbox or else the assignment will error. This means that if you assign to an UPPERCASE variable, you know it has a side-effect. No more cargo culting of old, meaningless variables!

To change the behavior of moz.build files (add new variables or functions, change how makefile generation works, etc) will require changes to the code in /python/mozbuild. This code belongs squarely to the build module and requires appropriate review. A problem with Makefiles is that they have lots of foot guns by default and its easy for self-inflicted wounds to land in the tree without explicit build peer review. This problem largely goes away with moz.build files because the sandbox takes away all of make's foot guns.

The output of a moz.build execution is essentially a static data structure. It's easy to validate them for conformance. If we discover bad practices in our build definition, we can centrally add tests for them and enforce best practices.

We will also see user experience wins by moving data to moz.build files. Take mochitests for an example. We currently have several flavors (plain, browser, chrome, etc). Sometimes you cannot distinguish the flavor by the filename alone. With moz.build files, it will be easier to answer questions like "what mochitest flavor is this file?" mach could hook into this so you can type |mach mochitest path/to/file.html| instead of |mach mochitest-plain path/to/file.html|. Even better, you should just be able to type |mach path/to/test.html| and mach knows from the build definition that path/to/test.html is a plain mochitest file and assumes you want to run it. There are dozens of small development workflow wins to be gained here!

If I change a moz.build file, what happens?

If you change a moz.build file, then make should detect that it has changed and it will update the dynamically generated backend.mk file and reinvoke the requested build action. This should all happen automatically (just like Makefile.in to Makefile conversion works automatically).

My build seems to pause for a few seconds before starting!

A change to any moz.build file will cause a full traversal of the entire moz.build tree. On modern machines, this should only take 1-3 seconds. If your source tree is not in the page cache (and you need to load moz.build files from disk) or if you are on older hardware, this could be a little slower.

This is an unfortunate side-effect of having a whole world view of the build definition. The build delay incurred by these full scans should eventually be cancelled out by build backend optimizations resulting from having this whole world view, however.

The good news is this full scan should only occur if a mozbuild file changes. And, if you are performing make recursion, it should only happen once (not in every directory). If you notice multiple moz.build scanning-related pauses during builds, please file a bug in Core :: Build Config!

Finally, we are performing the reads on a single thread currently. We can throw more cores at the task if someone codes up a patch.

What happened to allmakefiles.sh?

It has been sacked. allmakefiles.sh was an optimization to perform all the Makefile.in to Makefile conversion in one go. The directory traversal performed by moz.build reading effectively replaces the role of allmakefiles.sh. Not only that, but the moz.build build definition is always up to date! allmakefiles.sh was typically out of sync with reality and was a burden to maintain.

Did we just invent our own build system?

Kinda. We invented a generic Python sandboxing infrastructure. Then we hooked up code to populate it with variables from our build system and told it how to perform file traversal by reading specific variables set during file execution. Then we hooked up code for taking the evaluated results of all those sandboxes and convert them into make files.

Conceptually, what we invented is like GYP but with a different config file format. We have dabbled with the idea of converting the parsed build definition into GYP classes and then leveraging GYP to produce Makefiles, Ninja files, Visual Studio Projects, etc. This would an interesting experiment!

If you want to call it a build system, call it a build system. However, it is currently tightly coupled to Mozilla's needs, so you can't just use it anywhere. The concept might be worth exploring, however.

Is there anything else you'd like to share?

I think we set the record for most parts in a bug: 61. Although, they are numbered 1-17, 19-20. Part 18 has 30+ sub-parts using letters from the English and Greek alphabet for identifiers. Part 61 uses the infinity symbol as its number. See the pushlog.

Finally, I'd like to thank everyone who helped with this effort. The bug itself was only 6 months old and had active development off and on for a lot of it. Ted Mielczarek and Mike Hommey provided invaluable feedback on the core build system patches. A number of module owners stepped in to lend eyes to the mechanical conversion of their files. Last but not least, Ms2ger provided invaluable review aid on many of the patches. The work was so good that we all agreed that an Ms2ger f+ was good enough for a build peer rs! If reviewing the patches wasn't enough, Ms2ger also oversaw the tree closure and merging of the landing. I don't know how I will repay this debt.

Any more questions?

If you have more questions, drop in #build on irc.mozilla.org and ask away.


Thoughts on Logging - Part 1 - Structured Logging

December 06, 2012 at 09:55 PM | categories: Mozilla, logging

Forward

For reasons I can't explain, I have a keen interest in computer logging systems. Perhaps it stems from my frustration that such a seemingly trivial task can often be quite complicated. Perhaps it's from the scaling challenges inherent with inevitably needing to analyze terabytes of logged data. Perhaps it's from my experience designing highly-available distributed server application and needing to use logging for real-time monitoring and decision making. Whatever the root causes, I've always wanted to create a brain dump of my thoughts on logging. So, here we go!

Introduction

Logging. Practically every computer program does it in some form or another. The standard libraries for many programming languages provide a logging facility. But, have you ever stopped to think about the side effects of logging or why you log the way you do?

In this first post, I will briefly define what I consider logging and will talk about something called structured logging.

The Definition of Logging

For the scope of these posts, I will define logging to be:

An emitted stream of distinct messages from a running application

A message can be pretty much anything. They are often strings. But, they could be binary data. The transport mechanism is undefined. Although, most logging systems typically write to a file (eventually).

I should also call out the difference between reading/interpreting logs and transporting them. When I say reading a log, I'm referring to a process where an entity (either human or machine) reads the individual message content and infers something from it. By contrast, a transporter treats messages as a black box and just shuffles bits around. Transporters also typically send or store logging messages somewhere.

It's worth explicitly noting that this definition encompasses both tradtional print() based logging (writing a line to a file or output) as well as things like transaction logs or recording events as new rows in a database.

The Multiple Use Cases of Logging

Before I get to some of the interesting aspects of logging, let's identify some of the use cases of logging (in no particular order):

  1. Recording application errors (e.g. exceptions and stack traces)
  2. Low-level information to aid in debugging or human observation
  3. Application monitoring (includes metrics and alerting)
  4. Business analytics (use the logged data to help make decisions)
  5. Security auditing

Now that we have the basics out of the way, let's move on to the crux of this post.

Logging for Humans or Machines

I would argue that every log message is intended to be interpreted by either a human or a machine.

If a log message is only going to be used to help a human debug a system or casually see what's going on, it's intended for humans. If a log message is going to be used for machine analysis later (e.g. monitoring, alerting, business analytics, security auditing, etc), it's intended for machines.

This is an important distinction. If it isn't clear, please re-read the prior paragraphs. Now might also be a good time for you to audit the logs in your life and assign human or machine labels to them.

The Problem

The problem with logging as most of us know it is that it is optimized for human consumption, even if machines are the intended consumer.

At the heart of most logging systems we log a string, a char *, an array of characters that can easily be read by humans. For example, a simple web application might produce log files containing lines like:

2012:11:24T17:32:23.3435 INFO gps@mozilla.com successfully logged in

Human readability is a great feature to have. However, it comes at a price: more difficult machine consumption.

By optimizing for human consumption, we've introduced a new problem: log parsing.

In the above example log message, what the program has done is combined a few distinct fields of data (an event - logged in, the username, and the time) into a single text string. This is great for humans glancing at a log file. But that decision now necessitates downstream machine consumers to parse/decode that text back into its original, distinct fields.

I call this style of logging destructured or unstructured logging. We start with distinct fields of data and then destructure them into a fully free-form or semi-formed text. By doing so, we lose part of the original data structure and necessitate the need to reconstruct it (via parsing).

This seems silly to me.

Why are we formatting distinct fields of data into unified strings only to have them parsed into their original components again? If lossless and simple round-trip encoding/decoding is your goal, there are much better ways than freely-formed text! Going to text is easy (printf() or string concatenation are how it is commonly performed). Getting back to the original representation is the real problem.

Every persisted log message destined for machines will need to be decoded to some degree to facilitate analysis. However, by using destructured text, nearly every program and logging system forces this decoding problem onto its users by using an encoding that is difficult to code.

Every time you are analyzing a log file containing destructured text you need to solve the parsing/decoding problem. Every time you write an awk, grep, sed, perl, etc script, use regular expressions, etc to scrape logs you are solving this problem. How much time do you spend decoding/parsing logs versus actually doing something useful with the data? From my personal experience, a lot. And, parsing destructured text is a very fragile solution. What happens when the string changes, even by just a little? Does that throw off your decoder/parser?

Decoding log messages is a hard problem. Furthermore, it's a problem I'd rather not have.

Structured Logging to the Rescue

What if I told you there's a better way to log

One of my early professional mentors taught me that one of the best ways to solve a problem is to change the requirements so that the problem no longer exists or becomes easier to solve. (If that didn't immediately click with you, you might want to read it again because I feel it's an invaluable skill and one that many engineers don't employ). Applying that lesson here, how do we eliminate the problem of parsing destructured textual data back into its original fields? Simple: we don't emit destructured text! Instead, we emit data that is easily machine readable, where each individual field is preserved. This is called structured logging.

The primary benefit of structured logging is that you eliminate the parsing problem (or at least you change it to be a solved problem by using a common or well-defined encoding format). No tedious searching for data within free-form text and extracting it. Instead, the fields from the event-producing application are seamlessly preserved and loaded into downstream machine consumers.

A Closer Look

What does structured logging look like? Well, it depends. There is no standard for structured logging. When I first started writing this post in June 2011, a Google search for structured logging didn't yield much. Today, things aren't that much better. Although, there has been significant activity in the last 15 months. rsyslog has launched project Lumberjack, which has structured logging as a pillar. There is even someone sitting 10 ft from me at Mozilla's SF office doing structured logging for server applications (metlog).

All structured logging systems are different. But, the concepts are mostly the same.

From the perspective of a log-producing application, a function call to produce an unstructured message may look something like:

logger.info(username + " successfully logged in")

Whereas a call to produce that same message in structured form may look like:

logger.info("successful_login", who=username)

You'll immediately notice that the API retains the original pieces of data. In this example, we just have the event name (successful_login) and who (some variable value).

This will look different in every programming language, of course. And, depending on the structured logging mechanism, the common metadata produced with each message (e.g. time and log level) may differ. But the general idea is you preserve the rich data fields as they cross over into the logging facility rather than casting them to a string. Behind the scenes, the structured logger retains the fields (the original structure) and encodes them into something machine readable.

Put another way, structured loggers log objects, not strings.

How objects are encoded again depends on the structured logger implementation. JSON is a well-understood encoding format. An example JSON encoding of our example message might look like:

[1354317157.034534, "INFO", "sucessful_login", {"who": "gps@mozilla.com"}]

That's an array of current time, log level, event name/message type, and a mapping of the data fields.

Or, you could opt for a more efficient encoding format, such as protocol buffers (which is binary).

The actual encoding theoretically doesn't really matter as long as it can easily be decoded back into its original data. A good way to identify this trait is whether you need to implement a decoder/parser more than once per programming language or if you need to apply different decoding mechanisms based on the content of a message. If you do, you probably don't have a generic enough encoding format.

Going back to implementation details, what about a consumer? In JavaScript, our JSON structured logger from above might look something like:

for (let line of log) {
  let [time, level, action, fields] = JSON.parse(line)

  if (action == "successful_login") {
    let who = fields.who;

    // Do something interesting.
  } else if (action == "XXX") {
    // ...
  }
}

Compare this with a destructured parser:

for (let line of log) {
  if (line.endsWith("successfully logged in")) {
    // Find username.
    let spaceIndex = line.indexOf(" ");
    let username = line.substr(0, spaceIndex - 1);

    // Do something interesting.
  } else if (line.indexOf(...)) {
    //
  }
}

This example admittedly isn't very complex. But, you can already see some differences. Notably, it takes 1 line to completely parse any message using the structured logger. The unstructured text parser requires multiple lines of code per message type. One of those scales better than the other.

Before I conclude, it's worth mentioning something about event names/enumerations. You may notice I gave event name a dedicated field in my example. This is intentional. In a subsequent post, I will talk about the importance of event names/enumerations and schemas. Until then, just assume that having a unique event name is an important aspect of a well-designed structured logging system. Hence its inclusion in my examples above.

Conclusion

I think the vast majority of application logging is done suboptimally because it forces a hard problem of decoding/parsing onto consumers. Given that many logs are intended for machines, I think this is a problem that shouldn't need to exist and the software industry should rethink its current approach to logging by offering first-class structured logging facilities and subsequently encouraging their use.

If application logging retained the data behind messages, the problem of parsing/decoding log messages would largely vanish and we would be left with the real problem: actually doing something useful with logged data. That is the point of logging after all, isn't it?

Now, this may sound like I'm waiving my hands a bit. You are right: structured logging by itself is no silver bullet. There are many ways to do it suboptimally and to use it when you shouldn't. In subsequent posts to this series, I will dive deeper into how to properly design a (structured) logging system. I will also investigate some of the performance implications associated with different logging mechanisms. Stay tuned.


Changes to How mach Loads mozconfigs

December 05, 2012 at 10:30 AM | categories: Mozilla, Firefox

If you use mach to build Firefox, there is a chance you may run into some errors from a change that landed on mozilla-central Wednesday.

I reimplemented the mozconfig finding and loading logic in Python and with a nice shell wrapper that lets us more easily inspect side-effects of execution. The overall goal is to speed up mozconfig loading (fewer shell invocations) as well as to have a strong API for interacting with mozconfigs so we can do things like better validation. As a side effect, we now have test coverage of mozconfig behavior!

Anyway, because the logic for mozconfig finding and loading is going through a completely new mechanism, you may notice some weird behavior. Until a few minutes ago, $topsrcdir wasn't defined in the execution environment. This is now fixed. I think most complaints stemmed from this.

Another significant change is that mozconfig shell scripts are now executed in a shell that has set -e. This means that failing commands in your mozconfig will now abort execution. Before, execution would have silently continued unless you checked the exit code in your mozconfig. I think this is a change for the better, but it may break your mozconfigs.

If you encounter an issue, please follow the instructions in mach's output to file a bug. Please attach or link to your mozconfig.


« Previous Page -- Next Page »