Thoughts on Logging - Part 1 - Structured Logging

December 06, 2012 at 09:55 PM | categories: Mozilla, logging

Forward

For reasons I can't explain, I have a keen interest in computer logging systems. Perhaps it stems from my frustration that such a seemingly trivial task can often be quite complicated. Perhaps it's from the scaling challenges inherent with inevitably needing to analyze terabytes of logged data. Perhaps it's from my experience designing highly-available distributed server application and needing to use logging for real-time monitoring and decision making. Whatever the root causes, I've always wanted to create a brain dump of my thoughts on logging. So, here we go!

Introduction

Logging. Practically every computer program does it in some form or another. The standard libraries for many programming languages provide a logging facility. But, have you ever stopped to think about the side effects of logging or why you log the way you do?

In this first post, I will briefly define what I consider logging and will talk about something called structured logging.

The Definition of Logging

For the scope of these posts, I will define logging to be:

An emitted stream of distinct messages from a running application

A message can be pretty much anything. They are often strings. But, they could be binary data. The transport mechanism is undefined. Although, most logging systems typically write to a file (eventually).

I should also call out the difference between reading/interpreting logs and transporting them. When I say reading a log, I'm referring to a process where an entity (either human or machine) reads the individual message content and infers something from it. By contrast, a transporter treats messages as a black box and just shuffles bits around. Transporters also typically send or store logging messages somewhere.

It's worth explicitly noting that this definition encompasses both tradtional print() based logging (writing a line to a file or output) as well as things like transaction logs or recording events as new rows in a database.

The Multiple Use Cases of Logging

Before I get to some of the interesting aspects of logging, let's identify some of the use cases of logging (in no particular order):

  1. Recording application errors (e.g. exceptions and stack traces)
  2. Low-level information to aid in debugging or human observation
  3. Application monitoring (includes metrics and alerting)
  4. Business analytics (use the logged data to help make decisions)
  5. Security auditing

Now that we have the basics out of the way, let's move on to the crux of this post.

Logging for Humans or Machines

I would argue that every log message is intended to be interpreted by either a human or a machine.

If a log message is only going to be used to help a human debug a system or casually see what's going on, it's intended for humans. If a log message is going to be used for machine analysis later (e.g. monitoring, alerting, business analytics, security auditing, etc), it's intended for machines.

This is an important distinction. If it isn't clear, please re-read the prior paragraphs. Now might also be a good time for you to audit the logs in your life and assign human or machine labels to them.

The Problem

The problem with logging as most of us know it is that it is optimized for human consumption, even if machines are the intended consumer.

At the heart of most logging systems we log a string, a char *, an array of characters that can easily be read by humans. For example, a simple web application might produce log files containing lines like:

2012:11:24T17:32:23.3435 INFO gps@mozilla.com successfully logged in

Human readability is a great feature to have. However, it comes at a price: more difficult machine consumption.

By optimizing for human consumption, we've introduced a new problem: log parsing.

In the above example log message, what the program has done is combined a few distinct fields of data (an event - logged in, the username, and the time) into a single text string. This is great for humans glancing at a log file. But that decision now necessitates downstream machine consumers to parse/decode that text back into its original, distinct fields.

I call this style of logging destructured or unstructured logging. We start with distinct fields of data and then destructure them into a fully free-form or semi-formed text. By doing so, we lose part of the original data structure and necessitate the need to reconstruct it (via parsing).

This seems silly to me.

Why are we formatting distinct fields of data into unified strings only to have them parsed into their original components again? If lossless and simple round-trip encoding/decoding is your goal, there are much better ways than freely-formed text! Going to text is easy (printf() or string concatenation are how it is commonly performed). Getting back to the original representation is the real problem.

Every persisted log message destined for machines will need to be decoded to some degree to facilitate analysis. However, by using destructured text, nearly every program and logging system forces this decoding problem onto its users by using an encoding that is difficult to code.

Every time you are analyzing a log file containing destructured text you need to solve the parsing/decoding problem. Every time you write an awk, grep, sed, perl, etc script, use regular expressions, etc to scrape logs you are solving this problem. How much time do you spend decoding/parsing logs versus actually doing something useful with the data? From my personal experience, a lot. And, parsing destructured text is a very fragile solution. What happens when the string changes, even by just a little? Does that throw off your decoder/parser?

Decoding log messages is a hard problem. Furthermore, it's a problem I'd rather not have.

Structured Logging to the Rescue

What if I told you there's a better way to log

One of my early professional mentors taught me that one of the best ways to solve a problem is to change the requirements so that the problem no longer exists or becomes easier to solve. (If that didn't immediately click with you, you might want to read it again because I feel it's an invaluable skill and one that many engineers don't employ). Applying that lesson here, how do we eliminate the problem of parsing destructured textual data back into its original fields? Simple: we don't emit destructured text! Instead, we emit data that is easily machine readable, where each individual field is preserved. This is called structured logging.

The primary benefit of structured logging is that you eliminate the parsing problem (or at least you change it to be a solved problem by using a common or well-defined encoding format). No tedious searching for data within free-form text and extracting it. Instead, the fields from the event-producing application are seamlessly preserved and loaded into downstream machine consumers.

A Closer Look

What does structured logging look like? Well, it depends. There is no standard for structured logging. When I first started writing this post in June 2011, a Google search for structured logging didn't yield much. Today, things aren't that much better. Although, there has been significant activity in the last 15 months. rsyslog has launched project Lumberjack, which has structured logging as a pillar. There is even someone sitting 10 ft from me at Mozilla's SF office doing structured logging for server applications (metlog).

All structured logging systems are different. But, the concepts are mostly the same.

From the perspective of a log-producing application, a function call to produce an unstructured message may look something like:

logger.info(username + " successfully logged in")

Whereas a call to produce that same message in structured form may look like:

logger.info("successful_login", who=username)

You'll immediately notice that the API retains the original pieces of data. In this example, we just have the event name (successful_login) and who (some variable value).

This will look different in every programming language, of course. And, depending on the structured logging mechanism, the common metadata produced with each message (e.g. time and log level) may differ. But the general idea is you preserve the rich data fields as they cross over into the logging facility rather than casting them to a string. Behind the scenes, the structured logger retains the fields (the original structure) and encodes them into something machine readable.

Put another way, structured loggers log objects, not strings.

How objects are encoded again depends on the structured logger implementation. JSON is a well-understood encoding format. An example JSON encoding of our example message might look like:

[1354317157.034534, "INFO", "sucessful_login", {"who": "gps@mozilla.com"}]

That's an array of current time, log level, event name/message type, and a mapping of the data fields.

Or, you could opt for a more efficient encoding format, such as protocol buffers (which is binary).

The actual encoding theoretically doesn't really matter as long as it can easily be decoded back into its original data. A good way to identify this trait is whether you need to implement a decoder/parser more than once per programming language or if you need to apply different decoding mechanisms based on the content of a message. If you do, you probably don't have a generic enough encoding format.

Going back to implementation details, what about a consumer? In JavaScript, our JSON structured logger from above might look something like:

for (let line of log) {
  let [time, level, action, fields] = JSON.parse(line)

  if (action == "successful_login") {
    let who = fields.who;

    // Do something interesting.
  } else if (action == "XXX") {
    // ...
  }
}

Compare this with a destructured parser:

for (let line of log) {
  if (line.endsWith("successfully logged in")) {
    // Find username.
    let spaceIndex = line.indexOf(" ");
    let username = line.substr(0, spaceIndex - 1);

    // Do something interesting.
  } else if (line.indexOf(...)) {
    //
  }
}

This example admittedly isn't very complex. But, you can already see some differences. Notably, it takes 1 line to completely parse any message using the structured logger. The unstructured text parser requires multiple lines of code per message type. One of those scales better than the other.

Before I conclude, it's worth mentioning something about event names/enumerations. You may notice I gave event name a dedicated field in my example. This is intentional. In a subsequent post, I will talk about the importance of event names/enumerations and schemas. Until then, just assume that having a unique event name is an important aspect of a well-designed structured logging system. Hence its inclusion in my examples above.

Conclusion

I think the vast majority of application logging is done suboptimally because it forces a hard problem of decoding/parsing onto consumers. Given that many logs are intended for machines, I think this is a problem that shouldn't need to exist and the software industry should rethink its current approach to logging by offering first-class structured logging facilities and subsequently encouraging their use.

If application logging retained the data behind messages, the problem of parsing/decoding log messages would largely vanish and we would be left with the real problem: actually doing something useful with logged data. That is the point of logging after all, isn't it?

Now, this may sound like I'm waiving my hands a bit. You are right: structured logging by itself is no silver bullet. There are many ways to do it suboptimally and to use it when you shouldn't. In subsequent posts to this series, I will dive deeper into how to properly design a (structured) logging system. I will also investigate some of the performance implications associated with different logging mechanisms. Stay tuned.


Changes to How mach Loads mozconfigs

December 05, 2012 at 10:30 AM | categories: Mozilla, Firefox

If you use mach to build Firefox, there is a chance you may run into some errors from a change that landed on mozilla-central Wednesday.

I reimplemented the mozconfig finding and loading logic in Python and with a nice shell wrapper that lets us more easily inspect side-effects of execution. The overall goal is to speed up mozconfig loading (fewer shell invocations) as well as to have a strong API for interacting with mozconfigs so we can do things like better validation. As a side effect, we now have test coverage of mozconfig behavior!

Anyway, because the logic for mozconfig finding and loading is going through a completely new mechanism, you may notice some weird behavior. Until a few minutes ago, $topsrcdir wasn't defined in the execution environment. This is now fixed. I think most complaints stemmed from this.

Another significant change is that mozconfig shell scripts are now executed in a shell that has set -e. This means that failing commands in your mozconfig will now abort execution. Before, execution would have silently continued unless you checked the exit code in your mozconfig. I think this is a change for the better, but it may break your mozconfigs.

If you encounter an issue, please follow the instructions in mach's output to file a bug. Please attach or link to your mozconfig.


Firefox Build System Presentation

November 30, 2012 at 02:00 PM | categories: Mozilla, Firefox, build system

In case you missed it, I gave a presentation on the state of Firefox's build system yesterday.

You can watch it and view the slides online.

If you build Firefox from source regularly, you should definitely at least skim through the slide deck.

I'm not an HTML expert, so my apogolies for bad UI on the interactive slides. You may need to press enter to select items in dropdown menus. Also, the interactive slides are a bit resource intensive. If the slide deck is really slow, turn off those elements. I've also only tested the slides in Firefox 19 and 20. My apologies if they don't work everywhere.


Mach Has Landed

September 26, 2012 at 05:30 PM | categories: Mozilla, Firefox

Hacking on Firefox is hard. Compiling the source tree for the first time is a daunting experience. If you manage to do that, you still need to figure out how to submit patches, write and run tests, etc. There is no shortage of points where people can get confused, make mistakes, or just give up because the barrier is too high.

I have strong convictions about making the overall developer experience better (developers are users too, after all). The easier you make the developer experience, the greater the chances of retaining developers. And retaining developers means more contributions: more features and fewer bugs. This translates to a better state for the Mozilla Project. This makes the world a better place.

Since practically my first day working at Mozilla, I've been experimenting with ways to make contributing to Firefox easier by improving the build system or interaction with it.

With a lot of help, I've finally succeeded in landing something into the Firefox source tree that I think will ultimately lead to a much better developer experience.

It's called mach (German for do) and if you pull the latest version of mozilla-central (Firefox's main source code repository), you can run mach today.

Running Mach

You can run Mach by simply executing it from the root directory in the repository:

$ ./mach

Ideally, I shouldn't have to tell you anything else: mach's output should provide you all the instruction you need to use it. If it doesn't, that is a bug and it should be fixed.

Realistically, mach is still in early development and its user experience still leaves a lot to be desired.

Because technical usage docs belong in a medium that is easily discoverable and where the community can improve on them (not a post on a personal blog), you can find additional usage information in the mach article on MDN. The mach README holds more technical information for people wanting to poke at the inner workings.

Mach does require Python 2.7. The build system will likely soon require Python 2.7 as well. So, if you don't have Python 2.7, you should upgrade now before you lose the ability to build the tree. Conveniently, the tree now has a bootstrap script which covers the installation of Python. So, Python 2.7 should just be a simple command away.

Features

Why would you use mach? Good question! Compared to the existing out-of-the-box experience, mach adds:

  • Ability to run xpcshell and mochitest tests from the source directory. This means you can tab-complete test filenames in your shell and it just works.
  • Parsing of compiler warnings (currently supports Clang 3.1 and MSVC formats) into a unified warnings database (actually a JSON file). After builds you can run ./mach warnings-list or ./mach warnings-summary to get a concise list without having to look at build logs.
  • A single command-line interface where you can easily discover new functionality. Just run ./mach help (try doing that with make!).

Naysayers will say this is a paltry list. They are correct. I have bigger plans. But, you have to start somewhere.

Goals and Future Ideas

The overall goal of mach is to improve the developer experience of developing Firefox and other Gecko applications. It does this by providing a convenient, central command in the root directory of the repository that acts as both an oracle to discover new commands and functionality (./mach help) as well as a proxy to execute them. You don't need to worry about environment variables, working directories, or finding some random script hidden in the bowells of the source tree. You just run a single command and the world is brought to you. No build documentation. No outdated wikis. No copying commands into your shell. No having to scour random blogs for useful tools. You just clone the repository, run a command, see what you can do, and get stuff done. Your shell-literate grandmother could do it.

Mach should be your springboard to in-tree developer tools and increased productivity. You shouldn't need anything except a copy of the source tree and mach.

Like Git and Mercurial, mach is powered by the concept of sub-commands/actions. So, one simply needs to register a new sub-command with the mach driver. This involves writing a Python class. Once you do that, people can discover and use that functionality. All batteries are included with a copy of mozilla-central.

As stated above, the current set of commands is rather small. But, the sky is the limit. Here are some of my ideas:

  • Ability to upload, download, and apply patches from Bugzilla (Burak Yiğit Kaya, Jeff Hammel, Clint Talbert and I have already talked about this -- progress tracked in bug 774141).
  • Automatically configure Mercurial with optimal settings (ensure user info is set, proper lines of diff context, enable mqext, etc). Tracked bug 794580.
  • Submit Try builds. The trychooser Mercurial extension could easily live as a mach subcommand! Tracked in bug 774137.
  • Identify Bugzilla components and/or reviewers from files touched by patch. It's all in the history of the touched files and the history of the old bugs referenced by those commits! Bug 774145.
  • Interaction with the self-serve build API. That thing on TBPL to control individual builds - we could make a CLI interface for it. (Steve Fink and/or Jeff Hammel already have code for this - it just needs to be integrated). Bug 774147.

If you have an idea for a feature, please file a bug. Please note there are many features on file already. However, some obvious ones such as integration with missing test suites have yet to be filed (at least at the time I wrote this post).

If you wrote an awesome developer tool and would like others to use it (without having to rely on people discovering it by reading a corner of the Internet), add it to mach! Use mach as a wedge to get more exposure and users. File a bug. I will happily r+ patches that add useful developer functionality to the tree.

What this Means / Longer Term Future

While there is no timetable, mach will eventually replace client.mk. client.mk, like mach, is essentially a CLI driver for the build system. Having the driver implemented in Python rather than make has many compelling advantages. I could probably write a whole post on it, but I'll spare the world from my ranting.

Anyway, this all means that you may want to start re-training your muscle memory now. Stop typing make and start typing mach. If you need to type make because mach does not provide a feature, this is a missing feature from mach. File a bug and request a new mach feature!

I want to condition people to stop typing make, especially in the object directory. There are drastic changes to the build system in the works (bug 784841 is the tip of the iceburg). These changes will require the build system to be treated as a black box. So, things like invoking make from the object directory will eventually break. You'll need to go through an intelligent tool to invoke the build system. Mach will be that tool.

Thanks

I would like to single out the following individuals for helping to land mach:

  • Jeff Hammel for doing the bulk of the reviewing. He shares my vision for mach and how it will make the overall developer experience much more pleasant and how this will translate to better things for The Project. In my mind, Jeff deserves as much credit for landing mach as I do.
  • Mike Hommey and Ms2ger for review help. Mike Hommey helped identify a lot of issues with build system integration. Ms2ger provided lots of general feedback on Python issues and API design.
  • Mike Connor (my manager) for allowing me to work on this. It's not related to my job description in any way so he could say I shouldn't be spending time on this. But, he realizes the positive impact this can have and has been supportive of it.

I hope you find mach useful!


Bootstrap Your System to Build Firefox

September 18, 2012 at 05:00 PM | categories: Mozilla, Firefox

If you've looked at the build instructions for Firefox lately, you may have noticed something new: support for system bootstrapping!

Now checked in to mozilla-central is a framework for ensuring your system is capable of building mozilla-central and Firefox. You just need to download and run a single Python script and it performs magic.

Kudos go out to a community contributor, kmm (name wasn't revealed) for doing the legwork for tracking down and verifying things worked on all the Linux distros. Richard Newman and Jeff Hammel also helped with code reviews. Just hours after it landed, Landry Breuil contributed support for OpenBSD!

Currently, bootstrapping works for the following:

  • Ubuntu Linux
  • Mint
  • CentOS 6.x
  • Fedora
  • OS X 10.6, 10.7, and 10.8
  • OpenBSD

If you want to add support for an additional OS, please file a Core : Build Config bug. Likewise, if you encounter an issue, please file a bug so others don't get tripped up by it!

Bootstrap support is definitely in its infancy. It still needs features like better prompting and opportunities for user choice (e.g. support MacPorts on OS X - currently it only works with Homebrew). But, I think it is much better than what existed previously, especially on OS X.

I consider this bootstrapping component just one piece in a larger mission to make developing and building Firefox (and other Gecko applications) easier. This should (hopefully) lead to more development involvement. The next component to land will likely be mach. It's (finally) been getting some review love (thanks Jeff Hammel!), so I'm optimistic it will land in the next few weeks.


« Previous Page -- Next Page »