Improving the Mozilla Build System Experience

May 07, 2012 at 04:45 PM | categories: Mozilla, Firefox

tl;dr User experience matters and developers are people too. I have proposed a tool to help developers interact with the Firefox build system and source tree.

I don't think I have to make my case when I state that Mozilla's build system end-user experience is lacking. There are lots of hurdles to overcome:

  • Determine where to obtain the source code.
  • Install a source control system (possibly).
  • Wait a long time for the large source repository to download.
  • Figure out how to launch the build process (unlike many other build systems, it isn't as simple as configure or make - although it is close).
  • Determine which dependencies need to be installed and install them (this can also take a long time).
  • Create a configuration file (mozconfig).
  • Build the tree (another long process).

If you want to contribute patches, there are additional steps:

  • Configure Mercurial with your personal info.
  • Configure Mercurial to generate patches in proper format.
  • Create a Bugzilla account (made simpler through Persona!).
  • Figure out the proper Bugzilla product/component (even I still struggle at this) so you can file a bug.
  • Figure out how to attach a patch to a bug and request review (it isn't intuitive if you've never used Bugzilla before).
  • Figure out who should review patch.
  • Learn how tests work so you can:
  • Write new tests.
  • Run existing tests to verify your changes.
  • Obtain commit access (so at least you can push to Try).
  • Learn how to push to Try.
  • Learn about TBPL.
  • Discover and use some of the amazing tools to help you (MXR, trychooser, mqext, etc).

Granted, not all of these are required. But, they will be for returning contributors. My point is that there are lots of steps here. And, every one of them represents a point where someone could get frustrated and bail -- a point where Mozilla loses a potential contributor.

Ever since I started at Mozilla, I've been thinking of ways this could be done better. While the Developer Guide on MDN has improved drastically in the last year, there are still many ways the process could be improved and streamlined.

In bug 751795, I've put forward the groundwork of a tool to make the developer experience more user friendly. Yes, this is a vague goal, so let me go in to further detail.

What I've submitted in the bug is essentially a framework for performing common actions related to the build system and source tree. These actions are defined as methods in Python code. Hooking it all together is a command-line interface which is launched via a short script in the root directory called mach (mach is German for do). Since actions speak louder than words, here's an example:

$ ./mach

usage: mach command [command arguments]

This program is your main control point for the Mozilla source tree.

To perform an action, specify it as the first argument. Here are some common
actions:

  mach build         Build the source tree.
  mach help          Show full help.
  mach xpcshell-test Run xpcshell test(s).

To see more help for a specific action, run:

  mach <command> --help

e.g. mach build --help

And, going into a sub-command:

$ ./mach xpcshell-test --help

usage: mach xpcshell-test [-h] [--debug] [TEST]

positional arguments:
  TEST         Test to run. Can be specified as a single JS file, an
               xpcshell.ini manifest file, a directory, or omitted. If
               omitted, the entire xpcshell suite is executed.

optional arguments:
  -h, --help   show this help message and exit
  --debug, -d  Run test in debugger.

Now, I've focused effort at this stage on performing actions after the initial build environment is configured. The reason is this is low-hanging fruit and easily allows me to create a proof-of-concept. But, I have many more ideas that I'd eventually like to see implemented.

One of my grand ideas is to have some kind of setup wizard guide you through the first time you use mach. It can start by asking the basics: "Which application do you want to build?" "Release or Debug?" "Clang or GCC?" "Should I install Clang for you?" It could also be more intelligent about installing dependencies. "I see you are using Ubuntu and are missing required packages X and Y. Would you like me to install them?" And, why stop at a command-line interface? There's no reason a graphical frontend (perhaps Tcl/Tk) couldn't be implemented!

The setup wizard could even encompass configuring your source control system for proper patch generation by ensuring your tree-local .hg/hgrc or .git/config files have the proper settings. We could even ask you for Bugzilla credentials so you could interact with Bugzilla directly from the command-line.

Once we have all of the basic configs in place, it's just a matter of hooking up the plumbing. Want to submit a patch for review? We could provide a command for that:

./mach submit-patch

"refactor-foo" is currently on top of your patch queue.

Submit "refactor-foo"?
y/n: y

Enter bug number for patch or leave empty for no existing bug.
Bug number:

OK. A new bug for this patch will be created.

Please enter a one-line summary of the patch:
Summary: Refactor foo subsystem

Is the patch for (r)eview, (f)eedback, or (n)either?
r/f/n: r

I've identified Gregory Szorc (:gps) as a potential reviewer for
this code. If you'd like someone else, please enter their IRC
nickname or e-mail address. Otherwise, press ENTER.
Reviewer:

I'm ready to submit your patch. Press ENTER to continue or CTRL+C to
abort.

Bug 700000 submitted! You can track it at
https://bugzilla.mozilla.org/show_bug.cgi?id=700000

The framework is extremely flexible and extensible for a few reasons. First, it encourages all of the core actions to be implemented as Python modules/methods. Once you have things defined as API calls (not shell scripts), the environment feels like a cohesive library rather than a loose collection of shell scripts. Shell scripts have a place, don't get me wrong. But, they are hard to debug and test (not to mention performance penalties on Windows). Writing code as reusable libraries with shell scripts only being the frontend is a more robust approach to software design.

Second, the command-line driver is implemented as a collection of sub-commands. This is similar to how version control systems like Git, Mercurial, and Subversion work. This makes discovery of features extremely easy: just list the supported commands! Contrast this to our current build system, where the answer is to consult a wiki (with likely out-of-date and fragmented information) or gasp try to read the makefiles in the tree.

My immediate goal for bug 751795 is to get a minimal framework checked in to the tree with a core design people are content with. Once that is done, I'm hoping other people will come along and implement additional features and commands. Specifically, I'd like to see some of the awesome tools like mqext integrated such that their power can be harnessed without requiring people to first discover they exist and second install and configure them. I think it is silly for these obvious productivity wins to go unused by people ignrant of their existence. If they are valuable, let's ship them as part of a batteries included environment.

In the long run, I think there are many more uses for this framework. For starters, it gives us a rallying point around which to organize all of the Python support/tools code in the tree. Currently, we have things spread all over the place. Quite frankly, it is a mess. I'd like to have a unified site-packages tree with all our Python so things are easier to locate and thus improve.

If nothing else, the tool provides a framework for logging and formatting activities in a unified way. There are separate log streams: one for humans, one for machines. Under the hood, they both use the saming logging infrastructure. When messages are logged, the human stream is formatted as simple sentences (complete with terminal encodings and colorization). The machine-destined log stream is newline-delimited JSON containing the fields that were logged. This allows analysis of output without having to parse strings. This is how all log analysis should be done. But, that's for another post. Anyway, what this all means is that the output for humans can be more readable. Colors, progress bars: we can do that now.

Over time, I imagine some may want to move logic out of configure and makefiles and into this tool (because Python is easier to maintain and debug, IMO). I would love to see that too. But, I want to stress that this isn't a focus right now. I believe this framework should be supplemental in the beginning and the official build system should not rely on it. Maybe that changes in the future. Time will tell.

Anyway, this project is currently just my solo effort. This isn't captured on a roadmap or anyone's quarterly goals. There is no project page listing planned features. If you are interested in helping, drop me a line and/or join in on the bug. Hopefully the core framework will land soon. Once it does, I'm hoping for an explosion of new, user-friendly features/commands to make the overall Firefox development experience smoother.


Comparing the Security and Privacy of Browser Syncing

April 08, 2012 at 09:00 PM | categories: Mozilla, security, browsers, Firefox, internet

Many popular web browsers offer built-in synchronization of browser data (history, bookmarks, passwords, etc) across devices. In this post, I examine the data security and privacy aspects of some of them.

Chrome

Chrome and Chromium have comprehensive support for browser sync.

When you sign in to Chrome (using your Google Account credentials), Chrome prompts you to set up sync. By default, all data types are uploaded to Google's servers.

The default behavior is for Chrome to encrypt your passwords before uploading them to the server. All of your remaining data (history, bookmarks, etc) is uploaded to Google unencrypted. This means anyone with access to Google's servers has full access to your history, etc.

Access to the uploaded data is governed by the Google Chrome Privacy Notice. This policy (pulled on April 3, 2012) states that the sync data is governed by the unified Google Privacy Policy. This policy states (as pulled on April 4, 2012):

We use the information we collect from all of our services to
provide, maintain, protect and improve them, to develop new ones,
and to protect Google and our users. We also use this information
to offer you tailored content – like giving you more relevant
search results and ads.

In other words, you are granting Google the ability to use your synced data.

An advanced settings dialog as part of the sync setup allows users to opt in to local encryption of all data - not just passwords - simply by clicking a checkbox. This same dialog also allows users to choose an alternate passphrase (not your Google Account password) for encrypting data.

For encrypted data, Chrome uses an encryption scheme called Nigori. An Overview and protocol details are available from the author's website.

This encryption scheme takes the user-supplied passphrase and uses PBKDF2 to derive keys. It first derives a 64 bit salt key, Suser, using 1001 iterations of PBKDF2 with SHA1 using the username as the salt. Then, it performs 3 more PBKDF2 derivations to produce three 128 bit keys from the original passphrase using the newly-derived salt key, producing Kuser, Kenc, and Khmac. For these, the PBKDF2 iteration counts are 1002, 1003, and 1004, respectively. Kuser and Kenc use AES as the PBKDF2 algorithm. Kmac uses SHA-1. Kuser is used to authenticate the client with the server. Kenc and Kmac are used to encrypt and sign data, respectively. Data is encrypted with AES-128 in CBC mode with a 16 byte IV. (It is worth noting that Chrome does not use a cryptographically-secure random number generator for the IV. I don't believe this amounts to anything more than a mild embarassment in this case.)

When someone wishes to sync to a new Chrome instance, she simply enters her Google Account username and password (or custom sync passphrase) and data is downloaded from Google's servers and applied. The pre-PBKDF2 passphrase is all that is needed. The new Chrome instance remembers the passphrase and syncing is automatic from that point on.

Opera

Opera supports syncing via Opera Link. Opera Link supports syncing bookmarks, history, passwords, search engine plugins, and other data types.

Opera is not open source and I have not been able to find technical details on how Opera Link is implemented. The two sources I found are a blog post and the Guide to Using Opera Link.

From those two documents, we know that Opera locally encrypts passwords. However, it is unclear whether other data is also encrypted locally. I can interpret the blog post to go either way. (If someone knows, please leave a comment with some kind of proof and I'll update this post.)

The blog post gives a high-level overview of how encryption works. A lone comment is the only source of technical details:

for encryption we use AES-128, and we use a random salt that is
part of each "blob" (one blob is a single field in each password
entry)

As commenters in that post have pointed out, that is still very short on technical details.

What I think is going on is that when you initially set up Opera Link, it generates a full-entropy 128 bit key from a random number generator. Uploaded data is encrypted with this key using AES-128 with a randomly-generated IV (or salt using terms from the blog post). The ciphertext and the IV are uploaded to Opera's servers. There may be HMAC or some other form of message verification involved, but I could find no evidence of that.

Since Opera Link is tied to your Opera Account password, I'm guessing that Opera uses PBKDF2 to derive a key from the password. It then uses this key to symmetrically encrypt the randomly-generated encryption key. It then uploads the encrypted encryption key to Opera's servers.

When someone wishes to sync with a new Opera instance, she simply enters her Opera Account credentials on the new Opera and Opera Link is set up automatically. This is a one-time set-up process.

Data uploaded with Opera Link is goverened by an Opera Link Privacy Policy. This policy states (pulled on April 4, 2012):

Opera will never disclose, share, or distribute an individual’s
Linked data to any third party except where required by law or
regulation, or in cases where you have chosen to grant access to your
data to an Opera or third party application or service using Opera
Link API. Opera restricts internal access to this information
exclusively to those who need it for the operation of the Link
service.

Safari

Safari supports syncing via iCloud. Its offerings appear to currently be limited to bookmarks, possibly because iCloud is a relatively new offering from Apple.

Configuration of iCloud is something that typically happens outside of Safari at the OS level. And, iCloud is deeply tied to your Apple ID. Users typically sign up for an Apple ID then enable iCloud support for a Safari feature (currently just bookmarks). During Apple ID setup, iCloud asks you some security questions. To connect a new device, you simply sign in to Apple ID, enable iCloud, and things just work.

Technical details of iCloud's security model are hard to come by. What we do appear to know is that everything except email and notes is encrypted on Apple's servers. However, the current theory is that this encryption only occurs after the data hits Apple's servers or that Apple has the encryption key and can read your data without your knowledge.

Data uploaded to iCloud is governed by the iCloud Terms and Conditions. This policy states (pulled on April 7, 2012):

You further consent and agree that Apple may collect, use, transmit,
process and maintain information related to your Account, and any
devices or computers registered thereunder, for purposes of providing
the Service, and any features therein, to you. Information collected
by Apple when you use the Service may also include technical or
diagnostic information related to your use that may be used by Apple
to support, improve and enhance Apple’s products and services.

If data is readable by Apple, this policy grants Apple the right to use it.

I'm not going to speculate about the technical details of Apple's encryption model because I couldn't find any non-speculative sources to base it on. If you want to read the speculation of others, see Ars Technica posts 1, 2, and 3 and Matthew Green's response.

Internet Explorer

Internet Explorer supports syncing of favorites via Windows Live Mesh.

This was discovered after this post was originally written, which is why there are no additional details.

Firefox

Firefox has built-in support for syncing browser data via Firefox Sync. It doesn't sync as many data types as Chrome, but the basics (history, bookmarks, passwords, add-ons) are all there.

When you initially create a Firefox Sync account, you are asked to create a Mozilla Services account by entering an e-mail address and password. Once this process is done, Firefox uploads data to the sync server in the background.

By default, all data is encrypted locally before being uploaded to the server. There is no option to disable client-side encryption.

Data uploaded to the server is governed by the Firefox Sync Privacy Policy. The summary (pulled on April 4, 2012) is quite clear:

* Your data is only used to provide the Firefox Sync service.
* Firefox Sync on your computer encrypts your data before sending
  it to us so the data isn’t sitting around on our servers in a
  usable form.
* We don’t sell your data or use ad networks on the Firefox Sync
  webpages or service.

While Mozilla provides a default server for Firefox Sync, the server is open source (see their documentation) and anybody can run a server and point their clients at it.

When a new account is created, Firefox creates a full-entropy 128 bit key via random number generation. It then derives two 256 bit keys through SHA-256 HMAC-based HKDF (RFC 5869). This key pair effectively constitutes a root encryption and signing key.

Firefox then generates a completely new pair of full-entropy 256 bit keys via random number generation. This key pair is used to encrypt and sign all data uploaded to the server. This second key pair is called a collection key.

Firefox takes your synced data, and performs AES-256 in CBC mode with a 16 byte randomly-generated IV (unique for each record) with the collection key's symmetric encryption key. The ciphertext is then hashed with the HMAC key. The ciphertext, HMAC, and IV are uploaded to the server.

The collection key is encrypted and signed with the root key pair and uploaded to the server as well. The root keys remain on the client and are never transmitted to the server.

Technical details of the full crypto model are available.

The e-mail and password for the Mozilla Services account are used to authenticate the HTTPS channel with the server using HTTP Basic Auth.

When you wish to connect another Firefox instance to your Firefox Sync account, the root 128 bit key must be transferred to the new device. Firefox supports manually entering the 128 bit key as a 26 character value. More commonly, Password Authenticated Key Exchange by Juggling (J-PAKE) is used. One device displays 12 characters and establishes a channel with a central brokering server. The same 12 characters are entered on the pairing device. The two devices establish a cryptographically secure channel between them and proceed to exchange the Mozilla Account credentials, server information, and the 128 bit root key. While the J-PAKE server is hosted by Mozilla, the channel is secured between both endpoints, so the server operator can't read the root key as it passes through it.

The new client then derives the root key pair via HKDF, downloads, verifies, and decrypts the collection key from the server, then uses that key pair for all subsequent encryption and verification operations.

Once a client has been paired, it holds on to the root key indefinitely and a user doesn't need to take any subsequent action for syncing to occur.

LastPass

LastPass isn't a browser, but a password manager that can be integrated with all the popular browsers. I thought it would be interesting to throw it into the comparison, especially since LastPass is perceived to have an excellent security model.

Technical details of LastPass's security model are available in the LastPass User Manual. The remaining details are found on a help desk answer.

LastPass encrypts all of your data locally before uploading it to the LastPass servers. It does this by making use of a master password.

Data uploaded to LastPass's servers is governed by a Privacy Statement. The summary that best reflects it (as pulled on April 4, 2012) is:

We don't allow you to send LastPass critically important information
like your usernames, passwords, account notes, and LastPass master
password; instead your LastPass master password is used locally to
encrypt the important data that's sent to us so that no one,
including LastPass employees ever can access it.

LastPass performs N iterations (default 500) of PBKDF2 using SHA256 over your master password to produce a 256 bit encryption key. It then produces one additional iteration to produce a login key. Data is encrypted locally using AES-256 with the encryption key derived from your master password. Encrypted data is uploaded to LastPass's servers. Your master password is never transmitted to LastPass. Instead, the login key is used to authenticate communications.

The LastPass web interface downloads encrypted blobs and decrypts them locally using the PBKDF2-derived encryption key.

To set up a new LastPass client, you download LastPass and present your username and master password. Typically, the master password needs to be presented every time you initially access your LastPass data (e.g. the first time you need to find a password after starting your browser).

Assessment

The following chart summarizes the security aspects of different browsers' sync features. Desirable traits for better security are bolded.

Product Encryption Defaults Can Encrypt Everything? Encryption Entropy Source Server Knows Decryption Key? Server-Side Data Recovery Difficulty
Chrome Passwords encrypted; everything else stored in cleartext Yes User-supplied passphrase Yes by default (Google Account password). No if using custom passphrase No effort for unencrypted data. 1001 PBKDF2-SHA1 + 1003 PBKDF2-AES iterations for encrypted data.
Opera Passwords encrypted; everything else unknown Unknown User-supplied passphrase Yes. Can't change. Unknown
Safari On remote disks only? No Unknown. User-supplied password? Yes (probably) No effort for Apple (apparently)
Firefox Everything Yes (default) 128 bit randomly generated key No 128 bit key + HKDF into AES-256
LastPass Everything (only syncs passwords and notes) Yes (default) User-supplied passphrase No Variable PBKDF2-SHA256 iterations (default 500)

So much about Safari is unknown, so it will be ignored.

Firefox and LastPass (and possibly Opera) are the only products that encrypt all data by default. Chrome (and possibly Opera) is the only product that does not encrypt all data by default.

Firefox and LastPass are the only products that don't send the entropy source to the server by default. Chrome uses the Google Account password by default and this is sent to Google when logging in to various services. Opera sends the password to Opera when logging in to your Opera Account. Google allows you to change the entropy source to a custom passphrase so Google doesn't receive the entropy source. Opera does not.

Sending the entropy source to the server is an important security consideration because it means you are giving the key to your data to someone else. Even if your data is encrypted locally, someone with the key can decrypt it. Services that send the entropy source to the server are subject to man-in-the-middle attacks and could be subverted by malicious or legal actions occurring on the server side (e.g. the service operator could be compelled through a subpoena to capture your entropy source and use it to decrypt your stored data, possibly without your knowledge).

Firefox is the only product whose encryption source is full-entropy. All other products rely on taking a user-supplied passphrase and using "key-stretching" via PBKDF2 to increase the cost of a brute-force search.

PBKDF2-derived encryption keys are common in the examined products. It is worth noting that PBKDF2 can be susceptible to dictionary and brute-force attacks because assumptions can be made about the input passphrase, such as its entropy and length. Systems often enforce rules on the source passphrase (e.g. between 5 and 15 characters and contains only letters and numbers). When cracking keys, you normally iterate through every possible permutation until you find one that works. When you can make assumptions about the input, you can eliminate a large number of these permutations. The products that use PBKDF2 are theoretically susceptible to this weakened brute-force search.

Since Firefox does not rely on PBKDF2, it is the only examined product not theoretically susceptible to a weakened brute-force search. Instead, an attacker would have to churn through every permutation of a 128 bit root key, which would take billions of computer-years. (See Brute-force attack on Wikipedia for more.)

Firefox's additional security comes at the price of more complex device setup. Firefox users need to physically have a copy of the 128 bit root key or physical access to 2 devices when pairing. All other products rely on a passphrase which the user can carry around seemlessly in her head. In addition, if the Firefox root encryption key is lost, it is more likely that your data is not recoverable because the key is not in your head.

Conclusion

Considering just the security and privacy aspects, I can only recommend two of the examined products: Firefox Sync and LastPass. I am recommending them because they encrypt all data locally by default and they do not send the encryption key source to the server. Of these two, Firefox Sync is more secure for reasons outlined above.

I can't recommend Safari because details about iCloud's encryption strategy are unknown. Furthermore, it appears Apple can recover your (possibly) encrypted data without your knowledge.

I can't recommend Opera because your encryption key source (your Opera Account password) is sent to Opera's servers. Furthermore, not enough technical details of Opera Link are available to vet it.

I can't recommend Chrome (at least in its default configuration) because it doesn't encrypt all data locally (only passwords) and you periodically send the encryption key source (your Google Account password) to Google's servers when using other Google services. If you enable encryption of all data and use a custom passphrase, Chrome's security model is essentially identical to LastPass's and thus can be recommended.

Disclaimer: I am currently employed by Mozilla and work on Firefox Sync. That being said, I believe this post has been objective and not subject to my bias towards Firefox and/or Firefox Sync. If you feel differently, please leave a comment and I will adjust the post as necessary.

Edit 2012-04-16 Note that IE supports Bookmark sync via Windows Live Mesh (thanks to Nick Richards for pointing it out in the comments). Also removed an incorrect sentence from the Chrome section which incorrectly stated that the PBKDF2 iteration count was part of the hash in each iteration.


« Previous Page