Most of Mozilla gathered in Orlando in December for an all hands meeting. If you attended any of the plenary sessions, you probably heard people like David Bryant and Lawrence Mandel make references to improving the Firefox build system and related tools. Well, the cat is out of the bag: Mozilla will be investing heavily in the Firefox build system and related tooling in 2016!
In the past 4+ years, the Firefox build system has mostly been held together and incrementally improved by a loose coalition of people who cared. We had a period in 2013 where a number of people were making significant updates (this is when moz.build files happened). But for the past 1.5+ years, there hasn't really been a coordinated effort to improve the build system - just a lot of one-off tasks and (much-appreciated) individual heroics. This is changing.
Improving the build system is a high priority for Mozilla in 2016. And investment has already begun. In Orlando, we had a marathon 3 hour meeting planning work for Q1. At least 8 people have committed to projects in Q1.
The focus of work is split between immediate short-term wins and longer-term investments. We also decided to prioritize the Firefox and Fennec developer workflows (over Gecko/Platform) as well as the development experience on Windows. This is because these areas have been under-loved and therefore have more potential for impact.
Here are the projects we're focusing on in Q1:
- Turnkey artifact based builds for Firefox and Fennec (download pre-built binaries so you don't have to spend 20 minutes compiling C++)
- Running tests from the source directory (so you don't have to copy tens of thousands of files to the object directory)
- Speed up configure / prototype a replacement
- Telemetry for mach and the build system
- NSPR, NSS, and (maybe) ICU build system rewrites
- mach build faster improvements
- Improvements to build rules used for building binaries (enables non-make build backends)
- mach command for analyzing C++ dependencies
- Deploy automated testing for mach bootstrap on TaskCluster
Work has already started on a number of these projects. I'm optimistic 2016 will be a watershed year for the Firefox build system and the improvements will have a drastic positive impact on developer productivity.
This impacts you because pushes to hg.mozilla.org should now be significantly faster. For example, pushes to mozilla-inbound that used to take 15s now take 2s. Pushes to Try that used to take 45s now take 10s. (Yes, the old replication system really added a lot of overhead.) Pushes to hg.mozilla.org are still not as fast as they could be due to us running the service on 5 year old hardware (we plan to buy new servers this year) and due to the use of NFS on the server. However, I believe push latency is now reasonable for every repo except Try.
The new replication system opens the door to a number of future improvements. We'd like to stand up mirrors in multiple data centers - perhaps even offices - so clients have the fastest connectivity and so we have a better disaster recovery story. The new replication system facilitates this.
The new replication log is effectively a unified pushlog - something people have wanted for years. While there is not yet a public API for it, one could potentially be exposed, perhaps indirectly via Pulse.
It is now trivial for us to stand up new consumers of the replication log that react to repository events merely milliseconds after they occur. This should eventually result in downstream systems like build automation and conversion to Git repos starting and thus completing faster.
Finally, the new replication system has been running unofficially for a few weeks, so you likely won't notice anything different today (other than removal of some printed messages when you push). What changed today is the new system is enabled by default and we have no plans to continue supporting or operating the legacy system. Good riddance.
There have been a handful of updates to Mozilla's client-side Mercurial tools in the past week and all users are encouraged to pull down the latest commit of mozilla-central and run mach mercurial-setup to ensure their configuration is up to date.
Improvements to mach mercurial-setup include:
- ~ are now used in paths when appropriate
- Mercurial 3.5.2 is now the oldest version you can run without the wizard complaining
- The clone bundles feature is enabled when running Mercurial 3.6
- hg wip is available for configuration
- hgwatchman (make hg status significantly faster via background filesystem watching) is now available on OS X
- x509 host key fingerprints are no longer pinned if your Python and Mercurial version is modern (this pinning exists in Mercurial because old versions of Python don't verify x509 certificates properly)
- 3rd party Mercurial extensions are pulled with extensions disabled (to prevent issues with old, incompatible extensions crashing the hg pull invocation)
The firefoxtree extension has also been updated. It now uses the new namespaces feature of Mercurial so all Firefox labels/names/refs are exposed in a firefoxtree namespace instead of as tags. As part of this change, you will lose old tags created by this extension and will need to re-pull repos to recreate them as namespaced labels. hg log output will now look like the following:
changeset: 332223:0babaa3edcf9 fxtree: central parent: 332188:40038a66525f parent: 332222:c6fc9d77e86f user: Carsten "Tomcat" Book <email@example.com> date: Wed Dec 16 12:01:46 2015 +0100 summary: merge mozilla-inbound to mozilla-central a=merge
(Note the fxtree line.)
The firefoxtree extension also now works with hg share. i.e. if you hg share from a Firefox repository and hg pull from the source repo or any shared repo, the labels will be updated in every repo. This only works on newly-created shares. If you want to enable this on an existing share, see this comment.
A number of changes have been made to hg.mozilla.org over the past few days:
Bookmarks are now replicated from master to mirrors properly (before, you could have seen foo@default bookmarks appearing). This means bookmarks now properly work on hg.mozilla.org! (bug 1139056).
Universally upgraded to Mercurial 3.5.2. Previously we were running 3.5.1 on the SSH master server and 3.4.1 on the HTTP endpoints. Some HTML templates received minor changes. (bug 1200769).
Pushes from clients running Mercurial older than 3.5 will now see an advisory message encouraging them to upgrade. (bug 1221827).
Author/user fields are now validated to be a RFC 822-like value (e.g. "Joe Smith <firstname.lastname@example.org>"). (bug 787620).
We can now mark individual repositories as read-only. (bug 1183282).
We can now mark all repositories read-only (useful during maintenance events). (bug 1183282).
Pushlog commit list view only shows first line of commit message. (bug 666750).
Please report any issues in their respective bugs. Or if it is an emergency, #vcs on irc.mozilla.org.
Mercurial 3.6 (scheduled for release on or shortly after November 1) contains a number of improvements to cloning. In this post, I will describe a new feature to help server operators reduce load (while enabling clients to clone faster) and some performance work to make clone operations faster on the client.
Cloning repositories can incur a lot of load on servers. For mozilla-central (the main Firefox repository), clones require the server to spend 4+ minutes CPU time and send ~1,230 MB over the network. Multiply by thousands of clients from build and test automation and developers, and you could quickly finding yourself running out of CPU cores or network bandwidth. Scaling Mercurial servers (like many services) can therefore be challenging. (It's worth noting that Git is in the same boat for reasons technically similar to Mercurial's.)
Mozilla previously implemented a Mercurial extension to seed clones from pre-generated bundle files so the Mercurial servers themselves don't have to work very hard for an individual clone. (That linked post goes into the technical reasons why cloning is expensive). We now offload cloning of frequently cloned repositories on hg.mozilla.org to Amazon S3 and a CDN and are diverting 1+ TB/day and countless hours of CPU work away from the hg.mozilla.org servers themselves.
The positive impact from seeding clones from pre-generated, externally-hosted bundles has been immense. Load on hg.mozilla.org dropped off a cliff. Clone times on clients became a lot faster (mainly because they aren't waiting for a server to dynamically generate/stream bits). But there was a problem with this approach: it required the cooperation of clients to install an extension in order for clone load to be offloaded. It didn't just work.
I'm pleased to announce that the ability to seed clones from server-advertised pre-generated bundles is now a core feature in Mercurial 3.6! Server operators can install the clonebundles extension (it is distributed with Mercurial) to advertise the location of pre-generated, externally-hosted bundle files. Compatible clients will automatically clone from the server-advertised URLs instead of creating potentially excessive load on the Mercurial server. The implementation is almost identical to what Mozilla has deployed with great success. If you operate a Mercurial server that needs to serve larger repositories (100+ MB) and/or is under high load, you should be jumping with joy at the existence of this feature, as it should make scaling problems attached to cloning mostly go away.
Documentation for server operators is currently in the extension and can be accessed at the aforementioned URL or with hg help -e clonebundles. It does require a bit of setup work. But if you are at the scale where you could benefit from the feature, the results will almost certainly be worth it.
One caveat is that the feature is currently behind an experimental flag on the client. This means that it doesn't just work yet. This is because we want to reserve the right to change some behaviors without worrying about backwards compatibility. However, I'm pretty confident the server parts won't change significantly. Or if they do, I'm pretty committed to providing an easy transition path since I'll need one for hg.mozilla.org. So, I'm giving server operators a tentative green light to deploy this extension. I can't guarantee there won't be a few bumps transitioning to a future release. But it shouldn't be a break-the-world type of problem. It is my intent to remove the experimental flag and have the feature enabled by default in Mercurial 3.7. At that point, server operators just need clients to run a modern Mercurial release and they can count on drastically reduced load from cloning.
To help with adoption and testing of the clone bundles feature, servers advertising bundles will inform compatible clients of the existence of the feature when they clone:
$ hg clone https://hg.mozilla.org/mozilla-central requesting all changes remote: this server supports the experimental "clone bundles" feature that should enable faster and more reliable cloning remote: help test it by setting the "experimental.clonebundles" config flag to "true" adding changesets adding manifests adding file changes ...
And if you have the feature enabled, you'll see something like:
$ hg clone https://hg.mozilla.org/mozilla-central applying clone bundle from https://hg.cdn.mozilla.net/mozilla-central/daa7d98525e859d32a3b3e97101e129a897192a1.gzip.hg adding changesets adding manifests adding file changes added 265986 changesets with 1501210 changes to 223996 files finished applying clone bundle searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files
This new clone bundles feature is deployed on hg.mozilla.org. Users of Mercurial 3.6 can start using it today by cloning from one of the repositories with bundles enabled. (If you have previously installed the bundleclone extension, please be sure your version-control-tools repository is up to date, as the extension was recently changed to better interact with the official feature.)
And that's the clone bundles feature. I hope you are as excited about it as I am!
Mercurial 3.6 also contains numerous performance improvements that make cloning faster, regardless of whether you are using clone bundles! For starters:
- Caching just-added entries made changelog writing 25% faster.
- Reusing file handles when adding revlog entries drastically reduced the number of file opens, closes, and writes.
- Avoiding excessive file flushing when adding revlog entries drastically reduced system call count.
These performance enhancements will make all operations that write new repository data faster. But it will be felt most on clone and pull operations on the client and push operations on the server.
One of the most impressive performance optimizations was to a Python class that converts a generator of raw data chunks to something that resembles a file object so it can be read() from. Refactoring read() to avoid collections.deque operations and an extra string slice and allocation made unbundle operations 15-20% faster. Since this function can handle hundreds of megabytes or even gigabytes of data across hundreds of thousands of calls, small improvements like this can make a huge difference! This patch was a stark reminder that function calls, collection mutations, string slicing, and object allocation all can have a significant cost in a higher-level, garbage collected language like Python.
The end result of all this performance optimization on applying a mozilla-central gzip bundle on Linux on an i7-6700K:
- 35-40s wall time faster (~245s to ~205s) (~84% of original)
- write(2) calls reduced from 1,372,411 to 679,045 (~49% of original)
- close(2) calls reduced from 405,147 to 235,039 (~58% of original)
- total system calls reduced from 5,120,893 to 2,938,479 (~57% of original)
And the same operation on Windows 10 on the same machine:
- ~300s wall time faster (933s to 633s) (~68% of original)
You may have noticed the discrepancy between Linux and Windows wall times, where Windows is 2-4x slower than Linux. What gives? The reason is closing file handles that have been appended to is slow on Windows. For more, read my recent blog post.
Mercurial writes ~226,000 files during a clone of mozilla-central (excluding the working copy). Assuming 2ms per file close operation, that comes out to ~450s just for file close operations! (All operations are on the same thread.) The current wall time difference between clone times on Windows and Linux is ~428s. So it's fair to say that waiting on file closes accounts for most of this.
Along the same vein, the aforementioned performance work reduced total number of file close operations during a mozilla-central clone by ~165,000. Again assuming 2ms per file close, that comes to ~330s, which is in the same ballpark as the ~300s wall time decrease we see on Windows in Mercurial 3.6. Writing - and therefore closing - hundreds of thousands of files handles is slower on Windows and accounts for most of the performance difference on that platform.
Empowered by this knowledge, I wrote some patches to move file closing to a background thread on Windows. The results were promising (minutes saved when writing 100,000+ files). Unfortunately, I didn't have time to finish these patches for Mercurial 3.6. Hopefully they'll make it into 3.7. I also have some mad scientist ideas for alternate storage mechanisms that don't rely on hundreds of thousands of files. This should enable clones to run at 100+ MB/s on all platforms - basically as fast as your network and system I/O can keep up (yes, Python and Windows are capable of this throughput). Stay tuned.
And that's a summary of the cloning improvements in Mercurial 3.6!
Mercurial 3.6 is currently in release candidate. Please help test it by downloading the RC at https://www.mercurial-scm.org/. Mercurial 3.6 final is due for release on or shortly after November 1. There is a large gathering of Mercurial contributors in London this weekend. So if a bug is reported, I can pretty much guarantee a lot of eyeballs will see it and there's a good chance it will be acted upon.
« Previous Page -- Next Page »