Mercurial 3.6 (scheduled for release on or shortly after November 1) contains a number of improvements to cloning. In this post, I will describe a new feature to help server operators reduce load (while enabling clients to clone faster) and some performance work to make clone operations faster on the client.
Cloning repositories can incur a lot of load on servers. For mozilla-central (the main Firefox repository), clones require the server to spend 4+ minutes CPU time and send ~1,230 MB over the network. Multiply by thousands of clients from build and test automation and developers, and you could quickly finding yourself running out of CPU cores or network bandwidth. Scaling Mercurial servers (like many services) can therefore be challenging. (It's worth noting that Git is in the same boat for reasons technically similar to Mercurial's.)
Mozilla previously implemented a Mercurial extension to seed clones from pre-generated bundle files so the Mercurial servers themselves don't have to work very hard for an individual clone. (That linked post goes into the technical reasons why cloning is expensive). We now offload cloning of frequently cloned repositories on hg.mozilla.org to Amazon S3 and a CDN and are diverting 1+ TB/day and countless hours of CPU work away from the hg.mozilla.org servers themselves.
The positive impact from seeding clones from pre-generated, externally-hosted bundles has been immense. Load on hg.mozilla.org dropped off a cliff. Clone times on clients became a lot faster (mainly because they aren't waiting for a server to dynamically generate/stream bits). But there was a problem with this approach: it required the cooperation of clients to install an extension in order for clone load to be offloaded. It didn't just work.
I'm pleased to announce that the ability to seed clones from server-advertised pre-generated bundles is now a core feature in Mercurial 3.6! Server operators can install the clonebundles extension (it is distributed with Mercurial) to advertise the location of pre-generated, externally-hosted bundle files. Compatible clients will automatically clone from the server-advertised URLs instead of creating potentially excessive load on the Mercurial server. The implementation is almost identical to what Mozilla has deployed with great success. If you operate a Mercurial server that needs to serve larger repositories (100+ MB) and/or is under high load, you should be jumping with joy at the existence of this feature, as it should make scaling problems attached to cloning mostly go away.
Documentation for server operators is currently in the extension and can be accessed at the aforementioned URL or with hg help -e clonebundles. It does require a bit of setup work. But if you are at the scale where you could benefit from the feature, the results will almost certainly be worth it.
One caveat is that the feature is currently behind an experimental flag on the client. This means that it doesn't just work yet. This is because we want to reserve the right to change some behaviors without worrying about backwards compatibility. However, I'm pretty confident the server parts won't change significantly. Or if they do, I'm pretty committed to providing an easy transition path since I'll need one for hg.mozilla.org. So, I'm giving server operators a tentative green light to deploy this extension. I can't guarantee there won't be a few bumps transitioning to a future release. But it shouldn't be a break-the-world type of problem. It is my intent to remove the experimental flag and have the feature enabled by default in Mercurial 3.7. At that point, server operators just need clients to run a modern Mercurial release and they can count on drastically reduced load from cloning.
To help with adoption and testing of the clone bundles feature, servers advertising bundles will inform compatible clients of the existence of the feature when they clone:
$ hg clone https://hg.mozilla.org/mozilla-central requesting all changes remote: this server supports the experimental "clone bundles" feature that should enable faster and more reliable cloning remote: help test it by setting the "experimental.clonebundles" config flag to "true" adding changesets adding manifests adding file changes ...
And if you have the feature enabled, you'll see something like:
$ hg clone https://hg.mozilla.org/mozilla-central applying clone bundle from https://hg.cdn.mozilla.net/mozilla-central/daa7d98525e859d32a3b3e97101e129a897192a1.gzip.hg adding changesets adding manifests adding file changes added 265986 changesets with 1501210 changes to 223996 files finished applying clone bundle searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files
This new clone bundles feature is deployed on hg.mozilla.org. Users of Mercurial 3.6 can start using it today by cloning from one of the repositories with bundles enabled. (If you have previously installed the bundleclone extension, please be sure your version-control-tools repository is up to date, as the extension was recently changed to better interact with the official feature.)
And that's the clone bundles feature. I hope you are as excited about it as I am!
Mercurial 3.6 also contains numerous performance improvements that make cloning faster, regardless of whether you are using clone bundles! For starters:
- Caching just-added entries made changelog writing 25% faster.
- Reusing file handles when adding revlog entries drastically reduced the number of file opens, closes, and writes.
- Avoiding excessive file flushing when adding revlog entries drastically reduced system call count.
These performance enhancements will make all operations that write new repository data faster. But it will be felt most on clone and pull operations on the client and push operations on the server.
One of the most impressive performance optimizations was to a Python class that converts a generator of raw data chunks to something that resembles a file object so it can be read() from. Refactoring read() to avoid collections.deque operations and an extra string slice and allocation made unbundle operations 15-20% faster. Since this function can handle hundreds of megabytes or even gigabytes of data across hundreds of thousands of calls, small improvements like this can make a huge difference! This patch was a stark reminder that function calls, collection mutations, string slicing, and object allocation all can have a significant cost in a higher-level, garbage collected language like Python.
The end result of all this performance optimization on applying a mozilla-central gzip bundle on Linux on an i7-6700K:
- 35-40s wall time faster (~245s to ~205s) (~84% of original)
- write(2) calls reduced from 1,372,411 to 679,045 (~49% of original)
- close(2) calls reduced from 405,147 to 235,039 (~58% of original)
- total system calls reduced from 5,120,893 to 2,938,479 (~57% of original)
And the same operation on Windows 10 on the same machine:
- ~300s wall time faster (933s to 633s) (~68% of original)
You may have noticed the discrepancy between Linux and Windows wall times, where Windows is 2-4x slower than Linux. What gives? The reason is closing file handles that have been appended to is slow on Windows. For more, read my recent blog post.
Mercurial writes ~226,000 files during a clone of mozilla-central (excluding the working copy). Assuming 2ms per file close operation, that comes out to ~450s just for file close operations! (All operations are on the same thread.) The current wall time difference between clone times on Windows and Linux is ~428s. So it's fair to say that waiting on file closes accounts for most of this.
Along the same vein, the aforementioned performance work reduced total number of file close operations during a mozilla-central clone by ~165,000. Again assuming 2ms per file close, that comes to ~330s, which is in the same ballpark as the ~300s wall time decrease we see on Windows in Mercurial 3.6. Writing - and therefore closing - hundreds of thousands of files handles is slower on Windows and accounts for most of the performance difference on that platform.
Empowered by this knowledge, I wrote some patches to move file closing to a background thread on Windows. The results were promising (minutes saved when writing 100,000+ files). Unfortunately, I didn't have time to finish these patches for Mercurial 3.6. Hopefully they'll make it into 3.7. I also have some mad scientist ideas for alternate storage mechanisms that don't rely on hundreds of thousands of files. This should enable clones to run at 100+ MB/s on all platforms - basically as fast as your network and system I/O can keep up (yes, Python and Windows are capable of this throughput). Stay tuned.
And that's a summary of the cloning improvements in Mercurial 3.6!
Mercurial 3.6 is currently in release candidate. Please help test it by downloading the RC at https://www.mercurial-scm.org/. Mercurial 3.6 final is due for release on or shortly after November 1. There is a large gathering of Mercurial contributors in London this weekend. So if a bug is reported, I can pretty much guarantee a lot of eyeballs will see it and there's a good chance it will be acted upon.
For the past few months, Mozilla has been serving Mercurial clones from Amazon S3. We upload snapshots (called bundles) of large and/or high-traffic repositories to S3. We have a custom Mercurial extension on the client and server that knows how to exchange the URLs for these snapshots and to transparently use them to bootstrap a clone. The end result is drastically reduced Mercurial server load and faster clone times. The benefits are seriously ridiculous when you operate version control at scale.
Amazon CloudFront is a CDN. You can easily configure it up to be backed by an S3 bucket. So we did.
https://hg.cdn.mozilla.net/ is Mozilla's CDN for hosting Mercurial data. Currently it's just bundles to be used for cloning.
As of today, if you install the bundleclone Mercurial extension and hg clone a repository on hg.mozilla.org such as mozilla-central (hg clone https://hg.mozilla.org/mozilla-central), the CDN URLs will be preferred by default. (Previously we preferred S3 URLs that hit servers in Oregon, USA.)
This should result in clone time reductions for Mozillians not close to Oregon, USA, as the CloudFront CDN has servers all across the globe and your Mercurial clone should be bootstrapped from the closest and hopefully therefore fastest server to you.
Unfortunately, you do need the the aforementioned bundleclone extension installed for this to work. But, this should only be temporary: I've proposed integrating this feature into the core of Mercurial so if a client talks to a server advertising pre-generated bundles the clone offload just works. I already have tentative buy-in from one Mercurial maintainer. So hopefully I can land this feature in Mercurial 3.6, which will be released November 1. After that, I imagine some high-traffic Mercurial servers (such as Bitbucket) will be very keen to deploy this so CPU load on their servers is drastically reduced.
I added a feature to Mercurial 3.4 that exposes JSON from Mercurial's various web APIs. Unfortunately, due to the presence of legacy code on hg.mozilla.org providing similar functionality, we weren't able to deploy this feature to hg.mozilla.org when we deployed Mercurial 3.4 several weeks ago.
I'm pleased to announce that as of today, JSON is now exposed from hg.mozilla.org!
To access JSON output, simply add json- to the command name in URLs. e.g. instead of https://hg.mozilla.org/mozilla-central/rev/de7aa6b08234 use https://hg.mozilla.org/mozilla-central/json-rev/de7aa6b08234. The full list of web commands, URL patterns, and their parameters are documented in the hgweb help topic.
Not all web commands support JSON output yet. Not all web commands expose all data available to them. If there is data you need but isn't exposed, please file a bug and I'll see what I can do.
Thanks go to Steven MacLeod for reviewing the rather large series it took to make this happen.
Sometime last week we enabled a new API on hg.mozilla.org: json-mozbuildinfo. This endpoint will return JSON describing moz.build-derived metadata about the files that changed in a commit.
We plan to eventually leverage this API to do cool things like have MozReview automatically file bugs in the appropriate component and assign appropriate reviewers given the set of changed files in a commit.
The API is currently only available on mozilla-central. And, we have very conservative resource limits in place. So large commits may cause it to error out. As such, the API is considered experimental. Also, performance is not as optimal as it could be. You have to start somewhere.
I'd like to thank Guillaume Destuynder (kang) for his help with the security side of things. When I started on this project, I didn't think I'd be writing C code for spawning secure processes, but here we are. In the not so distant future, I'll likely be adding seccomp(2) into the mix, which will make the execution environment as or more secure than the Firefox content process sandbox, depending on how it is implemented. The rabbit holes we find ourselves in to implement proper security...
Just a few minutes ago, hg.mozilla.org reached an important milestone: deployments are now performed via Ansible from our open source version-control-tools repository instead of via Puppet from Mozilla's private sysadmins repository. This is important for a few reasons.
First, the code behind the operation of hg.mozilla.org is now open source and available for the public to see and change. I strive for my work at Mozilla to be open by default. With hg.mozilla.org's private Puppet repository, people weren't able to see what was going on under the covers. Nor were they empowered to change anything. This may come as a shock, but even I don't have commit privileges to the internal Puppet repository that was previously powering hg.mozilla.org! I did have read access. But any change I wanted to make involved me proxying it through one of two people. It was tedious, made me feel uncomfortable for having to nag people to do my work, and slowed everyone down. We no longer have this problem, thankfully.
Second, having the Ansible code in version-control-tools enables us to use the same operational configuration in production as we do in our Docker test environment. I can now spin up a cluster of Docker containers that behave very similarly to the production servers (which aren't running Docker). This enables us to write end-to-end tests of complex systems running across multiple Docker containers and have relatively high confidence that our production and testing environments behave very similarly. In other words, I can test complex interactions between multiple systems all from my local machine - even from a plane! For example, we can and do test that SSH connections to a simulated production environment running in Docker behave as expected, complete with an OpenSSH server speaking to an OpenLDAP server for SSH public key lookup. While we still have many tests to write, we had no such tests a year ago and every production deployment was a cross-your-fingers type moment. Having comprehensive tests gives us confidence to move fast and not break things.
One year ago, hg.mozilla.org's infrastructure was opaque, didn't have automated tests, and was deployed too seldomly. There was the often correct perception that changing this critical-to-Mozilla service was difficult and slow. Today, things couldn't be more different. The hg.mozilla.org infrastructure is open, we have tests, and we can and do deploy multiple times per day without forward notice and without breaking things. I love this brave new world of open infrastructure and moving fast.
Next Page »