For the past few months, Mozilla has been serving Mercurial clones from Amazon S3. We upload snapshots (called bundles) of large and/or high-traffic repositories to S3. We have a custom Mercurial extension on the client and server that knows how to exchange the URLs for these snapshots and to transparently use them to bootstrap a clone. The end result is drastically reduced Mercurial server load and faster clone times. The benefits are seriously ridiculous when you operate version control at scale.
Amazon CloudFront is a CDN. You can easily configure it up to be backed by an S3 bucket. So we did.
https://hg.cdn.mozilla.net/ is Mozilla's CDN for hosting Mercurial data. Currently it's just bundles to be used for cloning.
As of today, if you install the bundleclone Mercurial extension and hg clone a repository on hg.mozilla.org such as mozilla-central (hg clone https://hg.mozilla.org/mozilla-central), the CDN URLs will be preferred by default. (Previously we preferred S3 URLs that hit servers in Oregon, USA.)
This should result in clone time reductions for Mozillians not close to Oregon, USA, as the CloudFront CDN has servers all across the globe and your Mercurial clone should be bootstrapped from the closest and hopefully therefore fastest server to you.
Unfortunately, you do need the the aforementioned bundleclone extension installed for this to work. But, this should only be temporary: I've proposed integrating this feature into the core of Mercurial so if a client talks to a server advertising pre-generated bundles the clone offload just works. I already have tentative buy-in from one Mercurial maintainer. So hopefully I can land this feature in Mercurial 3.6, which will be released November 1. After that, I imagine some high-traffic Mercurial servers (such as Bitbucket) will be very keen to deploy this so CPU load on their servers is drastically reduced.
I added a feature to Mercurial 3.4 that exposes JSON from Mercurial's various web APIs. Unfortunately, due to the presence of legacy code on hg.mozilla.org providing similar functionality, we weren't able to deploy this feature to hg.mozilla.org when we deployed Mercurial 3.4 several weeks ago.
I'm pleased to announce that as of today, JSON is now exposed from hg.mozilla.org!
To access JSON output, simply add json- to the command name in URLs. e.g. instead of https://hg.mozilla.org/mozilla-central/rev/de7aa6b08234 use https://hg.mozilla.org/mozilla-central/json-rev/de7aa6b08234. The full list of web commands, URL patterns, and their parameters are documented in the hgweb help topic.
Not all web commands support JSON output yet. Not all web commands expose all data available to them. If there is data you need but isn't exposed, please file a bug and I'll see what I can do.
Thanks go to Steven MacLeod for reviewing the rather large series it took to make this happen.
Sometime last week we enabled a new API on hg.mozilla.org: json-mozbuildinfo. This endpoint will return JSON describing moz.build-derived metadata about the files that changed in a commit.
We plan to eventually leverage this API to do cool things like have MozReview automatically file bugs in the appropriate component and assign appropriate reviewers given the set of changed files in a commit.
The API is currently only available on mozilla-central. And, we have very conservative resource limits in place. So large commits may cause it to error out. As such, the API is considered experimental. Also, performance is not as optimal as it could be. You have to start somewhere.
I'd like to thank Guillaume Destuynder (kang) for his help with the security side of things. When I started on this project, I didn't think I'd be writing C code for spawning secure processes, but here we are. In the not so distant future, I'll likely be adding seccomp(2) into the mix, which will make the execution environment as or more secure than the Firefox content process sandbox, depending on how it is implemented. The rabbit holes we find ourselves in to implement proper security...
Just a few minutes ago, hg.mozilla.org reached an important milestone: deployments are now performed via Ansible from our open source version-control-tools repository instead of via Puppet from Mozilla's private sysadmins repository. This is important for a few reasons.
First, the code behind the operation of hg.mozilla.org is now open source and available for the public to see and change. I strive for my work at Mozilla to be open by default. With hg.mozilla.org's private Puppet repository, people weren't able to see what was going on under the covers. Nor were they empowered to change anything. This may come as a shock, but even I don't have commit privileges to the internal Puppet repository that was previously powering hg.mozilla.org! I did have read access. But any change I wanted to make involved me proxying it through one of two people. It was tedious, made me feel uncomfortable for having to nag people to do my work, and slowed everyone down. We no longer have this problem, thankfully.
Second, having the Ansible code in version-control-tools enables us to use the same operational configuration in production as we do in our Docker test environment. I can now spin up a cluster of Docker containers that behave very similarly to the production servers (which aren't running Docker). This enables us to write end-to-end tests of complex systems running across multiple Docker containers and have relatively high confidence that our production and testing environments behave very similarly. In other words, I can test complex interactions between multiple systems all from my local machine - even from a plane! For example, we can and do test that SSH connections to a simulated production environment running in Docker behave as expected, complete with an OpenSSH server speaking to an OpenLDAP server for SSH public key lookup. While we still have many tests to write, we had no such tests a year ago and every production deployment was a cross-your-fingers type moment. Having comprehensive tests gives us confidence to move fast and not break things.
One year ago, hg.mozilla.org's infrastructure was opaque, didn't have automated tests, and was deployed too seldomly. There was the often correct perception that changing this critical-to-Mozilla service was difficult and slow. Today, things couldn't be more different. The hg.mozilla.org infrastructure is open, we have tests, and we can and do deploy multiple times per day without forward notice and without breaking things. I love this brave new world of open infrastructure and moving fast.
Mercurial 3.5 was released today (following Mercurial's time-based schedule of releasing a new version every 3 months).
There were roughly 1000 commits between 3.4 and 3.5, making this a busy version. Although, 1000 commits per release has become the new norm, as development on Mercurial has accelerated in the past few years.
In my mind, the major highlight of Mercurial 3.5 is that the new bundle2 wire protocol for transferring data during hg push and hg pull is now enabled by default on the client. Previously, it was enabled by default only on the server. hg.mozilla.org is running Mercurial 3.4, so clients that upgrade to 3.5 today will be speaking to it using the new wire protocol.
The bundle2 wire protocol succeeds the existing protocol (which has been in place for years) and corrects many of its deficiencies. Before bundle2, pull and push operations were not atomic because Mercurial was performing a separate API call for each piece of data. It would start by transferring changeset data and then have subsequent transfers of metadata like bookmarks and phases. As you can imagine, there were race conditions and scenarios where pushes could be incomplete (not atomic). bundle2 transfers all this data in one large chunk, so there are much stronger guarantees for data consistency and for atomic operations.
Another benefit of bundle2 is it is a fully extensible data exchange format. Peers can add additional parts to the payload. For extensions that wish to transfer additional metadata (like Mozilla's pushlog data), they can simply add this directly into the data stream without requiring additional requests over the wire protocol. This translates to fewer network round trips and faster push and pull operations.
The progress extension has been merged into Mercurial's core and is enabled by default. It is now safe to remove the extensions.progress config option from your hgrc.
Mercurial 3.5 also (finally) drops support for Python 2.4 and 2.5. Hopefully nobody reading this is still running these ancient and unsupported versions of Python. This is a win for Mercurial developers, as we were constantly having to work around deficiencies with these old Python releases. There were dozens of commits removing hacks and workarounds for Python 2.4 and 2.5. Dropping 2.4 and 2.5 also means Python 3 porting can begin in earnest. However, this isn't a high priority for anyone, so don't hold your breath.
There were a number of performance improvements in 3.5:
- operations involving obsolescence markers are faster (for users of changeset evolution)
- various revsets were optimized
- parts of phases calculation are now performed in C. The not public() revset should be much faster.
- hg status and things walking the filesystem are faster (Mozillians should be using hgwatchman to make hg status insanely fast)
A ui.allowemptycommit config option was introduced to control whether empty commits are allowed. Mozillians manually creating trychooser commits may run into problems creating empty commits without this option (a better solution is to use mach push-to-try).
Work is progressing on per-directory manifests. Currently, Mercurial stores the mapping of files to content in a giant list called the manifest. For repositories with tens or hundreds of thousands of files, decoding and reading large manifests is very CPU intensive. Work is being done to enable Mercurial to split manifests by directory. So instead of a single manifest, there are several. This is a prequisite to narrow clone, which is the ability to clone history for a subset of files (like how Subversion works). This work will eventually enable repositories with millions of files to exist without significant performance loss. It will also allow monolithic repositories to exist without the common critique that they are too unwieldy to use because they are so large.
hgignore files now have an include: and subinclude: syntax that can be used to include other files containing ignore rules. This feature is useful for a number of reasons. First, it makes sense for ignore rules to live in the directory hierarchy next to paths they impact. Second, for people working with monolithic repositories, it means you can export a sub-directory of your monorepo (to e.g. a Git repository) and its ignore rules - being defined in local directories - can still work. (I'm pretty sure Facebook is using this approach to make its syncing of directories/projects from its Mercurial monorepo to GitHub easier to manage.)
Significant work has been done on the template parser. If you have written custom templates, you may find that Mercurial 3.5 is more strict about parsing certain syntax.
Revsets with chained or no longer result in stack exhaustion. Before, programmatically generated revsets like 1 or 2 or 3 or 4 or 5 or 6... would likely fail.
Interactions with servers over SSH should now display server output in real time. Before, server output was buffered and only displayed at the end of the operation. (You may not see this on hg.mozilla.org until the server is upgraded to 3.5, which is planned for early September.)
There are now static analysis checks in place to ensure that Mercurial config options have corresponding documentation in hg help config. As a result, a lot of formerly undocumented options are now documented.
I contributed various improvements. These include:
- auto sharing repository data during clone
- clone and pull performance improvements
- hg help scripting
The Mercurial for Mozillians Installing Mercurial article provides a Mozilla tailored yet generally applicable guide for installing or upgrading Mercurial to 3.5. As always, conservative software users may want to wait until September 1 for the 3.5.1 point release to fix any issues or regressions from 3.5.
Next Page »