Serving Mercurial Clones from a CDN

September 01, 2015 at 03:00 PM | categories: Mercurial, Mozilla

For the past few months, Mozilla has been serving Mercurial clones from Amazon S3. We upload snapshots (called bundles) of large and/or high-traffic repositories to S3. We have a custom Mercurial extension on the client and server that knows how to exchange the URLs for these snapshots and to transparently use them to bootstrap a clone. The end result is drastically reduced Mercurial server load and faster clone times. The benefits are seriously ridiculous when you operate version control at scale.

Amazon CloudFront is a CDN. You can easily configure it up to be backed by an S3 bucket. So we did.

https://hg.cdn.mozilla.net/ is Mozilla's CDN for hosting Mercurial data. Currently it's just bundles to be used for cloning.

As of today, if you install the bundleclone Mercurial extension and hg clone a repository on hg.mozilla.org such as mozilla-central (hg clone https://hg.mozilla.org/mozilla-central), the CDN URLs will be preferred by default. (Previously we preferred S3 URLs that hit servers in Oregon, USA.)

This should result in clone time reductions for Mozillians not close to Oregon, USA, as the CloudFront CDN has servers all across the globe and your Mercurial clone should be bootstrapped from the closest and hopefully therefore fastest server to you.

Unfortunately, you do need the the aforementioned bundleclone extension installed for this to work. But, this should only be temporary: I've proposed integrating this feature into the core of Mercurial so if a client talks to a server advertising pre-generated bundles the clone offload just works. I already have tentative buy-in from one Mercurial maintainer. So hopefully I can land this feature in Mercurial 3.6, which will be released November 1. After that, I imagine some high-traffic Mercurial servers (such as Bitbucket) will be very keen to deploy this so CPU load on their servers is drastically reduced.