JSON APIs on hg.mozilla.org

August 18, 2015 at 04:00 PM | categories: Mercurial, Mozilla | View Comments

I added a feature to Mercurial 3.4 that exposes JSON from Mercurial's various web APIs. Unfortunately, due to the presence of legacy code on hg.mozilla.org providing similar functionality, we weren't able to deploy this feature to hg.mozilla.org when we deployed Mercurial 3.4 several weeks ago.

I'm pleased to announce that as of today, JSON is now exposed from hg.mozilla.org!

To access JSON output, simply add json- to the command name in URLs. e.g. instead of https://hg.mozilla.org/mozilla-central/rev/de7aa6b08234 use https://hg.mozilla.org/mozilla-central/json-rev/de7aa6b08234. The full list of web commands, URL patterns, and their parameters are documented in the hgweb help topic.

Not all web commands support JSON output yet. Not all web commands expose all data available to them. If there is data you need but isn't exposed, please file a bug and I'll see what I can do.

Thanks go to Steven MacLeod for reviewing the rather large series it took to make this happen.

Read and Post Comments

moz.build metadata on hg.mozilla.org

August 04, 2015 at 07:55 PM | categories: Mercurial, Mozilla | View Comments

Sometime last week we enabled a new API on hg.mozilla.org: json-mozbuildinfo. This endpoint will return JSON describing moz.build-derived metadata about the files that changed in a commit.

Example. Docs.

We plan to eventually leverage this API to do cool things like have MozReview automatically file bugs in the appropriate component and assign appropriate reviewers given the set of changed files in a commit.

The API is currently only available on mozilla-central. And, we have very conservative resource limits in place. So large commits may cause it to error out. As such, the API is considered experimental. Also, performance is not as optimal as it could be. You have to start somewhere.

I'd like to thank Guillaume Destuynder (kang) for his help with the security side of things. When I started on this project, I didn't think I'd be writing C code for spawning secure processes, but here we are. In the not so distant future, I'll likely be adding seccomp(2) into the mix, which will make the execution environment as or more secure than the Firefox content process sandbox, depending on how it is implemented. The rabbit holes we find ourselves in to implement proper security...

Read and Post Comments

hg.mozilla.org Operational Workings Now Open Sourced

August 04, 2015 at 02:30 PM | categories: Mercurial, Mozilla | View Comments

Just a few minutes ago, hg.mozilla.org reached an important milestone: deployments are now performed via Ansible from our open source version-control-tools repository instead of via Puppet from Mozilla's private sysadmins repository. This is important for a few reasons.

First, the code behind the operation of hg.mozilla.org is now open source and available for the public to see and change. I strive for my work at Mozilla to be open by default. With hg.mozilla.org's private Puppet repository, people weren't able to see what was going on under the covers. Nor were they empowered to change anything. This may come as a shock, but even I don't have commit privileges to the internal Puppet repository that was previously powering hg.mozilla.org! I did have read access. But any change I wanted to make involved me proxying it through one of two people. It was tedious, made me feel uncomfortable for having to nag people to do my work, and slowed everyone down. We no longer have this problem, thankfully.

Second, having the Ansible code in version-control-tools enables us to use the same operational configuration in production as we do in our Docker test environment. I can now spin up a cluster of Docker containers that behave very similarly to the production servers (which aren't running Docker). This enables us to write end-to-end tests of complex systems running across multiple Docker containers and have relatively high confidence that our production and testing environments behave very similarly. In other words, I can test complex interactions between multiple systems all from my local machine - even from a plane! For example, we can and do test that SSH connections to a simulated production environment running in Docker behave as expected, complete with an OpenSSH server speaking to an OpenLDAP server for SSH public key lookup. While we still have many tests to write, we had no such tests a year ago and every production deployment was a cross-your-fingers type moment. Having comprehensive tests gives us confidence to move fast and not break things.

One year ago, hg.mozilla.org's infrastructure was opaque, didn't have automated tests, and was deployed too seldomly. There was the often correct perception that changing this critical-to-Mozilla service was difficult and slow. Today, things couldn't be more different. The hg.mozilla.org infrastructure is open, we have tests, and we can and do deploy multiple times per day without forward notice and without breaking things. I love this brave new world of open infrastructure and moving fast.

Read and Post Comments

Mercurial 3.5 Released

July 31, 2015 at 01:15 PM | categories: Mercurial, Mozilla | View Comments

Mercurial 3.5 was released today (following Mercurial's time-based schedule of releasing a new version every 3 months).

There were roughly 1000 commits between 3.4 and 3.5, making this a busy version. Although, 1000 commits per release has become the new norm, as development on Mercurial has accelerated in the past few years.

In my mind, the major highlight of Mercurial 3.5 is that the new bundle2 wire protocol for transferring data during hg push and hg pull is now enabled by default on the client. Previously, it was enabled by default only on the server. hg.mozilla.org is running Mercurial 3.4, so clients that upgrade to 3.5 today will be speaking to it using the new wire protocol.

The bundle2 wire protocol succeeds the existing protocol (which has been in place for years) and corrects many of its deficiencies. Before bundle2, pull and push operations were not atomic because Mercurial was performing a separate API call for each piece of data. It would start by transferring changeset data and then have subsequent transfers of metadata like bookmarks and phases. As you can imagine, there were race conditions and scenarios where pushes could be incomplete (not atomic). bundle2 transfers all this data in one large chunk, so there are much stronger guarantees for data consistency and for atomic operations.

Another benefit of bundle2 is it is a fully extensible data exchange format. Peers can add additional parts to the payload. For extensions that wish to transfer additional metadata (like Mozilla's pushlog data), they can simply add this directly into the data stream without requiring additional requests over the wire protocol. This translates to fewer network round trips and faster push and pull operations.

The progress extension has been merged into Mercurial's core and is enabled by default. It is now safe to remove the extensions.progress config option from your hgrc.

Mercurial 3.5 also (finally) drops support for Python 2.4 and 2.5. Hopefully nobody reading this is still running these ancient and unsupported versions of Python. This is a win for Mercurial developers, as we were constantly having to work around deficiencies with these old Python releases. There were dozens of commits removing hacks and workarounds for Python 2.4 and 2.5. Dropping 2.4 and 2.5 also means Python 3 porting can begin in earnest. However, this isn't a high priority for anyone, so don't hold your breath.

There were a number of performance improvements in 3.5:

  • operations involving obsolescence markers are faster (for users of changeset evolution)
  • various revsets were optimized
  • parts of phases calculation are now performed in C. The not public() revset should be much faster.
  • hg status and things walking the filesystem are faster (Mozillians should be using hgwatchman to make hg status insanely fast)

A ui.allowemptycommit config option was introduced to control whether empty commits are allowed. Mozillians manually creating trychooser commits may run into problems creating empty commits without this option (a better solution is to use mach push-to-try).

Work is progressing on per-directory manifests. Currently, Mercurial stores the mapping of files to content in a giant list called the manifest. For repositories with tens or hundreds of thousands of files, decoding and reading large manifests is very CPU intensive. Work is being done to enable Mercurial to split manifests by directory. So instead of a single manifest, there are several. This is a prequisite to narrow clone, which is the ability to clone history for a subset of files (like how Subversion works). This work will eventually enable repositories with millions of files to exist without significant performance loss. It will also allow monolithic repositories to exist without the common critique that they are too unwieldy to use because they are so large.

hgignore files now have an include: and subinclude: syntax that can be used to include other files containing ignore rules. This feature is useful for a number of reasons. First, it makes sense for ignore rules to live in the directory hierarchy next to paths they impact. Second, for people working with monolithic repositories, it means you can export a sub-directory of your monorepo (to e.g. a Git repository) and its ignore rules - being defined in local directories - can still work. (I'm pretty sure Facebook is using this approach to make its syncing of directories/projects from its Mercurial monorepo to GitHub easier to manage.)

Significant work has been done on the template parser. If you have written custom templates, you may find that Mercurial 3.5 is more strict about parsing certain syntax.

Revsets with chained or no longer result in stack exhaustion. Before, programmatically generated revsets like 1 or 2 or 3 or 4 or 5 or 6... would likely fail.

Interactions with servers over SSH should now display server output in real time. Before, server output was buffered and only displayed at the end of the operation. (You may not see this on hg.mozilla.org until the server is upgraded to 3.5, which is planned for early September.)

There are now static analysis checks in place to ensure that Mercurial config options have corresponding documentation in hg help config. As a result, a lot of formerly undocumented options are now documented.

I contributed various improvements. These include:

  • auto sharing repository data during clone
  • clone and pull performance improvements
  • hg help scripting

There were tons of other changes, of course. See the official release notes and the upgrade notes for more.

The Mercurial for Mozillians Installing Mercurial article provides a Mozilla tailored yet generally applicable guide for installing or upgrading Mercurial to 3.5. As always, conservative software users may want to wait until September 1 for the 3.5.1 point release to fix any issues or regressions from 3.5.

Read and Post Comments

My Contributions to Mercurial 3.5

July 31, 2015 at 10:55 AM | categories: Mercurial, Mozilla | View Comments

Mercurial 3.5 was released today. I contributed some small improvements to this version that I thought I'd share with the world.

The feature I'm most proud of adding to Mercurial 3.5 is what I'm referring to as auto share. The existing hg share extension/command enables multiple checkouts of a repository to share the same backing repository store. Essentially the .hg/store directory is a symlink to shared directory. This feature has existed in Mercurial for years and is essentially identical to the git worktree feature just recently added in Git 2.5.

My addition to the share extension is the ability for Mercurial to automatically perform an hg clone + hg share in the same operation. If the share.pool config option is defined, hg clone will automatically clone or pull the repository data somewhere inside the directory pointed to by share.pool then create a new working copy from that shared location. But here's the magic: Mercurial can automatically deduce that different remotes are the same logical repository (by looking at the root changeset) and automatically have them share storage. So if you first hg clone the canonical repository then later do a hg clone of a fork, Mercurial will pull down the changesets unique to the fork into the previously created shared directory and perform a checkout from that. Contrast with performing a full clone of the fork. If you are cloning multiple repositories that are logically derived from the same original one, this can result in a significant reduction of disk space and network usage. I wrote this feature with automated consumers in mind, particularly continuous integration systems. However, there is also mode more suitable for humans where repositories are pooled not by their root changeset but by their URL. For more info, see hg help -e share.

For Mercurial 3.4, I contributed changes that refactored how Mercurial's tags cache works. This cache was a source of performance problems at Mozilla's scale for many years. Since upgrading to Mercurial 3.4, Mozilla has not encountered any significant performance problems with the cache on either client or server as far as I know.

Building on this work, Mercurial 3.5 supports transferring tags cache entries from server to client when clients clone/pull. Before, clients would have to recompute tags cache entries for pulled changesets. On repositories that are very large in terms of number of files (over 50,000) or heads (hudreds or more), this could take several dozen seconds or even minutes. This would manifest as a delay either during or after initial clone. In Mercurial 3.5 - assuming both client and server support the new bundle2 wire protocol - the cache entries are transferred from server to client and no extra computation needs to occur. The client does pay a very small price for transferring this additional data over the wire, but the payout is almost always worth it. For large repositories, this feature means clones are usable sooner.

A few weeks ago, a coworker told me that connections to a Mercurial server were timing out mid clone. We investigated and discovered a potential for a long CPU-intensive pause during clones where Mercurial would not touch the network. On this person's under-powered EC2 instance, the pause was so long that the server's inactivity timeout was triggered and it dropped the client's TCP connection. I refactored Mercurial's cloning code so there is no longer a pause. There should be no overall change in clone time, but there is no longer a perceivable delay between applying changesets and manifests where the network could remain idle. This investigation also revealed some potential follow-up work for Mercurial to be a bit smarter about how it interacts with networks.

Finally, I contributed hg help scripting to Mercurial's help database. This help topic covers how to use Mercurial from scripting and other automated environments. It reflects knowledge I've learned from seeing Mercurial used in automation at Mozilla.

Of course, there are plenty of other changes in Mercurial 3.5. Stay tuned for another blog post.

Read and Post Comments

« Previous Page -- Next Page »