Track Firefox Repositories with Local-Only Mercurial Tags

June 30, 2014 at 10:25 AM | categories: Mercurial, Mozilla

After reading my recent Please Stop Using MQ post, a number of people asked me about my development workflow. While I still owe a full answer to that question, one of the cornerstores is a unified Mercurial repository. Instead of having separate clones for mozilla-central, mozilla-inbound, aurora, beta, etc, I have a single clone with the changesets from all the repositories.

I feel having a unified repository has made me more productive. I no longer have to waste time shuffling changesets between local clones. This introduced all kinds of extra cognitive load and manual processes that slowed me down. I highly encourage others to adopt unified repositories.

Because the various Firefox repositories don't have unique branches or bookmarks tracking the various heads, aggregating multiple repositories introduces a client-side problem of identifying which head is which. If you merely do a hg pull, you'll get a bunch of anonymous heads.

To solve this problem, you'll need to employ minor client-side magic. Previously, I recommended my heavyweight mozext extension.

Today, I'm proud to announce a new, lighter extension: firefoxtree. When pulling from a known Firefox repository, this extension will add a local-only Mercurial tag for that repository. For example, when pulling mozilla-central, the central tag will be created.

Local-only tags are a Mercurial feature only available to extensions. A local-only tag is effectively an overlay of tags that don't get transferred as part of push and pull operations. They behave like normal tags: you can hg up to them and reference them elsewhere changeset identifiers are used. They are also read only: if you update to a tag and then commit, the tag will not move forward (contrast with branches or bookmarks).

Example Usage

Clone https://hg.mozilla.org/hgcustom/version-control-tools and add the following to your .hg/hgrc:

[extensions]
firefoxtree = /path/to/version-control-tools/hgext/firefoxtree

To use, simply pull from a Firefox repo:

$ hg pull https://hg.mozilla.org/mozilla-central

$ hg up central

# Do your development work.

# Time to land.
$ hg pull https://hg.mozilla.org/integration/mozilla-inbound
$ hg rebase -d inbound
$ hg out -r . https://hg.mozilla.org/integration/mozilla-inbound
$ hg push -r .  https://hg.mozilla.org/integration/mozilla-inbound

Please note that hg push tries to push all local changes on all heads to a remote by default. When operating a unified repo, you'll need to use the -r argument to hg push and hg out to limit what changesets are considered. I most frequently use -r . to limit changes to the current checked out changeset.

Also note that this extension conflicts with my mozext extension. I hope to update mozext to make it behave in the same manner. (It was easier to write a new extension than to update mozext.)


Please Stop Using MQ

June 23, 2014 at 11:50 AM | categories: Mercurial, Mozilla

Are you a Mercurial user?

Do you use the mq extension? If so, please don't.

Why, you ask? Good question. The short version is that mq is a solution spawned from version control techniques that were popular over a decade ago. We have better tools now. mq is technologically obsolete.

But if you want to know the full answer, keep reading.

A history of Mercurial and mq

The mq extension is just that: an extension to Mercurial's core capabilities. (Mercurial consists of a core/basic feature set supplemented by built-in and 3rd party extensions that provide more advanced or lesser used, niche features.) Modern versions of Mercurial (if you aren't running version 3.0+, please upgrade ASAP) are significantly different from the Mercurial that necessitated the existence and usage of the mq extension.

In the early days of Mercurial (Mercurial was announced in April 2005), your choices for branching were not spectacular. Your option in the core of Mercurial was to use Mercurial branches. Mercurial branches are very heavy beasts. Branches are permanent. If you change the branch a changeset (Mercurial's term for a commit) is assciated with, the SHA-1 of that changeset changes. The takeaway is Mercurial branches aren't very user friendly. That's why modern versions of Mercurial print a warning when you create one:

$ hg branch my-new-feature
marked working directory as branch my-new-feature
(branches are permanent and global, did you want a bookmark?)

Branches were and still are a relatively poor user experience inside Mercurial, especially when compared to Git branches (comparing Git and Mercurial branches is an apples to oranges comparison: Mercurial branches are more synonymous with CVS or Subversion branches). In defense of branches, they are good for some roles, such as tracking release branches. (Mozilla's aurora, beta, release, etc Firefox repositories should arguably be modeled as branches instead of separate repositories.)

Further complicating the usability of branches in early versions of Mercurial was the lack of commands to rewrite history. Concepts such as rebasing or reordering patches were not always available in Mercurial (or if they were there were significant limitations). Today, this seems like an obvious shortcoming of a version control system. But keep in mind the state of version control around 2005-2007. CVS and Subversion were the big players. Perforce, SourceSafe, TFS, etc were popular in corporate settings. While I'm sure there were version control systems at the time that supported rewriting history, my recollection is that rewriting history did not become a thing until Git rose in popularity (which I think was around 2008 or 2009). At that time, the concept of mutating past commits was alien, if not absurd. Why would you want to lose data on what you've already done? Isn't that the antithesis of a version control system?! Git - and many of its concepts, including history rewriting - were perceived as radical. It doesn't matter if they borrowed the ideas from other tools: Git's newfound popularity made them common and a new necessity. If you went from Git to Subversion (or even Mercurial), the superiority of Git's flexibility was obvious (although the UI was not so great, but people can - and do - still cope).

It was under this renaissance of modern version control tools in the late 2000's where mq became popular. But its popularity was not because of the strength of mq: it was because of its necessity.

mq was added to Mercurial in February 2006 and released as part of Mercurial 0.8.1 in April 2006. As far as I can tell, at the time of mq's release, mq was the only tolerable way to perform history rewriting in Mercurial (hg qref and hg qpush --move are a very crude method of history rewriting). Now, you could leverage multiple Mercurial commands to give the illusion of history rewriting, but it was very cumbersome. No sane person wanted to deal with that. So mq was used.

The introduction of the transplant extension in the Mercurial 0.9.2 release in December 2006 added another history rewriting facility to Mercurial. The transplant command effectively copies changesets between branches or heads.

The Mercurial 0.9.5 release in September 2007 introduced the record extension. The record command provides interactive and incremental commit support, similar to git add -i. While record isn't strictly history rewriting, it provides a very useful functionality for helping to produce separate, smaller commits from a larger ongoing change.

The release of Mercurial 1.1 in December 2008 was - on paper at least - an milestone for Mercurial. This release added the rebase extension. This introduced the rebase command, which moves changesets (as opposed to transplant, which merely copies them). Rebasing allowed you to pull remote changes and then easily move your local changesets on top of the new changes, among other things.

An even bigger feature added in Mercurial 1.1 was the bookmarks extension. Bookmarks are Mercurial's lightweigh branches. Instead of permanent association with a changeset, a bookmark is a movable label attached to a changeset. They are very similar to Git branches.

The initial bookmarks feature was far from robust. You couldn't push bookmarks and share them with others until Mercurial 1.6 in June 2010. It's worth noting that bookmarks became part of the Mercurial core (as opposed to existing in an extension that must be enabled) in Mercurial 1.8, released in March 2011.

Bookmarks, record, and rebase provided a decent framework for history editing in Mercurial. But there were still some rough edges, notably around making it easier to edit history: mq was still easier to use for doing tasks like reordering patches.

It wasn't until Mercurial 2.2 in May 2012 that Mercurial gained the ability to easily reorder patches using something other than mq. It came via the histedit extension, which provides the histedit command. This command provides a mechanism to interactively edit history. It is very similar to git rebase --interactive.

With the introduction of the histedit extension, you could (finally) perform most of the more advanced repository interaction workflows that mq allowed without using mq. Continue reading the next section to learn why not using mq is important.

For Mozillians reading, it's worth noting that the conversion of the Firefox source repository from CVS to Mercurial occurred in March 2007. At that time, Mercurial had mq and transplant for performing history rewriting. mq was thus the most convenient method for rewriting history and managing individual patches (a process Firefox development largely maintains to this day). Thus, mq effectively became a de facto requirement for developing Firefox patches. Despite mq being arguably unnecessary since Mercurial 2.2's release in May 2012, mq is still widely used within Mozilla. My perception is that most developers continue to use mq. Furthermore, even new developers are still picking it up instead of utilizing Mercurial's other commands for managing changesets. FWIW, I believe many are so turned off by mq or don't want to learn Mercurial or mq that they do their day-to-day development in Git and only involve Mercurial when doing the final push to the canonical Firefox Mercurial repository.

Now that we learned why mq is popular, let's talk about why it shouldn't be used any more.

The hack that is mq

In terms of architecture, mq is a giant hack: a wart on top of Mercurial. The goal of every version control system is to track changes over time. mq actively works against that.

At the core of every Mercurial repository is a store that contains the repository data. The changesets in the repository are logically represented as a directed acyclic graph (DAG). When you run hg commit, hg pull, hg import, or any command that introduces new changesets, new data is added to the DAG and written to the store.

One of the most important properties of the store is that it is supposed to be append-only: once data is added, it is never removed. This property allows Mercurial to fulfill the contract of version control systems, which is to keep track of data without losing anything.

This append-only property is important because it means Mercurial can know about all of the history for all of time. If you need to perform a merge, Mercurial knows exactly what came before and thus the most effective way to perform that merge. (It's worth noting that merging can be quite complicated. All this extra data helps do the merge correctly the first time, which means you can spend your time writing code instead of resolving merge conflicts.)

The "mq is a hack" statement derives from how mq interacts with Mercurial's store. Your mq patch queue is effectively an overlay over Mercurial's built-in store that is in an ever-changing state of partial application. When you hg qpush to apply a patch from the patch queue, you effectively do an hg import to import a Mercurial changeset file into the core Mercurial store. That's mostly fine. But when you hg qpop (unapply a patch from mq), you are effectively asking Mercurial to delete a changeset and all data associated with it from the core Mercurial store. In other words, qpop breaks the ideal append-only property of the Mercurial store: qpop forces Mercurial to delete data and lose track of history. This makes Mercurial very sad.

Because you are throwing away repository data, merges become much harder. This means you spend more of your time dealing with resolving conflicts. To further complicate that, the mq code paths for performing merges and conflict resolution aren't as robust as those used by, say, hg rebase and hg histedit. So if you use mq, you are pretty much guaranteed a poorer conflict resolution process. This means you spend more of your time wrestling with tools instead of, you know, actually doing something productive.

Deleting repository data by unapplying patches via mq also has negative performance implications. Given a large enough repository (such as Firefox's - it currently has ~200,000 changesets), these performance issues can result in severely degraded performance. An extreme example of this is Mozilla's Try repository. The runaway process/CPU issues leading to outages is due to a known Mercurial issue dealing with inefficient handling of stores that aren't append-only. When you use mq, you risk running into the same issues. Although the performance implication should hopefully be a magnitude or two less pronounced than what Mozilla's Try repository experiences.

In theory, mq could compensate for deletion of data from Mercurial's core store by retaining that data itself. But it doesn't. This leads to yet more reasons why you shouldn't use mq.

mq throws away perfectly fine history data. When you hg qrefresh, your current patch is overwritten with the new one. The old version is discared. This actively works against the goal of a version control system to record history. Have you ever started down a path and realized a few commits later that you need to throw it away and backtrack to a few commits back? I do that all the time. If you qrefresh, tough luck: you've lost your history. If only the history store was append-only.

(In fairness to mq, the patch queue maintained by mq is itself a Mercurial repository and can be committed to, preventing history loss. The mqext extension even provides a mechanism for automatically committing the patch queue repo during qrefresh and other mutation events. But as we will see, even with this, mq is not perfect.)

mq also fails to adequately update patches when they are moved around. Little known fact: mq patches have their parent changeset encoded in them. If you run hg qpush --exact, Mercurial will apply the patch against the parent changeset it was actually created against, as opposed to the current work directory changeset. In theory, you should never encounter conflicts when applying patches this way. Because Mercurial's changeset SHA-1s are dependent on content (like Git), if 23d165967da3 is the child of d5741ada659d, then there should be absolutely no way that 23d165967da3 should fail to apply directly to d5741ada659d. The problem is that mq fails to update the parent changeset when you push patches on top of a new parent! e.g.

# Let's create a root commit.
$ echo 'foo' > foo
$ hg commit -A -m 'initial commit'
adding foo
$ hg log
changeset:   0:23d165967da3
user:        Gregory Szorc <gps@mozilla.com>
date:        Sun Jun 22 13:48:05 2014 -0700
summary:     initial commit

Notice the changeset of that commit. 023d165....

Now let's create a new mq patch.

$ echo 'patch 1' > foo
$ hg qnew -m 'make mq patch 1' p1

# Creating an mq patch creates a file in .hg/patches.
$ cat .hg/patches/p1
# HG changeset patch
# Parent 23d165967da39e5846976eb6ed967f1058827ffa
# User Gregory Szorc <gps@mozilla.com>
make p1

diff --git a/foo b/foo
--- a/foo
+++ b/foo
@@ -1,1 +1,1 @@
-foo
+patch 1

Notice how the parent is listed as the changeset in Mercurial's core store.

Now let's pop that patch, create a new commit, and push the old mq patch.

$ hg qpop
popping p1
patch queue now empty

$ echo 'bar' > bar
$ hg commit -A -m 'second commit'
adding bar

$ hg log
changeset:   1:d5741ada659d
tag:         tip
user:        Gregory Szorc <gps@mozilla.com>
date:        Sun Jun 22 13:48:37 2014 -0700
summary:     second commit

changeset:   0:23d165967da3
user:        Gregory Szorc <gps@mozilla.com>
date:        Sun Jun 22 13:48:05 2014 -0700
summary:     initial commit

$ hg qpush
applying p1
now at: p1

$ hg log --graph
@ changeset:   2:64851294d038
| tag:         p1
| tag:         qbase
| tag:         qtip
| tag:         tip
| user:        Gregory Szorc <gps@mozilla.com>
| date:        Sun Jun 22 14:27:55 2014 -0700
| summary:     make mq patch 1
|
o changeset:   1:d5741ada659d
| tag:         qparent
| user:        Gregory Szorc <gps@mozilla.com>
| date:        Sun Jun 22 13:48:37 2014 -0700
| summary:     second commit
|
o changeset:   0:23d165967da3
  user:        Gregory Szorc <gps@mozilla.com>
  date:        Sun Jun 22 13:48:05 2014 -0700
  summary:     initial commit

We see our mq patch is now applied on top of the second commit, d5741ada659d because it was pushed while the working directory was sitting at d5741ada659d.

But let's see what mq says:

$ hg cat .hg/patches/p1
# HG changeset patch
# Parent 23d165967da39e5846976eb6ed967f1058827ffa
# User Gregory Szorc <gps@mozilla.com>
make p1

diff --git a/foo b/foo
--- a/foo
+++ b/foo
@@ -1,1 +1,1 @@
-foo
+patch 1

mq still says 23d165967da3 is the parent, even though we changed the parent! mq is lying to us! Now, we can rectify the situation by running hg qrefresh. That will cause mq to regenerate the patch file, which will pick up the new parent. But who does that? Unless you need to qrefresh to resolve conflicts or other changes, nobody. This means that if you distribute the patch - say you are uploading it for review - the parent changeset may not be accurate and anyone applying that patch may apply it to the wrong changeset and get unexpected results. That's no good.

For what it's worth, this behavior of not auto refreshing patches is by design. We don't necessarily want application to be a mutating operation, especially since mq repositories aren't automatically versioned. This is partially because mq was effectively designed to be quilt built into Mercurial. And quilt is a tool that was invented before the modern version control tools era. mq's behavior is influenced more from emulating quilt than from the desire to facilitate a sane patch/stack/feature based workflow. If you ever wanted to know why mq works the way it does and not like something more modern, that's why.

Another huge reason to not use mq is because its concepts and workflows are alien to modern and well-understood practices. While I'm a huge fan of Mercurial and prefer it over Git (read my thoughts on the topic), I don't pretend that Git isn't the most widely used version control software right now (at least in the open source world). It got that way despite its horrible UI shortcomings. I argue its rise was because people enjoy workflows such as lightweight branches and fast, often-simple merging. (GitHub and its fairly good web UI certainly helped the cause.) People today know Git. They know lightweight branches. I think they largely understand the concept of repositories with multiple heads and grok how a distributed version control system means you can commit locally without affecting a remote (server). They grok how can you push local commits to a remote (sometimes in the form of a pull request). mq doesn't work this way - the way that people have come to expect from Git. There are cognitive biases that lead us to believe that because mq is different (from Git) that it is inferior. Why should someone apply and unapply individual patches on a temporary queue (or stack depending on how you think about it) instead of working with branches? If someone groks Git and multiple heads/branches/bookmarks, why should they go through all the trouble of learning mq? The fact that mq doesn't work as well as non-mq mechanisms is icing on the figurative cake.

If the reasons I've listed are not enough, know this: mq users will not able to fully utilize the game-changing Changeset Evolution feature of Mercurial. Remember in the late 2000's when people thought Git was crazy for allowing you to mutate history - the antithesis of version control because it threw away data? Changeset Evolution and obsolescence markers - the mechanism used to enable Changeset Evolution - are Mercurial's answer to that. They enable Mercurial to retain metadata about previous changesets and how they evolved into newer changesets. That criticism about history rewriting deleting history: Changeset Evolution makes it largely go away. A light version of the history of past commits is preserved and propagated during push and pull operations.

Changeset Evolution changes the game much like Git changed the landscape of version control in the late 2000's. So much of our notions of how modern version control work are based on the current behavior of tools. For example, any Git user will tell you, never git push --force. Bad things will happen. Why do they say that? They say that because the user experience of distributed history rewriting in Git can be horrible and error prone. And it is like this because Git's implementation of history rewriting burns the old books of history once they are translated to the new world order. (OK, to be fair to Git it does have a reflog that can kinda sorta be used to recover in disaster scenarios. But it's far from robust and there are numerous caveats, notably that the reflog is not part of data distributed with push/fetch and therefore susceptible to single/few points of failure.) Mercurial's obsolescence markers, by contrast, leave footnotes in our history books and transfer these annotations as part of the distributed repository data. Changeset Evolution uses these footnotes to reconstruct the pages of history, even if they differ from our own perspective. For example, you can force push rewritten history and other clients can recover from that automagically. Want to do complex history rewriting and force push? Go right ahead: Mercurial and Changeset Evolution will enable you to work without imposing as many (practical) restrictions on your workflow as other tools. Good tools are flexible and impose fewer restrictions. This is why Mercurial and Changeset Evolution have the potential to change the world of version control much like Git changed things half a decade ago.

Are you still not convinced mq is a bad idea? Know this: a Mercurial core developer recently proposed marking mq as deprecated. That's right: there's very little love for mq within the Mercurial Project. Maintainers generally think what I've stated in this post: mq is yesterday's solution to yesterday's problem and it should be cast aside.

If you are a mq user, I hope you are now convinced that mq is a bad idea and you should transition away from it as soon as possible. In the next section, I'll talk about moving off mq.

Moving away from mq

I completely abandonded mq in December 2013 and am never using it again. In hindsight, I should have made the transition much sooner.

I could write an entire post about my workflow. Since this post is already a bit long, I'm going to describe it with mostly command line examples.

I use bookmarks for managing my feature branches. I use one bookmark per logical feature. This workflow is very similar to using Git branches.

# Create a new bookmark for the new feature I'm developing.
$ hg bookmark gps/my-new-feature

# make changes to files
(gps/my-new-feature) $ hg commit
# make more changes)
(gps/my-new-feature) $ hg commit
# keep making small changes
(gps/my-new-features) $ hg commit

# I decide I want to reorder or merge a few commits. I look at the
# DAG to see which changeset to rewrite.
(gps/my-new-features) $ hg log --graph

(gps/my-new-feature) $ hg histedit 5e907ed10e22
# this opens an editor where I can say what changes to make)

# Let's start a new feature.
(gps/my-new-feature) $ hg up central
(leaving bookmark gps/my-new-feature)
$ hg bookmark gps/feature2
# Make some changes
(gps/feature2) $ hg commit

# I got review on my original feature. Let's push it.
(gps/feature2) $ hg up gps/my-new-feature
(activating bookmark gps/marionette-restart)

(gps/my-new-feature) $ hg pull gecko
pulling from http://hg.stage.mozaws.net/gecko
searching for changes
adding changesets
adding manifests
adding file changes
added 118 changesets with 543 changes to 328 files (+1 heads)
OBSEXC: pull obsolescence markers
OBSEXC: looking for common markers in 215590 nodes
OBSEXC: no unknown remote markers
OBSEXC: DONE
updating bookmark aurora
updating bookmark b2g28
updating bookmark b2ginbound
updating bookmark beta
updating bookmark central
updating bookmark esr24
updating bookmark fx-team
updating bookmark inbound
updating bookmark release
(run 'hg heads .' to see heads, 'hg merge' to merge)

(gps/my-new-feature) $ hg rebase -d fx-team

# Verify I'm about to push what I want to push.
(gps/my-new-feature) $ hg out -r . fx-team

# And push the changes.
(gps/my-new-feature) $ hg push -r . fx-team

# And delete the bookmark that I don't need any more.
(gps/my-new-feature) $ hg bookmark -d gps/my-new-feature

That's how bookmarks work. If you upgrade to Mercurial 3.0 or newer, Mercurial will print messages when you enter and leave bookmarks. This is a great UI improvement!

I also have the prompt extension installed and configured so my shell prompt contains the active bookmark. This helps me keep track of what head I'm committing to.

I'm also a user of the experimental evolve extension. This is the extension that is implementing the Changeset Evolution feature. Eventually the functionality will be merged into core Mercurial. One such workflow with evolve is as follows.

# Create a bookmark for a new feature
$ hg bookmark gps/my-new-feature

# Make some changes.
(gps/my-new-feature) $ hg commit -m 'first commit'

# Make some more changes.
(gps/my-new-feature) $ hg commit -m 'second commit'

# Submit code for review. (Exact commands excluded.)

# Wait for review comments. There is a nit on the first commit.
$ hg previous
[204142] first commit

# Make changes
$ hg amend
1 new unstable changesets

# (amend is provided by evolve. It is equivalent to hg commit --amend)

# evolve is the command that looks at the history rewriting markers
# and knows how to rebase, etc other changesets in the face of
# rewriting.
$ hg evolve
move:[204143] second commit
atop:[204144] first commit
merging foo
warning: conflicts during merge.
merging foo incomplete! (edit conflicts, then use 'hg resolve --mark')
evolve failed!
fix conflict and run "hg evolve --continue"
abort: unresolved merge conflicts (see hg help resolve)

# Manually address conflict markers in file "foo"
$ hg resolve -m foo
no more unresolved files

$ hg evolve --continue
grafting revision 204143

# And I'm ready to push.

If you really love the concept of mq and patch queues and absolutely must do development that way (as opposed to bookmarks/branches), then you should consider using the shelve extension instead of mq. While shelve is still a bit hacky in that it violates the append-only goal of the Mercurial store, it does so in a way that integrates much better than mq. For example, when you unshelve, the changesets will only apply to the parent changeset they were last attached to. You will get no merge conflicts during unshelve. If you want to rebase, you will need to use the rebase command, which will go through Mercurial's more robust merging code paths, leaving conflict markers instead of .rej files (assuming you are using the internal merge tool). Shelve also groups multiple changesets together. Contrast with mq's default behavior of a flat list of patches (my mq queue grew to over 100 patches with patches belonging to dozens of bugs). Shelve is all around a better mq and is a good compromise between the stack/queue based development workflow of mq with the more modern development workflow of bookmarks. If nothing else, shelve integrates much better into the Mercurial core. As a quantitative measurement of this, mq weighs in at 3461 lines. Shelve, by comparison, is a svelt 704 lines. Considering they do nearly the same thing (shelve offloads functionality like folding and reordering to core Mercurial commands that didn't exist at the time of mq's inception), I trust shelve with its reduced surface area to be more robust and bug free.

But as good as shelve is, it's still not as integrated as bookmarks or branches. If you can, try to stick to the Mercurial core features and avoid shelve. But whatever you do, try to avoid mq!

Conclusion

I never used Mercurial before becoming an employee of Mozilla and a contributor to Firefox. I was a mq user for my first two plus years of Mercurial use. I blindly followed the Mozilla documentation and advice of my fellow Mozillians to use mq. It wasn't until I truly invested in learning Mercurial (as opposed to learning merely the command line interface - the minimum required for me to contribute to Firefox) that I realized how wrong the common Mozilla mindset about Mercurial and mq was. I was incorrectly associating mq with "The Mercurial Way" and thought Mercurial was inherently lacking. I knew about branches and bookmarks, but didn't realize I could use them as an alternative to mq.

Only after learning Mercurial's internals, contributing patches to the Mercurial core, and reading the changelog of the Mercurial project itself did I realize the truth: mq's popularity is a result of historical necessity that no longer exists; and, continued usage of mq represents a general failure in user eduction about mq's drawbacks and Mercurial's modern features. In other words, the more informed I became, the more I learned what a bad idea mq is and why it should be avoided.

At Mozilla, we happened to switch to Mercurial at a time (2007) when mq was the only reasonable solution available to Mercurial. The procedures and documentation developed in 2007 have persisted to 2014, largely ignoring advancements in Mercurial over that time. To be fair to my fellow Mozillians and Mercurial users everywhere, Mercurial wasn't truly ready to jettison mq until the histedit extension came into existence a few years ago. But that was years ago. We all should have had time to switch, but we haven't. Or if we have, we've switched to Git. (And I can't blame anyone for switching to Git - both its mind share and the network effect of GitHub are very compelling reasons.)

I hope you now know why mq is a bad idea. Please join me in realizing that mq was a solution for yesterday's problems. mq is a legacy tool that loses data and makes your life harder. It's time to let go of the past and embrace the future. Please join me in shaving the mq neck beard that has been growing since CVS was a popular version control system.

Addendum on Code Review

While I've wanted to write this post for a while, the trigger for me finally writing it is my work on integrating Mercurial with the Review Board code review software. The Bugzilla Team at Mozilla is working on a project to integrate Review Board into Bugzilla for code review, supplementing the antiquated-by-comparison Splinter code review feature. Instead of uploading patches to Bugzilla - how Mozilla and Firefox development has done it for over a decade - you will push changesets to a Mercurial repository and pushed changesets will automatically turn into code reviews on Review Board and get cross-posted to referenced Bugzilla bugs.

This is all part of a larger goal to make it easier for patches to be submitted and landed consistently and centrally. This should lead to various automation and tooling improvements, such as bots conducting aspects of code review and automatically landing reviewed patches.

These things are very difficult to do in a Bugzilla-only and patch-based world because it is difficult to first aggregate patches and second process them in a unified manner. Patches come in many different flavors. You have Mercurial style diffs. You have plain patches, and patches with Git or Mercurial style annotation. You have different lines of context in the patch. You have different handling of binary content. You have mq lying about the parent changeset. All these contribute to a massive impedance mismatch of sorts that makes it extremely difficult to roll out any kind of automation. Any tools built on top must first solve a very difficult data normalization and distribution problem. These are problems that should not exist and drastically raise the barrier to forward progress.

Contrast this with pushing to Mercurial (or Git) repositories. The repository is the data, not the patch file. Clients are able to extract variations of that data however they see fit. Commit data is centralized, distributable, and unified. There are dozens of existing tools and services that already know how to speak with repositories. Our data problem not only goes away, but many solutions have already been invented for us.

Back to Review Board.

I've written an extension that pushes review-specific metadata to the Mercurial server. This effectively allows Mercurial to speak code review and interact with the code review server. For Mercurial clients that are writing obsolescence markers, we are able to track the changesets undergoing review against their logical successors. For example, if you pull and rebase, this will rewrite changeset X to changeset Y (because the SHA-1 changes). Mercurial knows that a) changeset X was under review and b) Y is the new version of X. So, when you push Y for code review, the review associated with X is updated automagically. You could accomplish this with heuristic-based matching or user prompting, but these are error prone (you wouldn't believe the number of corner cases) or require excessive user time, especially when complex history rewriting is involved. Mercurial's obsolescence markers allow the matching to just work in most scenarios and for people to spend more time writing and reviewing code as opposed to telling the code review tool what and how to review.

As I was implementing this extension, I realized that mq users (most Mercurial users at Mozilla I reckon) would be unable to reap the benefits of all this magic because, well, they are using mq. I care passionately about developer productivity. Good tools are the tools that don't require constant massaging. I want everyone to get the full benefits of Mozilla's new code review system when it launches. And the only way that happens is if people stop using mq. And the only way that happens is if people learn why mq is a bad idea. So if nothing else has convinced you to stop using mq yet, maybe the lure of an amazing and automagical code review and collaboration experience will sway you.


Using Mercurial for Status Reports

April 01, 2014 at 12:30 PM | categories: Mercurial, Mozilla

Mercurial has a pair of amazing features called Revisions Sets and Templates. Combined, they allow you to query Mercurial like a database and to generate custom reports from obtained data.

As I've demonstrated, you can write Mercurial extensions to provide custom revision set queries and template functions and keywords. My mozext extension aggregates Mozilla's pushlog data into a local SQLite database and makes this data available to revision sets and templates.

My hack of the day is to use revision sets and templates to create a weekly status report:

hg log -r 'public() and me() and firstpushdate("-7")' \
--template '* {ifeq(reviewer, "gps", "Review: ", "Landing: ")}{firstline(desc)}\n'

When I run this, I get the output:

* Review: Bug 957241 - Don't package the full sdk when we don't need it. r=gps
* Review: Bug 987146 - Represent SQL queries more efficiently. r=gps.
* Review: Bug 987984 - VirtualenvManager.call_setup() should use self.python_path instead of sys.executable, r=gps
* Landing: Bug 987398 - Part 1: Run mochitests from manifests with mach; r=ahal
* Landing: Bug 987398 - Part 2: Handle install-to-subdir in TestResolver; r=ahal
* Landing: Bug 987414 - Pass multiple test arguments to mach testing commands; r=ahal
* Review: Bug 988141 - Clean up config/recurse.mk after bug 969164. r=gps
* Landing: Bug 973992 - Support experiments add-ons; r=Unfocused
* Review: Bug 927672 - Force pymake to fall back to mozmake when run on build slaves. r=gps
* Review: Bug 989147 - Use new sccache for Linux and Android builds. r=gps
* Review: Bug 989147 - Add missing part of the patch from rebase conflict. r=gps
* Landing: Bug 975000 - Disable updating and compatibility checking for Experiments; r=Unfocused
* Landing: Bug 985084 - Experiment add-ons should be disabled by default; r=Unfocused
* Landing: Backed out changeset 4834a3833639 and c580afddd1cb (bug 985084 and bug 97500)
* Landing: Bug 975000 - Disable updating and compatibility checking for Experiments; r=Unfocused
* Landing: Bug 985084 - Experiment add-ons should be disabled by default; r=Unfocused
* Landing: Bug 989137 - Part 1: Uninstall unknown experiments; r=Unfocused
* Landing: Bug 989137 - Part 2: Don't use a global logger; r=gfritzsche
* Landing: Bug 989137 - Part 3: Log.jsm API to get a Logger that prefixes messages; r=bsmedberg
* Landing: Bug 989137 - Part 4: Use a prefixing logger for Experiments logging; r=gfritzsche
* Landing: Bug 989137 - Part 5: Prefix each log message with the instance of the object; r=gfritzsche
* Review: Bug 988849 - Add mach target for jit tests; r=gps
* Landing: Bug 989137 - Part 6: Create experiment XPIs during the build; r=bsmedberg
* Landing: Bug 989137 - Part 7: Remove unncessary content from test experiments; r=Unfocused
* Landing: Bug 985084 - Part 2: Properly report userDisabled in the API; r=Unfocused

Which I can then copy and paste directly into the status tool to capture all my weekly code contributions! That takes a few seconds to run and saves me a few minutes of typing.

For the curious, let's break that Mercurial command down.

  • public() selects all public changesets. These are changesets in the repository that have been pushed to a publishing repository. In other words, patches that landed in Firefox.
  • me() is a custom revset from my mozext extension that parses the commit message and selects changesets that I authored or reviewed.
  • firstpushdate("-7") is a custom revset from my mozext extension. It selects changesets that were first pushed in the last 7 days (using pushlog data stored in a local SQLite database).

The template piece should be easy to read. I have a simple branch testing whether the changeset is a review or not, then output a label followed by the first line of the commit message.

I have this command saved under the [alias] section of my ~/.hgrc file so I can just type hg statusreport.

While there is room to improve the tool (stripping r= lines from commit messages for example), I think it's a pretty cool hack and shows how Mercurial can grow to solve problems you don't think your version control system knows how to solve.


How Promises and Tasks are Improving Tests

March 30, 2014 at 02:15 PM | categories: Mozilla, JavaScript

I was a very early adoptor of promises and Tasks in Firefox's JavaScript code base. To me, promises on their own are ok. The ability to chain promises together and tack one error handler on the end sure beats the Pyramid of Doom and having to pass errors into callbacks everywhere. But what really lured me in were tasks: using generators (then a feature only available in SpiderMonkey) to represent async code flow as nice, easy-to-read procedural flow that nearly every programming can relate to. It made code much easier to read and grok. I've been using tasks ever since.

When I started writing new APIs that returned promises instead of using callbacks, I found myself writing a lot of tests consuming promises and using tasks. So, I added an add_task API to our xpcshell test harness to make writing task-based unit tests involve less boilerplate. That API is now used heavily for new xpcshell tests.

While I initially added add_task() to cut down on the boilerplate for writing tests, I only recently realized it has another benefit: it's helped cut down on hung tests!

Before, with callback-based APIs, we'd code tests like so:

add_test(function () {
  do_something(function onThatThing(result) {
     Assert.ok(result.success);
     run_next_test();
  });
});

Or another pattern:

add_test(function () {
  do_something(function onThatThing(result) {
    // The next line throws an Error by accident!
    result.foo();
    run_next_test();
  });
});

In the first example, the test will hang if the callback never gets called. The test harness driver will eventually terminate the test (after a multi-second delay with no output). Not good.

In the second example, we are still susceptible to the callback not being called. But we have a different problem: an untrapped Error is thrown from a callback! This results in the same behavior: run_next_test() (the function that says to advance to the next test) won't execute and the test will hang until it times out.

A more proper way to write this test is:

add_test(function () {
  do_something(function onThatThing(result) {
    try {
      result.foo();
    } catch (ex) {
      do_report_unexpected_exception(ex);
    }

    run_next_test();
  });
});

In reality, few people surround all their callbacks with try..catch blocks because, well, it's a lot of typing and people don't always think it's necessary (the test passes most of the time, doesn't it?).

What promises and task-based tests are doing is enabling us to write more robust tests without all of the extra work. Here is how you would use task-based tests:

add_task(function* () {
  let result = yield do_something();
  // The next line throws an Error by accident!
  result.foo();
});

Here, the Error thrown by the test function is thrown within the context of an executing Task. It is caught by the Task and converted into a rejected promise. The test harness sees that failure immediately and no timeout occurs! This can cut down on overhead when writing tests, especially if you are trying to debug a hang.

Furthermore, the test is 4 lines versus 10. Less typing means you have more time to write additional tests or you can focus on writing other patches.

Finally, the task-based test functions are easier to understand. That 4 line, procedural test is much easier to grok than its callback-based counterpart.

And before I conclude, I should mention that we can do more with promises. For example, bug 976205 is making uncaught promise errors turn into test failures! There is also an awesome patch in bug 867742 to introduce a unified JavaScript test harness API for defining JavaScript tests in our tree (currently the APIs for xpcshell tests and mochitests are different, leading to cognitive dissonance and lower productivity). If you want to be a hero to the Firefox developer community, help finish that patch.

Given that so much Firefox feature development time (at Mozilla) is spent writing and debugging tests, I encourage everyone to consider promises and tasks for his or her next feature so that you can cut down on development time and complete projects faster.


New Repository for Mozilla Version Control Tools

February 05, 2014 at 07:15 PM | categories: Git, Mercurial, Mozilla

Version control systems can be highly useful tools.

At Mozilla, we've made numerous improvements and customizations to our version control tools. We have custom hooks that run on the server. We have a custom skin for Mercurial's web interface. Mozillians have written a handful of Mercurial extensions to aid with common developer tasks, such as pushing to try, interacting with Bugzilla, making mq more useful, and more.

These have all come into existence in an organic manner, one after the other. Individuals have seen an itch and scratched it. Good for them. Good for Mozilla.

Unfortunately, the collection of amassed tools has become quite large. They have become difficult to discover and keep up to date. The consistency in quality and style between the tools varies. Each tool has separate processes for updating and changing.

I contacted the maintainers of the popular version control tools at Mozilla with a simple proposal: let's maintain all our tools under one repo. This would allow us to increase cohesion, share code, maintain a high quality bar, share best practices, etc. There were no major objections, so we now have a unified repository containing our version control tools!

Currently, we only have a few Mercurial extensions in there. A goal is to accumulate as much of the existing Mercurial infrastructure into that repository as possible. Client code. Server code. All of the code. I want developers to be able to install the same hooks on their clients as what's running on the server: why should your local repo let you commit something that the server will reject? I want developers to be able to reasonably reproduce Mozilla's canonical version control server configuration locally. That way, you can test things locally with a high confidence that your changes will work the same way on production. This allows deployments to move faster and with less friction.

The immediate emphasis will be on moving extensions into this repo and deprecating the old homes on user repositories. Over time, we'll move into consolidating server code and getting hg.mozilla.org and git.mozilla.org to use this repository. But that's a lower priority: the most important goal right now is to make it easier and friendlier for people to run productivity-enhancing tools.

So, if you see your Mercurial extensions alerting you that they've been moved to a new repository, now you know what's going on.


« Previous Page -- Next Page »