Using Mercurial to query Mozilla metadata

November 08, 2013 at 09:42 AM | categories: Mercurial, Mozilla

I have updated my Mercurial extension tailored for Gecko/Firefox development with features that support rich querying of Mozilla/Gecko-development specific metadata!

The extension now comes with a bug full of revision set selectors and template keywords. You can use them to query and format Mozilla-central metadata from the repository.

Revision set selectors

You can now select changesets referencing a specific bug number:

hg log -r 'bug(931383)'

Or that were reviewed by a specific person:

hg log -r 'reviewer(gps)'

Or were reviewed or not reviewed:

hg log -r 'reviewed()'
hg log -r 'not reviewed()'

You can now select changesets that are present in a specific tree:

hg log -r 'tree(central)'

I've also introduced support to query changesets you influenced:

hg log -r 'me()'

(This finds changesets you authored or reviewed.)

You can select changesets that initially landed on a specific tree:

hg log -r 'firstpushtree(central)'

You can select changesets marked as DONTBUILD:

hg log -r 'dontbuild()'

You can select changesets that don't reference a bug:

hg log -r 'nobug()'

You can select changesets that were push heads for a tree:

hg log -r 'pushhead(central)'

(This would form the basis of a push-aware bisection tool - an excellent idea for a future feature in this extension.)

You can combine these revset selector functions with other revset selectors to do some pretty powerful things.

To select all changesets on inbound but not central:

hg log -r 'tree(inbound) - tree(central)'

To find all your contributions on beta but not release:

hg log -r 'me() & (tree(beta) - tree(release))'

To find all changesets referencing a specific bug that have landed in Aurora:

hg log -r 'bug(931383) and tree(aurora)'

To find all changesets marked DONTBUILD that landed directly on central:

hg log -r 'dontbuild() and firstpushtree(central)'

To find all non-merge changesets that don't reference a bug:

hg log -r 'not merge() and nobug()'

Neato!

Template keywords

You can also now print some Mozilla information when using templates.

To print the main bug of a changeset, use:

{bug}

To retrieve all referenced bugs:

{bugs} {join(bugs, ', ')}

To print the reviewers:

{reviewer} {join(reviewers, ', ')}

To print the first version a changeset appeared in a specific channel:

{firstrelease} {firstbeta} {firstaurora} {firstnightly}

To print the estimated first Aurora and Nightly date for a changeset, use:

{auroradate} {nightlydate}

(Getting the exact first Aurora and Nightly dates requires consulting 3rd party services, which we don't currently do. I'd like to eventually integrate these into the extension. For now, it just estimates dates from the pushlog data.)

You can also print who and where pushed a changeset:

{firstpushuser} {firstpushtree}

You can also print the TBPL URL with the results of the first push:

{firstpushtbpl}

Here is an example that prints channel versions and dates for each changesets:

hg log --template '{rev} Nightly: {firstnightly} {nightlydate}; Aurora {firstaurora} {auroradate}; Beta: {firstbeta}; Release: {firstrelease}\n'

Putting it all together

Of course, you can combine selectors and templates to create some mighty powerful queries.

To look at your impact on Mozilla, do something like:

hg log --template '{rev} Bug {bug}; Release {firstrelease}\n' -r 'me()'

You can easily forumate a status report for your activity in the past week:

hg log --template '{firstline(desc)}\n' -r 'firstpushdate(-7) and me()'

You can also query Mercurial to see where changesets have been landing in the past 30 days:

hg log --template '{firstpushtree}\n' -r 'firstpushdate(-30)' | sort | uniq -c

You can see who has been reviewing lots of patches lately:

hg log --template '{join(reviewers, "\n")}\n' -r 'firstpushdate(-30)' | sort | uniq -c | sort -n

(smaug currently has the top score, edging out my 116 reviews with 137.)

If you want to reuse templates (instead of having to type them on the command line), you can save them as style files. Search the Internets to learn how to use them. You can even change your default style so the default output from hg log contains everything you'd ever want to know about a changeset!

Keeping it running

Many of the queries rely on data derived from multiple repositories and pushlog data that is external to the repository.

To get best results, you'll need to be running a monolithic/unified Mercurial repository. You can either assemble one locally with this extension by periodically pulling from the separate repos:

hg pull releases
hg pull integration

Or, you can pull from my personal unified repo.

You will also need to ensure the pushlog data is current. If you pull directly from the official repos, this will happen automatically. To be sure, run:

hg pushlogsync

Finally, you can force a repopulation of cached bug data by running:

hg buginfo --reset

Over time, I want all this to automagically work. Stay tuned.

Comments and future improvements

I implemented this feature to save myself from having to go troving through Bugzilla and repository history to answer questions and to obtain metrics. I can now answer many questions via simple Mercurial one-liners.

Custom revision set selectors and template keywords are a pretty nifty feature of Mercurial. They demonstrate how you can extend Mercurial to be aware of more than just tracking commits and files. As I've said before and will continue to say, the extensibility of Mercurial is really its killer feature, especially for organizations with well-defined processes (like Mozilla). The kind of extensibility I achieved with this extension with custom queries and formatting functions is just not possible with Git (at least not with the reference C implementation that the overwhelming majority of Git users use).

There are numerous improvements that can be made to the extension. Obviously more revision set selectors and template keywords can be added. The parsing routine to extract bugs and reviewers isn't the most robust in the world. I copied some existing Mozilla code. It does well at detecting string patters but doesn't cope well with extracting lists.

I'd also love to better integrate Mercurial with automation results so you can do things like expose a greenpush() selector and do things like hg up -r 'last(tree(inbound)) and greenpush()' (which of course could be exposed as lastgreen(inbound). Wouldn't that be cool! (This would be possible if we had better APIs for querying individual push results.) It would also be possible to have the Mercurial server expose this data as repository data so clients pull it automatically. That would prevent clients from all needing to query the same 3rd party services. Just a crazy thought.

Speed can be an issue. Calculating the release information ({firstnightly} etc) is currently slower than I'd like. This is mostly due to me using inefficient algorithms and not caching things where I should. Speed issues should be fixed in due time.

Please let me know if you run into any problems or have suggestions for improvements. If you want to implement your own revision set selectors or template keywords, it's easier than you think! I will happily accept patches. Keep in mind that Mercurial can integrate with 3rd party services. So if you want to supplement repository data with data from a HTTP+JSON web service, that's very doable. The sky is the limit.