Gregory Szorc's Digital Home

Importance of Hosting Your Version Control Server

November 13, 2013 at 09:25 AM | categories: Git, Mercurial, Mozilla

The subject of where to host version control repositories comes up a lot at Mozilla. It takes many forms:

We should move the Firefox repository to GitHub
I should be allowed to commit to GitHub
I want the canonical repository to be hosted by Bitbucket

When Firefox development is concerned, Release Engineerings puts down their foot and insists the canonical repository be hosted by Mozilla, under a Mozilla hostname. When that's not possible, they set up a mirror on Mozilla infrastructure.

I think a recent issue with the Jenkins project demonstrates why hosting your own version control server is important. The gist is someone force pushed to a bunch of repos hosted on GitHub. They needed to involve GitHub support to recover from the issue. While it appears they largely recovered (and GitHub support deserves kudos - I don't want to take away from their excellence), this problem would have been avoided or the response time significantly decreased if the Jenkins people had direct control over the Git server: they either could have installed a custom hook that would have prevented the pushes or had access to the reflog so they could have easily seen the last pushed revision and easily forced pushed back to it. GitHub doesn't have a mechanism for defining pre-* hooks, doesn't allow defining custom hooks (a security and performance issue for them), and doesn't expose the reflog data.

Until repository hosting services expose full repository data (such as reflogs) and allow you to define custom hooks, accidents like these will happen and the recovery time will be longer than if you hosted the repo yourself.

It's possible repository hosting services like GitHub and Bitbucket will expose these features or provide a means to quickly recover. If so, kudos to them. But larger, more advanced projects will likely employ custom hooks and considering custom hooks are a massive security and performance issue for any hosted service provider, I'm not going to hold my breath this particular feature is rolled out any time soon. This is unfortunate, as it makes projects seemingly choose between low risk/low convenience and GitHub's vibrant developer community.

Mercurial 2.8 released

November 08, 2013 at 02:30 PM | categories: Mercurial, Mozilla

Mercurial 2.8 has been released.

The changes aren't as sexy as previous releases. But there are a handful of bug fixes that seem useful to pull in. People may also find the new shelve extension useful.

I encourage Mozillians to keep their Mercurial up to date. I once went around the San Francisco office and stood behind people as they upgraded to a modern Mercurial. For the next few weeks I was hearing a lot of "OMG Mercurial is so much better now." Don't handicap yourself by running an older, buggy Mercurial.

If you don't yet feel comfortable running 2.8, 2.7 should be safe.

Using Mercurial to query Mozilla metadata

November 08, 2013 at 09:42 AM | categories: Mercurial, Mozilla

I have updated my Mercurial extension tailored for Gecko/Firefox development with features that support rich querying of Mozilla/Gecko-development specific metadata!

The extension now comes with a bug full of revision set selectors and template keywords. You can use them to query and format Mozilla-central metadata from the repository.

Revision set selectors

You can now select changesets referencing a specific bug number:

hg log -r 'bug(931383)'

Or that were reviewed by a specific person:

hg log -r 'reviewer(gps)'

Or were reviewed or not reviewed:

hg log -r 'reviewed()'
hg log -r 'not reviewed()'

You can now select changesets that are present in a specific tree:

hg log -r 'tree(central)'

I've also introduced support to query changesets you influenced:

hg log -r 'me()'

(This finds changesets you authored or reviewed.)

You can select changesets that initially landed on a specific tree:

hg log -r 'firstpushtree(central)'

You can select changesets marked as DONTBUILD:

hg log -r 'dontbuild()'

You can select changesets that don't reference a bug:

hg log -r 'nobug()'

You can select changesets that were push heads for a tree:

hg log -r 'pushhead(central)'

(This would form the basis of a push-aware bisection tool - an excellent idea for a future feature in this extension.)

You can combine these revset selector functions with other revset selectors to do some pretty powerful things.

To select all changesets on inbound but not central:

hg log -r 'tree(inbound) - tree(central)'

To find all your contributions on beta but not release:

hg log -r 'me() & (tree(beta) - tree(release))'

To find all changesets referencing a specific bug that have landed in Aurora:

hg log -r 'bug(931383) and tree(aurora)'

To find all changesets marked DONTBUILD that landed directly on central:

hg log -r 'dontbuild() and firstpushtree(central)'

To find all non-merge changesets that don't reference a bug:

hg log -r 'not merge() and nobug()'

Neato!

Template keywords

You can also now print some Mozilla information when using templates.

To print the main bug of a changeset, use:

{bug}

To retrieve all referenced bugs:

{bugs} {join(bugs, ', ')}

To print the reviewers:

{reviewer} {join(reviewers, ', ')}

To print the first version a changeset appeared in a specific channel:

{firstrelease} {firstbeta} {firstaurora} {firstnightly}

To print the estimated first Aurora and Nightly date for a changeset, use:

{auroradate} {nightlydate}

(Getting the exact first Aurora and Nightly dates requires consulting 3rd party services, which we don't currently do. I'd like to eventually integrate these into the extension. For now, it just estimates dates from the pushlog data.)

You can also print who and where pushed a changeset:

{firstpushuser} {firstpushtree}

You can also print the TBPL URL with the results of the first push:

{firstpushtbpl}

Here is an example that prints channel versions and dates for each changesets:

hg log --template '{rev} Nightly: {firstnightly} {nightlydate}; Aurora {firstaurora} {auroradate}; Beta: {firstbeta}; Release: {firstrelease}\n'

Putting it all together

Of course, you can combine selectors and templates to create some mighty powerful queries.

To look at your impact on Mozilla, do something like:

hg log --template '{rev} Bug {bug}; Release {firstrelease}\n' -r 'me()'

You can easily forumate a status report for your activity in the past week:

hg log --template '{firstline(desc)}\n' -r 'firstpushdate(-7) and me()'

You can also query Mercurial to see where changesets have been landing in the past 30 days:

hg log --template '{firstpushtree}\n' -r 'firstpushdate(-30)' | sort | uniq -c

You can see who has been reviewing lots of patches lately:

hg log --template '{join(reviewers, "\n")}\n' -r 'firstpushdate(-30)' | sort | uniq -c | sort -n

(smaug currently has the top score, edging out my 116 reviews with 137.)

If you want to reuse templates (instead of having to type them on the command line), you can save them as style files. Search the Internets to learn how to use them. You can even change your default style so the default output from hg log contains everything you'd ever want to know about a changeset!

Keeping it running

Many of the queries rely on data derived from multiple repositories and pushlog data that is external to the repository.

To get best results, you'll need to be running a monolithic/unified Mercurial repository. You can either assemble one locally with this extension by periodically pulling from the separate repos:

hg pull releases
hg pull integration

Or, you can pull from my personal unified repo.

You will also need to ensure the pushlog data is current. If you pull directly from the official repos, this will happen automatically. To be sure, run:

hg pushlogsync

Finally, you can force a repopulation of cached bug data by running:

hg buginfo --reset

Over time, I want all this to automagically work. Stay tuned.

Comments and future improvements

I implemented this feature to save myself from having to go troving through Bugzilla and repository history to answer questions and to obtain metrics. I can now answer many questions via simple Mercurial one-liners.

Custom revision set selectors and template keywords are a pretty nifty feature of Mercurial. They demonstrate how you can extend Mercurial to be aware of more than just tracking commits and files. As I've said before and will continue to say, the extensibility of Mercurial is really its killer feature, especially for organizations with well-defined processes (like Mozilla). The kind of extensibility I achieved with this extension with custom queries and formatting functions is just not possible with Git (at least not with the reference C implementation that the overwhelming majority of Git users use).

There are numerous improvements that can be made to the extension. Obviously more revision set selectors and template keywords can be added. The parsing routine to extract bugs and reviewers isn't the most robust in the world. I copied some existing Mozilla code. It does well at detecting string patters but doesn't cope well with extracting lists.

I'd also love to better integrate Mercurial with automation results so you can do things like expose a greenpush() selector and do things like hg up -r 'last(tree(inbound)) and greenpush()' (which of course could be exposed as lastgreen(inbound). Wouldn't that be cool! (This would be possible if we had better APIs for querying individual push results.) It would also be possible to have the Mercurial server expose this data as repository data so clients pull it automatically. That would prevent clients from all needing to query the same 3rd party services. Just a crazy thought.

Speed can be an issue. Calculating the release information ({firstnightly} etc) is currently slower than I'd like. This is mostly due to me using inefficient algorithms and not caching things where I should. Speed issues should be fixed in due time.

Please let me know if you run into any problems or have suggestions for improvements. If you want to implement your own revision set selectors or template keywords, it's easier than you think! I will happily accept patches. Keep in mind that Mercurial can integrate with 3rd party services. So if you want to supplement repository data with data from a HTTP+JSON web service, that's very doable. The sky is the limit.

Alternate Mercurial Server for Firefox Development

October 17, 2013 at 07:30 AM | categories: Mercurial, Mozilla

I have long opined about the sad state of Mercurial at Mozilla. The short version is Mozilla has failed to use Mercurial optimally, at least for Firefox development. It's easy to see why so many Mozillians are quick to discredit Mercurial when compared to Git!

I have a history attempting to address the deficiencies. Up to this point, I've been able to make things better through local tooling. But, for my next set of tricks, I reached an impasse with the Mercurial server at hg.mozilla.org. So, I stood up my own Mercurial server at hg.gregoryszorc.com/!

This server is running Mercurial 2.7 and has a few nice features the official Mercurial server at hg.mozilla.org does not.

The repositories

http://hg.gregoryszorc.com/gecko is a read-only unified Mercurial repository containing the commits for the major Firefox/Gecko repositories. If you look at its bookmarks, you'll see something special: the heads of all the separate Mercurial repos it is aggregating are being stored as bookmarks! (Bookmarks are effectively Git branches.) The tip of mozilla-central is at the bookmark central/default. The tip of Beta is at beta/default. You get the idea. Once you clone this repo, you can easily switch between project branches by running e.g. hg up central/default. When you pull the repo, you get changesets for all repos by connecting to one server, not several (this reduces load on Mozilla's servers and is faster for clients).

This repository shares the same changesets/SHA-1's as the official repositories. It just has everything under one roof. You can work out of this repository and push to the official repositories. Although, you may want to use the pushtree command from my custom extension to make your life easier (hg push with no arguments will attempt to push all changesets, which you definitely don't want when pushing to e.g. mozilla-central).

http://hg.gregoryszorc.com/gecko-collab is an offshoot of the gecko repo that you can push to. Changesets from the gecko repo are pulled into it automatically.

What makes the gecko-collab repository special is that it has obsolescence enabled. That is the core Mercurial feature enabling changeset evolution. More on that feature and why it is amazing in a future blog post. Stay tuned.

Cloning

If you would like to clone one of these unified repos, please do my paltry EC2 server a favor and bootstrap your clone from an existing clone. e.g. if you have a copy of mozilla-central sitting around but don't want my repo's changesets to pollute it, do the following:

hg clone mozilla-central gecko
cd gecko
hg pull http://hg.gregoryszorc.com/gecko

Or, if you are OK with your clone accumulating the extra changesets from all the project branches, just run:

hg pull http://hg.gregoryszorc.com/gecko

Don't forget to update the [paths] section in your .hg/hgrc file to point to hg.gregoryszorc.com! e.g.

[paths]
gecko = http://hg.gregoryszorc.com/gecko
collab = http://hg.gregoryszorc.com/gecko-collab

Setting up push support and SSH keys

If you would like to push to the gecko-collab repository, you'll need to give me your SSH public key. But don't give me your key - give an automated process your key!

Head on over to http://phabricator.gregoryszorc.com/ and log in (look for the Persona button). Once you've logged in, go to your settings by clicking the wrench icon in the top right. Then look for SSH Public Keys to add your key(s). If you can't find it, just go to http://phabricator.gregoryszorc.com/settings/panel/ssh/.

Once your SSH public key is added, it will take up to a minute for it to be added to my system. It's all automatic. You don't need to wait for any manual action.

To connect to my server over SSH, you'll need to log in as the hgssh user. e.g. in your hgrc file, add:

[paths]
gecko = ssh://hgssh@hg.gregoryszorc.com/gecko
collab = ssh://hgssh@hg.gregoryszorc.com/gecko-collab

Then, you should be able to pull and push over SSH!

Other Notes

This server is running on an EC2 instance that isn't as powerful as I'd like. Expect some operations to be slower than desired.

I don't guarantee an SLA for this service. It could go down at any moment. However, Mercurial being a distributed version control system, there should be little to no data loss assuming people pull frequently. I know I have a backup on all my machines now.

I'm running this server for two main reasons.

First, I want to demonstrate the utility of a unified Mercurial server for Firefox development in hopes we can run one officially. I've been running a unified repo locally for a few months and I have little doubt I'm more productive because of it. I want others to realize the awesomeness.

Second, I needed a server that supported changeset evolution so I could play around with it. I asked the powers at be to enable it on hg.mozilla.org and didn't get a response that met my timeline. So, I figured setting up my own server was easier.

Please let me know if you have any questions or issues with this server. I'd also love to hear whether people like the unified repo approach!

Mercurial setup wizard for Firefox development

July 29, 2013 at 05:45 PM | categories: Mercurial, Mozilla

I'm a big fan of tools that encourage and/or enforce the following of best practices and that help people become more productive.

One of the tools that Firefox developers interact with nearly daily is Mercurial. As I've observed from coworkers and from community contributors, many don't have Mercurial configured for optimal development. For first-time contributors, this can manifest in patch rejection - an experience that can be embarassing and demotivating. This is frustrating to me because most issues are easily identifiable and correctable. And, when addressed, everyone wins.

Anyway, I'm pleased to announce that there is now a configuration wizard in the Firefox source tree to help with configuring Mercurial. To run it, just type:

./mach mercurial-setup

Currently, it's aimed for first-time contributors. So, it's missing things that more seasoned developers rely on. But you need to start somewhere, right?

Currently, the tool isn't advertised anywhere other than mach help. Please run it and report issues in bug 794580 or file a new report. Once things have baked in, I'd like to add some kind of notification/tips system to mach where it will encourage you to do things like automatically run mach mercurial-setup. Until then, I recommend trying to remember to run mach mercurial-setup every few weeks to ensure your Mercurial environment is up to date and properly configured.

I'd like to thank Nick Alexander for sharing my enthusiasm for helping contributors and for taking the time to review this work.

« Previous Page -- Next Page »