Aggregating Version Control Info at Mozilla

January 21, 2014 at 10:50 AM | categories: Git, Mercurial, Mozilla, Python | View Comments

Over the winter break, I set out on an ambitious project to create a service to help developers and others manage the flury of patches going into Firefox. While the project is far from complete, I'm ready to unleash the first part of the project upon the world.

If you point your browsers to moztree.gregoryszorc.com, you'll hopefully see some documentation about what I've built. Source code is available and free, of course. Patches very welcome.

Essentially, I built a centralized indexing service for version control repositories with Mozilla's extra metadata thrown in. I tell it what repositories to mirror, and it clones everything, fetches data such as the pushlog and Git SHA-1 mappings, and stores everything in a central database. It then exposes this aggregated data through world-readable web services.

Currently, I have the service indexing the popular project branches for Firefox (central, aurora, beta, release, esr, b2g, inbound, fx-team, try, etc). You can view the full list via the web service. As a bonus, I'm also serving these repositories via hg.gregoryszorc.com. My server appears to be significantly faster than hg.mozilla.org. If you want to use it for your daily needs, go for it. I make no SLA guarantees, however.

I'm also using this service as an opportunity to experiment with alternate forms of Mercurial hosting. I have mirrors of mozilla-central and the try repository with generaldelta and lz4 compression enabled. I may blog about what those are eventually. The teaser is that they can make Mercurial perform a lot faster under some conditions. I'm also using ZFS under the hood to manage repositories. Each repository is a ZFS filesystem. This means I can create repository copies on the server (user repositories anyone?) at a nearly free cost. Contrast this to the traditional method of full clones, which take lots of time, memory, CPU, and storage.

Anyway, some things you can do with the existing web service:

  • Obtain metadata about Mercurial changesets. Example.
  • Look up metadata about Git commits. Example.
  • Obtain a SPORE descriptor describing the web service endpoints. This allows you to auto-generate clients from descriptors. Yay!

Obviously, that's not a lot. But adding new endpoints is relatively straightforward. See the source. It's literally as easy as defining a URL mapping and writing a database query.

The performance is also not the best. I just haven't put in the effort to tune things yet. All of the querying hits the database, not Mercurial. So, making things faster should merely be a matter of database and hosting optimization. Patches welcome!

Some ideas that I haven't had time to implement yet:

  • Return changests in a specific repository
  • Return recently pushed changesets
  • Return pushes for a given user
  • Return commits for a given author
  • Return commits referencing a given bug
  • Obtain TBPL URLs for pushes with changeset
  • Integrate bugzilla metadata

Once those are in place, I foresee this service powering a number of dashboards. Patches welcome.

Again, this service is only the tip of what's possible. There's a lot that could be built on this service. I have ideas. Others have ideas.

The project includes a Vagrant file and Puppet manifests for provisioning the server. It's a one-liner to get a development environment up and running. It should be really easy to contribute to this project. Patches welcome.

Read and Post Comments

Things Mozilla Could Do with Mercurial

January 17, 2014 at 03:00 PM | categories: Mercurial, Mozilla | View Comments

As I've written before, Mercurial is a highly extensible version control system. You can do things with Mercurial you can't do in other version control systems.

In this post, I'll outline some of the cool things Mozilla could do with Mercurial. But first, I want to outline some features of Mercurial that many don't know exist.

pushkey and listkeys command

The Mercurial wire protocol (how two Mercurial peer repositories talk to each other over a network) contains two very useful commands: pushkey and listkeys. These commands allow the storage and listing of arbitrary key-value pair metadata in the repository.

This generic storage mechanism is how Mercurial stores and synchronizes bookmarks and phases information, for example.

By implementing a Mercurial extension, you can have Mercurial store key-value data for any arbitrary data namespace. You can then write a simple extension that synchronizes this data as part of the push and pull operations.

Extending the wire protocol

For cases where you want to transmit arbitrary data to/from Mercurial servers and where the pushkey framework isn't robust enough, it's possible to implement custom commands in the Mercurial wire protocol.

A server installs an extension making the commands available. A client installs an extension knowing how to use the commands. Arbitrary data is transferred or custom actions are performed.

When it comes to custom commands, the sky is really the limit. You could do pretty much anything from transfer extra data types (this is how the largefiles extension works) to writing commands that interact with remote agents.

Custom revision set queries and templating

Mercurial offers a rich framework for querying repository data and for formatting data. The querying is called revision sets and the later templates. If you are unfamiliar with the feature, I encourage you to run hg help revset and hg help templates right now to discover the awesomeness.

As I've demonstrated, you can do some very nifty things with custom revision sets and templating!

The possibilities

Now that you know some ways Mercurial can be extended, let's talk about some cool use cases at Mozilla. I want to be clear that I'm not advocating we do these things, just that they are possible and maybe they are a little cool.

Storing pushlog data

Mozilla records information about who pushed what changesets where and when in what's called the pushlog. The pushlog data is currently stored in a SQLite database inside the repository on the server. The data is made available via a HTTP+JSON API.

We could go a step further and make the pushlog data available via listkeys so Mercurial clients could download pushlog data with the same channel used to pull core repository data. (Currently, we have to open a new TCP connection and talk to the HTTP+JSON API.) This would make fetching of pushlog data faster, especially for clients on slow connections.

I concede this is an iterative improvement and adds little value beyond what we currently have. But if I were designing pushlog storage from scratch, this is how I'd do it.

Storing a changeset's automation results

The pushkey framework could be used to mark specific changesets as passing automation. When release automation or a sheriff determines that a changeset/push is green, they could issue an authenticated pushkey command to the Mercurial server stating such. Clients could then easily obtain a list of all changesets that are green.

Why stop there? We could also record automation failures in Mercurial as well. Depending on how complex this gets, we may outgrow pushkey and require a separate command. But that's all doable.

Anyway, clients could download automation results for a specific changeset as part of the repository data. The same extension that pulls down that data could also monkeypatch the bisection algorithm used by hg bisect to automatically skip over changesets that didn't pass automation. You'll never bisect a backed out changeset again!

If this automation data were stored on the Try repository, the autoland tool would just need to query the Mercurial repo to see which changesets are candidates for merging into mainline - there would be no need for a separate database and web service!

Marking a changeset as reviewed

Currently, Mozilla's review procedure is very patch and Bugzilla centric. But it doesn't have to be that way. (I argue it shouldn't be that way.)

Imagine a world where code review is initiated by pushing changesets to a special server, kind of like how Try magically turns pushes into automation jobs.

In this world, reviews could be initiated by issuing a pushkey or custom command to the server. This could even initiate server-side static analysis that would hold off publishing the review unless static analysis checks passed!

Granted review could be recorded by having someone issue a pushkey command to mark a changeset as reviewed. The channel to the Mercurial server is authenticated via SSH, so the user behind the current SSH key is the reviewer. The Mercurial server could store this username as part of the repository data. The autoland tool could then pull down the reviewer data and only consider changesets that have an appropriate reviewer.

It might also be possible to integrate crypto magic into this workflow so reviewers could digitally sign a changeset as reviewed. This could help with the verification of the Firefox source code that Brendan Eich recently outlined.

Like the automation data above, no separate database would be required: all data would be part of the repository. All you need to build is a Mercurial extension.

Encouraging best practices

Mozillians have written a handful of useful Mercurial extensions to help people become more productive. We have also noticed that many developers are still (unknowingly?) running old, slow, and buggy Mercurial releases. We want people to have the best experience possible. How do we do that?

One idea is to install an extension on the server that strongly recommands or even requires users follow best practices (minimal HG version, installed extensions, etc).

I have developed a proof-of-concept that does just this.

Rich querying of metadata

When you start putting more metadata into Mercurial (or at least write Mercurial extensions to aggregate this metadata), all kinds of interesting query opportunities open up. Using revsets and templates, you can do an awful lot to use Mercurial as a database of sorts to extract useful reports.

I dare say reports like John O'duinn's Monthly Infrastructure Load posts could be completely derived from Mercurial. I've demonstrated this ability previously. That's only the tip of the iceburg.

Summary

We could enable a lot of new and useful scenarios by extending Mercurial. We could accomplish this without introducing new services and tools into our already complicated infrastructure and workflows.

The possibilities I've suggested are by no means exhaustive. I encourage others to dream up new and interesting ideas. Who knows, maybe some of them may actually happen.

Read and Post Comments

mach now lives in mozilla-central

January 09, 2014 at 10:55 AM | categories: Mozilla, mach | View Comments

mach -- the generic command line interface framework that is behind the mach tool used to build Firefox -- now has its canonical home in mozilla-central, the canonical repository for Firefox. The previous home has been updated to reflect the change.

mach will continue to be released on PyPI and installable via pip install mach.

I made the change because keeping multiple repositories in sync wasn't something I wanted to spend time doing. Furthermore, Mozillians have been contributing a steady stream of improvements to the mach core recently and it makes sense to leverage Mozilla's familiar infrastructure for patch contribution.

This decision may be revisited in the future. Time will tell.

Read and Post Comments

Why do Projects Support old Python Releases

January 08, 2014 at 05:00 PM | categories: Python, Mozilla | View Comments

I see a number of open source projects supporting old versions of Python. Mercurial supports 2.4, for example. I have to ask: why do projects continue to support old Python releases?

Consider:

  • Python 2.4 was last released on December 19, 2008 and there will be no more releases of Python 2.4.
  • Python 2.5 was last released on May 26, 2011 and there will be no more releases of Python 2.5.
  • Python 2.6 was last released on October 29, 2013 and there will be no more releases of Python 2.6.
  • Everything before Python 2.7 is end-of-lifed
  • Python 2.7 continues to see periodic releases, but mostly for bug fixes.
  • Practically all of the work on CPython is happening in the 3.3 and 3.4 branches. Other implementations continue to support 2.7.
  • Python 2.7 has been available since July 2010
  • Python 2.7 provides some very compelling language features over earlier releases that developers want to use
  • It's much easier to write dual compatible 2/3 Python when 2.7 is the only 2.x release considered.
  • Python 2.7 can be installed in userland relatively easily (see projects like pyenv).

Given these facts, I'm not sure why projects insist on supporting old and end-of-lifed Python releases.

I think maintainers of Python projects should seriously consider dropping support for Python 2.6 and below. Are there really that many people on systems that don't have Python 2.7 easily available? Why are we Python developers inflicting so much pain on ourselves to support antiquated Python releases?

As a data point, I successfully transitioned Firefox's build system from requiring Python 2.5+ to 2.7.3+ and it was relatively pain free. Sure, a few people complained. But as far as I know, not very many new developers are coming in and complaining about the requirement. If we can do it with a few thousand developers, I'm guessing your project can as well.

Update 2014-01-09 16:05:00 PST: This post is being discussed on Slashdot. A lot of the comments talk about Python 3. Python 3 is its own set of considerations. The intended focus of this post is strictly about dropping support for Python 2.6 and below. Python 3 is related in that porting Python 2.x to Python 3 is much easier the higher the Python 2.x version. This especially holds true when you want to write Python that works simultaneously in both 2.x and 3.x.

Read and Post Comments

On Multiple Patches in Bugs

January 07, 2014 at 04:40 PM | categories: Mozilla | View Comments

There is a common practice at Mozilla for developing patches with multiple parts. Nothing wrong with that. In fact, I think it's a best practice:

  • Smaller, self-contained patches are much easier to grok and review than larger patches.
  • Smaller patches can land as soon as they are reviewed. Larger patches tend to linger and get bit rotted.
  • Smaller patches contribute to a culture of being fast and nimble, not slow and lethargic. This helps with developer confidence, community contributions, etc.

There are some downsides to multiple, smaller patches:

  • The bigger picture is harder to understand until all parts of a logical patch series are shared. (This can be alleviated through commit messages or reviewer notes documenting future intentions. And of course reviewers can delay review until they are comfortable.)
  • There is more overhead to maintain the patches (rebasing, etc). IMO the solutions provided by Mercurial and Git are sufficient.
  • The process overhead for dealing with multiple patches and/or bugs can be non-trivial. (I would like to think good tooling coupled with continuous revisiting of policy decisions is sufficient to counteract this.)

Anyway, the prevailing practice at Mozilla seems to be that multiple patches related to the same logical change are attached to the same bug. I would like to challenge the effectiveness of this practice.

Given:

  • An individual commit to Firefox should be standalone and should not rely on future commits to unbust it (i.e. bisects to any commit should be safe).
  • Bugzilla has no good mechanism to isolate review comments from multiple attachments on the same bug, making deciphering simultaneous reviews on multiple attachments difficult and frustrating. This leads to review comments inevitably falling through the cracks and the quality of code suffering.
  • Reiterating the last point because it's important.

I therefore argue that attaching multiple reviews to a single Bugzilla bug is not a best practice and it should be avoided if possible. If that means filing separate bugs for each patch, so be it. That process can be automated. Tools like bzexport already do it. Alternatively (and even better IMO), we ditch Bugzilla's code review interface (Splinter) and integrate something like ReviewBoard instead. We limit Bugzilla to tracking, high-level discussion, and metadata aggregation. Code review happens elsewhere, without all the clutter and chaos that Bugzilla brings to the table.

Thoughts?

Read and Post Comments

« Previous Page -- Next Page »