Repository-Centric Development

July 24, 2014 at 08:23 PM | categories: Git, Mercurial, Mozilla | View Comments

I was editing a wiki page yesterday and I think I coined a new term which I'd like to enter the common nomenclature: repository-centric development. The term refers to development/version control workflows that place repositories - not patches - first.

When collaborating on version controlled code with modern tools like Git and Mercurial, you essentially have two choices on how to share version control data: patches or repositories.

Patches have been around since the dawn of version control. Everyone knows how they work: your version control system has a copy of the canonical data and it can export a view of a specific change into what's called a patch. A patch is essentially a diff with extra metadata.

When distributed version control systems came along, they brought with them an alternative to patch-centric development: repository-centric development. You could still exchange patches if you wanted, but distributed version control allowed you to pull changes directly from multiple repositories. You weren't limited to a single master server (that's what the distributed in distributed version control means). You also didn't have to go through an intermediate transport such as email to exchange patches: you communicate directly with a peer repository instance.

Repository-centric development eliminates the middle man required for patch exchange: instead of exchanging derived data, you exchange the actual data, speaking the repository's native language.

One advantage of repository-centric development is it eliminates the problem of patch non-uniformity. Patches come in many different flavors. You have plain diffs. You have diffs with metadata. You have Git style metadata. You have Mercurial style metadata. You can produce patches with various lines of context in the diff. There are different methods for handling binary content. There are different ways to express file adds, removals, and renames. It's all a hot mess. Any system that consumes patches needs to deal with the non-uniformity. Do you think this isn't a problem in the real world? Think again. If you are involved with an open source project that collects patches via email or by uploading patches to a bug tracker, have you ever seen someone accidentally upload a patch in the wrong format? That's patch non-uniformity. New contributors to Firefox do this all the time. I also see it in the Mercurial project. With repository-centric development, patches never enter the picture, so patch non-uniformity is a non-issue. (Don't confuse the superficial formatting of patches with the content, such as an incorrect commit message format.)

Another advantage of repository-centric development is it makes the act of exchanging data easier. Just have two repositories talk to each other. This used to be difficult, but hosting services like GitHub and Bitbucket make this easy. Contrast with patches, which require hooking your version control tool up to wherever those patches are located. The Linux Kernel, like so many other projects, uses email for contributing changes. So now Git, Mercurial, etc all fulfill Zawinski's law. This means your version control tool is talking to your inbox to send and receive code. Firefox development uses Bugzilla to hold patches as attachments. So now your version control tool needs to talk to your issue tracker. (Not the worst idea in the world I will concede.) While, yes, the tools around using email or uploading patches to issue trackers or whatever else you are using to exchange patches exist and can work pretty well, the grim reality is that these tools are all reinventing the wheel of repository exchange and are solving a problem that has already been solved by git push, git fetch, hg pull, hg push, etc. Personally, I would rather hg push to a remote and have tools like issue trackers and mailing lists pull directly from repositories. At least that way they have a direct line into the source of truth and are guaranteed a consistent output format.

Another area where direct exchange is huge is multi-patch commits (branches in Git parlance) or where commit data is fragmented. When pushing patches to email, you need to insert metadata saying which patch comes after which. Then the email import tool needs to reassemble things in the proper order (remember that the typical convention is one email per patch and email can be delivered out of order). Not the most difficult problem in the world to solve. But seriously, it's been solved already by git fetch and hg pull! Things are worse for Bugzilla. There is no bullet-proof way to order patches there. The convention at Mozilla is to add Part N strings to commit messages and have the Bugzilla import tool do a sort (I assume it does that). But what if you have a logical commit series spread across multiple bugs? How do you reassemble everything into a linear series of commits? You don't, sadly. Just today I wanted to apply a somewhat complicated series of patches to the Firefox build system I was asked to review so I could jump into a debugger and see what was going on so I could conduct a more thorough review. There were 4 or 5 patches spread over 3 or 4 bugs. Bugzilla and its patch-centric workflow prevented me from importing the patches. Fortunately, this patch series was pushed to Mozilla's Try server, so I could pull from there. But I haven't always been so fortunate. This limitation means developers have to make sacrifices such as writing fewer, larger patches (this makes code review harder) or involving unrelated parties in the same bug and/or review. In other words, deficient tools are imposing limited workflows. No bueno.

It is a fair criticism to say that not everyone can host a server or that permissions and authorization are hard. Although I think concerns about impact are overblown. If you are a small project, just create a GitHub or Bitbucket account. If you are a larger project, realize that people time is one of your largest expenses and invest in tools like proper and efficient repository hosting (often this can be GitHub) to reduce this waste and keep your developers happier and more efficient.

One of the clearest examples of repository-centric development is GitHub. There are no patches in GitHub. Instead, you git push and git fetch. Want to apply someone else's work? Just add a remote and git fetch! Contrast with first locating patches, hooking up Git to consume them (this part was always confusing to me - do you need to retroactively have them sent to your email inbox so you can import them from there), and finally actually importing them. Just give me a URL to a repository already. But the benefits of repository-centric development with GitHub don't stop at pushing and pulling. GitHub has built code review functionality into pushes. They call these pull requests. While I have significant issues with GitHub's implemention of pull requests (I need to blog about those some day), I can't deny the utility of the repository-centric workflow and all the benefits around it. Once you switch to GitHub and its repository-centric workflow, you more clearly see how lacking patch-centric development is and quickly lose your desire to go back to the 1990's state-of-the-art methods for software development.

I hope you now know what repository-centric development is and will join me in championing it over patch-based development.

Mozillians reading this will be very happy to learn that work is under way to shift Firefox's development workflow to a more repository-centric world. Stay tuned.

Read and Post Comments

New Repository for Mozilla Version Control Tools

February 05, 2014 at 07:15 PM | categories: Git, Mercurial, Mozilla | View Comments

Version control systems can be highly useful tools.

At Mozilla, we've made numerous improvements and customizations to our version control tools. We have custom hooks that run on the server. We have a custom skin for Mercurial's web interface. Mozillians have written a handful of Mercurial extensions to aid with common developer tasks, such as pushing to try, interacting with Bugzilla, making mq more useful, and more.

These have all come into existence in an organic manner, one after the other. Individuals have seen an itch and scratched it. Good for them. Good for Mozilla.

Unfortunately, the collection of amassed tools has become quite large. They have become difficult to discover and keep up to date. The consistency in quality and style between the tools varies. Each tool has separate processes for updating and changing.

I contacted the maintainers of the popular version control tools at Mozilla with a simple proposal: let's maintain all our tools under one repo. This would allow us to increase cohesion, share code, maintain a high quality bar, share best practices, etc. There were no major objections, so we now have a unified repository containing our version control tools!

Currently, we only have a few Mercurial extensions in there. A goal is to accumulate as much of the existing Mercurial infrastructure into that repository as possible. Client code. Server code. All of the code. I want developers to be able to install the same hooks on their clients as what's running on the server: why should your local repo let you commit something that the server will reject? I want developers to be able to reasonably reproduce Mozilla's canonical version control server configuration locally. That way, you can test things locally with a high confidence that your changes will work the same way on production. This allows deployments to move faster and with less friction.

The immediate emphasis will be on moving extensions into this repo and deprecating the old homes on user repositories. Over time, we'll move into consolidating server code and getting hg.mozilla.org and git.mozilla.org to use this repository. But that's a lower priority: the most important goal right now is to make it easier and friendlier for people to run productivity-enhancing tools.

So, if you see your Mercurial extensions alerting you that they've been moved to a new repository, now you know what's going on.

Read and Post Comments

Aggregating Version Control Info at Mozilla

January 21, 2014 at 10:50 AM | categories: Git, Mercurial, Mozilla, Python | View Comments

Over the winter break, I set out on an ambitious project to create a service to help developers and others manage the flury of patches going into Firefox. While the project is far from complete, I'm ready to unleash the first part of the project upon the world.

If you point your browsers to moztree.gregoryszorc.com, you'll hopefully see some documentation about what I've built. Source code is available and free, of course. Patches very welcome.

Essentially, I built a centralized indexing service for version control repositories with Mozilla's extra metadata thrown in. I tell it what repositories to mirror, and it clones everything, fetches data such as the pushlog and Git SHA-1 mappings, and stores everything in a central database. It then exposes this aggregated data through world-readable web services.

Currently, I have the service indexing the popular project branches for Firefox (central, aurora, beta, release, esr, b2g, inbound, fx-team, try, etc). You can view the full list via the web service. As a bonus, I'm also serving these repositories via hg.gregoryszorc.com. My server appears to be significantly faster than hg.mozilla.org. If you want to use it for your daily needs, go for it. I make no SLA guarantees, however.

I'm also using this service as an opportunity to experiment with alternate forms of Mercurial hosting. I have mirrors of mozilla-central and the try repository with generaldelta and lz4 compression enabled. I may blog about what those are eventually. The teaser is that they can make Mercurial perform a lot faster under some conditions. I'm also using ZFS under the hood to manage repositories. Each repository is a ZFS filesystem. This means I can create repository copies on the server (user repositories anyone?) at a nearly free cost. Contrast this to the traditional method of full clones, which take lots of time, memory, CPU, and storage.

Anyway, some things you can do with the existing web service:

  • Obtain metadata about Mercurial changesets. Example.
  • Look up metadata about Git commits. Example.
  • Obtain a SPORE descriptor describing the web service endpoints. This allows you to auto-generate clients from descriptors. Yay!

Obviously, that's not a lot. But adding new endpoints is relatively straightforward. See the source. It's literally as easy as defining a URL mapping and writing a database query.

The performance is also not the best. I just haven't put in the effort to tune things yet. All of the querying hits the database, not Mercurial. So, making things faster should merely be a matter of database and hosting optimization. Patches welcome!

Some ideas that I haven't had time to implement yet:

  • Return changests in a specific repository
  • Return recently pushed changesets
  • Return pushes for a given user
  • Return commits for a given author
  • Return commits referencing a given bug
  • Obtain TBPL URLs for pushes with changeset
  • Integrate bugzilla metadata

Once those are in place, I foresee this service powering a number of dashboards. Patches welcome.

Again, this service is only the tip of what's possible. There's a lot that could be built on this service. I have ideas. Others have ideas.

The project includes a Vagrant file and Puppet manifests for provisioning the server. It's a one-liner to get a development environment up and running. It should be really easy to contribute to this project. Patches welcome.

Read and Post Comments

Importance of Hosting Your Version Control Server

November 13, 2013 at 09:25 AM | categories: Git, Mercurial, Mozilla | View Comments

The subject of where to host version control repositories comes up a lot at Mozilla. It takes many forms:

  • We should move the Firefox repository to GitHub
  • I should be allowed to commit to GitHub
  • I want the canonical repository to be hosted by Bitbucket

When Firefox development is concerned, Release Engineerings puts down their foot and insists the canonical repository be hosted by Mozilla, under a Mozilla hostname. When that's not possible, they set up a mirror on Mozilla infrastructure.

I think a recent issue with the Jenkins project demonstrates why hosting your own version control server is important. The gist is someone force pushed to a bunch of repos hosted on GitHub. They needed to involve GitHub support to recover from the issue. While it appears they largely recovered (and GitHub support deserves kudos - I don't want to take away from their excellence), this problem would have been avoided or the response time significantly decreased if the Jenkins people had direct control over the Git server: they either could have installed a custom hook that would have prevented the pushes or had access to the reflog so they could have easily seen the last pushed revision and easily forced pushed back to it. GitHub doesn't have a mechanism for defining pre-* hooks, doesn't allow defining custom hooks (a security and performance issue for them), and doesn't expose the reflog data.

Until repository hosting services expose full repository data (such as reflogs) and allow you to define custom hooks, accidents like these will happen and the recovery time will be longer than if you hosted the repo yourself.

It's possible repository hosting services like GitHub and Bitbucket will expose these features or provide a means to quickly recover. If so, kudos to them. But larger, more advanced projects will likely employ custom hooks and considering custom hooks are a massive security and performance issue for any hosted service provider, I'm not going to hold my breath this particular feature is rolled out any time soon. This is unfortunate, as it makes projects seemingly choose between low risk/low convenience and GitHub's vibrant developer community.

Read and Post Comments

Thoughts on Mercurial (and Git)

May 12, 2013 at 12:00 PM | categories: Mozilla, Mercurial, Git | View Comments

My first experience with Mercurial (Firefox development) was very unpleasant. Coming from Git, I thought Mercurial was slow and perhaps even more awkward to use than Git. I frequently encountered repository corruption that required me to reclone. I thought the concept of a patch queue was silly compared to Git branches. It was all extremely frustrating and I dare say a hinderance to my productivity. It didn't help that I was surrounded by a bunch of people who had previous experience with Git and opined about every minute difference.

Two years later and I'm on much better terms with Mercurial. I initially thought it might be Stockholm Syndrome, but after reflection I can point at specific changes and enlightenments that have reshaped my opinions.

Newer versions of Mercurial are much better

I first started using Mercurial in the 1.8 days and thought it was horrible. However, modern releases are much, much better. I've noticed a steady improvement in the quality and speed of Mercurial in the last few years.

If you aren't running 2.5 or later (Mercurial 2.6 was released earlier this month), you should take the time to upgrade today. When you upgrade, you should of course read the changelog and upgrade notes so you can make the most of the new features.

Proper configuration is key

For my workflow, the default configuration of Mercurial out of the box is... far from optimal. There are a number of basic changes that need to be made to satisfy my expectations for a version control tool.

I used to think this was a shortcoming with Mercurial: why not ship a powerful and useful environment out of the box? But, after talking to a Mercurial core contributor, this is mostly by design. Apparently a principle of the Mercurial project is that the CLI tool (hg) should be simple by default and should minimize foot guns. They view actions like rebasing and patch queues as advanced and thus don't have them enabled by default. Seasoned developers may scoff at this. But, I see where Mercurial is coming from. I only need to refer everyone to her first experience with Git as an example of what happens when you don't aim for simplicity. (I've never met a Git user who didn't think it overly complicated at first.)

Anyway, to get the most out of Mercurial, it is essential to configure it to your liking, much like you install plugins or extensions in your code editor.

Every person running Mercurial should go to http://mercurial.selenic.com/wiki/UsingExtensions and take the time to find extensions that will make your life better. You should also run hg help hgrc to view all the configuration options. There is a mountain of productivity wins waiting to be realized.

For reference, my ~/.hgrc. Worth noting are some of the built-in externsions I've enabled:

  • color - Colorize terminal output. Clear UX win.
  • histedit - Provides git rebase --interactive behavior.
  • pager - Feed command output into a pager (like less). Clear UX win.
  • progress - Draw progress bars on long-running operations. Clear UX win.
  • rebase - Ability to easily rebase patches on top of other heads. This is a basic feature of patch management.
  • transplant - Easily move patches between repositories, branches, etc.

If I were on Linux, I'd also use the inotify extension, which installs filesystem watchers so operations like hg status are instantaneous.

In addition to the built-in extensions, there are a number of 3rd party extensions that improve my Mozilla workflow:

  • mqext - Automatically commit to your patch queue when you qref, etc. This is a lifesaver. If that's not enough, it suggests reviewers for your patch, suggests a bug component, and let's you find bugs touching the files you are touching.
  • trychooser - Easily push changes to Mozilla's Try infrastructure.
  • qimportbz - Easily import patches from Bugzilla.
  • bzexport - Easily export patches to Bugzilla.

I'm amazed more developers don't use these simple productivity wins. Could it be that people simply don't realize they are available?

Mozilla has a bug tracking easier configuration of the user's Mercurial environment. My hope is one day people simply run a single command and get a Mozilla-optimized Mercurial environment that just works. Along the same vein, if your extensions are out of date, it prompts you to update them. This is one of the benefits of a unified developer tool like mach: you can put these checks in one place and everyone can reap the benefits easily.

Mercurial is extensible

The major differentiator from almost every other version control system (especially Git) is the ease and degree to which Mercurial can be extended and contorted. If you take anything away from this post it should be that Mercurial is a flexible and agile tool.

If you want to change the behavior of a built-in command, you can write an extension that monkeypatches that command. If you want to write a new command, you can of course do that easily. You can have extensions interact with one another - all natively. You can even override the wire protocol to provide new capabilities to extend how peers communicate with one another. You can leverage this to transfer additional metadata or data types. This has nearly infinite potential. If that's not enough, it's possible to create a new branching/development primitive through just an extension alone! If you want to invent Git-style branches with Mercurial, you could do that! It may require client and server support, but it's possible.

Mercurial accomplishes this by being written (mostly) in Python (as opposed to C) and by having a clear API on which extensions can be built. Writing extensions in Python is huge. You can easily drop into the debugger to learn the API and your write-test loop is much smaller.

By contrast, most other version control systems (including Git) require you to parse output of commands (this is the UNIX piping principle). Mercurial supports this too, but the native Python API is so much more powerful. Instead of parsing output, you can just read the raw values from a Python data structure. Yes please.

Since I imagine a lot of people at Mozilla will be reading this, here are some ways Mozilla could leverage the extensibility of Mercurial:

  • Command to create try pushes (it exists - see above).
  • Record who pushed what when (we have this - it's called the pushlog).
  • Command to land patches. If inbound1 is closed, automatically rebase on inbound2. etc. This could even be monkeypatched into hg push so pushes to inbound are automatically intercepted and magic ensues.
  • Record the automation success/fail status against individual revisions and integrate with commands (e.g. only pull up to the most recent stable changeset).
  • Command to create a review request for a patch or patch queue.
  • Command to assist with reviews. Perhaps a reviewer wants to make minor changes. Mercurial could download and apply the patch(es), wait for your changes, then reupload to Bugzilla (or the review tool) automatically.
  • Annotating commits or pushes with automation info (which jobs to run, etc).
  • Find Bugzilla component for patch (it exists - see above).
  • Expose custom protocol for configuring automation settings for a repository or a head. e.g. clients (with access) could reconfigure PGO scheduling, coalescing, etc without having to involve RelEng - useful for twigs and lesser used repositories.
  • So much more.

Essentially, Mercurial itself could become the CLI tool code development centers around. Whether that is a good idea is up for debate. But, it can. And that says a lot about the flexibility of Mercurial.

Future potential of Mercurial

When you consider the previous three points, you arrive at a new one: Mercurial has a ton of future potential. The fact that extensions can evolve vanilla Mercurial into something that resembles Mercurial in name only is a testament to this.

When I sat down with a Mercurial core contributor, they reinforced this. To them, Mercurial is a core library with a limited set of user-facing commands forming the stable API. Since core features (like storage) are internal APIs (not public commands - like Git), this means they aren't bound to backwards compatibility and can refactor internals as needed and evolve over time without breaking the world. That is a terrific luxury.

An example of this future potential is changeset evolution. If you don't know what that is, you should because it's awesome. One of the things they figured out is how to propagate rebasing between clones!

Comparing to Git

Two years ago I would have said I would never opt to use Mercurial over Git. I cannot say that today.

I do believe Git still has the advantage over Mercurial in a few areas:

  • Branch management. Mercurial branches are a non-starter for light-weigh work. Mercurial bookmarks are kinda-sorta like Git branches, but not quite. I really like aspects of Git branches. Hopefully changeset evolution will cover the remaining gaps and more.
  • Patch conflict management. Git seems to do a better job of resolving patch conflicts. But, I think this is mostly due to Mercurial's patch queue extension not using the same merge code as built-in commands (this is a fixable problem).
  • Developer mind share and GitHub. The GitHub ecosystem makes up for many of Git's shortcomings. Bitbucket isn't the same.

However, I believe Mercurial has the upper hand for:

  • Command line friendliness. Git's command line syntax is notoriously awful and the concepts can be difficult to master.
  • Extensibility. It's so easy to program custom workflows and commands with Mercurial. If you want to hack your version control system, Mercurial wins hands down. Where Mercurial embraces extensibility, I couldn't even find a page listing all the useful Git extensions!
  • Open source culture. Every time I've popped into the Mercurial IRC channel I've had a good experience. I get a response quickly and without snark. Git by contrast, well, let's just say I'd rather be affiliated with the Mercurial crowd.
  • Future potential. Git is a content addressable key-value store with a version control system bolted on top. Mercurial is designed to be a version control system. Furthermore, Mercurial's code base is much easier to hack on than Git's. While Git has largely maintained feature parity in the last few years, Mercurial has grown new features. I see Mercurial evolving faster than Git and in ways Git cannot.

It's worth calling out the major detractors for each.

I think Git's major weakness is its lack of extensibility and inability to evolve (at least currently). Git will need to grow a better extensibility model with better abstractions to compete with Mercurial on new features. Or, the Git community will need to be receptive to experimental features living in the core. All of this will require some major API breakage. Unfortunately, I see little evidence this will occur. I'm unable to find a vision document for the future of Git, a branch with major new features, or interesting threads on the mailing list. I tried to ask in their IRC channel and got crickets.

I think Mercurial's greatest weakness is lack of developer mindshare. Git and GitHub are where it's at. This is huge, especially for projects wanting collaboration.

Of all those points, I want to stress the extensibility and future potential of Mercurial. If hacking your tools to maximize potential and awesomeness is your game, Mercurial wins. End of debate. However, if you don't want to harness these advantages, then I think Git and Mercurial are mostly on equal footing. But given the rate of development in the Mercurial project and relative stagnation of Git (I can't name a major new Git feature in years), I wouldn't be surprised if Mercurial's feature set obviously overtakes Git's in the next year or two. Mind share will of course take longer and will likely depend on what hosting sites like GitHub and Bitbucket do (I wouldn't be surprised if GitHub rebranded as CodeHub or something some day). Time will tell.

Extending case study

I have removed the case study that appeared in the original article because as Mike Hommey observed in the comments, it wasn't a totally accurate comparison. I don't believe the case study significantly added much to the post, so I likely won't write a new one.

Conclusion

From where I started with Mercurial, I never thought I'd say this. But here it goes: I like Mercurial.

I started warming up when it became faster and more robust in recent versions in the last few years. When I learned about its flexibility and the fundamentals of the project and thus its future potential, I became a true fan.

It's easy to not like Mercurial if you are a new user coming from Git and are forced to use a new tool. But, once you take the time to properly configure it and appreciate it for what it is and what it can be, Mercurial is easy to like.

I think Mercurial and Git are both fine version control systems. I would happily use either one for a new project. If the social aspects of development (including encouraging new contributors) were important to me, I would likely select Git and GitHub. But, if I wanted something just for me or I was a large project looking for a system that scales and is flexible or was looking to the future, I'd go with Mercurial.

Mercurial is a rising star in the version control world. It's getting faster and better and enabling others to more easily innovate through powerful extensions. The future is bright for this tool.

Read and Post Comments

Next Page ยป