Gregory Szorc's Digital Home

Modern Mercurial Documentation for Mozillians

January 15, 2015 at 02:45 PM | categories: Mercurial, Mozilla

Mozilla's Mercurial documentation has historically been pretty bad. The documentation on MDN (which I refuse to link to) is horribly disjointed and contains a lot of outdated recommendations. I've made attempts to burn some of it to the ground, but it is just too overwhelming.

I've been casually creating my own Mercurial documentation tailored for Mozillians. It's called Mercurial for Mozillians.

It started as a way to document extensions inside the version-control-tools repository. But, it has since evolved to cover other topics, like how to install Mercurial, how to develop using bookmarks, and how to interact with a unified Firefox repository. The documentation is nowhere near complete. But it already has some very useful content beyond what MDN offers.

I'm not crazy about the idea of having generic Mercurial documentation on a Mozilla domain (this should be part of the official Mercurial documentation). Nor am I crazy about moving content off MDN. I'm sure content will move to its appropriate location later. Until then, enjoy some curated Mercurial documentation!

If you would like to contribute to Mercurial for Mozillians, read the docs.

Major bzexport Updates

January 13, 2015 at 03:55 PM | categories: Mercurial, Mozilla

The bzexport Mercurial extension - an extension that enables you to easily create new Bugzilla bugs and upload patches to Bugzilla for review - just received some major updates.

First, we now have automated test coverage of bzexport! This is built on top of the version control test harness I previously blogged about. As part of the tests, we start Docker containers that run the same code that's running on bugzilla.mozilla.org, so interactions with Bugzilla are properly tested. This is much, much better than mocking HTTP requests and responses because if Bugzilla changes, our tests will detect it. Yay continuous integration.

Second, bzexport now uses Bugzilla' REST API instead of the legacy bzAPI endpoint for all but 1 HTTP request. This should make BMO maintainers very happy.

Third and finally, bzexport now uses shared code for obtaining Bugzilla credentials. The behavior is documented, of course. Behavior is not backwards compatible. If you were using some old configuration values, you will now see warnings when running bzexport. These warnings are actionable, so I shouldn't need to describe them here.

Please obtain the new code by pulling the version-control-tools repository. Or, if you have a Firefox clone, run mach mercurial-setup.

If you find any regressions, file a bug in the Developers Services :: Mercurial: bzexport component and have it depend on bug 1033394.

Thanks go out to Steve Fink, Ed Morley, and Ted Mielczarek for looking at the code.

Utilizing GitHub for Firefox Development

January 12, 2015 at 11:00 AM | categories: Mozilla

Recent posts on my blog have talked about the difficulty submitting changes to Firefox and the rise of GitHub. I encourage you to stop reading this post and read them now if you haven't already.

As I was looking at the list of process debt contributing to Firefox, one thought kept creeping into my mind: how many of these items go away if we utilize GitHub?

As I mentioned in these two posts, GitHub's popularity has essentially commoditized many items on this list, especially the parts around source control and submitting patches for consideration (just fork and open a pull request). It seems that everyone these days is on GitHub and asking people to use GitHub to send changes to Firefox would almost certainly be well-received by contributors and even Mozilla staff.

Here's what I think: Mozilla should utilize GitHub for Firefox development.

The verb in that sentence is important: I purposefully said utilize and not something like switch to. To switch or not to switch to GitHub for Firefox development is a false dillemma and a logical fallacy. So is the question about switching to Git. As I explain later, there is a spectrum of options available and switching or not switching are on the extremes. Utilize doesn't preclude a binary switch or don't switch outcome, but it does keep an array of options on the table for consideration.

So, how should Mozilla utilize GitHub for Firefox development?

I think that insisting people establish Bugzilla accounts and upload patches to Bugzilla/bugs is an anitquated practice in desperate need of an overhaul. I think that if someone has written code, they should be able to essentially throw it over a wall to initiate the change process. They should be able to do this in a manner that incurs little to no process debt. We, Mozilla, should be able to take only code and integrate it into Firefox, assuming a trusted person - a module owner or peer - agrees and grants review. GitHub pull requests would facilitate a lesser-involved code contribution mechanism.

Another benefit of GitHub is that the web interface goes further than just code submission: they also have facilities for editing files. It's possible to edit a file in someone else's repository and create a pull request direct from the web interface! My post on process debt began by comparing the process of edit a wiki versus the current Firefox change process. GitHub's web-based editing essentially reduces the gap to cosmetic differences. GitHub's ease of contributing purely via the browser would open the door to more contribution for lesser-involved changes (sometimes referred to as good first bugs).

To state it explicitly, I support the use of GitHub pull requests for submitting changes to Firefox.

Now, there are some concerns and challenges about doing this. These include:

Fragmentation of code review and tracking could be problematic for Mozilla staff and other highly-active individuals.
GitHub can lose some parts of code review after rebasing and force pushes. Edit: Comments below indicate this is no longer a problem. Great!
You can only assign 1 reviewer per pull request.
GitHub sends an email/notification per review comment. This can be extremely annoying for some mail clients.
GitHub doesn't have a mechanism for dealing with security bugs.
Data sovereignty concerns (all data hosted on GitHub and subject to their data retention and access policies). Their API has query limits, which can limit machine use somewhat.
GitHub's model favors merges over rebases. Merges have a number of downsides, especially for large projects, and we strongly prefer to maintain our mostly-linear Firefox repository history.
GitHub's model favors appending commits rather than rewriting commits. (This is due to Git badness when you force push.) Mozilla favors a world where the final commit is what's reviewed and landed.
Git != Mercurial. Firefox is canonically stored in Mercurial. There is some impedence mismatch here. But nothing tools can't overcome.
The Merge Pull Request button is almost completely useless for Firefox's existing and future workflows. This partially invalidates other niceness the pure GitHub pull request workflow buys you.
Everything is lumped into a single bucket. We lose component-level subscriptions, making following harder.
Following the entire Firefox project on GitHub would produce an overwhelming fire hose of data.
We don't control GitHub and our options for extending it to extract even more process optimization are limited to what their APIs support and what they choose to implement.
We are at the whim of GitHub should they ever change a feature or API.
See also Servo's list of challenges.

Some of these issues can be overcome by tools and automation (which I would happily build in my capacity as a Developer Productivity Engineer at Mozilla). Others are more fundamental and seemingly would require buy-in and/or support from very senior Mozillians.

If Mozilla were to go forward utilizing GitHub pull requests for Firefox, I think it should be done incrementally rather than going all-in and attempting the entire GitHub workflow from the start. Although, this would mean diverging from GitHub's well-known practices, which would increase process debt from the GitHub base level. I don't like that. But I think it is a step in the right direction. Partial reduction in process debt is better than no reduction.

What do I mean by incrementally start accepting pull requests? Well, I don't think code review should initially be conducted on GitHub. When you look at the above list of concerns, many of them are around code review and interacting with pull requests. I think there's too much badness and risk there to make me comfortable about moving things to GitHub and giving GitHub exclusive domain over this important data, at least initially.

But if code review isn't conducted on GitHub, what's the value of a pull request? A pull request would be a well-defined and well-understood mechanism for importing data into Mozilla's systems. For example, submitting a pull request would automatically result in the creation of a review request on MozReview or even a bug/attachment/review on Bugzilla. This would allow people to send code to Mozilla easily while simultaneously allowing Mozillians to use familiar tools and processes without the aforementioned concerns with GitHub. That appears to be win-win.

Once we have a simple mechanism in place for turning pull requests into MozReview's review requests, we can start playing around with the syncing of code review activity between Mozilla and GitHub so review activity on either system results in cross-posting. There is precedent for this today. GerritHub has bi-directional syncing of code review activity between GitHub and Gerrit. Facebook also does something similar, syncing data between their internal Phabricator instance. Mozilla to GitHub sync would not be difficult: we control all those systems and I'm pretty confident in our ability to make a GitHub API call when a MozReview review request is updated (we already make Bugzilla API calls, so we know this works). GitHub to Mozilla is a bit more difficult. But, others have done it: I'm confident we can too.

I see bi-directional syncing of GitHub pull request / code review data between GitHub and Mozilla as achievable and relatively free from controversy. I think we should experiment with this sometime in 2015, probably in Q2, once MozReview is in better condition to host GitHub pull requests. Although, supporting Git in MozReview is on my Q1 goals list, so maybe I sneak this into Q1. Time will tell.

At this time, I believe using GitHub for the ingestion of proposed Firefox commits into existing Mozilla systems should be the limit of Firefox's GitHub presence, at least as far as day-to-day development goes. If other groups want to use GitHub more actively and they find a way to make that work while placating everyone who cares, power to them. But I think moving the pendulum any further toward GitHub - including things like making GitHub the exclusive location for code review data, utilizing GitHub Issues, and making Git[Hub] the canonical Firefox repository - remain difficult and controverial propositions. I believe each of these to be medium to high cost and risk with low to medium reward. I believe it would be wise to defer these questions until we have data about the value of GitHub pull requests for Firefox development.

To summarize, I propose using GitHub pull requests as an alternate, supported front end to the code contribution pathway. We would eliminate a lot of process debt for non-Mozillians by supporting a known process. Mozillians on the review and code submission side of the process shouldn't have to worry about change because, well, it shouldn't matter if a commit came from GitHub or elsewhere: it will all appear mostly the same. I'm not saying that we will never expand our utilization of GitHub for Firefox development beyond this scope. But I am saying that I don't think it would be prudent to do so today.

And that's how and why I think Mozilla should utilize GitHub for Firefox development.

Addendum

While I'm here, it's important to note that GitHub does not and will likely never solve many items from our list of Firefox contribution process debt. GitHub is not a build system nor a tool for running and analyzing code and tests. We still have many, many deficiencies and usability concerns here. We have historically under-invested in this area and utilizing GitHub in any capacity won't address these other issues. In addition, Firefox is a magnitude larger and more complex than the vast majority of projects on GitHub. We will always be burdened with the cost of our success - of coping with and maintaining the additional complexity associated with that scale. Firefox is at least the 0.1%. There's a good chance GitHub and/or many of the amazing services associated with it (like Travis-CI) won't scale to our needs. I'd love to be proved wrong here, but the reality is supporting a marginal use case like Firefox likely isn't at the top of goals for GitHub and related organizations unless it is in their business interest (read: financial interest) to do so. One can hope that as these companies try to capture more of the enterprise market via offerings such as GitHub Enterprise that they invest in the features and scalability that large projects and organizations like Mozilla and Firefox need.

Code First and the Rise of the DVCS and GitHub

January 10, 2015 at 12:35 PM | categories: Git, Mozilla

The ascendancy of GitHub has very little to do with its namesake tool, Git.

What GitHub did that was so radical for its time and the strategy that GitHub continues to execute so well on today is the approach of putting code first and enabling change to be a frictionless process.

In case you weren't around for the pre-GitHub days or don't remember, they were not pleasant. Tools around code management were a far cry from where they are today (I still argue the tools are pretty bad, but that's for another post). Centralized version control systems were prevalent (CVS and Subversion in open source, Perforce, ClearCase, Team Foundation Server, and others in the corporate world). Tools for looking at and querying code had horrible, ugly interfaces and came out of a previous era of web design and browser capabilities. It felt like a chore to do anything, including committing code. Yes, the world had awesome services like SourceForge, but they weren't the same as GitHub is today.

Before I get to my central thesis, I want to highlight some supporting reasons for GitHub's success. There were two developments in the second half of the 2000s the contributed to the success of GitHub: the rises of the distributed version control system (DVCS) and the modern web.

While distributed version control systems like Sun WorkShop TeamWare and BitKeeper existed earlier, it wasn't until the second half of the 2000s that DVCS systems took off. You can argue part of the reason for this was open source: my recollection is there wasn't a well-known DVCS available as free software before 2005. Speaking of 2005, it was a big year for DVCS projects: Git, Mercurial, and Bazaar all had initial releases. Suddenly, there were old-but-new ideas on how to do source control being exposed to new and willing-to-experiment audiences. DVCS were a critical leap from traditional version control because they (theoretically) impose less process and workflow limitations on users. With traditional version control, you needed to be online to commit, meaning you were managing patches, not commits, in your local development workflow. There were some forms of branching and merging, but they were a far cry from what is available today and were often too complex for mere mortals to use. As more and more people were exposed to distributed version control, they welcomed its less-restrictive and more powerful workflows. They realized that source control tools don't have to be so limiting. Distributed version control also promised all kinds of revamped workflows that could be harnessed. There were potential wins all around.

Around the same time that open source DVCS systems were emerging, web browsers were evolving from an application to render static pages to a platform for running web applications. Web sites using JavaScript to dynamically manipulate web page content (DHTML as it was known back then) were starting to hit their stride. I believe it was GMail that turned the most heads as to the full power of the modern web experience, with its novel-for-its-time extreme reliance on XMLHttpRequest for dynamically changing page content. People were realizing that powerful, desktop-like applications could be built for the web and could run everywhere.

GitHub launched in April 2008 standing on the shoulders of both the emerging interest in the Git content tracking tool and the capabilities of modern browsers.

I wasn't an early user of GitHub. My recollection is that GitHub was mostly a Rubyist's playground back then. I wasn't a Ruby programmer, so I had little reason to use GitHub in the early stages. But people did start using GitHub. And in the spirit of Ruby (on Rails), GitHub moved fast, or at least was projecting the notion that they were. While other services built on top of DVCS tools - like Bitbucket - did exist back then, GitHub seemed to have momentum associated with it. (Look at the archives for GitHub's and Bitbucket's respective blogs. GitHub has hundreds of blog entries; Bitbucket numbers in the dozens.) Developers everywhere up until this point had all been dealing with sub-optimal tools and workflows. Some of us realized it. Others hadn't. Many of those who did saw GitHub as a beacon of hope: we have all these new ideas and new potentials with distributed version control and here is a service under active development trying to figure out how to exploit that. Oh, and it's free for open source. Sign me up!

GitHub did capitalize on a market opportunity. They also capitalized on the value of marketing and the perception that they were moving fast and providing features that people - especially in open source - wanted. This captured the early adopters market. But I think what really set GitHub apart and led to the success they are enjoying today is their code first approach and their desire to make contribution easy, and even fun and sociable.

As developers, our job is to solve problems. We often do that by writing and changing code. And this often involves working as part of a team, or collaborating. To collaborate, we need tools. You eventually need some processes. And as I recently blogged, this can lead to process debt and inefficiencies associated with them.

Before GitHub, the process debt for contributing to other projects was high. You often had to subscribe to mailing lists in order to submit patches as emails. Or, you had to create an account on someone's bug tracker or code review tool before you could send patches. Then you had to figure out how to use these tools and any organization or project-specific extensions and workflows attached to them. It was quite involved and a lot could go wrong. Many projects and organizations (like Mozilla) still practice this traditional methology. Furthermore (and as I've written before), these traditional, single patch/commit-based tools often aren't effective at ensuring the desired output of high quality software.

Before GitHub solved process debt via commoditization of knowledge via market dominance, they took another approach: emphasizing code first development.

GitHub is all about the code. You load a project page and you see code. You may think a README with basic project information would be the first thing on a project page. But it isn't. Code, like data, is king.

Collaboration and contribution on GitHub revolve around the pull request. It's a way of saying, hey, I made a change, will you take it? There's nothing too novel in the concept of the pull request: it's fundamentally no different than sending emails with patches to a mailing list. But what is so special is GitHub's execution. Gone are the days of configuring and using one-off tools and processes. Instead, we have the friendly confines of a clean, friendly, and modern web experience. While GitHub is built upon the Git tool, you don't even need to use Git (a tool lampooned for its horrible usability and approachability) to contribute on GitHub! Instead, you can do everything from your browser. That warrants repeating: you don't need to leave your browser to contribute on GitHub. GitHub has essentially reduced process debt to edit a text document territory, and pretty much anybody who has used a computer can do that. This has enabled GitHub to dabble into non-code territory, such as its GitHub and Government initiative to foster community involvement in government. (GitHub is really a platform for easily seeing and changing any content or data. But, please, let me continue using code as a stand-in, since I want to focus on the developer audience.)

GitHub took an overly-complicated and fragmented world of varying contribution processes and made the new world revolve around code and a unified and simple process for change - the pull request.

Yes, there are other reasons for GitHub's success. You can make strong arguments that GitHub has capitalized on the social and psychological aspects of coding and human desire for success and happiness. I agree.

You can also argue GitHub succeeded because of Git. That statement is more or less technically accurate, but I don't think it is a sound argument. Git may have been the most feature complete open source DVCS at the time GitHub came into existence. But that doesn't mean there is something special about Git that no other DVCS has that makes GitHub popular. Had another tool been more feature complete or had the backing of a project as large as Linux at the time of GitHub's launch, we could very well be looking at a successful service built on something that isn't Git. Git had early market advantage and I argue its popularity today - a lot of it via GitHub - is largely a result of its early advantages over competing tools. And, I would go so far to say that when you consider the poor usability of Git and the pain that its users go through when first learning it, more accurate statements would be that GitHub succeeded in spite of Git and Git owes much of its success to GitHub.

When I look back at the rise of GitHub, I see a service that has succeeded by putting people first by allowing them to capitalize on more productive workflows and processes. They've done this by emphasizing code, not process, as the means for change. Organizations and projects should take note.

Firefox Contribution Process Debt

January 09, 2015 at 04:45 PM | categories: Mozilla

As I was playing around with source-derived documentation, I grasped the reality that any attempt to move documentation out of MDN's wiki and into something derived from source code (despite the argued technical and quality advantages) would likely be met with fierce opposition because the change process for Firefox is much more involved than edit a wiki.

This observation casts light on something important: the very act of contributing any change to Firefox is too damn hard.

I've always believed this statement to be true. I even like to think I'm one of the few people that has consistently tried to do something about it (inventing mach, overhauling the build system, bootstrap scripting, MozReview, etc). But I, like many of the seasoned Firefox developers, often lose sight of this grim reality. (I think it's fair to say that new contributors often complain about the development experience and as we grow accustomed to it, the complaint volume and intensity wanes).

But when you have the simplicity of editing a wiki page on MDN juxtaposed against the Firefox change contribution process of today, the inefficiency of the Firefox process is clearly seen.

The amount of knowledge required to change Firefox is obscenely absurd. Sure, some complex components will always be difficult to change. But I'm talking about any component: the base set of knowledge required to contribute any change to Firefox is vast. This is before we get into any domain-specific knowledge inside Firefox. I have always believed and will continue to believe that this is a grave institutional issue for Mozilla. It should go without saying that I believe this is an issue worth addressing. After all, any piece of knowledge required for contribution is essentially an obstacle to completion. Elimination of required knowledge lowers the barrier to contribution. This, in turn, allows increased contribution via more and faster change. This advances the quality and relevance of Firefox, which enables Mozilla to advance its Mission.

Seasoned contributors have probably internalized most of the knowledge required to contribute to Firefox. Here is a partial list to remind everyone of the sheer scope of it:

Before you do anything
- Am I able to contribute?
- Do I meet the minimum requirements (hardware, internet access, etc)?
- Do I need any accounts?
Source control
- What is source control?
- How do I install Mercurial/Git?
- How do I use Mercurial/Git?
- Where can I get the Firefox source code?
- How do I *optimally* acquire the Firefox source code?
- Are there any recommended configuration settings or extensions?
Building Firefox
- Do I even need to build Firefox?
- How do I build Firefox?
- What prerequisites need to be installed?
- How do I install prerequisites?
- How do I configure the Firefox build to be appropriate for my needs?
- What are mozconfigs?
- How do I get Firefox to build faster?
- What do I do after a build?
Changing code
- Is there IDE support?
- Where can I find macros and aliases to make things easier?
Testing
- How do I run tests?
- Which tests are relevant for a given change?
- What are all these different test types?
- How do I debug tests?
Try and Automation
- What is Try?
- How do I get an account?
- What is vouching and different levels of access?
- What is SSH?
- How do I configure SSH?
- When will my tests run?
- What is Tree Herder?
- What do all these letters and numbers mean?
- What are all these colors?
- What's an *intermittent failure*?
- How do I know if something is an *intermittent failure*?
- What amount of *intermittent failure* is acceptable?
- What do these logs mean?
- What's buildbot?
- What's mozharness?
Sending patch to Mozilla
- Do I need to sign a CLA?
- Where do I send patches?
- Do I need to get an account on Bugzilla?
- Do I need to file a bug?
- What component should I file a bug in?
- What format should patches be sent in?
- How should I format commit messages?
- How do I upload patches to Bugzilla?
- How does code review work?
  - What's the modules system?
  - What modules does my change map to?
  - Who are the possible reviewers?
  - How do I ask someone for review?
  - When can I expect review?
  - What does r+ vs r- vs f+ vs f- vs cancelling review all mean?
  - How do I submit changes to my initial patch?
  - What do I do after review?
Landing patches
- What repository should a patch land on?
- How do you rebase?
- What's a tree closure?
- What do I do after pushing?
- How do I know the result of the landing?

Holy #$%@, that's a lot of knowledge. Not only is this list incomplete, it's also not encompassing a lot of the domain-specific knowledge around the content being changed.

Every item on this list represents a point where a potential contributor could throw up their arms out of despair and walk away, giving their time and talents to another project. Every item on this list that takes 10 minutes instead of 5 could be the tipping point. For common actions, things that take 5 seconds instead of 1 could be the difference maker. This list thus represents reasons that people do not contribute to Firefox or contribute ineffectively (in the case of common contributors, like paid Mozilla staff).

I view items on this list as process debt. Process debt is a term I'm coining (perhaps someone has beat me to it - I'm writing this on a plane without Internet access) that is a sibling of technical debt. Process debt is overhead and inefficiency associated with processes. The border between process debt and technical debt in computers is the code itself (although that border may sometimes not be very well-defined, as code and process are oftentimes similar, such as most interactions with version control or code review tools).

When I see this list of process debt, I'm inspired by the opportunity to streamline the processes and bask in the efficiency gains. But I am also simultaneously overwhelmed by the vast scope of things that need improved. When I think about the amount of energy that will need to be exerted to fight the OMG change crowd, the list becomes depressing. But discussing institutional resistance to change, the stop energy associated with it, and Mozilla's historical record of failing to invest in fixing process (and technical) debt is for another post.

When looking at the above list, I can think of the following general ways to make it more approachable:

Remove an item completely. If it isn't on the list, there is nothing to know and no overhead. The best way to solve a problem is to make it not exist.
Automate an item and makes its existence largely transparent. If an item is invisible, does it exist in the mind of a contributor? (This is also known as solving the problem by adding a layer of indirection.)
Change an item so that it is identical to another, more familiar process. If you use a well-defined process, there is no new knowledge that must be learned and the cost of on-boarding someone already familiar with that knowledge is practically zero.

When you start staring at this list of Firefox contribution process debt, you start thinking about priorities, groupings, and strategies. You look around at what others are doing to see if you can borrow good ideas.

I've done a lot of thinking on the subject and have some ideas and recommendations. Stay tuned for some additional posts on the topic.

« Previous Page -- Next Page »