Gregory Szorc's Digital Home

Please run mach mercurial-setup

July 25, 2014 at 10:00 AM | categories: Mercurial, Mozilla

Hey there, Firefox developer! Do you use Mercurial? Please take the time right now to run mach mercurial-setup from your Firefox clone.

It's been updated to ensure you are running a modern Mercurial version. More awesomely, it has support for a couple of new extensions to make you more productive. I think you'll like what you see.

mach mercurial-setup doesn't change your hgrc without confirmation. So it is safe to run to see what's available. You should consider running it periodically, say once a week or so. I wouldn't be surprised if we add a notification to mach to remind you to do this.

Repository-Centric Development

July 24, 2014 at 08:23 PM | categories: Git, Mercurial, Mozilla

I was editing a wiki page yesterday and I think I coined a new term which I'd like to enter the common nomenclature: repository-centric development. The term refers to development/version control workflows that place repositories - not patches - first.

When collaborating on version controlled code with modern tools like Git and Mercurial, you essentially have two choices on how to share version control data: patches or repositories.

Patches have been around since the dawn of version control. Everyone knows how they work: your version control system has a copy of the canonical data and it can export a view of a specific change into what's called a patch. A patch is essentially a diff with extra metadata.

When distributed version control systems came along, they brought with them an alternative to patch-centric development: repository-centric development. You could still exchange patches if you wanted, but distributed version control allowed you to pull changes directly from multiple repositories. You weren't limited to a single master server (that's what the distributed in distributed version control means). You also didn't have to go through an intermediate transport such as email to exchange patches: you communicate directly with a peer repository instance.

Repository-centric development eliminates the middle man required for patch exchange: instead of exchanging derived data, you exchange the actual data, speaking the repository's native language.

One advantage of repository-centric development is it eliminates the problem of patch non-uniformity. Patches come in many different flavors. You have plain diffs. You have diffs with metadata. You have Git style metadata. You have Mercurial style metadata. You can produce patches with various lines of context in the diff. There are different methods for handling binary content. There are different ways to express file adds, removals, and renames. It's all a hot mess. Any system that consumes patches needs to deal with the non-uniformity. Do you think this isn't a problem in the real world? Think again. If you are involved with an open source project that collects patches via email or by uploading patches to a bug tracker, have you ever seen someone accidentally upload a patch in the wrong format? That's patch non-uniformity. New contributors to Firefox do this all the time. I also see it in the Mercurial project. With repository-centric development, patches never enter the picture, so patch non-uniformity is a non-issue. (Don't confuse the superficial formatting of patches with the content, such as an incorrect commit message format.)

Another advantage of repository-centric development is it makes the act of exchanging data easier. Just have two repositories talk to each other. This used to be difficult, but hosting services like GitHub and Bitbucket make this easy. Contrast with patches, which require hooking your version control tool up to wherever those patches are located. The Linux Kernel, like so many other projects, uses email for contributing changes. So now Git, Mercurial, etc all fulfill Zawinski's law. This means your version control tool is talking to your inbox to send and receive code. Firefox development uses Bugzilla to hold patches as attachments. So now your version control tool needs to talk to your issue tracker. (Not the worst idea in the world I will concede.) While, yes, the tools around using email or uploading patches to issue trackers or whatever else you are using to exchange patches exist and can work pretty well, the grim reality is that these tools are all reinventing the wheel of repository exchange and are solving a problem that has already been solved by git push, git fetch, hg pull, hg push, etc. Personally, I would rather hg push to a remote and have tools like issue trackers and mailing lists pull directly from repositories. At least that way they have a direct line into the source of truth and are guaranteed a consistent output format.

Another area where direct exchange is huge is multi-patch commits (branches in Git parlance) or where commit data is fragmented. When pushing patches to email, you need to insert metadata saying which patch comes after which. Then the email import tool needs to reassemble things in the proper order (remember that the typical convention is one email per patch and email can be delivered out of order). Not the most difficult problem in the world to solve. But seriously, it's been solved already by git fetch and hg pull! Things are worse for Bugzilla. There is no bullet-proof way to order patches there. The convention at Mozilla is to add Part N strings to commit messages and have the Bugzilla import tool do a sort (I assume it does that). But what if you have a logical commit series spread across multiple bugs? How do you reassemble everything into a linear series of commits? You don't, sadly. Just today I wanted to apply a somewhat complicated series of patches to the Firefox build system I was asked to review so I could jump into a debugger and see what was going on so I could conduct a more thorough review. There were 4 or 5 patches spread over 3 or 4 bugs. Bugzilla and its patch-centric workflow prevented me from importing the patches. Fortunately, this patch series was pushed to Mozilla's Try server, so I could pull from there. But I haven't always been so fortunate. This limitation means developers have to make sacrifices such as writing fewer, larger patches (this makes code review harder) or involving unrelated parties in the same bug and/or review. In other words, deficient tools are imposing limited workflows. No bueno.

It is a fair criticism to say that not everyone can host a server or that permissions and authorization are hard. Although I think concerns about impact are overblown. If you are a small project, just create a GitHub or Bitbucket account. If you are a larger project, realize that people time is one of your largest expenses and invest in tools like proper and efficient repository hosting (often this can be GitHub) to reduce this waste and keep your developers happier and more efficient.

One of the clearest examples of repository-centric development is GitHub. There are no patches in GitHub. Instead, you git push and git fetch. Want to apply someone else's work? Just add a remote and git fetch! Contrast with first locating patches, hooking up Git to consume them (this part was always confusing to me - do you need to retroactively have them sent to your email inbox so you can import them from there), and finally actually importing them. Just give me a URL to a repository already. But the benefits of repository-centric development with GitHub don't stop at pushing and pulling. GitHub has built code review functionality into pushes. They call these pull requests. While I have significant issues with GitHub's implemention of pull requests (I need to blog about those some day), I can't deny the utility of the repository-centric workflow and all the benefits around it. Once you switch to GitHub and its repository-centric workflow, you more clearly see how lacking patch-centric development is and quickly lose your desire to go back to the 1990's state-of-the-art methods for software development.

I hope you now know what repository-centric development is and will join me in championing it over patch-based development.

Mozillians reading this will be very happy to learn that work is under way to shift Firefox's development workflow to a more repository-centric world. Stay tuned.

Updates to firefoxtree Mercurial extension

July 16, 2014 at 07:55 PM | categories: Mercurial, Mozilla

My Please Stop Using MQ post, has been generating a lot of interest for bookmark-based workflows at Mozilla. To make adoption easier, I quickly authored an extension to add remote refs of Firefox repositories to Mercurial.

There was still a bit of confusion and gripes about workflows that I thought it would be best to update the extension to make things more pleasant.

Automatic tree names

People wanted an ability to easy pull/aggregate the various Firefox trees without additional configuration to an hgrc file.

With firefoxtree, you can now hg pull central or hg pull inbound or hg pull aurora and it just works.

Pushing with aliases doesn't yet work. It is slightly harder to do in the Mercurial API. I have a solution, but I'm validating some code paths to ensure it is safe. This feature will likely appear soon.

fxheads commands

Once people adopted unified repositories with heads from multiple repositories, they asked how they could quickly identify the heads of the pulled Firefox repositories.

firefoxtree now provides a hg fxheads command that prints a concise output of the commits constituting the heads of the Firefox repos. e.g.

$ hg fxheads
224969:0ec0b9ac39f0 aurora (sort of) bug 898554 - raise expected hazard count for b2g to 4 until they are fixed, a=bustage+hazbuild-only
224290:6befadcaa685 beta Tagging /src/mdauto/build/mozilla-beta 1772e55568e4 with FIREFOX_RELEASE_31_BASE a=release CLOSED TREE
224848:8e8f3ba64655 central Merge inbound to m-c a=merge
225035:ec7f2245280c fx-team fx-team/default Merge m-c to fx-team
224877:63c52b7ddc28 inbound Bug 1039197 - Always build js engine with zlib. r=luke
225044:1560f67f4f93 release release/default tip Automated checkin: version bump for firefox 31.0 release. DONTBUILD CLOSED TREE a=release

Please note that the output is based upon local-only knowledge: you'll need to pull to ensure data is current.

Reject pushing multiple heads

People were complaining that bookmark-based workflows resulted in Mercurial trying to push multiple heads to a remote. This complaint stems from the fact that Mercurial's default push behavior is to find all commits missing from the remote and push them. This behavior is extremely frustrating for Firefox development because the Firefox repos only have a single head and pushing multiple heads will only result in a server hook rejecting the push (after wasting a lot of time transferring that commit data).

firefoxtree now will refuse to push multiple heads to a known Firefox repo before any commit data is sent. In other words, we fail fast so your time is saved.

firefoxtree also changes the default behavior of hg push when pushing to a Firefox repo. If no -r argument is specified, hg push to a Firefox repo will automatically remap to hg push -r .. In other words, we attempt to push the working copy's commit by default. This change establishes sensible default and likely working behavior when typing just hg push.

I am a bit on the fence about changing the default behavior of hg push. On one hand, it makes total sense. On the other, silently changing the default behavior of a built-in command is a little dangerous. I can easily see this backfiring when people interact with non-Firefox repos. I encourage people to get in the habit of typing hg push -r because that's what you should be doing.

Installing firefoxtree

Within the next 48 hours, mach mercurial-setup should prompt to install firefoxtree. Until then, clone https://hg.mozilla.org/hgcustom/version-control-tools and ensure your ~/.hgrc file has the following:

[extensions]
firefoxtree = /path/to/version-control-tools/hgext/firefoxtree

You likely already have a copy of version-control-tools in ~/.mozbuild/version-control-tools.

It is completely safe to install firefoxtree globally: the extension will only modify behavior of repositories that are clones of Firefox repositories.

Python Packaging Do's and Don'ts

July 15, 2014 at 05:20 PM | categories: Python, Mozilla

Are you someone who casually interacts with Python but don't know the inner workings of Python? Then this post is for you. Read on to learn why some things are the way they are and how to avoid making some common mistakes.

Always use Virtualenvs

It is an easy trap to view virtualenvs as an obstacle, a distraction towards accomplishing something. People see me adding virtualenvs to build instructions and they say I don't use virtualenvs, they aren't necessary, why are you doing that?

A virtualenv is effectively an overlay on top of your system Python install. Creating a virtualenv can be thought of as copying your system Python environment into a local location. When you modify virtualenvs, you are modifying an isolated container. Modifying virtualenvs has no impact on your system Python.

A goal of a virtualenv is to isolate your system/global Python install from unwanted changes. When you accidentally make a change to a virtualenv, you can just delete the virtualenv and start over from scratch. When you accidentally make a change to your system Python, it can be much, much harder to recover from that.

Another goal of virtualenvs is to allow different versions of packages to exist. Say you are working on two different projects and each requires a specific version of Django. With virtualenvs, you install one version in one virtualenv and a different version in another virtualenv. Things happily coexist because the virtualenvs are independent. Contrast with trying to manage both versions of Django in your system Python installation. Trust me, it's not fun.

Casual Python users may not encounter scenarios where virtualenvs make their lives better... until they do, at which point they realize their system Python install is beyond saving. People who eat, breath, and die Python run into these scenarios all the time. We've learned how bad life without virtualenvs can be and so we use them everywhere.

Use of virtualenvs is a best practice. Not using virtualenvs will result in something unexpected happening. It's only a matter of time.

Please use virtualenvs.

Never use sudo

Do you use sudo to install a Python package? You are doing it wrong.

If you need to use sudo to install a Python package, that almost certainly means you are installing a Python package to your system/global Python install. And this means you are modifying your system Python instead of isolating it and keeping it pristine.

Instead of using sudo to install packages, create a virtualenv and install things into the virtualenv. There should never be permissions issues with virtualenvs - the user that creates a virtualenv has full realm over it.

Never modify the system Python environment

On some systems, such as OS X with Homebrew, you don't need sudo to install Python packages because the user has write access to the Python directory (/usr/local in Homebrew).

For the reasons given above, don't muck around with the system Python environment. Instead, use a virtualenv.

Beware of the package manager

Your system's package manager (apt, yum, etc) is likely using root and/or installing Python packages into the system Python.

For the reasons given above, this is bad. Try to use a virtualenv, if possible. Try to not use the system package manager for installing Python packages.

Use pip for installing packages

Python packaging has historically been a mess. There are a handful of tools and APIs for installing Python packages. As a casual Python user, you only need to know of one of them: pip.

If someone says install a package, you should be thinking create a virtualenv, activate a virtualenv, pip install <package>. You should never run pip install outside of a virtualenv. (The exception is to install virtualenv and pip itself, which you almost certainly want in your system/global Python.)

Running pip install will install packages from PyPI, the Python Packaging Index by default. It's Python's official package repository.

There are a lot of old and outdated tutorials online about Python packaging. Beware of bad content. For example, if you see documentation that says use easy_install, you should be thinking, easy_install is a legacy package installer that has largely been replaced by pip, I should use pip instead. When in doubt, consult the Python packaging user guide and do what it recommends.

Don't trust the Python in your package manager

The more Python programming you do, the more you learn to not trust the Python package provided by your system / package manager.

Linux distributions such as Ubuntu that sit on the forward edge of versions are better than others. But I've run into enough problems with the OS or package manager maintained Python (especially on OS X), that I've learned to distrust them.

I use pyenv for installing and managing Python distributions from source. pyenv also installs virtualenv and pip for me, packages that I believe should be in all Python installs by default. As a more experienced Python programmer, I find pyenv just works.

If you are just a beginner with Python, it is probably safe to ignore this section. Just know that as soon as something weird happens, start suspecting your default Python install, especially if you are on OS X. If you suspect trouble, use something like pyenv to enforce a buffer so the system can have its Python and you can have yours.

Recovering from the past

Now that you know the preferred way to interact with Python, you are probably thinking oh crap, I've been wrong all these years - how do I fix it?

The goal is to get a Python install somewhere that is as pristine as possible. You have two approaches here: cleaning your existing Python or creating a new Python install.

To clean your existing Python, you'll want to purge it of pretty much all packages not installed by the core Python distribution. The exception is virtualenv, pip, and setuptools - you almost certainly want those installed globally. On Homebrew, you can uninstall everything related to Python and blow away your Python directory, typically /usr/local/lib/python*. Then, brew install python. On Linux distros, this is a bit harder, especially since most Linux distros rely on Python for OS features and thus they may have installed extra packages. You could try a similar approach on Linux, but I don't think it's worth it.

Cleaning your system Python and attempting to keep it pure are ongoing tasks that are very difficult to keep up with. All it takes is one dependency to get pulled in that trashes your system Python. Therefore, I shy away from this approach.

Instead, I install and run Python from my user directory. I use pyenv. I've also heard great things about Miniconda. With either solution, you get a Python in your home directory that starts clean and pure. Even better, it is completely independent from your system Python. So if your package manager does something funky, there is a buffer. And, if things go wrong with your userland Python install, you can always nuke it without fear of breaking something in system land. This seems to be the best of both worlds.

Please note that installing packages in the system Python shouldn't be evil. When you create virtualenvs, you can - and should - tell virtualenv to not use the system site-packages (i.e. don't use non-core packages from the system installation). This is the default behavior in virtualenv. It should provide an adequate buffer. But from my experience, things still manage to bleed through. My userland Python install is extra safety. If something wrong happens, I can only blame myself.

Conclusion

Python's long and complicated history of package management makes it very easy for you to shoot yourself in the foot. The long list of outdated tutorials on The Internet make this a near certainty for casual Python users. Using the guidelines in this post, you can adhere to best practices that will cut down on surprises and rage and keep your Python running smoothly.

Update Bugzilla Automatically on Push

June 30, 2014 at 11:15 PM | categories: Mercurial, Mozilla

Do you manually create Bugzilla comments when you push changes to a Firefox source repository? Yeah, I do too.

That's always annoyed me.

It is screaming to be automated.

So I automated it.

You can too. From a Firefox source checkout:

$ ./mach mercurial-setup

That should clone the version-control-tools repository into ~/.mozbuild/version-control-tools.

Then, add the following to your ~/.hgrc file:

[extensions]
bzpost = ~/.mozbuild/version-control-tools/hgext/bzpost

[bugzilla]
username = me@example.com
password = password

Now, when you hg push to a Firefox repository, the commit URLs will get posted to referenced bugs automatically.

Please note that pushing to release trees such as mozilla-central is not yet supported. In due time.

Please let me know if you run into any issues.

Estimated Cost Savings

Assuming the following:

It costs Mozilla $200,000 per year per full-time engineer working on Firefox (a general rule of thumb for non-senior positions is that your true employee cost is 2x your base salary).
Each full-time engineer works 40 hours per week for 46 weeks out of the year.
It takes 15 seconds to manually update Bugzilla for each push.
There are 20,000 pushes requiring Bugzilla attention per year.

We arrive at the following:

Cost per employee per hour worked: $108.70
Total person-time to manually update Bugzilla: ~83 hours
Total cost to manually update Bugzilla after push: $9,058.

I was intentionally conservative with all the inputs except time worked (I think many of us work more than 40 hour weeks). My estimates also don't take into account the lost productivity associated with getting mentally derailed by interacting with Bugzilla. With this in mind, I could very easily justify a total cost at least 2x-3x higher.

It took me maybe 3 hours to crank this out. I could spend another few weeks on it full time and Mozilla would still save money (assuming 100% adoption).

I encourage people to run their own cost calculations on other tasks that can be automated. Inefficiencies multiplied by millions of dollars (your collective employee cost) result in large piles of money. Not having tools (even simple ones like this) is equivalent to setting loads of cash on fire.

« Previous Page -- Next Page »