A bunch of us were in Toronto last week hacking on MozReview.
One of the cool things we did was deploy a bot for performing Python static analysis. If you submit some .py files to MozReview, the bot should leave a review. If it finds violations (it uses flake8 internally), it will open an issue for each violation. It also leaves a comment that should hopefully give enough detail on how to fix the problem.
While we haven't done much in the way of performance optimizations, the bot typically submits results less than 10 seconds after the review is posted! So, a human should never be reviewing Python that the bot hasn't seen. This means you can stop thinking about style nits and start thinking about what the code does.
This bot should be considered an alpha feature. The code for the bot isn't even checked in yet. We're running the bot against production to get a feel for how it behaves. If things don't go well, we'll turn it off until the problems are fixed.
I'd also like to eventually make it easier to locally run the same static analysis we run in MozReview. Addressing problems locally before pushing is a no-brainer since it avoids needless context switching from other people and is thus better for productivity. This will come in time.
Report issues in #mozreview or in the Developer Services :: MozReview Bugzilla component.
Are you someone who casually interacts with Python but don't know the inner workings of Python? Then this post is for you. Read on to learn why some things are the way they are and how to avoid making some common mistakes.
Always use Virtualenvs
It is an easy trap to view virtualenvs as an obstacle, a distraction towards accomplishing something. People see me adding virtualenvs to build instructions and they say I don't use virtualenvs, they aren't necessary, why are you doing that?
A virtualenv is effectively an overlay on top of your system Python install. Creating a virtualenv can be thought of as copying your system Python environment into a local location. When you modify virtualenvs, you are modifying an isolated container. Modifying virtualenvs has no impact on your system Python.
A goal of a virtualenv is to isolate your system/global Python install from unwanted changes. When you accidentally make a change to a virtualenv, you can just delete the virtualenv and start over from scratch. When you accidentally make a change to your system Python, it can be much, much harder to recover from that.
Another goal of virtualenvs is to allow different versions of packages to exist. Say you are working on two different projects and each requires a specific version of Django. With virtualenvs, you install one version in one virtualenv and a different version in another virtualenv. Things happily coexist because the virtualenvs are independent. Contrast with trying to manage both versions of Django in your system Python installation. Trust me, it's not fun.
Casual Python users may not encounter scenarios where virtualenvs make their lives better... until they do, at which point they realize their system Python install is beyond saving. People who eat, breath, and die Python run into these scenarios all the time. We've learned how bad life without virtualenvs can be and so we use them everywhere.
Use of virtualenvs is a best practice. Not using virtualenvs will result in something unexpected happening. It's only a matter of time.
Please use virtualenvs.
Never use sudo
Do you use sudo to install a Python package? You are doing it wrong.
If you need to use sudo to install a Python package, that almost certainly means you are installing a Python package to your system/global Python install. And this means you are modifying your system Python instead of isolating it and keeping it pristine.
Instead of using sudo to install packages, create a virtualenv and install things into the virtualenv. There should never be permissions issues with virtualenvs - the user that creates a virtualenv has full realm over it.
Never modify the system Python environment
On some systems, such as OS X with Homebrew, you don't need sudo to install Python packages because the user has write access to the Python directory (/usr/local in Homebrew).
For the reasons given above, don't muck around with the system Python environment. Instead, use a virtualenv.
Beware of the package manager
Your system's package manager (apt, yum, etc) is likely using root and/or installing Python packages into the system Python.
For the reasons given above, this is bad. Try to use a virtualenv, if possible. Try to not use the system package manager for installing Python packages.
Use pip for installing packages
Python packaging has historically been a mess. There are a handful of tools and APIs for installing Python packages. As a casual Python user, you only need to know of one of them: pip.
If someone says install a package, you should be thinking create a
virtualenv, activate a virtualenv,
pip install <package>. You
should never run
pip install outside of a virtualenv. (The exception
is to install virtualenv and pip itself, which you almost certainly want
in your system/global Python.)
Running pip install
There are a lot of old and outdated tutorials online about Python packaging. Beware of bad content. For example, if you see documentation that says use easy_install, you should be thinking, easy_install is a legacy package installer that has largely been replaced by pip, I should use pip instead. When in doubt, consult the Python packaging user guide and do what it recommends.
Don't trust the Python in your package manager
The more Python programming you do, the more you learn to not trust the Python package provided by your system / package manager.
Linux distributions such as Ubuntu that sit on the forward edge of versions are better than others. But I've run into enough problems with the OS or package manager maintained Python (especially on OS X), that I've learned to distrust them.
I use pyenv for installing and managing Python distributions from source. pyenv also installs virtualenv and pip for me, packages that I believe should be in all Python installs by default. As a more experienced Python programmer, I find pyenv just works.
If you are just a beginner with Python, it is probably safe to ignore this section. Just know that as soon as something weird happens, start suspecting your default Python install, especially if you are on OS X. If you suspect trouble, use something like pyenv to enforce a buffer so the system can have its Python and you can have yours.
Recovering from the past
Now that you know the preferred way to interact with Python, you are probably thinking oh crap, I've been wrong all these years - how do I fix it?
The goal is to get a Python install somewhere that is as pristine as possible. You have two approaches here: cleaning your existing Python or creating a new Python install.
To clean your existing Python, you'll want to purge it of pretty much all packages not installed by the core Python distribution. The exception is virtualenv, pip, and setuptools - you almost certainly want those installed globally. On Homebrew, you can uninstall everything related to Python and blow away your Python directory, typically /usr/local/lib/python*. Then, brew install python. On Linux distros, this is a bit harder, especially since most Linux distros rely on Python for OS features and thus they may have installed extra packages. You could try a similar approach on Linux, but I don't think it's worth it.
Cleaning your system Python and attempting to keep it pure are ongoing tasks that are very difficult to keep up with. All it takes is one dependency to get pulled in that trashes your system Python. Therefore, I shy away from this approach.
Instead, I install and run Python from my user directory. I use pyenv. I've also heard great things about Miniconda. With either solution, you get a Python in your home directory that starts clean and pure. Even better, it is completely independent from your system Python. So if your package manager does something funky, there is a buffer. And, if things go wrong with your userland Python install, you can always nuke it without fear of breaking something in system land. This seems to be the best of both worlds.
Please note that installing packages in the system Python shouldn't be evil. When you create virtualenvs, you can - and should - tell virtualenv to not use the system site-packages (i.e. don't use non-core packages from the system installation). This is the default behavior in virtualenv. It should provide an adequate buffer. But from my experience, things still manage to bleed through. My userland Python install is extra safety. If something wrong happens, I can only blame myself.
Python's long and complicated history of package management makes it very easy for you to shoot yourself in the foot. The long list of outdated tutorials on The Internet make this a near certainty for casual Python users. Using the guidelines in this post, you can adhere to best practices that will cut down on surprises and rage and keep your Python running smoothly.
Over the winter break, I set out on an ambitious project to create a service to help developers and others manage the flury of patches going into Firefox. While the project is far from complete, I'm ready to unleash the first part of the project upon the world.
Essentially, I built a centralized indexing service for version control repositories with Mozilla's extra metadata thrown in. I tell it what repositories to mirror, and it clones everything, fetches data such as the pushlog and Git SHA-1 mappings, and stores everything in a central database. It then exposes this aggregated data through world-readable web services.
Currently, I have the service indexing the popular project branches for Firefox (central, aurora, beta, release, esr, b2g, inbound, fx-team, try, etc). You can view the full list via the web service. As a bonus, I'm also serving these repositories via hg.gregoryszorc.com. My server appears to be significantly faster than hg.mozilla.org. If you want to use it for your daily needs, go for it. I make no SLA guarantees, however.
I'm also using this service as an opportunity to experiment with alternate forms of Mercurial hosting. I have mirrors of mozilla-central and the try repository with generaldelta and lz4 compression enabled. I may blog about what those are eventually. The teaser is that they can make Mercurial perform a lot faster under some conditions. I'm also using ZFS under the hood to manage repositories. Each repository is a ZFS filesystem. This means I can create repository copies on the server (user repositories anyone?) at a nearly free cost. Contrast this to the traditional method of full clones, which take lots of time, memory, CPU, and storage.
Anyway, some things you can do with the existing web service:
- Obtain metadata about Mercurial changesets. Example.
- Look up metadata about Git commits. Example.
- Obtain a SPORE descriptor describing the web service endpoints. This allows you to auto-generate clients from descriptors. Yay!
Obviously, that's not a lot. But adding new endpoints is relatively straightforward. See the source. It's literally as easy as defining a URL mapping and writing a database query.
The performance is also not the best. I just haven't put in the effort to tune things yet. All of the querying hits the database, not Mercurial. So, making things faster should merely be a matter of database and hosting optimization. Patches welcome!
Some ideas that I haven't had time to implement yet:
- Return changests in a specific repository
- Return recently pushed changesets
- Return pushes for a given user
- Return commits for a given author
- Return commits referencing a given bug
- Obtain TBPL URLs for pushes with changeset
- Integrate bugzilla metadata
Once those are in place, I foresee this service powering a number of dashboards. Patches welcome.
Again, this service is only the tip of what's possible. There's a lot that could be built on this service. I have ideas. Others have ideas.
The project includes a Vagrant file and Puppet manifests for provisioning the server. It's a one-liner to get a development environment up and running. It should be really easy to contribute to this project. Patches welcome.
I see a number of open source projects supporting old versions of Python. Mercurial supports 2.4, for example. I have to ask: why do projects continue to support old Python releases?
- Python 2.4 was last released on December 19, 2008 and there will be no more releases of Python 2.4.
- Python 2.5 was last released on May 26, 2011 and there will be no more releases of Python 2.5.
- Python 2.6 was last released on October 29, 2013 and there will be no more releases of Python 2.6.
- Everything before Python 2.7 is end-of-lifed
- Python 2.7 continues to see periodic releases, but mostly for bug fixes.
- Practically all of the work on CPython is happening in the 3.3 and 3.4 branches. Other implementations continue to support 2.7.
- Python 2.7 has been available since July 2010
- Python 2.7 provides some very compelling language features over earlier releases that developers want to use
- It's much easier to write dual compatible 2/3 Python when 2.7 is the only 2.x release considered.
- Python 2.7 can be installed in userland relatively easily (see projects like pyenv).
Given these facts, I'm not sure why projects insist on supporting old and end-of-lifed Python releases.
I think maintainers of Python projects should seriously consider dropping support for Python 2.6 and below. Are there really that many people on systems that don't have Python 2.7 easily available? Why are we Python developers inflicting so much pain on ourselves to support antiquated Python releases?
As a data point, I successfully transitioned Firefox's build system from requiring Python 2.5+ to 2.7.3+ and it was relatively pain free. Sure, a few people complained. But as far as I know, not very many new developers are coming in and complaining about the requirement. If we can do it with a few thousand developers, I'm guessing your project can as well.
Update 2014-01-09 16:05:00 PST: This post is being discussed on Slashdot. A lot of the comments talk about Python 3. Python 3 is its own set of considerations. The intended focus of this post is strictly about dropping support for Python 2.6 and below. Python 3 is related in that porting Python 2.x to Python 3 is much easier the higher the Python 2.x version. This especially holds true when you want to write Python that works simultaneously in both 2.x and 3.x.
I have a number of Python projects and tools that interact with various Mozilla services. I had authored clients for all these services as standalone Python modules so they could be reused across projects.
$ pip install mozautomation
Currently included in the Python package are:
- A client for treestatus.mozilla.org
- Module for extracting cookies from Firefox profiles (useful for programmatically getting Bugzilla auth credentials).
- A client for reading and interpretting the JSON dumps of automation jobs
- An interface to a SQLite database to manage associations between Mercurial changesets, bugs, and pushes.
- Rudimentary parsing of commit messages to extract bugs and reviewers.
- A client to obtain information about Firefox releases via the releases API
- A module defining common Firefox source repositories, aliases, logical groups (e.g. twigs and integration trees), and APIs for fetching pushlog data.
- A client for the self serve API
Documentation and testing is currently sparse. Things aren't up to my regular high quality standard. But something is better than nothing.
If you are interested in contributing, drop me a line or send pull requests my way!
Next Page »