Python Packaging Do's and Don'ts

July 15, 2014 at 05:20 PM | categories: Python, Mozilla

Are you someone who casually interacts with Python but don't know the inner workings of Python? Then this post is for you. Read on to learn why some things are the way they are and how to avoid making some common mistakes.

Always use Virtualenvs

It is an easy trap to view virtualenvs as an obstacle, a distraction towards accomplishing something. People see me adding virtualenvs to build instructions and they say I don't use virtualenvs, they aren't necessary, why are you doing that?

A virtualenv is effectively an overlay on top of your system Python install. Creating a virtualenv can be thought of as copying your system Python environment into a local location. When you modify virtualenvs, you are modifying an isolated container. Modifying virtualenvs has no impact on your system Python.

A goal of a virtualenv is to isolate your system/global Python install from unwanted changes. When you accidentally make a change to a virtualenv, you can just delete the virtualenv and start over from scratch. When you accidentally make a change to your system Python, it can be much, much harder to recover from that.

Another goal of virtualenvs is to allow different versions of packages to exist. Say you are working on two different projects and each requires a specific version of Django. With virtualenvs, you install one version in one virtualenv and a different version in another virtualenv. Things happily coexist because the virtualenvs are independent. Contrast with trying to manage both versions of Django in your system Python installation. Trust me, it's not fun.

Casual Python users may not encounter scenarios where virtualenvs make their lives better... until they do, at which point they realize their system Python install is beyond saving. People who eat, breath, and die Python run into these scenarios all the time. We've learned how bad life without virtualenvs can be and so we use them everywhere.

Use of virtualenvs is a best practice. Not using virtualenvs will result in something unexpected happening. It's only a matter of time.

Please use virtualenvs.

Never use sudo

Do you use sudo to install a Python package? You are doing it wrong.

If you need to use sudo to install a Python package, that almost certainly means you are installing a Python package to your system/global Python install. And this means you are modifying your system Python instead of isolating it and keeping it pristine.

Instead of using sudo to install packages, create a virtualenv and install things into the virtualenv. There should never be permissions issues with virtualenvs - the user that creates a virtualenv has full realm over it.

Never modify the system Python environment

On some systems, such as OS X with Homebrew, you don't need sudo to install Python packages because the user has write access to the Python directory (/usr/local in Homebrew).

For the reasons given above, don't muck around with the system Python environment. Instead, use a virtualenv.

Beware of the package manager

Your system's package manager (apt, yum, etc) is likely using root and/or installing Python packages into the system Python.

For the reasons given above, this is bad. Try to use a virtualenv, if possible. Try to not use the system package manager for installing Python packages.

Use pip for installing packages

Python packaging has historically been a mess. There are a handful of tools and APIs for installing Python packages. As a casual Python user, you only need to know of one of them: pip.

If someone says install a package, you should be thinking create a virtualenv, activate a virtualenv, pip install <package>. You should never run pip install outside of a virtualenv. (The exception is to install virtualenv and pip itself, which you almost certainly want in your system/global Python.)

Running pip install will install packages from PyPI, the Python Packaging Index by default. It's Python's official package repository.

There are a lot of old and outdated tutorials online about Python packaging. Beware of bad content. For example, if you see documentation that says use easy_install, you should be thinking, easy_install is a legacy package installer that has largely been replaced by pip, I should use pip instead. When in doubt, consult the Python packaging user guide and do what it recommends.

Don't trust the Python in your package manager

The more Python programming you do, the more you learn to not trust the Python package provided by your system / package manager.

Linux distributions such as Ubuntu that sit on the forward edge of versions are better than others. But I've run into enough problems with the OS or package manager maintained Python (especially on OS X), that I've learned to distrust them.

I use pyenv for installing and managing Python distributions from source. pyenv also installs virtualenv and pip for me, packages that I believe should be in all Python installs by default. As a more experienced Python programmer, I find pyenv just works.

If you are just a beginner with Python, it is probably safe to ignore this section. Just know that as soon as something weird happens, start suspecting your default Python install, especially if you are on OS X. If you suspect trouble, use something like pyenv to enforce a buffer so the system can have its Python and you can have yours.

Recovering from the past

Now that you know the preferred way to interact with Python, you are probably thinking oh crap, I've been wrong all these years - how do I fix it?

The goal is to get a Python install somewhere that is as pristine as possible. You have two approaches here: cleaning your existing Python or creating a new Python install.

To clean your existing Python, you'll want to purge it of pretty much all packages not installed by the core Python distribution. The exception is virtualenv, pip, and setuptools - you almost certainly want those installed globally. On Homebrew, you can uninstall everything related to Python and blow away your Python directory, typically /usr/local/lib/python*. Then, brew install python. On Linux distros, this is a bit harder, especially since most Linux distros rely on Python for OS features and thus they may have installed extra packages. You could try a similar approach on Linux, but I don't think it's worth it.

Cleaning your system Python and attempting to keep it pure are ongoing tasks that are very difficult to keep up with. All it takes is one dependency to get pulled in that trashes your system Python. Therefore, I shy away from this approach.

Instead, I install and run Python from my user directory. I use pyenv. I've also heard great things about Miniconda. With either solution, you get a Python in your home directory that starts clean and pure. Even better, it is completely independent from your system Python. So if your package manager does something funky, there is a buffer. And, if things go wrong with your userland Python install, you can always nuke it without fear of breaking something in system land. This seems to be the best of both worlds.

Please note that installing packages in the system Python shouldn't be evil. When you create virtualenvs, you can - and should - tell virtualenv to not use the system site-packages (i.e. don't use non-core packages from the system installation). This is the default behavior in virtualenv. It should provide an adequate buffer. But from my experience, things still manage to bleed through. My userland Python install is extra safety. If something wrong happens, I can only blame myself.

Conclusion

Python's long and complicated history of package management makes it very easy for you to shoot yourself in the foot. The long list of outdated tutorials on The Internet make this a near certainty for casual Python users. Using the guidelines in this post, you can adhere to best practices that will cut down on surprises and rage and keep your Python running smoothly.