PyOxidizer 0.7

April 09, 2020 at 09:00 PM | categories: Python, PyOxidizer

I am very pleased to announce the 0.7 release of PyOxidizer, a modern Python application packaging tool.

There are a host of notable new features in this release. You can read all about them in the project history.

I want to use this blog post to call out the more meaningful ones.

I started PyOxidizer as a science experiment of sorts: I sat out to prove the hypothesis that it was possible to produce high performance single file executables embedding Python and all of its resources (Python modules, non-module resource files, compiled extensions, etc). PyOxidizer has achieved this on Windows, Linux, and macOS since its very earliest releases. Hypothesis confirmed!

In order to actually achieve single file executables, you have to fundamentally change aspects of Python's behavior. Some of these changes invalidate deeply rooted assumptions about how Python works, such as the existence of __file__ in modules. As you can imagine, these broken assumptions translated to numerous compatibility issues and PyOxidizer didn't work with many popular Python packages.

With the science experiment phase of PyOxidizer out of the way, I have been making a concerted effort to broaden the user base of PyOxidizer. While single file executables can be an amazing property, it isn't critical for many use cases and the issues it was causing were preventing people from exploring PyOxidizer.

This brings us to what I think are the major new features in PyOxidizer 0.7.

Better Support for Loading Extension Modules

Earlier versions of PyOxidizer insisted that you compile Python (C) extension modules from source and statically link them into a produced binary. This requirement prevented the use of pre-built extension modules (commonly found in Python binary wheels available on PyPI) with PyOxidizer, forcing people to compile them locally. While this often just worked for many extension modules, it frequently failed on complex extension modules and it frequently failed on Windows.

PyOxidizer now supports loading compiled extension modules from standalone files (typically .so or .pyd files, which are actually shared libraries). There are still some sharp edges and known deficiencies. But in many cases, if you tell PyOxidizer to run pip install and package the result, pre-built wheels can be installed and PyOxidizer will pick up the standalone files.

On Windows, PyOxidizer even supports embedding the shared library data into the produced .exe and loading the .pyd/DLL directly from memory.

Loading Resources from the Filesystem

Binaries built with PyOxidizer contain a blob holding an index of available Python resources along with their data.

Earlier versions of PyOxidizer only allowed you to define resources as in-memory. If the resource was defined in this blob, it was imported from memory. Otherwise it wasn't known to PyOxidizer. You could still install files next to the produced binary and tell PyOxidizer to enable Python's default filesystem-based importer. But PyOxidizer didn't explicitly know about these files on the filesystem.

In PyOxidizer 0.7, the blob index of Python resources is able to express different locations for that resource. Currently, a resource can have its data made available in-memory or filesystem-relative. in-memory works as before: the raw data is embedded next to the next in memory and loaded from there (using 0-copy). filesystem-relative encodes a filesystem path to the resource. During packaging, PyOxidizer will place the resource next to the executable (using a typical Python file layout scheme) and store the relative path to that resource in the resources index.

The filesystem-relative resource indexing feature has a few implications for PyOxidizer.

First, it is more standard. When PyOxidizer loads a Python module from the filesystem, it sets __file__, __path__, etc and the module semantics should behave as if the file were imported by Python's standard importer. This means that if a package is having issues with in-memory importing, you can simply fall back to filesystem-relative to get standard Python behavior and everything should just work.

Second, PyOxidizer's filesystem resource loading is faster than Python's! When Python's standard importer goes to import a module, it needs to stat() various paths to first locate the file. It then performs some sanity checking and other minor actions before actually importing the module. All of this has overhead. Since the goal of PyOxidizer is to produce standalone applications and applications should be immutable, PyOxidizer can avoid most of this overhead. PyOxidizer simply tries to open() and read() the relative path baked into the resource index at build time. If that works, the resource is loaded. Else there is a failure. The code path in PyOxidizer to locate a Python resource is effectively a lookup in a Rust HashMap<&str, T>.

I thought it would be interesting to isolate the performance benefits of this new feature. I ran Mercurial's test harness with different variants of hg on Linux on my Ryzen 3950X.

  • traditional - A hg script with a #!/path/to/python3.7 shebang.
  • oxidized - A hg executable built with PyOxidizer, without PyOxidizer's custom module importer.
  • filesystem - A hg executable built with PyOxidizer using the new filesystem-relative resource index.
  • in-memory - A hg executable built with PyOxidizer with all resources loaded from memory (how PyOxidizer has traditionally worked).

The results are quite clear:

VariantCPU Time (s)Delta (s)% Orig
traditional11,287-552100
oxidized10,735-55295.1
filesystem10,186-1,10190.2
in-memory9,883-1,40487.6

We see a nice win just from using a native executable built with PyOxidizer (traditional to oxidized).

Then from oxidized to filesystem we see another jump of ~5%. This difference is attributed to using PyOxidizer's Rust-powered importer with an index of resources available on the filesystem. In other words, all that work that Python's standard importer is doing to discover files and then operate on them is non-trivial!

Finally, the smaller jump from filesystem to in-memory isolates the benefits of importing resource data from memory instead of involving filesystem I/O. (Filesystems are generally slow.) While I haven't measured explicitly, I hypothesize that macOS and Windows will see a bigger jump between these two variants, as the filesystem performance on these platforms generally isn't as good as it is on Linux.

PyOxidizer's Future

With PyOxidizer now supporting a couple of much-needed features to support a broader set of users, I'm hoping that future releases of PyOxidizer continue to broaden the utility of PyOxidizer.

The over-arching goal of PyOxidizer is to solve large aspects of the Python application packaging and distribution problem. So far a lot of focus has been spent on the former. PyOxidizer in its current form can materialize files on the filesystem that you can copy or package up manually and distribute. But I want these processes to be part of PyOxidizer: I want it to be possible for PyOxidizer to emit a Windows MSI installer, a macOS dmg, a Debian package, etc for a Python application.

In order to support the aforementioned marquee features of this PyOxidizer release, I had to pay down a lot of technical debt in the code base left over from the science experiment phase of PyOxidizer's inception.

In the short term, I plan to continue shoring up the code base and rounding out support for features requested in the issue tracker on GitHub. The next release of PyOxidizer will also likely require Python 3.8, as this will improve run-time control over the embedded Python interpreter and enable PyOxidizer to better support package metadata (importlib.metadata), enabling support for features like entry points.

I've also been thinking about extracting PyOxidizer's custom module importer to be usable as a standalone Python extension module. I think there's some value in publishing a pyoxidizer_importer package on PyPI that you can easily add to your installed packages to speed up Python's standard filesystem importer by a few percent. If nothing else, this may drum up interest in the larger Python community for standardizing a format for serializing Python resources in a single file. Perhaps we can get other Python packaging tools producing the same packed resources data blob that PyOxidizer uses so we can all standardize on a more efficient mechanism for loading Python modules. Time will tell.

Enjoy the new release. File issues at https://github.com/indygreg/PyOxidizer as you encounter them.