Announcing the PyOxy Python Runner
May 10, 2022 at 08:00 AM | categories: Python, PyOxidizerI'm pleased to announce the initial release of PyOxy. Binaries are available on GitHub.
(Yes, I used my pure Rust Apple code signing implementation to remotely sign the macOS binaries from GitHub Actions using a YubiKey plugged into my Windows desktop: that experience still feels magical to me.)
PyOxy is all of the following:
- An executable program used for running Python interpreters.
- A single file and highly portable (C)Python distribution.
- An alternative
python
driver providing more control over the interpreter than whatpython
itself provides. - A way to make some of PyOxidizer's technology more broadly available without using PyOxidizer.
Read the following sections for more details.
pyoxy
Acts Like python
The pyoxy
executable has a run-python
sub-command that will essentially
do what python
would do:
$ pyoxy run-python
Python 3.9.12 (main, May 3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
A Python REPL. That's familiar!
You can even pass python
arguments to it:
$ pyoxy run-python -- -c 'print("hello, world")'
hello, world
When a pyoxy
executable is renamed to any filename beginning with python
,
it implicitly behaves like pyoxy run-python --
.
$ mv pyoxy python3.9
$ ls -al python3.9
-rwxrwxr-x 1 gps gps 120868856 May 10 2022 python3.9
$ ./python3.9
Python 3.9.12 (main, May 3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Single File Python Distributions
The official pyoxy
executables are built with PyOxidizer and leverage the
Python distributions provided by my
python-build-standalone
project. On Linux and macOS, a fully featured Python interpreter and its library
dependencies are statically linked into pyoxy
. The pyoxy
executable also embeds
a copy of the Python standard library and imports it from memory using the
oxidized_importer
Python extension module.
What this all means is that the official pyoxy
executables can function as
single file CPython distributions! Just download a pyoxy
executable, rename it
to python
, python3
, python3.9
, etc and it should behave just like a normal
python
would!
Your Python installation has never been so simple. And fast: pyoxy
should be
a few milliseconds faster to initialize a Python interpreter mostly because of
oxidized_importer
and it avoiding filesystem overhead to look for and load
.py[c]
files.
Low-Level Control Over the Python Interpreter with YAML
The pyoxy run-yaml
command is takes the path to a YAML file defining the
embedded Python interpreter configuration and then launches that Python
interpreter in-process:
$ cat > hello_world.yaml <<EOF
---
allocator_debug: true
interpreter_config:
run_command: 'print("hello, world")'
...
EOF
$ pyoxy run-yaml hello_world.yaml
hello, world
Under the hood, PyOxy uses the
pyembed Rust crate to manage embedded
Python interpreters. The YAML document that PyOxy uses is simply deserialized
into a
pyembed::OxidizedPythonInterpreterConfig
Rust struct, which pyembed
uses to spawn a Python interpreter. This Rust struct
offers near complete control over how the embedded Python interpreter behaves: it
even allows you to tweak settings that are impossible to change from environment
variables or python
command arguments! (Beware: this power means you can
easily cause the interpreter to crash if you feed it a bad configuration!)
YAML Based Python Applications
pyoxy run-yaml
ignores all file content before the YAML ---
start document
delimiter. This means that on UNIX-like platforms
you can create executable YAML files defining your Python application. e.g.
$ mkdir -p myapp
$ cat > myapp/__main__.py << EOF
print("hello from myapp")
EOF
$ cat > say_hello <<"EOF"
#!/bin/sh
"exec" "`dirname $0`/pyoxy" run-yaml "$0" -- "$@"
---
interpreter_config:
run_module: 'myapp'
module_search_paths: ["$ORIGIN"]
...
EOF
$ chmod +x say_hello
$ ./say_hello
hello from myapp
This means that to distribute a Python application, you can drop a copy
of pyoxy
in a directory then define an executable YAML file masquerading
as a shell script and you can run Python code with as little as two files!
The Future of PyOxy
PyOxy is very young. I hacked it together on a weekend in September 2021. I wanted to shore up some functionality before releasing it then. But I got perpetually sidetracked and never did the work. I figured it would be better to make a smaller splash with a lesser-baked product now than wait even longer. Anyway...
As part of building PyOxidizer I've built some peripheral technology:
- Standalone and highly distributable Python builds via the python-build-standalone project.
- The pyembed Rust crate for managing an embedded Python interpreter.
- The oxidized_importer Python package/extension for importing modules from memory, among other things.
- The Python packed resources
data format for representing a collection of Python modules and resource
files for efficient loading (by
oxidized_importer
).
I conceived PyOxy as a vehicle to enable people to leverage PyOxidizer's technology without imposing PyOxidizer onto them. I feel that PyOxidizer's broader technology is generally useful and too valuable to be gated behind using PyOxidizer.
PyOxy is only officially released for Linux and macOS for the moment.
It definitely builds on Windows. However, I want to improve the single file
executable experience before officially releasing PyOxy on Windows. This
requires an extensive overhaul to oxidized_importer
and the way it
serializes Python resources to be loaded from memory.
I'd like to add a sub-command to produce a
Python packed resources
payload. With this, you could bundle/distribute a Python application as
pyoxy
plus a file containing your application's packed resources alongside
YAML configuring the Python interpreter. Think of this as a more modern and
faster version of the venerable zipapp
approach. This would enable PyOxy to
satisfy packaging scenarios provided by tools like Shiv, PEX, and XAR.
However, unlike Shiv and PEX, pyoxy
also provides an embedded Python
interpreter, so applications are much more portable since there isn't
reliance on the host machine having a Python interpreter installed.
I'm really keen to see how others want to use pyoxy
.
The YAML based control over the Python interpreter could be super useful for testing, benchmarking, and general Python interpreter configuration experimentation. It essentially opens the door to things previously only possible if you wrote code interfacing with Python's C APIs.
I can also envision tools that hide the existence of Python wanting to
leverage the single file Python distribution property of pyoxy
. For
example, tools like Ansible could copy pyoxy
to a remote machine to provide
a well-defined Python execution environment without having to rely on what
packages are installed. Or pyoxy
could be copied into a container or
other sandboxed/minimal environment to provide a Python interpreter.
And that's PyOxy. I hope you find it useful. Please file any bug reports or feature requests in PyOxidizer's issue tracker.
Announcing the 0.9 Release of PyOxidizer
October 18, 2020 at 10:00 PM | categories: Python, PyOxidizerI have decided to make up for the 6 month lull between PyOxidizer's 0.7 and 0.8 releases by releasing PyOxidizer 0.9 just 1 week after 0.8!
The full 0.9 changelog is found in the docs. First time user? See the Getting Started documentation.
While the 0.9 release is far smaller in terms of features compared to 0.8, it is an important release because of progress closing compatibility gaps.
Build a python
Executable
PyOxidizer 0.8 quietly shipped the ability to build executables that
behave like python
executables via enhancements to the configurability of
embedded Python interpreters.
PyOxidizer 0.9 made some minor changes to make this scenario work better
and there is even
official documentation
on how to achieve this. So now you can emit a python
executable next to your
application's executable. Or you could use PyOxidizer to build a highly portable,
self-contained python
executable and ship your Python scripts next to it,
using PyOxidizer's python
in your #!
.
Support Packaging Files as Files for Maximum Compatibility
There is a long-tail of Python packages that don't just work with PyOxidizer. A subset of these packages don't work because of bugs with how PyOxidizer attempts to classify files as specific types of Python resources.
The way that normal Python works is you materialize a bunch of files on
the filesystem and at run-time the filesystem-based importer stat()
s a
bunch of paths until it finds a candidate file satisfying the import
request. This works of course. But it is inefficient. Since PyOxidizer has
awareness of every resource being packaged at build time, it attempts to
index all known resources and serialize them to an efficient data structure
so finding and loading a resource can be extremely quick (effectively just a
hashmap lookup in Rust code to resolve the memory address of data).
PyOxidizer's approach does work in the majority of cases. But there are
edge cases. For example, NumPy's binary wheels have installed file paths
like numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
. The numpy.libs
directory is not a valid Python package directory since it has a .
and
since it doesn't have an __init__.py[c]
file. This is a case where
PyOxidizer's code for turning files into resources is currently confused.
It is tempting to argue that file layouts like NumPy's are wrong. But there doesn't seem to be any formal specification preventing the use of such layouts. The arbiter of truth here is what Python packaging tools accept and the current code for installing wheels gladly accepts file layouts like these. So I've accepted that PyOxidizer is just going to have to support edge cases like this. (I've captured more details about this particular issue in the docs).
Anyway, PyOxidizer 0.9 ships a new, simpler mode for handling files: files mode. In files mode, PyOxidizer disables its code for classifying files as typed Python resources (like module sources and extension modules) and instead treats a file as... a file.
When in files mode, actions that invoke Python packaging tools return files objects instead of classified resources. If you then add these files for packaging, those files are materialized on the filesystem next to your built executable. You can then use Python's standard filesystem importer to load these files at run-time.
This allows you to use PyOxidizer with packages like NumPy that were previously incompatible due to bugs with file/resource classification. In fact, getting NumPy working with PyOxidizer is now in the official documentation!
Files mode is still in its infancy. There exists code for embedding
files data in the produced executable. I plan to eventually teach PyOxidizer's
run-time code to extract these embedded files to a temporary directory,
SquashFS FUSE filesystem, etc. This is the approach that other Python
packaging tools like PyInstaller and XAR use. While it is less efficient, this
approach is highly compatible with Python code in the wild since you sidestep
issues with __file__
and other assumptions about installed file layouts. So
it makes sense for PyOxidizer to provide support for this so you can still
achieve the friendliness of a self-contained executable without worrying
about compatibility. Look for improvements to files mode in future releases.
And to help debug issues with PyOxidizer's file handling and resource classification, the new pyoxidizer find-resources command can be used to invoke PyOxidizer's code for scanning and classifying files. Hopefully this makes it easier to diagnose bugs in this critical component of PyOxidizer!
Some Important Bug Fixes
PyOxidizer 0.8 shipped with some pretty annoying bugs and behavior quirks.
The ability to set custom sys.path
values via Starlark was broken. How I
managed to ship that, I'm not sure. But it is fixed in 0.9.
Another bug I can't believe I shipped was
the PythonExecutable.read_virtualenv()
Starlark method being broken due to
a typo. You can read from virtualenvs again in PyOxidizer 0.9.
Another important improvement is in the default Python interpreter
configuration. We now automatically initialize Python's locales configuration
by default. Without this, the encoding of filesystem paths and sys.argv
may
not have been correct. If someone passed a non-ASCII argument, the Python str
value was likely mangled. PyOxidizer built binaries should behave reasonably
by default now. The issue
is a good read if the subtle behaviors of how encodings work in Python and on
different operating systems is interesting to you.
Better Binary Portability Documentation
The documentation on binary portability has been overhauled. Hopefully it is much more clear about the capabilities of PyOxidizer to produce a binary that just works on other machines.
I eventually want to get PyOxidizer to a point where users don't have to think about binary portability. But until PyOxidizer starts generating installers and providing the ability to run builds in deterministic and reproducible environments, it is sadly a problem that is being externalized to end users.
In Conclusion
PyOxidizer 0.9 is a small release representing just 1 week of work. But it contains some notable features that I wanted to get out the door.
As always, please report any issues or feedback in the GitHub issue tracker or the users mailing list.
Announcing the 0.8 Release of PyOxidizer
October 12, 2020 at 12:45 AM | categories: Python, PyOxidizerI am very excited to announce the 0.8 release of PyOxidizer, a modern Python application packaging tool. You can find the full changelog in the docs. First time user? See the Getting Started documentation.
Foremost, I apologize that this release took so long to publish (0.7 was released on 2020-04-09). I fervently believe that frequent releases are a healthy software development practice. And 6 months between PyOxidizer releases was way too long. Part of the delay was due to world events (it has proven difficult to focus on... anything given a global pandemic, social unrest, and wildfires further undermining any resemblance of lifestyle normalcy in California). Another contributing factor was I was waiting on a few 3rd party Rust crates to have new versions published to crates.io (you can't release a crate to crates.io unless all your dependencies are also published there).
Release delay and general life hardships aside, the 0.8 release is here and it is full of notable improvements!
Python 3.8 and 3.9 Support
PyOxidizer 0.8 now targets Python 3.8 by default and support for Python 3.9 is available by tweaking configuration files. Previously, we only supported Python 3.7 and this release drops support for Python 3.7. I feel a bit bad for dropping compatibility. But Python 3.8 introduced a new C API for initializing Python interpreters (thank you Victor Stinner!) and this makes PyOxidizer's run-time code for interfacing with Python interpreters vastly simpler. I decided that given the beta nature of PyOxidizer, it wasn't worth maintaining complexity to continue to support Python 3.7. I'm optimistic that I'll be able to support Python 3.8 as a baseline for a while.
Better Default Packaging Settings
PyOxidizer started as a science experiment of sorts to see if I could achieve the elusive goal of producing a single file executable providing a Python application. I was successful in proving this hypothesis. But the cost to achieving this outcome was rather high in terms of end-user experience: in order to produce single file executables, you had to break a lot of assumptions about how Python typically works and this in turn broke a lot of Python code and packages in the wild.
In other words, PyOxidizer's opinionated defaults of producing a single file executable were externalizing hardship on end-users and preventing them from using PyOxidizer.
PyOxidizer 0.8 contains a handful of changes to defaults that should hopefully lessen the friction.
On Windows, the default Python distribution now has a more traditional
build configuration (using .pyd
extension modules and a pythonXY.dll
file). This means that PyOxidizer can consume pre-built extension modules
without having to recompile them from source. If you publish a Windows
binary wheel on PyPI, in many cases it will just work with PyOxidizer
0.8! (There are some notable exceptions to this, such as numpy, which is
doing wonky things with the location of shared libraries in wheels - but
I aim to fix this soon.)
Also on Windows, we no longer attempt to embed Python extension modules
(.pyd
files) and their shared library dependencies in the produced
binary and load them from memory by default. This is because PyOxidizer's
from-memory library loader didn't work in all cases. For example, some
OpenSSL functionality used by the _ssl
module in the standard library
didn't work, preventing Python from establishing TLS connections. The old
mode enabling you to produce a single file executable on Windows is still
available. But you have to opt in to it (at the likely cost of more
packaging and compatibility pain).
Starlark Configuration Overhaul
PyOxidizer 0.8 contains a ton of changes to its Starlark configuration files. There are so many changes that you may find it easier to port to PyOxidizer 0.8 by creating a new configuration file rather than attempting to port an existing one.
I apologize for this churn and recognize it will be disruptive. However, this churn needed to happen for various reasons.
Much of the old Starlark configuration semantics was rooted in the days when configuration files were static TOML files. Now that configuration files provide the power of a (Python-inspired) programming language, we are free to expose much more flexibility. But that flexibility requires refactoring things so the experience feels more native.
Many changes to Starlark were rooted in necessity. For example,
the methods for invoking setup.py
or pip install
used to live on a
Python distribution type
and have been moved to a
type representing executables.
This is because the binary we are targeting influences how
packaging actions behave. For example, if the binary only supports
loading resources from memory (as opposed to standalone files), we need
to know that when invoking the packaging tool so we can produce files
(notably Python extension modules) compatible with the destination.
A major change to Starlark in 0.8 is around resource location handling. Before, you could define a static string denoting the resources policy for where things should be placed. And there were 10+ methods for adding different resource types (source, bytecode, extensions, package data) to different load locations (memory, filesystem). This mechanism is vastly simplified and more powerful in PyOxidizer 0.8!
In PyOxidizer 0.8, there is a single add_python_resource() method for adding a resource to a binary and the Starlark objects you add can denote where they should be added by defining attributes on those objects.
Furthermore, you can define a Starlark function that is called when resource objects are created to apply custom packaging rules using custom Starlark code defined in your PyOxidizer config file. So rather than having everyone try to abide by a few pre-canned policies for packaging resources, you can define a proper function in your config file that can be as complex as you want/need it to be! I feel this is vastly simpler and more powerful than implementing a custom DSL in static configuration files (like TOML, JSON, YAML, etc).
While the ability to implement your own arbitrarily complex packaging policies is useful, there is a new PythonPackagingPolicy Starlark type with enough flexibility to suit most needs.
Shipping oxidized_importer
During the development of PyOxidizer 0.8, I broke out the custom
Rust-based Python meta-path importer used by PyOxidizer's run-time code
into a standalone Python package. This sub-project is called
oxidized_importer
and I previously
blogged about it.
PyOxidizer 0.8 ships oxidized_importer
and all of its useful APIs
available to Python. Read more in the
official docs.
The new Python APIs should make debugging issues with PyOxidizer-packaged
applications vastly simpler: I found them invaluable when tracking down
user-reported bugs!
Tons of New Tests and Refactored Code
PyOxidizer was my first non-toy Rust project. And the quality of the Rust code I produced in early versions of PyOxidizer clearly showed it. And when I was in the rapid-prototyping phase of PyOxidizer, I eschewed writing tests in favor of short-term progress.
PyOxidizer 0.8 pays down a ton of technical debt in the code base. Lots of Rust code has been refactored and is using somewhat reasonable practices. I'm not yet a Rust guru. But I'm at the point where I cringe when I look at some of the early code I wrote, which is a good sign. I do have to say that Rust has been a dream to work with during this transition. Despite being a low-level language, my early misuse of Rust did not result in crashes like you would see in languages like C/C++. And Rust's seemingly omniscient compiler and IDE tools facilitating refactoring have ensured that code changes aren't accompanied by subtle random bugs that would occur in dynamic programming languages. I really need to write a dedicated post espousing the virtues of Rust...
There are a ton of new tests in PyOxidizer 0.8 and I now feel somewhat
confident that the main
branch of PyOxidizer should be considered
production-ready at any time assuming the tests pass. This will hopefully
lead to more rapid releases in the future.
There are now tests for the pyembed
Rust crate, which provides the
run-time code for PyOxidizer-built binaries. We even have
Python-based unit tests
for validating the Python-exposed APIs behave as expected. These tests have
been invaluable for ensuring that the run-time code works as expected. So now
when someone files a bug I can easily write a test to capture it and keep
the code working as intended through various refactors.
The packaging-time Rust code has also gained its fair share of tests. We now have fairly comprehensive test coverage around how resources are added/packaged. Python extension modules have proved to be highly nuanced in how they are handled. Tremendously helping testing of extension modules is that we're able to run tests for platform non-native extensions! While not yet exposed/supported by Starlark configuration files, I've taught PyOxidizer's core Rust code to be cross-compiling aware so that we can e.g. test Windows or macOS behavior from Linux. Before, I'd have to test Windows wheel handling on Windows. But after writing a wheel parser in Rust and teaching PyOxidizer to use a different Python distribution for the host architecture from the target architecture, I'm now able to write tests for platform-specific functionality that run on any platform that PyOxidizer can run on. This may eventually lead to proper cross-compiling support (at least in some configuration). Time will tell. But the foundation is definitely there!
New Rust Crates
As part of the aforementioned refactoring of PyOxidizer's Rust code, I've been extracting some useful/generic functionality built as part of developing PyOxidizer to their own Rust crates.
As part of this release, I'm publishing the initial 0.1 release of the python-packaging crate (docs). This crate provides pure Rust code for various Python packaging related functionality. This includes:
- Rust types representing Python resource types (source modules, bytecode modules, extension modules, package resources, etc).
- Scanning the filesystem for Python resource files .
- Configuring an embedded Python interpreter.
- Parsing
PKG-INFO
and related files. - Parsing wheel files.
- Collecting Python resources and serializing them to a data structure.
The crate is somewhat PyOxidizer centric. But if others are interested in improving its utility, I'll happily accept pull requests!
PyOxidizer's crates footprint now includes:
Major Documentation Updates
I strongly believe that software should be documented thoroughly and I strive for PyOxidizer's documentation to be useful and comprehensive.
There have been a lot of changes to PyOxidizer's documentation since the 0.7 release.
All configuration file documentation has been consolidated.
Likewise, I've attempted to consolidate a lot of the paved road documentation for how to use PyOxidizer in the Packaging User Guide section of the docs.
I'll be honest, since I have so much of PyOxidizer's workings internalized, it can be difficult for me to empathize with PyOxidizer's users. So if you have difficult with the readability of the documentation, please file an issue and report what is confusing so the documentation can be improved!
Mercurial Shipping With PyOxidizer 0.8
PyOxidizer is arguably an epic yak shave of mine to help the Mercurial version control tool transition to Python 3 and Rust.
I'm pleased to report that Mercurial is now shipping PyOxidizer-built distributions on Windows as of the 5.2.2 release a few days ago! If a complex Python application like Mercurial can be configured to work with PyOxidizer, chances are your Python application will work as well.
Whats Next
I view PyOxidizer 0.8 as a pivotal release where PyOxidizer is turning the corner from a prototyping science experiment to something more generally usable. The investments in test coverage and refactoring of the Rust internals are paving the way towards future features and bug fixes.
In upcoming releases, I'd like to close remaining known compatibility gaps with popular Python packages (such as numpy and other packages in the scientific/data space). I have a general idea of what work needs to be done and I've been laying the ground work via various refactorings to execute here.
I want a general theme of future releases to be eliminating reasons why people can't use PyOxidizer. PyOxidizer's historical origin was as a science experiment to see if single file Python applications were possible. It is clear that achieving this is fundamentally incompatible with compatibility with tons of Python packages in the wild. I'd like to find a way where PyOxidizer can achieve 99% package compatibility by default so new users don't get discouraged when using PyOxidizer. And for the subset of users who want single file executables, they can spend the magnitude of additional effort to achieve that.
At some point, I also want to make a pivot towards focusing on producing distributable artifacts (Debian/RPM packages, MSI installers, macOS DMG files, etc). I'm slightly bummed that I haven't made much progress here. But I have a vision in my mind of where I want to go (I'll be making a standalone Rust crate + Starlark dialect to facilitate producing distributable artifacts for any application) and I'm anticipating starting this work in the next few months. In the mean time, PyOxidizer 0.8 should be able to give people a directory tree that they can coerce into distributable artifacts using existing packaging tooling. That's not as turnkey as I would like it to be. But the technical problems around building a distributable Python application binary still needs some work and I view that as the most pressing need for the Python ecosystem. So I'll continue to focus there so there is a solid foundation to build upon.
In conclusion, I hope you enjoy the new release! Please report any issues or feedback in the GitHub issue tracker.
Using Rust to Power Python Importing With oxidized_importer
May 10, 2020 at 01:15 PM | categories: Python, PyOxidizerI'm pleased to announce the availability of the oxidized_importer
Python package, a standalone version of the custom Python module importer
used by PyOxidizer.
oxidized_importer
- a Python extension module implemented in Rust - enables
Python applications to start and run quicker by providing an alternate,
more efficient mechanism for loading Python resources (such as source
and bytecode modules).
Installation instructions and detailed usage information are available in the official documentation. The rest of this post hopefully answers the questions of why are you doing this and why should I care.
In a traditional Python process, Python's module importer inspects the filesystem at run-time to find and load resources like Python source and bytecode modules. It is highly dynamic in nature and relies on the filesystem as a point-in-time source of truth for resource availability.
oxidized_importer
takes a different approach to resource loading that is
more static in nature and more suitable to application environments (where
Python resources aren't changing). Instead of dynamically probing the
filesystem for available resources, resources are instead indexed ahead
of time. When Python goes to resolve a resource (say it is looking to
import
a module), oxidized_importer
simply needs to perform a lookup
in an in-memory data structure to locate said resource. This means
oxidized_importer
only has marginal reliance on the filesystem, which
can make it much faster than Python's traditional importer. (Performance
benefits of binaries built with PyOxidizer have already been
clearly demonstrated.)
The oxidized_importer
Python extension module exposes parts of
PyOxidizer's packaging and run-time functionality to Python code, without
requiring the full use of PyOxidizer for application packaging.
Specifically, oxidized_importer
allows you to:
- Install a custom, high-performance module importer
(OxidizedFinder)
to service Python
import
statements and resource loading (potentially from memory, using zero-copy). - Scan the filesystem for Python resources
(source modules, bytecode files, package resources, distribution metadata,
etc) and turn them into Python objects, which can be loaded into
OxidizedFinder
instances. - Serialize Python resource data into an efficient binary data structure
for loading into an
OxidizedFinder
instance. This facilitates producing a standalone resources blob that can be distributed with a Python application which contains all the Python modules, bytecode, etc required to power that application. See the docs on freezing an application with oxidized_importer.
oxidized_importer
can be thought of as PyOxidizer-lite: it provides just
enough functionality to allow Python application maintainers to leverage some
of the technical advancements of PyOxidizer (such as in-memory module imports)
without using PyOxidizer for application packaging. oxidized_importer
can
work with the Python distribution already installed on your system. You just
pip install
it like any other Python package.
By releasing oxidized_importer
as a standalone Python package, my hope is
to allow more people to leverage some of the technical achievements and
performance benefits coming out of PyOxidizer. I also hope that having more
users of PyOxidizer's underlying code will help uncover bugs and conformance
issues, raising the quality and viability of the projects.
I would also like to use oxidized_importer
as an opportunity to advance the
discourse around Python's resource loading mechanism. Filesystem I/O can be
extremely slow, especially in mobile and embedded environments. Dynamically
probing the filesystem to service module imports can therefore be slow. (The
Python standard library has the zipimport
module for importing Python resources
from a zip file. But in my opinion, we can do much better.) I would like to
see Python move towards leveraging immutable, serialized data structures for
loading resources as efficiently as possible. After all, Python resources
like the Python standard library are likely not changing between Python process
invocations. The performance zealot in me cringes thinking of all the overhead
that Python's filesystem probing approach incurs - all of the excessive stat()
and other filesystem I/O calls that must be performed to answer questions about
state that is easily indexed and often doesn't change. oxidized_importer
represents my vision for what a high-performance Python resource loader should
look like. I hope it can be successful in steering Python towards a better
approach for resource loading.
I plan to release oxidized_importer
independently from PyOxidizer
. While
the projects will continue to be developed in the same
repository and will leverage the
same underlying Rust code, I view them as somewhat independent and serving
different audiences.
While oxidized_importer
evolved from facilitating PyOxidizer's run-time use
cases, I'm not opposed to taking it in new directions. For example, I would
entertain implementing Python's dynamic filesystem probing logic in
oxidized_importer
, allowing it to serve as a functional stand-in for the
official importer shipped with the Python standard library. I have little
doubt an importer implemented in 100% Rust would outperform the official
importer, which is implemented in Python. There's all kinds of possibilities
here, such as using a background thread to index sys.path
outside the
constraints of the GIL. But I don't want to get ahead of myself...
If you are a Python application maintainer and want to make your Python
processes execute a bit faster by leveraging a pre-built index of available
Python resources and/or taking advantage of in-memory module importing,
I highly encourage you to take a look at oxidized_importer
!
PyOxidizer 0.7
April 09, 2020 at 09:00 PM | categories: Python, PyOxidizerI am very pleased to announce the 0.7 release of PyOxidizer, a modern Python application packaging tool.
There are a host of notable new features in this release. You can read all about them in the project history.
I want to use this blog post to call out the more meaningful ones.
I started PyOxidizer as a science experiment of sorts: I sat out to prove the hypothesis that it was possible to produce high performance single file executables embedding Python and all of its resources (Python modules, non-module resource files, compiled extensions, etc). PyOxidizer has achieved this on Windows, Linux, and macOS since its very earliest releases. Hypothesis confirmed!
In order to actually achieve single file executables, you have to
fundamentally change aspects of Python's behavior. Some of these
changes invalidate deeply rooted assumptions about how Python works,
such as the existence of __file__
in modules. As you can imagine,
these broken assumptions translated to numerous compatibility issues
and PyOxidizer didn't work with many popular Python packages.
With the science experiment phase of PyOxidizer out of the way, I have been making a concerted effort to broaden the user base of PyOxidizer. While single file executables can be an amazing property, it isn't critical for many use cases and the issues it was causing were preventing people from exploring PyOxidizer.
This brings us to what I think are the major new features in PyOxidizer 0.7.
Better Support for Loading Extension Modules
Earlier versions of PyOxidizer insisted that you compile Python (C) extension modules from source and statically link them into a produced binary. This requirement prevented the use of pre-built extension modules (commonly found in Python binary wheels available on PyPI) with PyOxidizer, forcing people to compile them locally. While this often just worked for many extension modules, it frequently failed on complex extension modules and it frequently failed on Windows.
PyOxidizer now supports loading compiled extension modules from
standalone files (typically .so
or .pyd
files, which are actually
shared libraries). There are still some sharp edges and known
deficiencies. But in many cases, if you tell PyOxidizer to run
pip install
and package the result, pre-built wheels can be
installed and PyOxidizer will pick up the standalone files.
On Windows, PyOxidizer even supports embedding the shared library
data into the produced .exe
and loading the .pyd
/DLL directly
from memory.
Loading Resources from the Filesystem
Binaries built with PyOxidizer contain a blob holding an index of available Python resources along with their data.
Earlier versions of PyOxidizer only allowed you to define resources as in-memory. If the resource was defined in this blob, it was imported from memory. Otherwise it wasn't known to PyOxidizer. You could still install files next to the produced binary and tell PyOxidizer to enable Python's default filesystem-based importer. But PyOxidizer didn't explicitly know about these files on the filesystem.
In PyOxidizer 0.7, the blob index of Python resources is able to express different locations for that resource. Currently, a resource can have its data made available in-memory or filesystem-relative. in-memory works as before: the raw data is embedded next to the next in memory and loaded from there (using 0-copy). filesystem-relative encodes a filesystem path to the resource. During packaging, PyOxidizer will place the resource next to the executable (using a typical Python file layout scheme) and store the relative path to that resource in the resources index.
The filesystem-relative resource indexing feature has a few implications for PyOxidizer.
First, it is more standard. When PyOxidizer loads a Python
module from the filesystem, it sets __file__
, __path__
,
etc and the module semantics should behave as if the file
were imported by Python's standard importer. This means that
if a package is having issues with in-memory importing, you
can simply fall back to filesystem-relative to get standard
Python behavior and everything should just work.
Second, PyOxidizer's filesystem resource loading is faster
than Python's! When Python's standard importer goes to
import
a module, it needs to stat()
various paths to
first locate the file. It then performs some sanity checking
and other minor actions before actually importing the module.
All of this has overhead. Since the goal of PyOxidizer is
to produce standalone applications and applications should
be immutable, PyOxidizer can avoid most of this overhead.
PyOxidizer simply tries to open()
and read()
the relative
path baked into the resource index at build time. If that
works, the resource is loaded. Else there is a failure.
The code path in PyOxidizer to locate a Python resource
is effectively a lookup in a Rust HashMap<&str, T>
.
I thought it would be interesting to isolate the performance
benefits of this new feature. I ran Mercurial's test harness
with different variants of hg
on Linux on my Ryzen 3950X.
- traditional - A
hg
script with a#!/path/to/python3.7
shebang. - oxidized - A
hg
executable built with PyOxidizer, without PyOxidizer's custom module importer. - filesystem - A
hg
executable built with PyOxidizer using the new filesystem-relative resource index. - in-memory - A
hg
executable built with PyOxidizer with all resources loaded from memory (how PyOxidizer has traditionally worked).
The results are quite clear:
Variant | CPU Time (s) | Delta (s) | % Orig |
---|---|---|---|
traditional | 11,287 | -552 | 100 |
oxidized | 10,735 | -552 | 95.1 |
filesystem | 10,186 | -1,101 | 90.2 |
in-memory | 9,883 | -1,404 | 87.6 |
We see a nice win just from using a native executable built with PyOxidizer (traditional to oxidized).
Then from oxidized to filesystem we see another jump of ~5%. This difference is attributed to using PyOxidizer's Rust-powered importer with an index of resources available on the filesystem. In other words, all that work that Python's standard importer is doing to discover files and then operate on them is non-trivial!
Finally, the smaller jump from filesystem to in-memory isolates the benefits of importing resource data from memory instead of involving filesystem I/O. (Filesystems are generally slow.) While I haven't measured explicitly, I hypothesize that macOS and Windows will see a bigger jump between these two variants, as the filesystem performance on these platforms generally isn't as good as it is on Linux.
PyOxidizer's Future
With PyOxidizer now supporting a couple of much-needed features to support a broader set of users, I'm hoping that future releases of PyOxidizer continue to broaden the utility of PyOxidizer.
The over-arching goal of PyOxidizer is to solve large aspects of the Python application packaging and distribution problem. So far a lot of focus has been spent on the former. PyOxidizer in its current form can materialize files on the filesystem that you can copy or package up manually and distribute. But I want these processes to be part of PyOxidizer: I want it to be possible for PyOxidizer to emit a Windows MSI installer, a macOS dmg, a Debian package, etc for a Python application.
In order to support the aforementioned marquee features of this PyOxidizer release, I had to pay down a lot of technical debt in the code base left over from the science experiment phase of PyOxidizer's inception.
In the short term, I plan to continue shoring up the code base
and rounding out support for features requested in the
issue tracker on GitHub. The next release of PyOxidizer will
also likely require
Python 3.8, as this will improve run-time control over the
embedded Python interpreter and enable PyOxidizer to better
support package metadata (importlib.metadata
), enabling
support for features like entry points.
I've also been thinking about extracting PyOxidizer's custom
module importer to be usable as a standalone Python extension
module. I think there's some value in publishing a
pyoxidizer_importer
package on PyPI that you can easily
add to your installed packages to speed up Python's
standard filesystem importer by a few percent. If nothing else,
this may drum up interest in the larger Python community for
standardizing a format for serializing Python resources in a
single file. Perhaps we can get other Python packaging tools
producing the same
packed resources data
blob that PyOxidizer uses so we can all standardize on a
more efficient mechanism for loading Python modules. Time
will tell.
Enjoy the new release. File issues at https://github.com/indygreg/PyOxidizer as you encounter them.
Next Page ยป