Using Rust to Power Python Importing With oxidized_importer
May 10, 2020 at 01:15 PM | categories: Python, PyOxidizerI'm pleased to announce the availability of the oxidized_importer
Python package, a standalone version of the custom Python module importer
used by PyOxidizer.
oxidized_importer
- a Python extension module implemented in Rust - enables
Python applications to start and run quicker by providing an alternate,
more efficient mechanism for loading Python resources (such as source
and bytecode modules).
Installation instructions and detailed usage information are available in the official documentation. The rest of this post hopefully answers the questions of why are you doing this and why should I care.
In a traditional Python process, Python's module importer inspects the filesystem at run-time to find and load resources like Python source and bytecode modules. It is highly dynamic in nature and relies on the filesystem as a point-in-time source of truth for resource availability.
oxidized_importer
takes a different approach to resource loading that is
more static in nature and more suitable to application environments (where
Python resources aren't changing). Instead of dynamically probing the
filesystem for available resources, resources are instead indexed ahead
of time. When Python goes to resolve a resource (say it is looking to
import
a module), oxidized_importer
simply needs to perform a lookup
in an in-memory data structure to locate said resource. This means
oxidized_importer
only has marginal reliance on the filesystem, which
can make it much faster than Python's traditional importer. (Performance
benefits of binaries built with PyOxidizer have already been
clearly demonstrated.)
The oxidized_importer
Python extension module exposes parts of
PyOxidizer's packaging and run-time functionality to Python code, without
requiring the full use of PyOxidizer for application packaging.
Specifically, oxidized_importer
allows you to:
- Install a custom, high-performance module importer
(OxidizedFinder)
to service Python
import
statements and resource loading (potentially from memory, using zero-copy). - Scan the filesystem for Python resources
(source modules, bytecode files, package resources, distribution metadata,
etc) and turn them into Python objects, which can be loaded into
OxidizedFinder
instances. - Serialize Python resource data into an efficient binary data structure
for loading into an
OxidizedFinder
instance. This facilitates producing a standalone resources blob that can be distributed with a Python application which contains all the Python modules, bytecode, etc required to power that application. See the docs on freezing an application with oxidized_importer.
oxidized_importer
can be thought of as PyOxidizer-lite: it provides just
enough functionality to allow Python application maintainers to leverage some
of the technical advancements of PyOxidizer (such as in-memory module imports)
without using PyOxidizer for application packaging. oxidized_importer
can
work with the Python distribution already installed on your system. You just
pip install
it like any other Python package.
By releasing oxidized_importer
as a standalone Python package, my hope is
to allow more people to leverage some of the technical achievements and
performance benefits coming out of PyOxidizer. I also hope that having more
users of PyOxidizer's underlying code will help uncover bugs and conformance
issues, raising the quality and viability of the projects.
I would also like to use oxidized_importer
as an opportunity to advance the
discourse around Python's resource loading mechanism. Filesystem I/O can be
extremely slow, especially in mobile and embedded environments. Dynamically
probing the filesystem to service module imports can therefore be slow. (The
Python standard library has the zipimport
module for importing Python resources
from a zip file. But in my opinion, we can do much better.) I would like to
see Python move towards leveraging immutable, serialized data structures for
loading resources as efficiently as possible. After all, Python resources
like the Python standard library are likely not changing between Python process
invocations. The performance zealot in me cringes thinking of all the overhead
that Python's filesystem probing approach incurs - all of the excessive stat()
and other filesystem I/O calls that must be performed to answer questions about
state that is easily indexed and often doesn't change. oxidized_importer
represents my vision for what a high-performance Python resource loader should
look like. I hope it can be successful in steering Python towards a better
approach for resource loading.
I plan to release oxidized_importer
independently from PyOxidizer
. While
the projects will continue to be developed in the same
repository and will leverage the
same underlying Rust code, I view them as somewhat independent and serving
different audiences.
While oxidized_importer
evolved from facilitating PyOxidizer's run-time use
cases, I'm not opposed to taking it in new directions. For example, I would
entertain implementing Python's dynamic filesystem probing logic in
oxidized_importer
, allowing it to serve as a functional stand-in for the
official importer shipped with the Python standard library. I have little
doubt an importer implemented in 100% Rust would outperform the official
importer, which is implemented in Python. There's all kinds of possibilities
here, such as using a background thread to index sys.path
outside the
constraints of the GIL. But I don't want to get ahead of myself...
If you are a Python application maintainer and want to make your Python
processes execute a bit faster by leveraging a pre-built index of available
Python resources and/or taking advantage of in-memory module importing,
I highly encourage you to take a look at oxidized_importer
!