Building Standalone Python Applications with PyOxidizer
June 24, 2019 at 09:00 AM | categories: Python, PyOxidizer, RustPython application distribution is generally considered an unsolved problem. At their PyCon 2019 keynote talk, Russel Keith-Magee identified code distribution as a potential black swan - an existential threat for longevity - for Python. In their words, Python hasn't ever had a consistent story for how I give my code to someone else, especially if that someone else isn't a developer and just wants to use my application. I completely agree. And I want to add my opinion that unless your target user is a Python developer, they shouldn't need to know anything about Python packaging, Python itself, or even the existence of Python in order to use your application. (And you can replace Python in the previous sentence with any programming language or software technology: most end-users don't care about the technical implementation, they just want to get stuff done.)
Today, I'm excited to announce the first release of PyOxidizer (project, documentation), an open source utility that aims to solve the Python application distribution problem! (The installation instructions are in the docs.)
Standalone Single File, No Dependencies Executable Python Applications
PyOxidizer's marquee feature is that it can produce a single file
executable containing a fully-featured Python interpreter, its
extensions, standard library, and your application's modules and
resources. In other words, you can have a single .exe
providing
your application. And unlike other tools in this space which tend to
be operating system specific, PyOxidizer works across platforms
(currently Windows, macOS, and Linux - the most popular platforms for
Python today). Executables built with PyOxidizer have minimal
dependencies on the host environment nor do they do anything
complicated at run-time. I believe PyOxidizer is the only open
source tool to have all these attributes.
On Linux, it is possible to build a fully statically linked executable. You can drop this executable into a chroot or container where it is the only file and it will just work. On macOS and Windows, the only library dependencies are on always-present or extremely common libraries. More details are in the docs.
At execution time, binaries built with PyOxidizer do not do anything
special to run the Python interpreter. (Other tools in this space do
things like create a temporary directory or SquashFS filesystem and
extract Python to it.) PyOxidizer loads everything from memory and there
is no explicit I/O being performed. When you import
a Python module,
the bytecode for that module is being loaded from a memory address in
the executable using zero-copy. This makes PyOxidizer executables
faster
to start and import
- faster than a python
executable itself!
Current Release and Future Roadmap
Today's release of PyOxidizer is just the first release milestone in what I envision is a long and successful project history. While my over-arching goal with PyOxidizer is to solve vast swaths of the Python application distribution problem, I want to be clear that this first release comes nowhere close to doing so. I toiled with what features must be in the initial release. I ultimately decided that PyOxidizer's current functionality is extremely valuable to some audiences and that the project has matured to the point where more eyeballs and users would substantially help its development. (I could definitely use some help prioritizing which features to work on and for that I need users and user feedback.)
In today's release, PyOxidizer is good at producing executables embedding Python. It doesn't yet venture too far into the distribution part of the problem (I want it to be trivial to produce MSI installers, DMG images, deb/rpm packages, etc). But on Linux, this is already a huge step forward because PyOxidizer makes it easy (hopefully!) to produce binaries that should just work on other machines. (Anyone who has attempted to distribute Linux applications will tell you how painful this problem can be.)
Despite its limitations, I believe today's release of PyOxidizer to be a viable tool for some applications. And I believe PyOxidizer can start to replace existing tools in this space. (See the Comparisons to Other Tools document for how PyOxidizer compares to other Python packaging and distribution tools.)
Using today's release of PyOxidizer, larger user-facing applications using Python (like Dropbox, Kodi, MusicBrainz Picard, etc) could use PyOxidizer to produce self-contained executables. This would likely cut down on installer size, decrease install/update time (fewer files means faster operations), and hopefully make packaging simpler for application maintainers. Maintainers of Python utilities could produce self-contained executables, making their utilities faster to start and easier to package and distribute.
New Possibilities and Reliability for Python
By enabling support for self-contained, single file Python applications, PyOxidizer opens exciting new doors for Python. Because Python has historically required an explicit, separate runtime not part of the executable, Python was not viable (or was a hinderance) in many domains. For example, if you wanted to use Python to bootstrap a fresh server or empty container environment, you had a chicken-and-egg problem because you needed to install Python before you could use it.
Let's take Ansible for example. One of Ansible's features is that it remotes into a machine and runs things. The way it does this is it dynamically generates Python scripts locally, uploads them to the remote machine, and tells the remote to execute them. Those Python scripts require the existence of a Python interpreter on the remote machine. This means you need to install Python on a machine before you can control it with Ansible. Furthermore, because the remote's Python isn't under Ansible's control, you can assume very little about its behavior and capabilities, making interaction a bit brittle.
Using PyOxidizer, projects like Ansible could produce a self-contained executable containing a Python interpreter. They could transfer that single binary to the remote machine and execute it, instantly giving the remote machine access to a fully-featured and modern Python interpreter. From there, the sky is the limit. In Ansible's case, the executable could contain the full Ansible runtime, along with any 3rd party Python packages they wanted to leverage. This would allow execution to occur (possibly mostly independently) on the remote machine. This architecture is simpler, scales better, would likely result in faster operations, and would probably improve the quality of life for everyone involved, from application developers to its end users.
Self-contained Python applications built with PyOxidizer essentially
solve the Python interpreter bootstrapping and reliability problems.
By providing a Python interpreter and a known set of Python modules, you
provide a highly deterministic and reliable execution environment for your
application. You don't need to fret about which version of Python
is installed: you know which version of Python you are using.
You don't need to worry about which Python packages are installed:
you control explicitly which packages are available. You don't need to
worry about whether you are running in a virtualenv, what sys.path
is set to, whether .pth
files come into play, whether various
PYTHON*
environment variables can mess up your application, whether
some Linux distribution packaged Python differently, what to put in your
script's shebang, etc: executables built with PyOxidizer behave as
you have instructed them to because they are compiled that way.
All of the concerns in the previous paragraph contribute to a larger problem in the eyes of application maintainers that can be summarized as Python isn't reliable. And because Python isn't reliable, many people reach the conclusion that Python shouldn't be used (this is the black swan that was referred to earlier). With PyOxidizer, the Python environment is isolated and highly deterministic making the reliability problem largely go away. This makes Python a more viable technology choice. And it enables application maintainers to aggressively adopt modern Python versions, utilize third party packages fearlessly, and spend far less time chasing an extremely long tail of issues related to Python environment variance. Succinctly, application developers can focus on building great applications instead of toiling with Python environment problems.
Project Status
PyOxidizer is still in its relative infancy. While it is far from feature complete, I'm mentally committed to working on the remaining major functionality. The Status document lists major missing functionality, lesser missing functionality, and potential future value-add functionality.
I want PyOxidizer to provide a Python application packaging and distribution experience that just works with minimal cognitive effort from Python application maintainers. I have spent a lot of effort documenting PyOxidizer. I care passionately about user experience and want everything about PyOxidizer to be simple and frustration free. I know things aren't there yet. The problems that PyOxidizer is attempting to solve are hard (that's a reason nobody has solved them well yet). I know there's details floating around in my head that haven't been added to the documentation yet. I know there's missing features and bugs in PyOxidizer. I know there are Packaging Pitfalls yet to be discovered.
This is where you come in.
I need your help to make PyOxidizer great. I encourage Python application maintainers reading this to head over to Getting Started and the Packaging User Guide and try to package your applications with PyOxidizer. If things don't work, let me know by filing an issue. If you are confused by lack of or unclear documentation, file an issue. If something frustrates you, file an issue. If you want to suggest I work on a certain feature or fix a bug, file an issue! Tweet to @indygreg to engage with me there. Join the pyoxidizer-users mailing list. While I feel PyOxidizer is usable today (that's why I'm announcing it), I need your feedback to help guide future prioritization.
Finally, I know PyOxidizer has significant implications for some companies and projects that use Python. While I'm not looking to enrich myself or make my livelihood from PyOxidizer, if PyOxidizer is useful to you and you'd like to send money my way as appreciation, you can do so on Patreon or PayPal. If not, that's totally fine: I wouldn't be making PyOxidizer open source if I didn't want to share it with the world for free! And I am financially well off as well. I just feel like there should be more financial contribution to open source because it would improve the health of the ecosystem and I can help achieve that end by advocating for it and giving myself.
Leveraging Rust
The oxidize part of PyOxidizer comes from Rust (See the Wikipedia Rust article - for the chemical not the programming language - to understand where oxidize comes from.) The build time packaging and building functionality is implemented in Rust. And the binary that embeds and controls the Python interpreter in built applications is Rust code. Rationale for these decisions is explained in the FAQ.
This is my first non-toy project using Rust and I have to say that Rust is... incredible! I may have to author a dedicated blog post extolling the virtues of Rust. In short, Rust is now my go-to language for systems level projects. Unless you need the target platform versatility, I don't think C or C++ are defensibles choices in 2019 given their security deficiencies. Languages like Go, Java, and various JVM or CLR languages are acceptable if you can tolerate having a garbage collector and/or a larger runtime. But what makes Rust superior in my mind is the ability for the compiler to prevent large classes of software bugs (especially those that turn into CVEs) and inefficiencies that have plagued our industry for decades. Rust is the first programming language I've used where I feel like the language itself, the compiler, the tools around it (cargo, rustfmt, clippy, rustup, etc), and the community surrounding it all actually care about and assist me with writing high quality software. Nothing else I've used comes even close.
What I've been most surprised about Rust is how high level it feels
for a systems level language that isn't garbage collected. When you
program lower-level languages like C or C++, compared to a higher level
language like Python, you have to type a lot more and be more explicit
in nearly everything you do. While Rust is certainly not as expressive
or compact as say Python, it is far, far closer to Python than I was
expecting it to be. Yes, you do have to type more and think more about
your code to appease the Rust compiler's constraints. But the return
on that investment is the compiler preventing entire classes of bugs
and C/C++ levels of performance. When I started PyOxidizer, the build
time logic was implemented in Python and only the run-time pieces were
in Rust. After learning a bit more Rust and realizing the obvious
code quality benefits, I ditched Python and adopted Rust for the build
time logic. And as the code base has grown and gone through various
refactorings, I am so glad I did so! The Rust compiler has caught
dozens of would-be bugs in Python. Granted, many of these can be
attributed to having strong typing and compile time type checking and
Rust is little different than say Java on this front. But a significant
number of prevented bugs covered invariants in the code because of the
way Rust's type system often intersects with control flow. e.g. match
arms must be exhaustive, so you can't have unhandled values/types and
unchecked Result
instances result ina compiler warning. And clippy
has been just fantastic helping to guide me towards writing more
acceptable code following community accepted best practices.
Even though PyOxidizer is implemented in Rust, most end-users shouldn't
have to care (beyond having to install a Rust compiler and build
PyOxidizer from source). The existence of Rust should be abstracted
away from Python packagers. I did this on purpose because I believe that
users of an application shouldn't have to care about the technical
implementation of that application. It is a bit unfortunate that I
force users to install Rust before using PyOxidizer, but in my defense
the target audience is technically savvy developers, bootstrapping Rust
is easy, and PyOxidizer is young, so I think it is acceptble for now.
If people get hung up on it, I can provide pre-compiled pyoxidizer
executables.
But if you do know Rust, PyOxidizer being implemented in Rust opens up some exciting possibilities!
One exciting possibility with PyOxidizer is the ability to add Rust
code to your Python application. PyOxidizer works by generating a
default Rust application (main.rs
) that simply instantiates and runs
an embedded Python interpreter then exits. It essentially does what
python
or a Python script would do. The key takeaway here is your
Python application is technically a Rust application (in the same
way that python
is technically a C application). And being a
Rust application means you can add Rust code to that application. You
can modify the autogenerated main.rs
to do things before, during, and
after the embedded Python interpreter runs. It's a regular Rust program
and can do anything that Rust programs can do!
Another possibility - and variant of above - is embedding Python in existing Rust projects. PyOxidizer's mechanism for embedding a Python interpreter is implemented as a standalone Rust crate. One can add the pyembed crate to an existing Rust project and a little of build system magic later, your Rust project can now embed and run a Python interpreter!
There's a lot of potential for hybrid Rust + Python programs. And I am very excited about the possibilities.
If you are a Rust programmer, PyOxidizer allows you to easily embed
Python in your Rust application. If you are a Python programmer,
PyOxidizer allows your to easily leverage Rust in your Python
application. In short, the package ecosystem of the other becomes
available to you. And if you aren't familiar with Rust, there are some
potentially crazy possibilities. For example,
Alacritty is a GPU accelerated
terminal emulator written in Rust and
Servo is an entire web browser
engine written in Rust. With PyOxidizer, you could integrate a
terminal emulator or browser engine as part of your Python
application if you really wanted to. And, yes, Rust's packaging
tools are so good that stuff like this tends to just work. As
a concrete example, the pyoxidizer
CLI tool contains libgit2
for performing in-process interactions with Git repositories. Adding
this required a single line change to a Cargo.toml
file and it
just worked on Linux, macOS, and Windows. Stuff like this often
takes hours to days to integrate in C/C++. It is quite ridiculous
how easy it is to add (complex) components to Rust projects!
For years, Python projects have implemented extensions in C to realize performance wins. If your Python application is a Rust executable, then implementing this functionality in Rust (rather than C) seems rationale. So we may see oxidized Python applications have their performance critical pieces slowly be rewritten in Rust. (Honestly, the Rust crates to interface between Rust and the CPython API still leave a bit to be desired, so the experience of writing this Rust code still isn't great. But things will certainly improve over time.)
This type of inside-out split language work has been practiced in Python for years. What PyOxidizer brings to the table is the ability to more easily port code outside-in. For example, you could implement performance-criticial, early application logic such as config file parsing and command line argument parsing in Rust. You could then have Rust service some application functionality without Python. Why would you want this? Performance is a valid reason. Starting a Python interpreter, importing modules, and running code can consume several dozen or even hundreds of milliseconds. If you are writing performance sensitive applications, the existence of any Python can add enough latency that people no longer perceive the interaction as instananeous. This added latency can make Python totally inappropriate for some contexts, such as for programs that run as part of populating your shell's prompt. Writing such code in Rust instead of Python dramatically increases the probability that the code is fast and likely delivers stronger correctness guarantees courtesy of Rust's compile time validation as well!
An extreme practice of outside-in porting of Python to Rust would
be to incrementally rewrite an entire Python application in Rust.
Rust's ergonomics are exceptional and I do think we'll see people
choose Rust where they previously would have chosen Python. I've
done this myself with PyOxidizer and feel it is a very defensible
decision to reach! I feel a bit conflicted releasing a tool which
may undermine Python's popularity by encouraging use of Rust over
Python. But at the end of the day, PyOxidizer increases the utility
of both Python and Rust by giving each more readily accessible
access to the other and PyOxidizer improves the overall utility of
Python by improving the application distribution story. I have no
doubt PyOxidizer is a net benefit for the Python ecosystem, even if
it does help usher in more people choosing Rust over Python. If
I have an ulterior motive in developing PyOxidizer, it is to enable
Mercurial's official distribution to be a Rust executable and
for some functionality (like hg status
) to be runnable without
Python (for performance reasons).
Another possible use of PyOxidizer is as a library. All the build time
functionality of PyOxidizer exists in a Rust crate. So, you can add
the pyoxidizer
crate to your own Rust project and use its code to
do things like build a library containing Python, compile Python
source modules to bytecode, or walk a directory tree and find Python
resources within. The code is still heavily geared towards PyOxidizer
and there's no promise of API stability. But this potential for library
usage exists and if others want to experiment with building custom
Python binaries not using the pyoxidizer
CLI tool, using PyOxidizer
as a library might save you a lot of time.
Standalone Python Distributions
One of the most time consuming parts of building PyOxidizer was
figuring out how to build self-contained Python distributions.
Typically, a Python build consists of a library, shared libraries for
various extension modules, shared libraries required by the prior
items, and a hodgepodge of other files, such as .py files implementing
the Python standard library. The
python-build-standalone
project was created to automate creating special builds of Python
which are self-contained and distributable. This requires doing
dirty things with build systems. But I don't want to inflict the
details on you here. What I do think is worth mentioning is how
those Python distributions are distributed. The output of the
build is a tarball containing the Python installation, build artifacts
that can be used to link a custom libpython, and a PYTHON.json
file describing the contents of the distribution. PyOxidizer reads
the PYTHON.json
file and learns how it should interact with that
distribution. If you produce a Python distribution conforming to the
format that python-build-standalone
defines, you can use that
Python with PyOxidizer.
While I have no urgency to do so at this time, I could see a future
where this Python distribution format is standardized. Then maintainers
of various Python distributions (CPython, PyPy, etc) would independently
produce their own distributable artifacts conforming to this standard,
in turn allowing machine consumers of Python distributions (such as
PyOxidizer) to easily consume different Python distributions and do
interesting things with them. You could even imagine these Python
distribution archives being readily available as packages in your
system's package manager and their locations exposed via the
sysconfig
Python module, making it easy for tools (like PyOxidizer)
to find and use them.
Over time, I could see PyOxidizer's functionality rolling up into
official packaging tools like pip
, which would know how to consume
the distribution archives and produce an executable containing a
Python interpreter, required Python modules, etc.
Getting PyOxidizer's functionality rolled into official Python
packaging tools is likely years away (if it ever happens). But I think
standardizing a format describing a Python distribution and (optionally)
contains build artifacts that can be used to repackage it is a
prerequisite and would be a good place to start this journey. I would
certainly love for Python distributions (like CPython) to be in charge of
producing official repackagable distributions because this is not
something I want to be in the business of doing long term (I'm
lazy, less equipped to make the correct decisions, and there are
various trust and security concerns). And while I'm here, I am
definitely interested in upstreaming some of the
python-build-standalone
functionality into the existing CPython
build system because coercing CPython's build system to produce
distributable binaries is currently a major pain and I'd love to enable
others to do this. I just haven't had time nor do I know if the patches
would be well received. If a CPython maintainer wants to get in
touch, I'd love to have a conversation!
Conclusion
I started hacking on PyOxidizer in November 2018. After months of chipping away at it, I think I finally have a useful utility for some audiences. There's still a lot of missing features and some rough edges. But the core functionality is there and I'm convinced that PyOxidizer or its underlying technology could be an integral part of solving Python's application distribution black swan problem. I'm particularly proud of the hacks I concocted to coerce Python into importing module bytecode from memory using zero-copy. Those are documented in this blog post and in the pyembed crate docs.
So what are you waiting for? Head on over to the documentation, install PyOxidizer, and let me know how it goes by filing issues!
I hope you enjoy oxidizing your Python applications!