My Shifting Open Source Priorities

March 17, 2024 at 09:00 PM | categories: Personal, PyOxidizer

I'm a maintainer of a handful of open source projects, some of which have millions of downloads and/or are used in important workloads, including in production.

I have a full time job as a software engineer and my open source work is effectively a side job. (Albeit one I try very hard to not let intersect with my day job.)

Historically, my biggest contributions to my open source projects have come when I'm not working full time:

  • python-zstandard was started when I was on medical leave, recovering from a surgery.
  • python-build-standalone and PyOxidizer were mainly built when I was between jobs, after leaving Mozilla.
  • apple-codesign was built in the height of COVID when I took a voluntary leave of absence from work to reconstitute my mental and physical health.

When working full time, my time to contribute to open source has been carved out of weekday nights and weekends, especially in the winter months. I believe that code is an art form and programming a form of creative expression. My open source contributions provide a relaxing avenue for me to express my artistic creativity, when able.

My open source contributions reflect my personal priorities of where and what to spend my free time on.

The only constant in life is change.

In the middle of 2022, I switched job roles and found myself reinvigorated by my new role - Infrastructure Performance - which is at the intersection of some of my strongest technical and professional skills. I found myself willingly pouring more energy and time into my day job. That had the side effect of reducing my open source contributions.

In 2023Q1 I got married. In the months leading up to and after, I chose to prioritize spending time with my now wife and all the time commitments that entails. This also reduced the amount of time available for open source contributions.

In 2023Q4 I became a father to a beautiful baby girl. While on my employer's generous-for-the-United-States fourteen week paternity leave, I somehow found some time to contribute to open source. As refreshing as that was, it didn't last. My man cave where my desktop computer resides has been converted into a nursery. And for the past few months it has been occupied by my mother-in-law, who has been generously effectively serving as a live-in nanny. Even when I'm able to sit down at my desktop, it's hard to get into a state of flow due to the added entropy from the additional three people now living with me.

After realizing the new normal in 2024Q1, I purchased a Wahoo KICKR MOVE bicycle trainer and now spend considerable time doing virtual bicycle rides on Zwift because its one of the few leisure activities I can do at home without drawing scrutiny from my wife and mother-in-law (but 98% my mother-in-law because I've observed that my wife is effectively infallible). I now get excited about virtually summiting famous climbs instead of contributing to open source. (Today's was Mont Ventoux - an absolute beast of a climb that reminded me a lot of my real world ride up Pike's Peak in 2020.)

Various changes in the past eighteen or so months have created additional time constraints and prioritization changes that have resulted in my open source contributions withering.

In addition, my technical interests have been shifting.

I've always gravitated to more systems-level areas of computers. My degree is in Computer Engineering and I have a stereotypical engineer mindset: I have an insatiable curiosity about how things work and interact and I want to always be tinkering. I prefer to be closer to hardware instead of abstracted far away from it. I enjoy interacting with the building blocks of software ecosystems: operating systems, filesystems, runtimes, file formats, compilers, etc.

Historically, my open source contributions to my preferred areas of computing were limited. Again, to me open source is an enjoyable form of creative expression. That means I do it for fun. Historically, the systems-level programming space was limited to languages like C and C++, which I consider frustrating and painful to use. If I'm going to subject myself to misery when programming, you are going to have to pay me well to do it.

As part of creating PyOxidizer, I learned Rust.

When I became proficient in Rust, I realized that Rust unlocks all kinds of systems-level problems that were effectively off-limits for my open source contributions. Would I implement Debian packaging primitives in Python? Or a tool to bulk analyze Linux packages and peek inside ELF binaries for insights about what compiler/linker features are used in the wild in Python/C/C++? Not unless you pay me to do it!

As I learned Rust, I also found myself being drawn away from Python, my prior go-to language. As I wrote in Rust is for Professionals, Rust feels surprisingly high level. It isn't as terse as Python but it is a lot closer than I thought it would be. And Rust gives you vastly stronger compile-time guarantees and run-time performance than Python. I felt like Rust's tooling ecosystem was supporting me instead of standing in my way. I felt that when you consider the overall software development lifecycle - not just the edit-build-run loop that people tend to fixate on, likely because it is the easiest to measure - Rust was vastly more productive and a joy to work with than Python. All those countless hours debugging, fixing, and authoring tests for TypeError and ValueError Python exceptions you see in production just don't happen with Rust and that time can be better spent iterating on core functionality, which is what actually matters.

On top of the Rust undercurrents, I've also become somewhat disenchanted with the Python ecosystem. As I wrote in 2020's Mercurial's Journey to and Reflections on Python 3, the Python 3 transition was bungled and resulted in years - if not a full decade - of lost opportunity. As I wrote in 2023's My User Experience Porting Off setup.py, the Python packaging story feels as discombobulated and frustrating as ever. PyOxidizer additionally brushed up against several limitations in how Python is designed and implemented, many of which are not trivially fixable. As a systems-level guy, I am frequently questioning various aspects of the Python ecosystem which I have contrasting opinions on, including the importance of correctness and performance.

Starting in 2021, I started gravitating towards writing more Rust code and solving problems in the systems domain that were previously off-limits to me, like Apple code signing. Initially the work was in support of PyOxidizer: I was going to implement all these packaging primitives in pure Rust and enable people to distribute Python applications without requiring access to a Windows or macOS machine! Over time, this work consumed me. Apple code signing turned into a major time sink because of its complexity and the fact I was having to reverse engineer a lot of its internals. But I was having a ton of fun doing it: more fun than swimming upstream against decades of encrusted technical debts in the Python ecosystem.

By late 2021, I realized I made a series of mistakes with PyOxidizer.

I started PyOxidizer as a science experiment to see if it was possible to achieve a single file executable Python application without requiring a temporary filesystem at run-time. I succeeded. But the cost was compatibility with the larger pre-built Python package ecosystem. I built all this complexity into PyOxidizer to allow people to tweak how Python resources are packaged so they could choose to build a single file application if they wanted. This ballooned into a hot mess and was obviously not user-friendly. It violated various personal principles about optimizing for end-user experience.

Armed with knowledge of all the pitfalls, I realized that there was a 90% use case for Python application packaging that was simple for end users and technically achievable using all the code primitives - like the pyembed Rust crate - that I built out for PyOxidizer.

Thus the PyOxy project was born and released in May 2022.

While I believe PyOxy is already a generally useful primitive to have in the Python ecosystem, I had bigger goals in mind.

My intent with PyOxy was to build in a simplified and opinionated PyOxidizer lite mode. The pyoxy executable is already a chameleon: if you rename it to python it behaves like a python executable. I wanted to extend this so you could do something like pyoxy build-app and it would collect all dependencies, assemble a Python packed resources blob, and embed that in a copy of the pyoxy binary as an ELF, Mach-O, or PE segment. Then at run-time, the variant executable binary would load the application configuration and Python resources metadata from its own binary and execute the application. Essentially, PyOxy would evolve into a self-packaging Python application. I just needed to evolve the Python packed resources format, implement a very crude ELF, Mach-O, and PE linker to append resources data to an executable, and teach pyembed to read resources data from an ELF, Mach-O, or PE segment. All within my sphere of technical competency. And I was excited to build it and forever alter people's perceptions of how easy it could be to produce a distributable Python application.

Then the roller coaster of my personal life took over. I felt newly invigorated with a new job role. I got engaged and married. I became a father.

By early 2023, it was clear my ability to contribute to open source would be vastly diminished for the foreseeable future. PyOxidizer and PyOxy fell into a state of neglect. Weeks went by without me even tinkering on my local computer, much less push commits or publish a release. Weeks turned into months. Months into quarters. At this point, I haven't pushed a commit to indygreg/PyOxidizer since January 2023. And I'm not sure when I next will, if ever.

In my limited open source contribution time, I've prioritized other projects over PyOxidizer.

python-build-standalone has gained a life outside PyOxidizer. It is now used by rye, Bazel's rules_python, briefcase, and a myriad of other consumers. The release assets have been downloaded over 23 million times and the download rate appears to be accelerating. I still actively support python-build-standalone and intend for the project to be actively supported for the indefinite future: it has become too important to abandon. I'm actively recruiting assistance to help maintain the project and I'm not concerned about its future.

Apple code signing still actively draws my engagement. What I love about the project is it either works or it doesn't: there's limited extra features we can add to it since Apple mostly dictates the feature set. And I perceive the current project to be mostly done.

python-zstandard is downloaded ~8 million times per month. The project is long overdue for some modernization. I'm sitting on a pile of commits to improve it, but progress has been slow. I just learned this weekend that the maintainer of the other popular zstandard Python package deleted their GitHub account recently and now users are looking to onboard to my package. Nothing quite like unanticipated distractions!

That's a very long-winded way of saying that PyOxidizer and all the projects under its umbrella are effectively in a zombie state. I'm hesitant to say dead because if I suddenly found myself with lots of free time I'd love to brush off the cobwebs and bring the projects back to life. But who am I kidding: they are effectively dead at the moment because with everything happening in my personal life, I don't see where I find the time to resuscitate the project. And that assumes I even want to: again, I've become somewhat disenchanted by the state of Python. The main thing that draws me to it is the size of the community and the potential for impact. But to realize that impact I feel like I'd be pushing Python in directions it isn't well-equipped to go in. Quite franky - and, yes, selfishly - I don't want to subject myself to the misery unless I'm being well paid to do it. Again, I view my open source contributions as a fun outlet for my creative expression and nudging Python packaging in directions it is obviously ill-equipped to go in just isn't fun.

If anyone reading has an interest in taking ownership or maintenance responsibilities of PyOxidizer, any projects under its umbrella, or any of my other open source projects, I'm receptive to proposals. Send me an email or create an issue or discussion on GitHub if you want to do it publicly.

But I'm going to assume that PyOxidizer is going to wither and die - or at least incur some massive backwards incompatible breaks if it continues to live. I've already filed issues against python-build-standalone - such as removing Windows static builds - to make the project easier to support and less work for future maintainers.

If I have one regret about how this has played out, it is my failure to communicate developments in my open source commitments / expectations in a timely manner. I knew the future was bleak in early 2023 but didn't publicly say anything. I still thought there was a chance that things were going to change and I didn't want to make a hard decision prematurely. Writing this post has been on my mind since the middle of 2023 but I just couldn't bring myself to write it. And - surprise - having a newborn at home is a giant time and mental commitment! I'm writing this now because people are (finally!) noticing my lack of contributions to PyOxidizer and asking questions. And I'm home alone for a few days and actually have time to sit down and compose this post. (Yes, I'm that stretched for time in my personal life.)

In 2023, I struggled with the idea of letting people down by declaring PyOxidizer dead. But when I wake up every morning, walk into the nursery, and cause my daughter to smile and flail her arms and legs with unbridled excitement when she sees me, I'd have it no other way. When it comes to choosing between open source and family, I choose family.

It feels appropriate to end this post with a link to XKCD 2347: Dependency. But I'm not the random person in Nebraska: I'm a husband and father.


Announcing the PyOxy Python Runner

May 10, 2022 at 08:00 AM | categories: Python, PyOxidizer

I'm pleased to announce the initial release of PyOxy. Binaries are available on GitHub.

(Yes, I used my pure Rust Apple code signing implementation to remotely sign the macOS binaries from GitHub Actions using a YubiKey plugged into my Windows desktop: that experience still feels magical to me.)

PyOxy is all of the following:

  • An executable program used for running Python interpreters.
  • A single file and highly portable (C)Python distribution.
  • An alternative python driver providing more control over the interpreter than what python itself provides.
  • A way to make some of PyOxidizer's technology more broadly available without using PyOxidizer.

Read the following sections for more details.

pyoxy Acts Like python

The pyoxy executable has a run-python sub-command that will essentially do what python would do:

$ pyoxy run-python
Python 3.9.12 (main, May  3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

A Python REPL. That's familiar!

You can even pass python arguments to it:

$ pyoxy run-python -- -c 'print("hello, world")'
hello, world

When a pyoxy executable is renamed to any filename beginning with python, it implicitly behaves like pyoxy run-python --.

$ mv pyoxy python3.9
$ ls -al python3.9
-rwxrwxr-x  1 gps gps 120868856 May 10  2022 python3.9

$ ./python3.9
Python 3.9.12 (main, May  3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Single File Python Distributions

The official pyoxy executables are built with PyOxidizer and leverage the Python distributions provided by my python-build-standalone project. On Linux and macOS, a fully featured Python interpreter and its library dependencies are statically linked into pyoxy. The pyoxy executable also embeds a copy of the Python standard library and imports it from memory using the oxidized_importer Python extension module.

What this all means is that the official pyoxy executables can function as single file CPython distributions! Just download a pyoxy executable, rename it to python, python3, python3.9, etc and it should behave just like a normal python would!

Your Python installation has never been so simple. And fast: pyoxy should be a few milliseconds faster to initialize a Python interpreter mostly because of oxidized_importer and it avoiding filesystem overhead to look for and load .py[c] files.

Low-Level Control Over the Python Interpreter with YAML

The pyoxy run-yaml command is takes the path to a YAML file defining the embedded Python interpreter configuration and then launches that Python interpreter in-process:

$ cat > hello_world.yaml <<EOF
---
allocator_debug: true
interpreter_config:
  run_command: 'print("hello, world")'
...
EOF

$ pyoxy run-yaml hello_world.yaml
hello, world

Under the hood, PyOxy uses the pyembed Rust crate to manage embedded Python interpreters. The YAML document that PyOxy uses is simply deserialized into a pyembed::OxidizedPythonInterpreterConfig Rust struct, which pyembed uses to spawn a Python interpreter. This Rust struct offers near complete control over how the embedded Python interpreter behaves: it even allows you to tweak settings that are impossible to change from environment variables or python command arguments! (Beware: this power means you can easily cause the interpreter to crash if you feed it a bad configuration!)

YAML Based Python Applications

pyoxy run-yaml ignores all file content before the YAML --- start document delimiter. This means that on UNIX-like platforms you can create executable YAML files defining your Python application. e.g.

$ mkdir -p myapp
$ cat > myapp/__main__.py << EOF
print("hello from myapp")
EOF

$ cat > say_hello <<"EOF"
#!/bin/sh
"exec" "`dirname $0`/pyoxy" run-yaml "$0" -- "$@"
---
interpreter_config:
  run_module: 'myapp'
  module_search_paths: ["$ORIGIN"]
...
EOF

$ chmod +x say_hello

$ ./say_hello
hello from myapp

This means that to distribute a Python application, you can drop a copy of pyoxy in a directory then define an executable YAML file masquerading as a shell script and you can run Python code with as little as two files!

The Future of PyOxy

PyOxy is very young. I hacked it together on a weekend in September 2021. I wanted to shore up some functionality before releasing it then. But I got perpetually sidetracked and never did the work. I figured it would be better to make a smaller splash with a lesser-baked product now than wait even longer. Anyway...

As part of building PyOxidizer I've built some peripheral technology:

  • Standalone and highly distributable Python builds via the python-build-standalone project.
  • The pyembed Rust crate for managing an embedded Python interpreter.
  • The oxidized_importer Python package/extension for importing modules from memory, among other things.
  • The Python packed resources data format for representing a collection of Python modules and resource files for efficient loading (by oxidized_importer).

I conceived PyOxy as a vehicle to enable people to leverage PyOxidizer's technology without imposing PyOxidizer onto them. I feel that PyOxidizer's broader technology is generally useful and too valuable to be gated behind using PyOxidizer.

PyOxy is only officially released for Linux and macOS for the moment. It definitely builds on Windows. However, I want to improve the single file executable experience before officially releasing PyOxy on Windows. This requires an extensive overhaul to oxidized_importer and the way it serializes Python resources to be loaded from memory.

I'd like to add a sub-command to produce a Python packed resources payload. With this, you could bundle/distribute a Python application as pyoxy plus a file containing your application's packed resources alongside YAML configuring the Python interpreter. Think of this as a more modern and faster version of the venerable zipapp approach. This would enable PyOxy to satisfy packaging scenarios provided by tools like Shiv, PEX, and XAR. However, unlike Shiv and PEX, pyoxy also provides an embedded Python interpreter, so applications are much more portable since there isn't reliance on the host machine having a Python interpreter installed.

I'm really keen to see how others want to use pyoxy.

The YAML based control over the Python interpreter could be super useful for testing, benchmarking, and general Python interpreter configuration experimentation. It essentially opens the door to things previously only possible if you wrote code interfacing with Python's C APIs.

I can also envision tools that hide the existence of Python wanting to leverage the single file Python distribution property of pyoxy. For example, tools like Ansible could copy pyoxy to a remote machine to provide a well-defined Python execution environment without having to rely on what packages are installed. Or pyoxy could be copied into a container or other sandboxed/minimal environment to provide a Python interpreter.

And that's PyOxy. I hope you find it useful. Please file any bug reports or feature requests in PyOxidizer's issue tracker.


Pure Rust Implementation of Apple Code Signing

April 14, 2021 at 01:45 PM | categories: PyOxidizer, Apple, Rust

A few weeks ago I (foolishly?) set out to implement Apple code signing (what Apple's codesign tool does) in pure Rust.

I wanted to quickly announce on this blog the existence of the project and the news that as of a few minutes ago, the tugger-apple-codesign crate implementing the code signing functionality is now published on crates.io!

So, you can now sign Apple binaries and bundles on non-Apple hardware by doing something like this:

$ cargo install tugger-apple-codesign
$ rcodesign sign /path/to/input /path/to/output

Current features include:

  • Robust support for parsing embedded signatures and most related data structures. rcodesign extract can be used to extract various signature data in raw or human readable form.
  • Parse and verify RFC 5652 Cryptographic Message Syntax (CMS) signature data.
  • Sign binaries. If a code signing certificate key pair is provided, a CMS signature will be created. This includes support for Time-Stamp Protocol (TSP) / RFC 3161 tokens. If no key pair is provided, you get an ad-hoc signature.
  • Signing bundles. Nested bundles and binaries will automatically be signed. Non-code resources will be digested and a CodeResources XML file will be produced.

The most notable missing features are:

  • No support for obtaining signing keys from keychains. If you want to sign with a cryptographic key pair, you'll need to point the tool at a PEM encoded key pair and CA chain.
  • No support for parsing the Code Signing Requirements language. We can parse the binary encoding produced by csreq -b and convert it back to this DSL. But we don't parse the human friendly language.
  • No support for notarization.

All of these could likely be implemented. However, I am not actively working on any of these features. If you would like to contribute support, make noise in the GitHub issue tracker.

The Rust API, CLI, and documentation are still a bit rough around the edges. I haven't performed thorough QA on aspects of the functionality. However, the tool is able to produce signed binaries that Apple's canonical codesign tool says are well-formed. So I'm reasonably confident some of the functionality works as intended. If you find bugs or missing features, please report them on GitHub. Or even better: submit pull requests!

As part of this project, I also created and published the cryptographic-message-syntax crate, which is a pure Rust partial implementation of RFC 5652, which defines the cryptographic message signing mechanism. This RFC is a bit dated and seems to have been superseded by RPKI. So you may want to look elsewhere before inventing new signing mechanisms that use this format.

Finally, it appears the Windows code signing mechanism (Authenticode) also uses RFC 5652 (or a variant thereof) for cryptographic signatures. So by implementing Apple code signatures, I believe I've done most of the legwork to implement Windows/PE signing! I'll probably implement Windows signing in a new crate whenever I hook up automatic code signing to PyOxidizer, which was the impetus for this work (I want to make it possible to build distributable Apple programs without Apple hardware, using as many open source Rust components as possible).


Announcing the 0.9 Release of PyOxidizer

October 18, 2020 at 10:00 PM | categories: Python, PyOxidizer

I have decided to make up for the 6 month lull between PyOxidizer's 0.7 and 0.8 releases by releasing PyOxidizer 0.9 just 1 week after 0.8!

The full 0.9 changelog is found in the docs. First time user? See the Getting Started documentation.

While the 0.9 release is far smaller in terms of features compared to 0.8, it is an important release because of progress closing compatibility gaps.

Build a python Executable

PyOxidizer 0.8 quietly shipped the ability to build executables that behave like python executables via enhancements to the configurability of embedded Python interpreters.

PyOxidizer 0.9 made some minor changes to make this scenario work better and there is even official documentation on how to achieve this. So now you can emit a python executable next to your application's executable. Or you could use PyOxidizer to build a highly portable, self-contained python executable and ship your Python scripts next to it, using PyOxidizer's python in your #!.

Support Packaging Files as Files for Maximum Compatibility

There is a long-tail of Python packages that don't just work with PyOxidizer. A subset of these packages don't work because of bugs with how PyOxidizer attempts to classify files as specific types of Python resources.

The way that normal Python works is you materialize a bunch of files on the filesystem and at run-time the filesystem-based importer stat()s a bunch of paths until it finds a candidate file satisfying the import request. This works of course. But it is inefficient. Since PyOxidizer has awareness of every resource being packaged at build time, it attempts to index all known resources and serialize them to an efficient data structure so finding and loading a resource can be extremely quick (effectively just a hashmap lookup in Rust code to resolve the memory address of data).

PyOxidizer's approach does work in the majority of cases. But there are edge cases. For example, NumPy's binary wheels have installed file paths like numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so. The numpy.libs directory is not a valid Python package directory since it has a . and since it doesn't have an __init__.py[c] file. This is a case where PyOxidizer's code for turning files into resources is currently confused.

It is tempting to argue that file layouts like NumPy's are wrong. But there doesn't seem to be any formal specification preventing the use of such layouts. The arbiter of truth here is what Python packaging tools accept and the current code for installing wheels gladly accepts file layouts like these. So I've accepted that PyOxidizer is just going to have to support edge cases like this. (I've captured more details about this particular issue in the docs).

Anyway, PyOxidizer 0.9 ships a new, simpler mode for handling files: files mode. In files mode, PyOxidizer disables its code for classifying files as typed Python resources (like module sources and extension modules) and instead treats a file as... a file.

When in files mode, actions that invoke Python packaging tools return files objects instead of classified resources. If you then add these files for packaging, those files are materialized on the filesystem next to your built executable. You can then use Python's standard filesystem importer to load these files at run-time.

This allows you to use PyOxidizer with packages like NumPy that were previously incompatible due to bugs with file/resource classification. In fact, getting NumPy working with PyOxidizer is now in the official documentation!

Files mode is still in its infancy. There exists code for embedding files data in the produced executable. I plan to eventually teach PyOxidizer's run-time code to extract these embedded files to a temporary directory, SquashFS FUSE filesystem, etc. This is the approach that other Python packaging tools like PyInstaller and XAR use. While it is less efficient, this approach is highly compatible with Python code in the wild since you sidestep issues with __file__ and other assumptions about installed file layouts. So it makes sense for PyOxidizer to provide support for this so you can still achieve the friendliness of a self-contained executable without worrying about compatibility. Look for improvements to files mode in future releases.

And to help debug issues with PyOxidizer's file handling and resource classification, the new pyoxidizer find-resources command can be used to invoke PyOxidizer's code for scanning and classifying files. Hopefully this makes it easier to diagnose bugs in this critical component of PyOxidizer!

Some Important Bug Fixes

PyOxidizer 0.8 shipped with some pretty annoying bugs and behavior quirks.

The ability to set custom sys.path values via Starlark was broken. How I managed to ship that, I'm not sure. But it is fixed in 0.9.

Another bug I can't believe I shipped was the PythonExecutable.read_virtualenv() Starlark method being broken due to a typo. You can read from virtualenvs again in PyOxidizer 0.9.

Another important improvement is in the default Python interpreter configuration. We now automatically initialize Python's locales configuration by default. Without this, the encoding of filesystem paths and sys.argv may not have been correct. If someone passed a non-ASCII argument, the Python str value was likely mangled. PyOxidizer built binaries should behave reasonably by default now. The issue is a good read if the subtle behaviors of how encodings work in Python and on different operating systems is interesting to you.

Better Binary Portability Documentation

The documentation on binary portability has been overhauled. Hopefully it is much more clear about the capabilities of PyOxidizer to produce a binary that just works on other machines.

I eventually want to get PyOxidizer to a point where users don't have to think about binary portability. But until PyOxidizer starts generating installers and providing the ability to run builds in deterministic and reproducible environments, it is sadly a problem that is being externalized to end users.

In Conclusion

PyOxidizer 0.9 is a small release representing just 1 week of work. But it contains some notable features that I wanted to get out the door.

As always, please report any issues or feedback in the GitHub issue tracker or the users mailing list.


Announcing the 0.8 Release of PyOxidizer

October 12, 2020 at 12:45 AM | categories: Python, PyOxidizer

I am very excited to announce the 0.8 release of PyOxidizer, a modern Python application packaging tool. You can find the full changelog in the docs. First time user? See the Getting Started documentation.

Foremost, I apologize that this release took so long to publish (0.7 was released on 2020-04-09). I fervently believe that frequent releases are a healthy software development practice. And 6 months between PyOxidizer releases was way too long. Part of the delay was due to world events (it has proven difficult to focus on... anything given a global pandemic, social unrest, and wildfires further undermining any resemblance of lifestyle normalcy in California). Another contributing factor was I was waiting on a few 3rd party Rust crates to have new versions published to crates.io (you can't release a crate to crates.io unless all your dependencies are also published there).

Release delay and general life hardships aside, the 0.8 release is here and it is full of notable improvements!

Python 3.8 and 3.9 Support

PyOxidizer 0.8 now targets Python 3.8 by default and support for Python 3.9 is available by tweaking configuration files. Previously, we only supported Python 3.7 and this release drops support for Python 3.7. I feel a bit bad for dropping compatibility. But Python 3.8 introduced a new C API for initializing Python interpreters (thank you Victor Stinner!) and this makes PyOxidizer's run-time code for interfacing with Python interpreters vastly simpler. I decided that given the beta nature of PyOxidizer, it wasn't worth maintaining complexity to continue to support Python 3.7. I'm optimistic that I'll be able to support Python 3.8 as a baseline for a while.

Better Default Packaging Settings

PyOxidizer started as a science experiment of sorts to see if I could achieve the elusive goal of producing a single file executable providing a Python application. I was successful in proving this hypothesis. But the cost to achieving this outcome was rather high in terms of end-user experience: in order to produce single file executables, you had to break a lot of assumptions about how Python typically works and this in turn broke a lot of Python code and packages in the wild.

In other words, PyOxidizer's opinionated defaults of producing a single file executable were externalizing hardship on end-users and preventing them from using PyOxidizer.

PyOxidizer 0.8 contains a handful of changes to defaults that should hopefully lessen the friction.

On Windows, the default Python distribution now has a more traditional build configuration (using .pyd extension modules and a pythonXY.dll file). This means that PyOxidizer can consume pre-built extension modules without having to recompile them from source. If you publish a Windows binary wheel on PyPI, in many cases it will just work with PyOxidizer 0.8! (There are some notable exceptions to this, such as numpy, which is doing wonky things with the location of shared libraries in wheels - but I aim to fix this soon.)

Also on Windows, we no longer attempt to embed Python extension modules (.pyd files) and their shared library dependencies in the produced binary and load them from memory by default. This is because PyOxidizer's from-memory library loader didn't work in all cases. For example, some OpenSSL functionality used by the _ssl module in the standard library didn't work, preventing Python from establishing TLS connections. The old mode enabling you to produce a single file executable on Windows is still available. But you have to opt in to it (at the likely cost of more packaging and compatibility pain).

Starlark Configuration Overhaul

PyOxidizer 0.8 contains a ton of changes to its Starlark configuration files. There are so many changes that you may find it easier to port to PyOxidizer 0.8 by creating a new configuration file rather than attempting to port an existing one.

I apologize for this churn and recognize it will be disruptive. However, this churn needed to happen for various reasons.

Much of the old Starlark configuration semantics was rooted in the days when configuration files were static TOML files. Now that configuration files provide the power of a (Python-inspired) programming language, we are free to expose much more flexibility. But that flexibility requires refactoring things so the experience feels more native.

Many changes to Starlark were rooted in necessity. For example, the methods for invoking setup.py or pip install used to live on a Python distribution type and have been moved to a type representing executables. This is because the binary we are targeting influences how packaging actions behave. For example, if the binary only supports loading resources from memory (as opposed to standalone files), we need to know that when invoking the packaging tool so we can produce files (notably Python extension modules) compatible with the destination.

A major change to Starlark in 0.8 is around resource location handling. Before, you could define a static string denoting the resources policy for where things should be placed. And there were 10+ methods for adding different resource types (source, bytecode, extensions, package data) to different load locations (memory, filesystem). This mechanism is vastly simplified and more powerful in PyOxidizer 0.8!

In PyOxidizer 0.8, there is a single add_python_resource() method for adding a resource to a binary and the Starlark objects you add can denote where they should be added by defining attributes on those objects.

Furthermore, you can define a Starlark function that is called when resource objects are created to apply custom packaging rules using custom Starlark code defined in your PyOxidizer config file. So rather than having everyone try to abide by a few pre-canned policies for packaging resources, you can define a proper function in your config file that can be as complex as you want/need it to be! I feel this is vastly simpler and more powerful than implementing a custom DSL in static configuration files (like TOML, JSON, YAML, etc).

While the ability to implement your own arbitrarily complex packaging policies is useful, there is a new PythonPackagingPolicy Starlark type with enough flexibility to suit most needs.

Shipping oxidized_importer

During the development of PyOxidizer 0.8, I broke out the custom Rust-based Python meta-path importer used by PyOxidizer's run-time code into a standalone Python package. This sub-project is called oxidized_importer and I previously blogged about it.

PyOxidizer 0.8 ships oxidized_importer and all of its useful APIs available to Python. Read more in the official docs. The new Python APIs should make debugging issues with PyOxidizer-packaged applications vastly simpler: I found them invaluable when tracking down user-reported bugs!

Tons of New Tests and Refactored Code

PyOxidizer was my first non-toy Rust project. And the quality of the Rust code I produced in early versions of PyOxidizer clearly showed it. And when I was in the rapid-prototyping phase of PyOxidizer, I eschewed writing tests in favor of short-term progress.

PyOxidizer 0.8 pays down a ton of technical debt in the code base. Lots of Rust code has been refactored and is using somewhat reasonable practices. I'm not yet a Rust guru. But I'm at the point where I cringe when I look at some of the early code I wrote, which is a good sign. I do have to say that Rust has been a dream to work with during this transition. Despite being a low-level language, my early misuse of Rust did not result in crashes like you would see in languages like C/C++. And Rust's seemingly omniscient compiler and IDE tools facilitating refactoring have ensured that code changes aren't accompanied by subtle random bugs that would occur in dynamic programming languages. I really need to write a dedicated post espousing the virtues of Rust...

There are a ton of new tests in PyOxidizer 0.8 and I now feel somewhat confident that the main branch of PyOxidizer should be considered production-ready at any time assuming the tests pass. This will hopefully lead to more rapid releases in the future.

There are now tests for the pyembed Rust crate, which provides the run-time code for PyOxidizer-built binaries. We even have Python-based unit tests for validating the Python-exposed APIs behave as expected. These tests have been invaluable for ensuring that the run-time code works as expected. So now when someone files a bug I can easily write a test to capture it and keep the code working as intended through various refactors.

The packaging-time Rust code has also gained its fair share of tests. We now have fairly comprehensive test coverage around how resources are added/packaged. Python extension modules have proved to be highly nuanced in how they are handled. Tremendously helping testing of extension modules is that we're able to run tests for platform non-native extensions! While not yet exposed/supported by Starlark configuration files, I've taught PyOxidizer's core Rust code to be cross-compiling aware so that we can e.g. test Windows or macOS behavior from Linux. Before, I'd have to test Windows wheel handling on Windows. But after writing a wheel parser in Rust and teaching PyOxidizer to use a different Python distribution for the host architecture from the target architecture, I'm now able to write tests for platform-specific functionality that run on any platform that PyOxidizer can run on. This may eventually lead to proper cross-compiling support (at least in some configuration). Time will tell. But the foundation is definitely there!

New Rust Crates

As part of the aforementioned refactoring of PyOxidizer's Rust code, I've been extracting some useful/generic functionality built as part of developing PyOxidizer to their own Rust crates.

As part of this release, I'm publishing the initial 0.1 release of the python-packaging crate (docs). This crate provides pure Rust code for various Python packaging related functionality. This includes:

  • Rust types representing Python resource types (source modules, bytecode modules, extension modules, package resources, etc).
  • Scanning the filesystem for Python resource files .
  • Configuring an embedded Python interpreter.
  • Parsing PKG-INFO and related files.
  • Parsing wheel files.
  • Collecting Python resources and serializing them to a data structure.

The crate is somewhat PyOxidizer centric. But if others are interested in improving its utility, I'll happily accept pull requests!

PyOxidizer's crates footprint now includes:

Major Documentation Updates

I strongly believe that software should be documented thoroughly and I strive for PyOxidizer's documentation to be useful and comprehensive.

There have been a lot of changes to PyOxidizer's documentation since the 0.7 release.

All configuration file documentation has been consolidated.

Likewise, I've attempted to consolidate a lot of the paved road documentation for how to use PyOxidizer in the Packaging User Guide section of the docs.

I'll be honest, since I have so much of PyOxidizer's workings internalized, it can be difficult for me to empathize with PyOxidizer's users. So if you have difficult with the readability of the documentation, please file an issue and report what is confusing so the documentation can be improved!

Mercurial Shipping With PyOxidizer 0.8

PyOxidizer is arguably an epic yak shave of mine to help the Mercurial version control tool transition to Python 3 and Rust.

I'm pleased to report that Mercurial is now shipping PyOxidizer-built distributions on Windows as of the 5.2.2 release a few days ago! If a complex Python application like Mercurial can be configured to work with PyOxidizer, chances are your Python application will work as well.

Whats Next

I view PyOxidizer 0.8 as a pivotal release where PyOxidizer is turning the corner from a prototyping science experiment to something more generally usable. The investments in test coverage and refactoring of the Rust internals are paving the way towards future features and bug fixes.

In upcoming releases, I'd like to close remaining known compatibility gaps with popular Python packages (such as numpy and other packages in the scientific/data space). I have a general idea of what work needs to be done and I've been laying the ground work via various refactorings to execute here.

I want a general theme of future releases to be eliminating reasons why people can't use PyOxidizer. PyOxidizer's historical origin was as a science experiment to see if single file Python applications were possible. It is clear that achieving this is fundamentally incompatible with compatibility with tons of Python packages in the wild. I'd like to find a way where PyOxidizer can achieve 99% package compatibility by default so new users don't get discouraged when using PyOxidizer. And for the subset of users who want single file executables, they can spend the magnitude of additional effort to achieve that.

At some point, I also want to make a pivot towards focusing on producing distributable artifacts (Debian/RPM packages, MSI installers, macOS DMG files, etc). I'm slightly bummed that I haven't made much progress here. But I have a vision in my mind of where I want to go (I'll be making a standalone Rust crate + Starlark dialect to facilitate producing distributable artifacts for any application) and I'm anticipating starting this work in the next few months. In the mean time, PyOxidizer 0.8 should be able to give people a directory tree that they can coerce into distributable artifacts using existing packaging tooling. That's not as turnkey as I would like it to be. But the technical problems around building a distributable Python application binary still needs some work and I view that as the most pressing need for the Python ecosystem. So I'll continue to focus there so there is a solid foundation to build upon.

In conclusion, I hope you enjoy the new release! Please report any issues or feedback in the GitHub issue tracker.


Next Page ยป