Transferring Python Build Standalone Stewardship to Astral
December 03, 2024 at 11:30 AM | categories: PythonMy Python Standalone Builds (PBS) project provides self-contained Python distributions that aim to just work.
PBS Python distributions are used by uv, Rye, Bazel's rules_python, and other tools in the Python ecosystem. The GitHub release artifacts have been downloaded over 70,000,000 times.
With ruff and uv, Charlie Marsh and the folks at Astral have been speedrunning building world class tooling for the Python ecosystem.
As I wrote in My Shifting Open Source Priorities in March 2024, changes in my life have resulted in me paring back my open source activities.
Since that post, uv has exploded onto the scene. There have been over 50 million downloads of PBS release assets since March.
Charlie and Astral have stepped up and been outstanding open source citizens when it comes to evolving PBS to support their work. When I told Charlie I could use assistance supporting PBS, Astral employees started contributing to the project. They have built out various functionality, including Python 3.13 support (including free-threaded builds), turnkey automated release publishing, and debug symbol stripped builds to further reduce the download/install size. Multiple Astral employees now have GitHub permissions to approve/merge PRs and publish releases. All releases since April have been performed by Astral employees.
PBS today is effectively an Astral maintained project and has been that way for months. As far as I can tell there have been only positive side effects of this transition. When I ask myself why I should continue being the GitHub maintainer, every answer distills down to personal pride or building a personal brand. I need neither.
I agree with Charlie that formally transferring stewardship of my standalone Python builds project to Astral is in the best interest of not only the project itself but of the wider Python community.
On 2024-12-17 I will transfer indygreg/python-build-standalone into the astral-sh GitHub organization. From there, Astral will lead its development and evolution.
I will retain my GitHub permissions on the project and hope to stay involved in its development, if nothing more than a periodic advisor.
Astral has clearly demonstrated competency for execution and has the needs of Python developers at heart. I have no doubt that PBS will thrive under Astral's stewardship. I can't wait to see what they do next.
Astral has also blogged about this announcement and I encourage interested parties to read it as well.
My User Experience Porting Off setup.py
October 30, 2023 at 06:00 AM | categories: PythonIn the past week I went to add Python 3.12 support to my zstandard Python package. A few hours into the unexpected yak shave / rat hole, I decided to start chronicling my experience so that I may share it with the broader Python community. My hope is that by sharing my (unfortunately painful) end-user experience that I can draw attention to aspects of Python packaging that are confusing so that better informed and empowered people can improve matters and help make future Python packaging decisions to help scenarios like what I'm about to describe.
This blog post is purposefully verbose and contains a very lightly edited stream of my mental thoughts. Think of it as a self-assessed user experience study of Python packaging.
Some Background
I'm no stranger to the Python ecosystem or Python packaging. I've been programming Python for 10+ years. I've even authored a Python application packaging tool, PyOxidizer.
When programming, I strive to understand how things work. I try to not blindly copy-paste or cargo cult patterns unless I understand how they work. This means I often scope bloat myself and slow down velocity in the short term. But I justify this practice because I find it often pays dividends in the long term because I actually understand how things work.
I also have a passion for security and supply chain robustness. After you've helped maintain complex CI systems for multiple companies, you learn the hard way that it is important to do things like transitively pin dependencies and reduce surface area for failures so that build automation breaks in reaction to code changes in your version control, not spooky-action-at-a-distance when state on a third party server changes (e.g. a new package version is uploaded).
I've been aware of the emergence of pyproject.toml
. But I've largely sat on
the sidelines and held off adopting them, mainly for if it isn't broken, don't
fix it reasons. Plus, my perception has been that the tooling still hasn't
stabilized: I'm not going to incur work now if it is going to invite avoidable
churn that could be avoided by sitting on my hands a little longer.
Now, on to my user experience of adding Python 3.12 to python-zstandard and the epic packaging yak shave that entailed.
The Journey Begins
When I attempted to run CI against Python 3.12 on GitHub Actions, running
python setup.py
complained that setuptools
couldn't be imported.
Huh? I thought setuptools
was installed in pretty much every Python
distribution by default? It was certainly installed in all previous Python
versions by the actions/setup-python
GitHub Action. I was aware distutils
was removed from the Python 3.12
standard library. But setuptools and distutils are not the same! Why did
setuptools
disappear?
I look at the CI logs for the passing Python 3.11 job and notice a message:
******************************************************************************** Please avoid running ``setup.py`` directly. Instead, use pypa/build, pypa/installer or other standards-based tools. See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. ********************************************************************************
I had several immediate reactions:
- OK, maybe this is a sign I should be modernizing to
pyproject.toml
and moving away frompython setup.py
. Maybe the missingsetuptools
in the 3.12 CI environment is a side-effect of this policy shift? - What are
pypa/build
andpypa/installer
? I've never heard of them. I knowpypa
is the Python Packaging Authority (I suspect most Python developers don't know this). Are these GitHub org/repo identifiers? - What exactly is a standards-based tool? Is pip not a standards-based tool?
- Speaking of pip, why isn't it mentioned? I thought pip was the de facto packaging tool and had been for a while!
- It's linking a URL for more info. But why is this a link to what looks like an individual's blog and not to some more official site, like the setuptools or pip docs? Or anything under python.org?
Learning That I Shouldn't Invoke python setup.py
I open https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html in my browser and see a 4,000+ word blog post. Oof. Do I really want/need to read this? Fortunately, the author included a tl;dr and linked to a summary section telling me a lot of useful information! It informs me (my commentary in parentheses):
- The setuptools project has stopped maintaining all direct invocations of setup.py years ago. (What?!)
- There are undoubtedly many ways that your setup.py-based system is broken today, even if it's not failing loudly or obviously.. (What?! Surely this can't be true. I didn't see any warnings from tooling until recently. How was I supposed to know this?)
- PEP 517, 518 and other standards-based packaging are the future of the Python
ecosystem. (A ha - a definition of standards-based tooling. I guess I have
to look at PEP 517 and PEP 518 in more detail. I'm pretty sure these are
the PEPs that define
pyproject.toml
.) - At this point you may be expecting me to give you a canonical list of the right
way to do everything that setup.py used to do, and unfortunately the answer here
is that it's complicated. (You are telling me that we had a working
python setup.py
solution for 10+ years, this workflow is now quasi deprecated, and the recommended replacement is it's complicated?! I'm just trying to get my package modernized. Why does that need to be complicated?) - That said, I can give you some simple "works for most people" recommendations for some of the common commands. (Great, this is exactly what I was looking for!)
Then I look at the table mapping old ways to new ways. In the new column, it
references the following tools: build,
pytest, tox, nox,
pip, and twine. That's quite the
tooling salad! (And that build
tool must be the pypa/build referenced in the
setuptools warning message. One mystery solved!)
I scroll back to the top of the article and notice the date: October 2021. Two
years old. The summary section also mentioned that there's been a lot of
activity around packaging tooling occurring. So now I'm wondering if this blog
post is outdated. Either way, it is clear I have to perform some additional
research to figure out how to migrate off python setup.py
so I can be
compliant with the new world order.
Learning About pyproject.toml
and Build Systems
I had pre-existing knowledge of pyproject.toml
as the modern way to define
build system metadata. So I decide to start my research by Googling
pyproject.toml
. The first results are:
- https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/
- https://stackoverflow.com/questions/62983756/what-is-pyproject-toml-file-for
- https://python-poetry.org/docs/pyproject/
- https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html
- https://godatadriven.com/blog/a-practical-guide-to-setuptools-and-pyproject-toml/
- https://towardsdatascience.com/pyproject-python-9df8cc092f61
I click pip's documentation first because pip is known to me and it seems a
canonical source. Pip's documentation proceeds to link to
PEP-518, PEP-517,
PEP-621, and PEP-660
before telling me how projects with pyproject.toml
are built, without giving
me - a package maintainer - much useful advice for what to do or how to port from
setup.py
. This seems like a dead end.
Then I look at the Stack Overflow link. Again, telling me a lot of what I don't really care about. (I've somewhat lost faith in Stack Overflow and only really skimmed this page: I would much prefer to get an answer from a first party source.)
I click on the Poetry link. It
documents TOML fields. But only for the [tool.poetry]
section. While I've
heard about Poetry, I know that I probably don't want to scope bloat myself
to learn how Poetry works so I can use it. (No offence meant to the Poetry
project here but I don't perceive my project as needing whatever features
Poetry provides: I'm just trying to publish a simple library package.) I
go back to the search results.
I click on the setuptools
link. I'm using setuptools via setup.py
so this content looks promising! It gives
me a nice example TOML of how to configure a [build-system]
and [project]
metadata. It links to PyPA's Declaring project metadata
content, which I open in a new tab, as the content seems useful. I continue
reading setuptools documentation. I land on its
Quickstart
documentation, which seems useful. I start reading it and it links to the
build tool documentation.
That's the second link to the build
tool. So I open that in a new tab.
At this point, I think I have all the documentation on pyproject.toml
. But
I'm still trying to figure out what to replace python setup.py
with. The
build
tool certainly seems like a contender since I've seen multiple
references to it. But I'm still looking for modern, actively maintained
documentation pointing me in a blessed direction.
The next Google link is A Practical Guide to Setuptools and Pyproject.toml.
I start reading that. I'm immediately confused because it is recommending I
put setuptools metadata in setup.cfg
files. But I just read all about defining
this metadata in pyproject.toml
files in setuptools' own documentation! Is
this blog post out of date? March 12, 2022. Seems pretty modern. I look at the
setuptools documentation again and see the pyproject.toml
metadata pieces are
in version 61.0.0 and newer. I go to
https://github.com/pypa/setuptools/releases/tag/v61.0.0
and see version 61.0.0 was released on March 25, 2022. So the fifth Google link
was seemingly obsoleted 13 days after it was published. Good times. I pretend I
never read this content because it seems out of date.
The next Google link is https://towardsdatascience.com/pyproject-python-9df8cc092f61. I click through. But Medium wants me to log in to read it all and it is unclear it is going to tell me anything important, so I back out.
Learning About the build
Tool
I give up on Google for the moment and start reading up on the build
tool
from its docs.
The only usage documentation for the build
tool is on its
root documentation page.
And that documentation basically prints what python -m build --help
would
print: says what the tool does but doesn't give any guidance or where I should
be using it or how to replace existing tools (like python setup.py
invocations).
Yes, I can piece the parts together and figure out that python -m build
can be
used as a replacement for python setup.py sdist
and python setup.py bdist_wheel
(and maybe pip wheel
?). But should it be the replacement I choose? I make
use of python setup.py develop
and the aforementioned blog post recommended
replacing that with python -m pip install -e
. Perhaps I can use pip
as the
singular replacement for building source distributions and binary wheels so I
have N-1 packaging tools? I keep researching.
Exploring the Python Packaging User Guide
I had previously opened https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
in a browser tab without really looking at it. On second glance, I see it is part
of a broader Python Packaging User Guide.
Oh, this looks promising! A guide on how to do what I'm seeking maintained by the
Python Packaging Authority (PyPA), the group who I know to be the, well, authorities
on Python packaging. It is is published under the canonical python.org
domain.
Surely the answer will be here.
I immediately click on the link to Packaging Python Projects to hopefully see what the PyPA folks are recommending.
Is Hatch the Answer?
I skim through. I see recommendations to use a pyproject.toml
with a
[build-system]
to define the build backend. This matches my expectations.
But they are using Hatchling as their build backend. Another tool I don't
really know about. I click through some inline links and eventually arrive
at https://github.com/pypa/hatch. (I'm kind of
confused why the PyPA tutorial said Hatchling when the project and tool is
apparently named Hatch. But whatever.)
I skim Hatch's GitHub README. It looks like a unified packaging tool. Build
system. Package uploading/publishing. Environment management (sounds like a
virtualenv alternative?). This tool actually seems quite nice! I start skimming
the docs. Like Poetry, it seems like this is yet another new tool that I'd need
to learn and would require me to blow up my existing setup.py
in order to
adopt. Do I really want to put in that effort? I'm just trying to get
python-zstandard back on the paved road and avoid seemingly deprecated workflows:
I'm not looking to adopt new tooling stacks.
I'm also further confused by the existence of Hatch under the PyPA GitHub Organization. That's the same GitHub organization hosting the Python packaging tools that are known to me, namely build, pip, and setuptools. Those three projects are pinned repositories. (The other three pinned repositories are virtualenv, wheel, and twine.) Hatch is seemingly a replacement for pip, setuptools, virtualenv, twine, and possibly other tools. But it isn't a pinned repository. Yet it is the default tool used in the PyPA maintained Packaging Python Projects guide. (That guide also suggests using other tools like setuptools, flit, and pdm. But the default is Hatch and that has me asking questions. Also, I didn't initially notice that Creating pyproject.toml has multiple tabs for different backends.)
While Hatch looks interesting, I'm just not getting a strong signal that Hatch is sufficiently stable or warrants my time investment to switch to. So I go back to reading the Python Packaging User Guide.
The PyPA User Guide Search Continues
As I click around the User Guide, it is clear the PyPA folks really want me
to use pyproject.toml
for packaging. I suppose that's the future and that's
a fair ask. But I'm still confused how I should migrate my setup.py
to it.
What are the risks with replacing my setup.py
with pyproject.toml
? Could
I break someone installing my package on an old Linux distribution or old
virtualenv using an older version of setuptools or pip? Will my adoption of
build, hatch, poetry, whatever constitute a one way door where I lock out
users in older environments? My package is downloaded over one million times
per month and if I break packaging someone is likely to complain.
I'm desperately looking for guidance from the PyPA at https://packaging.python.org/ on how to manage this migration. But I just... can't find it. Guides surprisingly has nothing on the topic.
Outdated Tool Recommendations from the PyPA
Finally I find Tool recommendations in the PyPA User Guide. Under Packaging tool recommendations it says:
- Use setuptools to define projects.
- Use build to create Source Distributions and wheels.
- If you have binary extensions and want to distribute wheels for multiple platforms, use cibuildwheel as part of your CI setup to build distributable wheels.
- Use twine for uploading distributions to PyPI.
Finally, some canonical documentation from the PyPA that comes out and suggests what to use!
But my relief immediately turns to questioning whether this tooling recommendations documentation is up to date:
- If setuptools is recommended, why does the Packaging Python Projects tutorial use Hatch?
- How exactly should I be using setuptools to define projects? Is this
referring to setuptools as a
[build-system]
backend? The existence of define seemingly implies usingsetup.py
orsetup.cfg
to define metadata. But I thought these distutils/setuptools specific mechanisms were deprecated in favor of the more genericpyproject.toml
? - Why aren't other tools like Hatch, pip, poetry, flit, and pdm mentioned on this page? Where's the guidance on when to use these alternative tools?
- There are footnotes referencing
distutils
as if it is still a modern practice. No mention that it was removed from the standard library in Python 3.12. - But the
build
tool is referenced and that tool is relatively new. So the docs have to be somewhat up-to-date, right?
Sadly, I reach the conclusion that this
Tool recommendations
documentation is inconsistent with newer documentation and can't be trusted.
But it did mention the build
tool and we now have multiple independent
sources steering me in the direction of the build
tool (at least for source
distribution and wheel building), so it seems like we have a winner on our
hands.
Initial Failures Running build
So let's use the build
tool. I remember docs saying to invoke it with
python -m build
, so I try that:
$ python3.12 -m build --help No module named build.__main__; 'build' is a package and cannot be directly executed
So the build
package exists but it doesn't have a __main__
. Ummm.
$ python3.12R Python 3.12.0 (main, Oct 23 2023, 19:58:35) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import build >>> build.__spec__ ModuleSpec(name='build', loader=<_frozen_importlib_external.NamespaceLoader object at 0x10d403bc0>, submodule_search_locations=_NamespacePath(['/Users/gps/src/python-zstandard/build']))
Oh, it picked up the build
directory from my source checkout because
sys.path
has the current directory by default. Good times.
$ (cd ~ && python3.12 -m build) /Users/gps/.pyenv/versions/3.12.0/bin/python3.12: No module named build
I guess build
isn't installed in my Python distribution / environment.
You used to be able to build packages using just the Python standard library.
I guess this battery is no longer included in the stdlib. I shrug and continue.
Installing build
I go to the Build installation docs.
It says to pip install build
. (I thought I read years ago that one should
use python3 -m pip
to invoke pip. Strange that a PyPA maintained tool is
telling me to invoke pip
directly since I'm pretty sure a lot of the reasons
to use python -m
to invoke tools are still valid. But I digress.)
I follow the instructions, installing it to the global site-packages
because I figure I'll use this tool a lot and I'm not a virtual environment
purist:
$ python3.12 -m pip install build Collecting build Obtaining dependency information for build from https://files.pythonhosted.org/packages/93/dd/b464b728b866aaa62785a609e0dd8c72201d62c5f7c53e7c20f4dceb085f/build-1.0.3-py3-none-any.whl.metadata Downloading build-1.0.3-py3-none-any.whl.metadata (4.2 kB) Collecting packaging>=19.0 (from build) Obtaining dependency information for packaging>=19.0 from https://files.pythonhosted.org/packages/ec/1a/610693ac4ee14fcdf2d9bf3c493370e4f2ef7ae2e19217d7a237ff42367d/packaging-23.2-py3-none-any.whl.metadata Downloading packaging-23.2-py3-none-any.whl.metadata (3.2 kB) Collecting pyproject_hooks (from build) Using cached pyproject_hooks-1.0.0-py3-none-any.whl (9.3 kB) Using cached build-1.0.3-py3-none-any.whl (18 kB) Using cached packaging-23.2-py3-none-any.whl (53 kB) Installing collected packages: pyproject_hooks, packaging, build Successfully installed build-1.0.3 packaging-23.2 pyproject_hooks-1.0.0
That downloads and installs wheels for build
, packaging
, and
pyproject_hooks
.
At this point the security aware part of my brain is screaming because we
didn't pin versions or SHA-256 digests of any of these packages
anywhere. So if a malicious version of any of these packages is somehow
uploaded to PyPI that's going to be a nightmare software supply chain
vulnerability having similar industry impact as
log4shell. Nowhere in build's
documentation does it mention this or say how to securely install build.
I suppose you have to just know about the supply chain gotchas with
pip install
in order to mitigate this risk for yourself.
Initial Results With build
Are Promising
After getting build
installed, python3.12 -m build --help
works now
and I can build a wheel:
$ python3.12 -m build --wheel . * Creating venv isolated environment... * Installing packages in isolated environment... (setuptools >= 40.8.0, wheel) * Getting build dependencies for wheel... ... * Installing packages in isolated environment... (wheel) * Building wheel... running bdist_wheel running build running build_py ... Successfully built zstandard-0.22.0.dev0-cp312-cp312-macosx_14_0_x86_64.whl
That looks promising! It seems to have invoked my setup.py
without me
having to define a [build-system]
in my pyproject.toml
! Yay for backwards
compatibility.
The Mystery of the Missing cffi
Package
But I notice something.
My setup.py
script conditionally builds a zstandard._cffi
extension
module if import cffi
succeeds. Building with build
isn't building this
extension module.
Before using build
, I had to run setup.py
using a python
having the
cffi
package installed, usually a project-local virtualenv. So let's try
that:
$ venv/bin/python -m pip install build cffi ... $ venv/bin/python -m build --wheel . ...
And I get the same behavior: no CFFI extension module.
Staring at the output, I see what looks like a smoking gun:
* Creating venv isolated environment... * Installing packages in isolated environment... (setuptools >= 40.8.0, wheel) * Getting build dependencies for wheel... ... * Installing packages in isolated environment... (wheel)
OK. So it looks like build
is creating its own isolated environment
(disregarding the invoked Python environment having cffi
installed),
installing setuptools >= 40.8.0
and wheel
into it, and then executing
the build from that environment.
So build
sandboxes builds in an ephemeral build environment. This actually
seems like a useful feature to help with deterministic and reproducible
builds: I like it! But at this moment it stands in the way of progress. So
I run python -m build --help
, spot a --no-isolation
argument and do the
obvious:
$ venv/bin/python -m build --wheel --no-isolation . ... building 'zstandard._cffi' extension ...
Success!
And I don't see any deprecation warnings either. So I think I'm all good.
But obviously I've ventured off the paved road here, as we had to violate
the default constraints of build
to get things to work. I'll get back to that
later.
Reproducing Working Wheel Builds With pip
Just for good measure, let's see if we can use pip wheel
to produce wheels,
as I've seen references that this is a supported mechanism for building wheels.
$ venv/bin/python -m pip wheel . Processing /Users/gps/src/python-zstandard Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: zstandard Building wheel for zstandard (pyproject.toml) ... done Created wheel for zstandard: filename=zstandard-0.22.0.dev0-cp312-cp312-macosx_14_0_x86_64.whl size=407841 sha256=a2e1cc1ad570ab6b2c23999695165a71c8c9e30823f915b88db421443749f58e Stored in directory: /Users/gps/Library/Caches/pip/wheels/eb/6b/3e/89aae0b17b638c9cdcd2015d98b85ee7fb3ef00325bb44a572 Successfully built zstandard
That output is a bit terse, since the setuptools build logs are getting swallowed.
That's fine. Rather than run with -v
to get those logs, I manually inspect
the built wheel:
$ unzip -lv zstandard-0.22.0.dev0-cp312-cp312-macosx_14_0_x86_64.whl Archive: zstandard-0.22.0.dev0-cp312-cp312-macosx_14_0_x86_64.whl Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 7107 Defl:N 2490 65% 10-23-2023 08:36 7bb42fff zstandard/__init__.py 13938 Defl:N 2498 82% 10-23-2023 08:36 8d8d1316 zstandard/__init__.pyi 919352 Defl:N 366631 60% 10-26-2023 08:28 3aeefc48 zstandard/backend_c.cpython-312-darwin.so 152430 Defl:N 32528 79% 10-26-2023 05:37 fc1a3c0c zstandard/backend_cffi.py 0 Defl:N 2 0% 12-26-2020 16:12 00000000 zstandard/py.typed 1484 Defl:N 784 47% 10-26-2023 08:28 facba579 zstandard-0.22.0.dev0.dist-info/LICENSE 2863 Defl:N 847 70% 10-26-2023 08:28 b8d80875 zstandard-0.22.0.dev0.dist-info/METADATA 111 Defl:N 106 5% 10-26-2023 08:28 878098e6 zstandard-0.22.0.dev0.dist-info/WHEEL 10 Defl:N 12 -20% 10-26-2023 08:28 a5f38e4e zstandard-0.22.0.dev0.dist-info/top_level.txt 841 Defl:N 509 40% 10-26-2023 08:28 e9a804ae zstandard-0.22.0.dev0.dist-info/RECORD -------- ------- --- ------- 1098136 406407 63% 10 files
(Python wheels are just zip files with certain well-defined paths having special meanings. I know this because I wrote Rust code for parsing wheels as part of developing PyOxidizer.)
Looks like the zstandard/_cffi.cpython-312-darwin.so
extension module is missing.
Well, at least pip
is consistent with build
! Although somewhat confusingly I don't
see any reference to a separate build environment in the pip output. But I suspect
it is there because cffi
is installed in the virtual environment I invoke pip from!
Reading pip help output, I find the relevant argument to not spawn a new environment and try again:
$ venv/bin/python -m pip wheel --no-build-isolation . <same exact output except the wheel size and digest changes> $ unzip -lv zstandard-0.22.0.dev0-cp312-cp312-macosx_14_0_x86_64.whl ... 1002664 Defl:N 379132 62% 10-26-2023 08:33 48afe5ba zstandard/_cffi.cpython-312-darwin.so ...
(I'm happy to see build
and pip
agreeing on the no isolation terminology.)
OK, so I got build
and pip
to behave nearly identically. I feel like I
finally understand this!
I also run pip -v wheel
and pip -vv wheel
to peek under the covers and see
what it's doing. Interestingly, I don't see any hint of a virtual environment
or temporary directory until I go to -vv
. I find it interesting that build
presents details about this by default but you have to put pip
in very verbose
mode to get it. I'm glad I used build
first because the ephemeral build
environment was the source of my missing dependency and pip
buried this
important detail behind a ton of other output in -vv
, making it much harder
to discover!
Understanding How setuptools
Gets Installed
When looking at pip's verbose output, I also see references to installing the
setuptools
and wheel
packages:
Processing /Users/gps/src/python-zstandard Running command pip subprocess to install build dependencies Collecting setuptools>=40.8.0 Using cached setuptools-68.2.2-py3-none-any.whl.metadata (6.3 kB) Collecting wheel Using cached wheel-0.41.2-py3-none-any.whl.metadata (2.2 kB) Using cached setuptools-68.2.2-py3-none-any.whl (807 kB) Using cached wheel-0.41.2-py3-none-any.whl (64 kB) Installing collected packages: wheel, setuptools Successfully installed setuptools-68.2.2 wheel-0.41.2 Installing build dependencies ... done
There's that setuptools>=40.8.0
constraint again. (We also saw it in build
.)
I rg 40.8.0
my source checkout (note: the .
in there are wildcard characters
since 40.8.0
is a regexp so this could over match) and come up with nothing.
If it's not coming from my code, where is it coming from?
In the pip documentation, Fallback behaviour
says that a missing [build-system]
from pyproject.toml
is implicitly
translated to the following:
[build-system] requires = ["setuptools>=40.8.0", "wheel"] build-backend = "setuptools.build_meta:__legacy__"
For build
, I go to the source code and discover that
similar functionality
was added in May 2020.
I'm not sure if this default behavior is specified in a PEP or what. But
build
and pip
seem to be agreeing on the behavior of adding
setuptools>=40.8.0
and wheel
to their ephemeral build environments and
invoking setuptools.build_meta:__legacy__
as the build backend as
implicit defaults if your pyproject.toml
lacks a [build-system]
. OK.
Being Explicit About The Build System
Perhaps I should consider defining [build-system]
and being explicit
about things? After all, the tools aren't printing anything indicating they
are assuming implicit defaults and for all I know the defaults could change
in a backwards incompatible manner in any release and break my build. (Although
I would hope to see a deprecation warning before that occurs.)
So I modify my pyproject.toml
accordingly:
[build-system] requires = [ "cffi==1.16.0", "setuptools==68.2.2", "wheel==0.41.2", ] build-backend = "setuptools.build_meta:__legacy__"
I pinned all the dependencies to specific versions because I like determinism and reproducibility. I really don't like when the upload of a new package version breaks my builds!
Software Supply Chain Weaknesses in pyproject.toml
When I pinned dependencies in [build-system]
in pyproject.toml
, the
security part of my brain is screaming over the lack of SHA-256
digest pinning.
How am I sure that we're using well-known, trusted versions of these dependencies? Are all the transitive dependencies even pinned?
Before pyproject.toml
, I used pip-compile
from
pip-tools to generate a requirements.txt
containing SHA-256 digests for all transitive dependencies. I would use
python3 -m venv
to create a virtualenv,
venv/bin/python -m pip install -r requirements.txt
to materialize a (highly
deterministic) set of packages, then run venv/bin/python setup.py
to invoke
a build in this stable and securely created environment. (Some) software supply chain
risks averted! But, uh, how do I do that with pyproject.toml
build-system.requires
? Does it even support pinning SHA-256 digests?
I skim the PEPs related to pyproject.toml
and don't see anything. Surely
I'm missing something.
In desperation I check the pip-tools project and sure enough they
document pyproject.toml integration.
However, they tell you how to feed requirements.txt
files into the dynamic
dependencies consumed by the build backend: there's nothing on how to securely
install the build backend itself.
As far as I can tell pyproject.toml
has no facilities for securely
installing (read: pinning content digests for all transitive dependencies)
the build backend itself. This is left as an exercise to the reader. But,
um, the build frontend (which I was also instructed to download insecurely
via python -m pip install
) is the thing installing the build backend. How am
I supposed to subvert the build frontend to securely install the build backend?
Am I supposed to disable default behavior of using an ephemeral environment
in order to get secure backend installs? Doesn't the ephemeral environment
give me additional, desired protections for build determinism and
reproducibility? That seems wrong.
It kind of looks like pyproject.toml
wasn't designed with software supply
chain risk mitigation as a criteria. This is extremely surprising for a build
system abstraction designed in the past few years. I shrug my shoulders and
move on.
Porting python setup.py develop
Invocations
Now that I figure I have a working pyproject.toml
, I move onto removing
python setup.py
invocations.
First up is a python setup.py develop --rust-backend
invocation.
My setup.py
performs very crude scanning
of sys.argv
looking for command arguments like --system-zstd
and
--rust-backend
as a way to influence the build. We just sniff these special
arguments and remove them from sys.argv
so they don't confuse the setuptools
options parser. (I don't believe this is a blessed way of doing custom options
handling in distutils/setuptools. But it is simple and has worked since I
introduced the pattern in 2016.)
Is --global-option
the Answer?
With python setup.py
invocations going away and a build frontend invoking
setup.py
, I need to find an alternative mechanism to pass settings into my
setup.py
.
Why you shouldn't invoke setup.py directly
tells me I should use pip install -e
. I'm guessing there's a way to instruct
pip install
to pass arguments to setup.py
.
$ venv/bin/python -m pip install --help ... -C, --config-settings <settings> Configuration settings to be passed to the PEP 517 build backend. Settings take the form KEY=VALUE. Use multiple --config-settings options to pass multiple keys to the backend. --global-option <options> Extra global options to be supplied to the setup.py call before the install or bdist_wheel command. ...
Hmmm. Not really sure which of these to use. But--global-option
mentions
setup.py
and I'm using setup.py
. So I try that:
$ venv/bin/python -m pip install --global-option --rust-backend -e . Usage: /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] <requirement specifier> [package-index-options] ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] -r <requirements file> [package-index-options] ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] [-e] <vcs project url> ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] [-e] <local project path> ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] <archive url/path> ... no such option: --rust-backend
Oh, duh, --rust-backend
looks like an argument and makes pip's own argument
parsing ambiguous as to how to handle it. Let's try that again with
--global-option=--rust-backend
:
$ venv/bin/python -m pip install --global-option=--rust-backend -e . DEPRECATION: --build-option and --global-option are deprecated. pip 24.0 will enforce this behaviour change. A possible replacement is to use --config-settings. Discussion can be found at https://github.com/pypa/pip/issues/11859 WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option. Obtaining file:///Users/gps/src/python-zstandard Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Preparing editable metadata (pyproject.toml) ... done Building wheels for collected packages: zstandard WARNING: Ignoring --global-option when building zstandard using PEP 517 Building editable for zstandard (pyproject.toml) ... done Created wheel for zstandard: filename=zstandard-0.22.0.dev0-0.editable-cp312-cp312-macosx_14_0_x86_64.whl size=4379 sha256=05669b0a5fd8951cac711923d687d9d4192f6a70a8268dca31bdf39012b140c8 Stored in directory: /private/var/folders/dd/xb3jz0tj133_hgnvdttctwxc0000gn/T/pip-ephem-wheel-cache-6amdpg21/wheels/eb/6b/3e/89aae0b17b638c9cdcd2015d98b85ee7fb3ef00325bb44a572 Successfully built zstandard Installing collected packages: zstandard Successfully installed zstandard-0.22.0.dev0
I immediately see the three DEPRECATION
and WARNING
lines (which are color
highlighted in my terminal, yay):
DEPRECATION: --build-option and --global-option are deprecated. pip 24.0 will enforce this behaviour change. A possible replacement is to use --config-settings. Discussion can be found at https://github.com/pypa/pip/issues/11859 WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option. WARNING: Ignoring --global-option when building zstandard using PEP 517
Yikes. It looks like --global-option
is deprecated and will be removed in pip 24.0.
And, later it says --global-option
was ignored. Is that true?!
$ ls -al zstandard/*cpython-312*.so -rwxr-xr-x 1 gps staff 1002680 Oct 27 11:35 zstandard/_cffi.cpython-312-darwin.so -rwxr-xr-x 1 gps staff 919352 Oct 27 11:35 zstandard/backend_c.cpython-312-darwin.so
Not seeing a backend_rust
library like I was expecting. So, yes, it does look
like --global-option
was ignored.
This behavior is actually pretty concerning to me. It certainly
seems like at one time --global-option
(and a --build-option
which doesn't
exist on the pip install
command I guess) did get threaded through to setup.py
.
However, it no longer does.
I find an entry in the pip 23.1 changelog:
Deprecate --build-option and --global-option. Users are invited to switch
to --config-settings. (#11859)
. Deprecate. What is pip's definition of
deprecate? I click the link to #11859.
An open issue with a lot of comments. I scan the issue history to find
referenced PRs and click on #11861.
OK, it is just an advertisement. Maybe --global-option
never got threaded
through to setup.py
? But its help usage text clearly says it is related to
setup.py
! Maybe the presence of [build-system]
in pyproject.toml
is
somehow engaging different semantics that result in --global-option
not
being passed to setup.py
? The warning message did say
Ignoring --global-option when building zstandard using PEP 517
.
I try commenting out the [build-system]
section in my pyproject.toml
and trying again. Same result. Huh? Reading the pip install --help
output,
I see --no-use-pep517
and try it:
$ venv/bin/python -m pip install --global-option=--rust-backend --no-use-pep517 -e . ... $ ls -al zstandard/*cpython-312*.so -rwxr-xr-x 1 gps staff 1002680 Oct 27 11:35 zstandard/_cffi.cpython-312-darwin.so -rwxr-xr-x 1 gps staff 919352 Oct 27 11:35 zstandard/backend_c.cpython-312-darwin.so -rwxr-xr-x 1 gps staff 2727920 Oct 27 11:53 zstandard/backend_rust.cpython-312-darwin.so
Ahh, so pip's default PEP-517 build mode is causing --global-option
to get
ignored. So I guess older versions of pip honored --global-option
and when
pip switched to PEP-517 build mode by default --global-option
just stopped
working and emitted a warning instead. That's quite the backwards incompatible
behavior break! I really wish tools would fail fast when making these kinds of
breaks or at least offer a --warnings-as-errors
mode so I can opt into fatal
errors when these kinds of breaks / deprecations are introduced. I would 100%
opt into this since these warnings are often the figurative needle in a haystack
of CI logs and easy to miss. Especially if the build environment is
non-deterministic and new versions of tools like pip get installed randomly
without a version control commit.
Pip's allowing me to specify --global-option
but then only issuing a
warning when it is ignored doesn't sit well with me. But what can I do?
It is obvious --global-option
is a non-starter here.
Attempts at Using --config-setting
Fortunately, pip's deprecation message suggests a path forward:
A possible replacement is to use --config-settings. Discussion can be found at https://github.com/pypa/pip/issues/11859
First, kudos for actionable warning messages. However, the wording says
possible replacement. Are there other alternatives I didn't see in the
pip install --help
output?
Anyway, I decide to go with that --config-settings
suggestion.
$ venv/bin/python -m pip install --config-settings=--rust-backend -e . Usage: /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] <requirement specifier> [package-index-options] ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] -r <requirements file> [package-index-options] ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] [-e] <vcs project url> ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] [-e] <local project path> ... /Users/gps/src/python-zstandard/venv/bin/python -m pip install [options] <archive url/path> ... Arguments to --config-settings must be of the form KEY=VAL
Hmmm. Let's try adding a trailing =
?
$ venv/bin/python -m pip install --config-settings=--rust-backend= -e . Obtaining file:///Users/gps/src/python-zstandard Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Preparing editable metadata (pyproject.toml) ... done Building wheels for collected packages: zstandard Building editable for zstandard (pyproject.toml) ... done Created wheel for zstandard: filename=zstandard-0.22.0.dev0-0.editable-cp312-cp312-macosx_14_0_x86_64.whl size=4379 sha256=619db9806bc4c39e973c3197a0ddb9b03b49fff53cd9ac3d7df301318d390b5e Stored in directory: /private/var/folders/dd/xb3jz0tj133_hgnvdttctwxc0000gn/T/pip-ephem-wheel-cache-gtsvw78d/wheels/eb/6b/3e/89aae0b17b638c9cdcd2015d98b85ee7fb3ef00325bb44a572 Successfully built zstandard Installing collected packages: zstandard Attempting uninstall: zstandard Found existing installation: zstandard 0.22.0.dev0 Uninstalling zstandard-0.22.0.dev0: Successfully uninstalled zstandard-0.22.0.dev0 Successfully installed zstandard-0.22.0.dev0
No warnings or deprecations. That's promising. Did it work?
$ ls -al zstandard/*cpython-312*.so -rwxr-xr-x 1 gps staff 1002680 Oct 27 12:11 zstandard/_cffi.cpython-312-darwin.so -rwxr-xr-x 1 gps staff 919352 Oct 27 12:11 zstandard/backend_c.cpython-312-darwin.so
No backend_rust
extension module. Boo. So what actually happened?
$ venv/bin/python -m pip -v install --config-settings=--rust-backend= -e .
I don't see --rust-backend
anywhere in that log output. I try with
more verbosity:
$ venv/bin/python -m pip -vvvvv install --config-settings=--rust-backend= -e .
Still nothing!
Maybe That --
prefix is wrong?
$ venv/bin/python -m pip -vvvvv install --config-settings=rust-backend= -e .
Still nothing!
I have no clue how --config-settings=
is getting passed
to setup.py
nor where it is seemingly getting dropped on the floor.
How Does setuptools Handle --config-settings
?
This must be documented in the setuptools project. So I open those docs in my web browser and do a search for settings. I open the first three results in separate tabs:
- Running setuptools commands
- Configuration File Options
- develop - Deploy the project source in "Development Mode"
That first link has docs on the deprecated setuptools commands and how to
invoke python setup.py
directly. (Note: there is a warning box here saying
that python setup.py
is deprecated. I guess I somehow missed this document
when looking at setuptools documentation earlier! In hindsight, it appears to
be buried at the figurative bottom of the docs tree as the last item under
a Backward compatibility & deprecated practice section
. Talk about burying
the lede!) These docs aren't useful.
The second link also takes me to deprecated documentation related to direct
python setup.py
command invocations.
The third link is also useless.
I continue opening search results in new tabs. Surely the answer is in here.
I find an Adding Arguments
section telling me that Adding arguments to setup is discouraged as such
arguments are only supported through imperative execution and not supported
through declarative config.
. I think that's an obtuse of saying that
sys.argv
arguments are only supported via python setup.py
invocations
and not via setup.cfg
or pyproject.toml
? But the example only shows me
how to use setup.cfg
and doesn't have any mention of pyproject.toml
. So
is this documentation even relevant to pyproject.toml
?
Eventually I stumble across Build System Support. In the Dynamic build dependencies and other build_meta tweaks section, I notice the following example code:
from setuptools import build_meta as _orig from setuptools.build_meta import * def get_requires_for_build_wheel(config_settings=None): return _orig.get_requires_for_build_wheel(config_settings) + [...] def get_requires_for_build_sdist(config_settings=None): return _orig.get_requires_for_build_sdist(config_settings) + [...]
config_settings=None
. OK, this might be the --config-settings
values passed to the build frontend getting fed into the build backend.
I Google get_requires_for_build_wheel
. One of the top results is
PEP-517, which I click on.
I see that the Build backend interface
consists of a handful of functions that are invoked by the build frontend.
These functions all seem to take a config_settings=None
argument. Great,
now I know the interface between build frontends and backends at the Python
API level. Where was I in this yak shave?
I remember from pyproject.toml
that one of the lines is
build-backend = "setuptools.build_meta:__legacy__"
. That
setuptools.build_meta:__legacy__
bit looks like a Python symbol reference.
Since the setuptools documentation didn't answer my question on how to
thread --config-settings
into setup.py
invocations, I
open the build_meta.py source code.
(Aside: experience has taught me that when in doubt on how something works,
consult the source code: code doesn't lie.)
I search for config_settings
. I immediately see
class _ConfigSettingsTranslator:
whose purported job is
Translate config_settings into distutils-style command arguments.
Only a limited number of options is currently supported.
Oh, this looks
relevant. But there's a fair bit of code in here. Do I really need to grok
it all? I keep scanning the source.
In a def _build_with_temp_dir()
I spot the following code:
sys.argv = [ *sys.argv[:1], *self._global_args(config_settings), *setup_command, "--dist-dir", tmp_dist_dir, *self._arbitrary_args(config_settings), ]
Ahh, cool. It looks to be calling self._global_args()
and
self._arbitrary_args()
and adding the arguments those functions return
to sys.argv
before evaluating setup.py
in the current interpreter.
I look at the definition of _arbitrary_args()
and I'm onto something:
def _arbitrary_args(self, config_settings: _ConfigSettings) -> Iterator[str]: """ Users may expect to pass arbitrary lists of arguments to a command via "--global-option" (example provided in PEP 517 of a "escape hatch"). ... """ args = self._get_config("--global-option", config_settings) global_opts = self._valid_global_options() bad_args = [] for arg in args: if arg.strip("-") not in global_opts: bad_args.append(arg) yield arg yield from self._get_config("--build-option", config_settings) if bad_args: SetuptoolsDeprecationWarning.emit( "Incompatible `config_settings` passed to build backend.", f""" The arguments {bad_args!r} were given via `--global-option`. Please use `--build-option` instead, `--global-option` is reserved for flags like `--verbose` or `--quiet`. """, due_date=(2023, 9, 26), # Warning introduced in v64.0.1, 11/Aug/2022. )
It looks to peek inside config_settings
and handle --global-option
and --build-option
specially. But we clearly see --global-option
is
deprecated in favor of --build-option
.
So is the --config-settings
key name --build-option
and its value
the setup.py
argument we want to insert?
I try that:
$ venv/bin/python -m pip install --config-settings=--build-option=--rust-backend -e . ... $ ls -al zstandard/*cpython-312*.so -rwxr-xr-x 1 gps staff 1002680 Oct 27 12:54 zstandard/_cffi.cpython-312-darwin.so -rwxr-xr-x 1 gps staff 919352 Oct 27 12:53 zstandard/backend_c.cpython-312-darwin.so -rwxr-xr-x 1 gps staff 2727920 Oct 27 12:54 zstandard/backend_rust.cpython-312-darwin.so
It worked!
Disbelief Over --config-settings=--build-option=
But, um, --config-settings=--build-option=--rust-backend
. We've triple encoded
command arguments here. This feels exceptionally weird. Is that really the
supported/preferred interface? Surely there's something simpler.
def _arbitrary_args()
's docstring mentioned escape hatch in the context
of PEP-517. I open PEP-517 and search for that term, finding
Config settings. Sure
enough, it is describing the mechanism I just saw the source code to. And its
pip example is using pip install
's --global-option
and --build-option
arguments. So this all seems to check out. (Although these pip arguments are
deprecated in favor of -C/--config-settings
.)
Thinking I missed some obvious documentation, I search the setuptools
documentation for --build-option.
The only hits are in the v64.0.0 changelog entry.
So you are telling me this feature of passing arbitrary config settings into
setup.py
via PEP-517 build frontends is only documented in the changelog?!
Ok, I know my setup.py
is abusing sys.argv
. I'm off the paved road for
passing settings into setup.py
. What is the preferred pyproject.toml
era mechanism for passing settings into setup.py
? These settings can't
be file based because they are dynamic. There must be a config_settings
mechanism to thread dynamic settings into setup.py
that doesn't rely on
these magical --build-option
and --global-option
settings keys.
I stare and stare at the
build_meta.py source code
looking for find an answer. But all I see is the def _build_with_temp_dir()
calling into self._global_args()
and self._arbitrary_args()
to append
arguments to sys.argv
. Huh? Surely this isn't the only solution. Surely
there's a simpler way. The setuptools documentation said Adding arguments
to setup is discouraged, seemingly implying a better way of doing it. And
yet the only code I'm seeing in build_meta.py
for passing custom
config_settings
values in is literally via additional setup.py
process
arguments. This can't be right.
I start unwinding my mental stack and browser tabs trying to come across something I missed.
I again look at
Dynamic build dependencies and other build_meta tweaks
and see its code is defining a custom [build-system]
backend that
does a from setuptools.build_meta import *
and defines some custom
build backend interface APIs (which receive config_settings
) and then
proxy into the original implementations. While the example is related to
build metadata, I'm thinking do I need to implement my own setuptools
wrapping build backend that implements a custom
def build_wheel() to
intercept config_settings
? Surely this is avoidable complexity.
Pip's Eager Deprecations
I keep unwinding context and again notice pip's warning message telling
me A possible replacement is to use --config-settings
. Discussion can
be found at https://github.com/pypa/pip/issues/11859.
I open pip issue #11859.
Oh, that's the same issue tracking the --global-option
deprecation I
encountered earlier. I again scan the issue timeline. It is mostly
references from other GitHub projects. Telltale sign that this
deprecation is creating waves.
The issue is surprisingly light on comments for how many references it has.
The comment with the most emoji reactions says:
Is there an example showing how to use --config-settings with setup.py and/or newer alternatives? The setuptools documentation is awful and the top search results are years/decades out-of-date and wildly contradictory.`
I don't know who you are, @alexchandel, but we're on the same wavelength.
Then the next comment says:
Something like this seems to work to pass global options to setuptools. pip -vv install --config-setting="--global-option=--verbose" . Passing --build-option in the same way does not work, as setuptools attempts to pass these to the egg_info command where they are not supported.
So there it seemingly is, confirmation that my independently derived
solution of --config-settings=--build-option=-...
is in fact the way to
go. But this commenter says to use --global-option
, which appears to
be deprecated in modern setuptools. Oof.
The next comment links to pypa/setuptools#3896
where apparently there's been an ongoing conversation since April about how
setuptools should design and document a stable mechanism to pass config_settings
to PEP517 backend.
If I'm interpreting this correctly, it looks like distutils/setuptools - the
primary way to define Python packages for the better part of twenty years -
doesn't have a stable mechanism for passing configuration settings from
modern pyproject.toml
[build-system]
frontends. Meanwhile pip is deprecating
long-working mechanisms to pass options to setup.py
and forcing people to
use a mechanism that setuptools doesn't explicitly document much less say is
stable. This is all taking place six years
after PEP-517 was accepted.
I'm kind of at a loss for words here. I understand pip's desire to delete some
legacy code and standardize on the new way of doing things. But it really looks
like they are breaking backwards compatibility for setup.py
a bit too
eagerly. That's a questionable decision in my mind, so I
write a detailed comment on the pip issue
explaining how the interface works and asking the pip folks to hold off on
deprecation until setuptools has a stable, documented solution. Time will
tell what happens.
In Summary
What an adventure that Python packaging yak shave was! I feel
like I just learned a whole lot of things that I shouldn't have needed to learn
in order to keep my Python package building without deprecation warnings.
Yes, I scope bloated myself to understanding how things worked because
that's my ethos. But even without that extra work, there's a lot here that I
feel I shouldn't have needed to do, like figure out the undocumented
--config-settings=--build-option=
interface.
Despite having ported my python setup.py
invocation to modern, PEP-517 build
frontends (build
and pip
) and gotten rid of various deprecation messages
and warnings, I'm still not sure the implications of that transition. I really
want to understand the trade-offs for adopting pyproject.toml
and using the
modern build frontends for doing things. But I couldn't find any documentation
on this anywhere! I don't know basic things like whether my adoption of
pyproject.toml
will break end-users stuck on older Python versions or what.
I still haven't ported my project metadata from setup.py
to pyproject.toml
because I don't understand the implications. I feel like I'm flying blind and
am bound to make mistakes with undesirable impacts to end-users of my package.
But at least I was able to remove deprecation warnings from my packaging CI with just several hours of work.
I recognize this post is light on constructive feedback and suggestions for how to improve matters.
One reason is that I think a lot of the improvements are self-explanatory - clearer warning messages, better documentation, not deprecating things prematurely, etc. I prefer to just submit PRs instead of long blog posts. But I just don't know what is appropriate in some cases: one of the themes of this post is I just don't grok the state of Python packaging right now.
This post did initially contain a few thousand words expanding on what all I thought was broken and how it should be fixed. But I stripped the content because I didn't want my (likely controversial) opinions to distract from the self-assessed user experience study documented in this post. This content is probably better posted to a PyPA mailing list anyway, otherwise I'm just another guy complaining on the Internet.
I've posted a link
to this post to the
packaging category on
discuss.python.org so the PyPA (and other
subscribed parties) are aware of all the issues I stumbled over. Hopefully
people with more knowledge of the state of Python packaging see this post,
empathize with my struggles, and enact meaningful improvements so others
can port off setup.py
with a fraction of the effort as it took me.
Announcing the PyOxy Python Runner
May 10, 2022 at 08:00 AM | categories: Python, PyOxidizerI'm pleased to announce the initial release of PyOxy. Binaries are available on GitHub.
(Yes, I used my pure Rust Apple code signing implementation to remotely sign the macOS binaries from GitHub Actions using a YubiKey plugged into my Windows desktop: that experience still feels magical to me.)
PyOxy is all of the following:
- An executable program used for running Python interpreters.
- A single file and highly portable (C)Python distribution.
- An alternative
python
driver providing more control over the interpreter than whatpython
itself provides. - A way to make some of PyOxidizer's technology more broadly available without using PyOxidizer.
Read the following sections for more details.
pyoxy
Acts Like python
The pyoxy
executable has a run-python
sub-command that will essentially
do what python
would do:
$ pyoxy run-python
Python 3.9.12 (main, May 3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
A Python REPL. That's familiar!
You can even pass python
arguments to it:
$ pyoxy run-python -- -c 'print("hello, world")'
hello, world
When a pyoxy
executable is renamed to any filename beginning with python
,
it implicitly behaves like pyoxy run-python --
.
$ mv pyoxy python3.9
$ ls -al python3.9
-rwxrwxr-x 1 gps gps 120868856 May 10 2022 python3.9
$ ./python3.9
Python 3.9.12 (main, May 3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Single File Python Distributions
The official pyoxy
executables are built with PyOxidizer and leverage the
Python distributions provided by my
python-build-standalone
project. On Linux and macOS, a fully featured Python interpreter and its library
dependencies are statically linked into pyoxy
. The pyoxy
executable also embeds
a copy of the Python standard library and imports it from memory using the
oxidized_importer
Python extension module.
What this all means is that the official pyoxy
executables can function as
single file CPython distributions! Just download a pyoxy
executable, rename it
to python
, python3
, python3.9
, etc and it should behave just like a normal
python
would!
Your Python installation has never been so simple. And fast: pyoxy
should be
a few milliseconds faster to initialize a Python interpreter mostly because of
oxidized_importer
and it avoiding filesystem overhead to look for and load
.py[c]
files.
Low-Level Control Over the Python Interpreter with YAML
The pyoxy run-yaml
command is takes the path to a YAML file defining the
embedded Python interpreter configuration and then launches that Python
interpreter in-process:
$ cat > hello_world.yaml <<EOF
---
allocator_debug: true
interpreter_config:
run_command: 'print("hello, world")'
...
EOF
$ pyoxy run-yaml hello_world.yaml
hello, world
Under the hood, PyOxy uses the
pyembed Rust crate to manage embedded
Python interpreters. The YAML document that PyOxy uses is simply deserialized
into a
pyembed::OxidizedPythonInterpreterConfig
Rust struct, which pyembed
uses to spawn a Python interpreter. This Rust struct
offers near complete control over how the embedded Python interpreter behaves: it
even allows you to tweak settings that are impossible to change from environment
variables or python
command arguments! (Beware: this power means you can
easily cause the interpreter to crash if you feed it a bad configuration!)
YAML Based Python Applications
pyoxy run-yaml
ignores all file content before the YAML ---
start document
delimiter. This means that on UNIX-like platforms
you can create executable YAML files defining your Python application. e.g.
$ mkdir -p myapp
$ cat > myapp/__main__.py << EOF
print("hello from myapp")
EOF
$ cat > say_hello <<"EOF"
#!/bin/sh
"exec" "`dirname $0`/pyoxy" run-yaml "$0" -- "$@"
---
interpreter_config:
run_module: 'myapp'
module_search_paths: ["$ORIGIN"]
...
EOF
$ chmod +x say_hello
$ ./say_hello
hello from myapp
This means that to distribute a Python application, you can drop a copy
of pyoxy
in a directory then define an executable YAML file masquerading
as a shell script and you can run Python code with as little as two files!
The Future of PyOxy
PyOxy is very young. I hacked it together on a weekend in September 2021. I wanted to shore up some functionality before releasing it then. But I got perpetually sidetracked and never did the work. I figured it would be better to make a smaller splash with a lesser-baked product now than wait even longer. Anyway...
As part of building PyOxidizer I've built some peripheral technology:
- Standalone and highly distributable Python builds via the python-build-standalone project.
- The pyembed Rust crate for managing an embedded Python interpreter.
- The oxidized_importer Python package/extension for importing modules from memory, among other things.
- The Python packed resources
data format for representing a collection of Python modules and resource
files for efficient loading (by
oxidized_importer
).
I conceived PyOxy as a vehicle to enable people to leverage PyOxidizer's technology without imposing PyOxidizer onto them. I feel that PyOxidizer's broader technology is generally useful and too valuable to be gated behind using PyOxidizer.
PyOxy is only officially released for Linux and macOS for the moment.
It definitely builds on Windows. However, I want to improve the single file
executable experience before officially releasing PyOxy on Windows. This
requires an extensive overhaul to oxidized_importer
and the way it
serializes Python resources to be loaded from memory.
I'd like to add a sub-command to produce a
Python packed resources
payload. With this, you could bundle/distribute a Python application as
pyoxy
plus a file containing your application's packed resources alongside
YAML configuring the Python interpreter. Think of this as a more modern and
faster version of the venerable zipapp
approach. This would enable PyOxy to
satisfy packaging scenarios provided by tools like Shiv, PEX, and XAR.
However, unlike Shiv and PEX, pyoxy
also provides an embedded Python
interpreter, so applications are much more portable since there isn't
reliance on the host machine having a Python interpreter installed.
I'm really keen to see how others want to use pyoxy
.
The YAML based control over the Python interpreter could be super useful for testing, benchmarking, and general Python interpreter configuration experimentation. It essentially opens the door to things previously only possible if you wrote code interfacing with Python's C APIs.
I can also envision tools that hide the existence of Python wanting to
leverage the single file Python distribution property of pyoxy
. For
example, tools like Ansible could copy pyoxy
to a remote machine to provide
a well-defined Python execution environment without having to rely on what
packages are installed. Or pyoxy
could be copied into a container or
other sandboxed/minimal environment to provide a Python interpreter.
And that's PyOxy. I hope you find it useful. Please file any bug reports or feature requests in PyOxidizer's issue tracker.
Announcing the 0.9 Release of PyOxidizer
October 18, 2020 at 10:00 PM | categories: Python, PyOxidizerI have decided to make up for the 6 month lull between PyOxidizer's 0.7 and 0.8 releases by releasing PyOxidizer 0.9 just 1 week after 0.8!
The full 0.9 changelog is found in the docs. First time user? See the Getting Started documentation.
While the 0.9 release is far smaller in terms of features compared to 0.8, it is an important release because of progress closing compatibility gaps.
Build a python
Executable
PyOxidizer 0.8 quietly shipped the ability to build executables that
behave like python
executables via enhancements to the configurability of
embedded Python interpreters.
PyOxidizer 0.9 made some minor changes to make this scenario work better
and there is even
official documentation
on how to achieve this. So now you can emit a python
executable next to your
application's executable. Or you could use PyOxidizer to build a highly portable,
self-contained python
executable and ship your Python scripts next to it,
using PyOxidizer's python
in your #!
.
Support Packaging Files as Files for Maximum Compatibility
There is a long-tail of Python packages that don't just work with PyOxidizer. A subset of these packages don't work because of bugs with how PyOxidizer attempts to classify files as specific types of Python resources.
The way that normal Python works is you materialize a bunch of files on
the filesystem and at run-time the filesystem-based importer stat()
s a
bunch of paths until it finds a candidate file satisfying the import
request. This works of course. But it is inefficient. Since PyOxidizer has
awareness of every resource being packaged at build time, it attempts to
index all known resources and serialize them to an efficient data structure
so finding and loading a resource can be extremely quick (effectively just a
hashmap lookup in Rust code to resolve the memory address of data).
PyOxidizer's approach does work in the majority of cases. But there are
edge cases. For example, NumPy's binary wheels have installed file paths
like numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
. The numpy.libs
directory is not a valid Python package directory since it has a .
and
since it doesn't have an __init__.py[c]
file. This is a case where
PyOxidizer's code for turning files into resources is currently confused.
It is tempting to argue that file layouts like NumPy's are wrong. But there doesn't seem to be any formal specification preventing the use of such layouts. The arbiter of truth here is what Python packaging tools accept and the current code for installing wheels gladly accepts file layouts like these. So I've accepted that PyOxidizer is just going to have to support edge cases like this. (I've captured more details about this particular issue in the docs).
Anyway, PyOxidizer 0.9 ships a new, simpler mode for handling files: files mode. In files mode, PyOxidizer disables its code for classifying files as typed Python resources (like module sources and extension modules) and instead treats a file as... a file.
When in files mode, actions that invoke Python packaging tools return files objects instead of classified resources. If you then add these files for packaging, those files are materialized on the filesystem next to your built executable. You can then use Python's standard filesystem importer to load these files at run-time.
This allows you to use PyOxidizer with packages like NumPy that were previously incompatible due to bugs with file/resource classification. In fact, getting NumPy working with PyOxidizer is now in the official documentation!
Files mode is still in its infancy. There exists code for embedding
files data in the produced executable. I plan to eventually teach PyOxidizer's
run-time code to extract these embedded files to a temporary directory,
SquashFS FUSE filesystem, etc. This is the approach that other Python
packaging tools like PyInstaller and XAR use. While it is less efficient, this
approach is highly compatible with Python code in the wild since you sidestep
issues with __file__
and other assumptions about installed file layouts. So
it makes sense for PyOxidizer to provide support for this so you can still
achieve the friendliness of a self-contained executable without worrying
about compatibility. Look for improvements to files mode in future releases.
And to help debug issues with PyOxidizer's file handling and resource classification, the new pyoxidizer find-resources command can be used to invoke PyOxidizer's code for scanning and classifying files. Hopefully this makes it easier to diagnose bugs in this critical component of PyOxidizer!
Some Important Bug Fixes
PyOxidizer 0.8 shipped with some pretty annoying bugs and behavior quirks.
The ability to set custom sys.path
values via Starlark was broken. How I
managed to ship that, I'm not sure. But it is fixed in 0.9.
Another bug I can't believe I shipped was
the PythonExecutable.read_virtualenv()
Starlark method being broken due to
a typo. You can read from virtualenvs again in PyOxidizer 0.9.
Another important improvement is in the default Python interpreter
configuration. We now automatically initialize Python's locales configuration
by default. Without this, the encoding of filesystem paths and sys.argv
may
not have been correct. If someone passed a non-ASCII argument, the Python str
value was likely mangled. PyOxidizer built binaries should behave reasonably
by default now. The issue
is a good read if the subtle behaviors of how encodings work in Python and on
different operating systems is interesting to you.
Better Binary Portability Documentation
The documentation on binary portability has been overhauled. Hopefully it is much more clear about the capabilities of PyOxidizer to produce a binary that just works on other machines.
I eventually want to get PyOxidizer to a point where users don't have to think about binary portability. But until PyOxidizer starts generating installers and providing the ability to run builds in deterministic and reproducible environments, it is sadly a problem that is being externalized to end users.
In Conclusion
PyOxidizer 0.9 is a small release representing just 1 week of work. But it contains some notable features that I wanted to get out the door.
As always, please report any issues or feedback in the GitHub issue tracker or the users mailing list.
Announcing the 0.8 Release of PyOxidizer
October 12, 2020 at 12:45 AM | categories: Python, PyOxidizerI am very excited to announce the 0.8 release of PyOxidizer, a modern Python application packaging tool. You can find the full changelog in the docs. First time user? See the Getting Started documentation.
Foremost, I apologize that this release took so long to publish (0.7 was released on 2020-04-09). I fervently believe that frequent releases are a healthy software development practice. And 6 months between PyOxidizer releases was way too long. Part of the delay was due to world events (it has proven difficult to focus on... anything given a global pandemic, social unrest, and wildfires further undermining any resemblance of lifestyle normalcy in California). Another contributing factor was I was waiting on a few 3rd party Rust crates to have new versions published to crates.io (you can't release a crate to crates.io unless all your dependencies are also published there).
Release delay and general life hardships aside, the 0.8 release is here and it is full of notable improvements!
Python 3.8 and 3.9 Support
PyOxidizer 0.8 now targets Python 3.8 by default and support for Python 3.9 is available by tweaking configuration files. Previously, we only supported Python 3.7 and this release drops support for Python 3.7. I feel a bit bad for dropping compatibility. But Python 3.8 introduced a new C API for initializing Python interpreters (thank you Victor Stinner!) and this makes PyOxidizer's run-time code for interfacing with Python interpreters vastly simpler. I decided that given the beta nature of PyOxidizer, it wasn't worth maintaining complexity to continue to support Python 3.7. I'm optimistic that I'll be able to support Python 3.8 as a baseline for a while.
Better Default Packaging Settings
PyOxidizer started as a science experiment of sorts to see if I could achieve the elusive goal of producing a single file executable providing a Python application. I was successful in proving this hypothesis. But the cost to achieving this outcome was rather high in terms of end-user experience: in order to produce single file executables, you had to break a lot of assumptions about how Python typically works and this in turn broke a lot of Python code and packages in the wild.
In other words, PyOxidizer's opinionated defaults of producing a single file executable were externalizing hardship on end-users and preventing them from using PyOxidizer.
PyOxidizer 0.8 contains a handful of changes to defaults that should hopefully lessen the friction.
On Windows, the default Python distribution now has a more traditional
build configuration (using .pyd
extension modules and a pythonXY.dll
file). This means that PyOxidizer can consume pre-built extension modules
without having to recompile them from source. If you publish a Windows
binary wheel on PyPI, in many cases it will just work with PyOxidizer
0.8! (There are some notable exceptions to this, such as numpy, which is
doing wonky things with the location of shared libraries in wheels - but
I aim to fix this soon.)
Also on Windows, we no longer attempt to embed Python extension modules
(.pyd
files) and their shared library dependencies in the produced
binary and load them from memory by default. This is because PyOxidizer's
from-memory library loader didn't work in all cases. For example, some
OpenSSL functionality used by the _ssl
module in the standard library
didn't work, preventing Python from establishing TLS connections. The old
mode enabling you to produce a single file executable on Windows is still
available. But you have to opt in to it (at the likely cost of more
packaging and compatibility pain).
Starlark Configuration Overhaul
PyOxidizer 0.8 contains a ton of changes to its Starlark configuration files. There are so many changes that you may find it easier to port to PyOxidizer 0.8 by creating a new configuration file rather than attempting to port an existing one.
I apologize for this churn and recognize it will be disruptive. However, this churn needed to happen for various reasons.
Much of the old Starlark configuration semantics was rooted in the days when configuration files were static TOML files. Now that configuration files provide the power of a (Python-inspired) programming language, we are free to expose much more flexibility. But that flexibility requires refactoring things so the experience feels more native.
Many changes to Starlark were rooted in necessity. For example,
the methods for invoking setup.py
or pip install
used to live on a
Python distribution type
and have been moved to a
type representing executables.
This is because the binary we are targeting influences how
packaging actions behave. For example, if the binary only supports
loading resources from memory (as opposed to standalone files), we need
to know that when invoking the packaging tool so we can produce files
(notably Python extension modules) compatible with the destination.
A major change to Starlark in 0.8 is around resource location handling. Before, you could define a static string denoting the resources policy for where things should be placed. And there were 10+ methods for adding different resource types (source, bytecode, extensions, package data) to different load locations (memory, filesystem). This mechanism is vastly simplified and more powerful in PyOxidizer 0.8!
In PyOxidizer 0.8, there is a single add_python_resource() method for adding a resource to a binary and the Starlark objects you add can denote where they should be added by defining attributes on those objects.
Furthermore, you can define a Starlark function that is called when resource objects are created to apply custom packaging rules using custom Starlark code defined in your PyOxidizer config file. So rather than having everyone try to abide by a few pre-canned policies for packaging resources, you can define a proper function in your config file that can be as complex as you want/need it to be! I feel this is vastly simpler and more powerful than implementing a custom DSL in static configuration files (like TOML, JSON, YAML, etc).
While the ability to implement your own arbitrarily complex packaging policies is useful, there is a new PythonPackagingPolicy Starlark type with enough flexibility to suit most needs.
Shipping oxidized_importer
During the development of PyOxidizer 0.8, I broke out the custom
Rust-based Python meta-path importer used by PyOxidizer's run-time code
into a standalone Python package. This sub-project is called
oxidized_importer
and I previously
blogged about it.
PyOxidizer 0.8 ships oxidized_importer
and all of its useful APIs
available to Python. Read more in the
official docs.
The new Python APIs should make debugging issues with PyOxidizer-packaged
applications vastly simpler: I found them invaluable when tracking down
user-reported bugs!
Tons of New Tests and Refactored Code
PyOxidizer was my first non-toy Rust project. And the quality of the Rust code I produced in early versions of PyOxidizer clearly showed it. And when I was in the rapid-prototyping phase of PyOxidizer, I eschewed writing tests in favor of short-term progress.
PyOxidizer 0.8 pays down a ton of technical debt in the code base. Lots of Rust code has been refactored and is using somewhat reasonable practices. I'm not yet a Rust guru. But I'm at the point where I cringe when I look at some of the early code I wrote, which is a good sign. I do have to say that Rust has been a dream to work with during this transition. Despite being a low-level language, my early misuse of Rust did not result in crashes like you would see in languages like C/C++. And Rust's seemingly omniscient compiler and IDE tools facilitating refactoring have ensured that code changes aren't accompanied by subtle random bugs that would occur in dynamic programming languages. I really need to write a dedicated post espousing the virtues of Rust...
There are a ton of new tests in PyOxidizer 0.8 and I now feel somewhat
confident that the main
branch of PyOxidizer should be considered
production-ready at any time assuming the tests pass. This will hopefully
lead to more rapid releases in the future.
There are now tests for the pyembed
Rust crate, which provides the
run-time code for PyOxidizer-built binaries. We even have
Python-based unit tests
for validating the Python-exposed APIs behave as expected. These tests have
been invaluable for ensuring that the run-time code works as expected. So now
when someone files a bug I can easily write a test to capture it and keep
the code working as intended through various refactors.
The packaging-time Rust code has also gained its fair share of tests. We now have fairly comprehensive test coverage around how resources are added/packaged. Python extension modules have proved to be highly nuanced in how they are handled. Tremendously helping testing of extension modules is that we're able to run tests for platform non-native extensions! While not yet exposed/supported by Starlark configuration files, I've taught PyOxidizer's core Rust code to be cross-compiling aware so that we can e.g. test Windows or macOS behavior from Linux. Before, I'd have to test Windows wheel handling on Windows. But after writing a wheel parser in Rust and teaching PyOxidizer to use a different Python distribution for the host architecture from the target architecture, I'm now able to write tests for platform-specific functionality that run on any platform that PyOxidizer can run on. This may eventually lead to proper cross-compiling support (at least in some configuration). Time will tell. But the foundation is definitely there!
New Rust Crates
As part of the aforementioned refactoring of PyOxidizer's Rust code, I've been extracting some useful/generic functionality built as part of developing PyOxidizer to their own Rust crates.
As part of this release, I'm publishing the initial 0.1 release of the python-packaging crate (docs). This crate provides pure Rust code for various Python packaging related functionality. This includes:
- Rust types representing Python resource types (source modules, bytecode modules, extension modules, package resources, etc).
- Scanning the filesystem for Python resource files .
- Configuring an embedded Python interpreter.
- Parsing
PKG-INFO
and related files. - Parsing wheel files.
- Collecting Python resources and serializing them to a data structure.
The crate is somewhat PyOxidizer centric. But if others are interested in improving its utility, I'll happily accept pull requests!
PyOxidizer's crates footprint now includes:
Major Documentation Updates
I strongly believe that software should be documented thoroughly and I strive for PyOxidizer's documentation to be useful and comprehensive.
There have been a lot of changes to PyOxidizer's documentation since the 0.7 release.
All configuration file documentation has been consolidated.
Likewise, I've attempted to consolidate a lot of the paved road documentation for how to use PyOxidizer in the Packaging User Guide section of the docs.
I'll be honest, since I have so much of PyOxidizer's workings internalized, it can be difficult for me to empathize with PyOxidizer's users. So if you have difficult with the readability of the documentation, please file an issue and report what is confusing so the documentation can be improved!
Mercurial Shipping With PyOxidizer 0.8
PyOxidizer is arguably an epic yak shave of mine to help the Mercurial version control tool transition to Python 3 and Rust.
I'm pleased to report that Mercurial is now shipping PyOxidizer-built distributions on Windows as of the 5.2.2 release a few days ago! If a complex Python application like Mercurial can be configured to work with PyOxidizer, chances are your Python application will work as well.
Whats Next
I view PyOxidizer 0.8 as a pivotal release where PyOxidizer is turning the corner from a prototyping science experiment to something more generally usable. The investments in test coverage and refactoring of the Rust internals are paving the way towards future features and bug fixes.
In upcoming releases, I'd like to close remaining known compatibility gaps with popular Python packages (such as numpy and other packages in the scientific/data space). I have a general idea of what work needs to be done and I've been laying the ground work via various refactorings to execute here.
I want a general theme of future releases to be eliminating reasons why people can't use PyOxidizer. PyOxidizer's historical origin was as a science experiment to see if single file Python applications were possible. It is clear that achieving this is fundamentally incompatible with compatibility with tons of Python packages in the wild. I'd like to find a way where PyOxidizer can achieve 99% package compatibility by default so new users don't get discouraged when using PyOxidizer. And for the subset of users who want single file executables, they can spend the magnitude of additional effort to achieve that.
At some point, I also want to make a pivot towards focusing on producing distributable artifacts (Debian/RPM packages, MSI installers, macOS DMG files, etc). I'm slightly bummed that I haven't made much progress here. But I have a vision in my mind of where I want to go (I'll be making a standalone Rust crate + Starlark dialect to facilitate producing distributable artifacts for any application) and I'm anticipating starting this work in the next few months. In the mean time, PyOxidizer 0.8 should be able to give people a directory tree that they can coerce into distributable artifacts using existing packaging tooling. That's not as turnkey as I would like it to be. But the technical problems around building a distributable Python application binary still needs some work and I view that as the most pressing need for the Python ecosystem. So I'll continue to focus there so there is a solid foundation to build upon.
In conclusion, I hope you enjoy the new release! Please report any issues or feedback in the GitHub issue tracker.
Next Page ยป