Technical Notes¶
How It Works¶
The first thing the build-*
scripts do is bootstrap an environment
for building Python. On Linux, a base Docker image based on a deterministic
snapshot of Debian Wheezy is created. A modern binutils and GCC are built
in this environment. That modern GCC is then used to build a modern Clang.
Clang is then used to build all of Python’s dependencies (openssl, ncurses,
libedit, sqlite, etc). Finally, Python itself is built.
Python is built in such a way that extensions are statically linked
against their dependencies. e.g. instead of the sqlite3
Python
extension having a run-time dependency against libsqlite3.so
, the
SQLite symbols are statically inlined into the Python extension object
file.
From the built Python, we produce an archive containing the raw Python
distribution (as if you had run make install
) as well as other files
useful for downstream consumers.
Setup.local Hackery¶
Python’s build system reads the Modules/Setup
and Modules/Setup.local
files to influence how C extensions are built. By default, many extensions
have no entry in these files and the setup.py
script performs work
to compile these extensions. (setup.py
looks for headers, libraries,
etc, and sets up the proper compiler flags.)
setup.py
doesn’t provide a lot of flexibility and relies on a lot
of default behavior in distutils
as well as other inline code in
setup.py
. This default behavior is often undesirable for our
desired outcome of producing a standalone Python distribution.
Since the build environment is mostly deterministic and since we have
special requirements, we generate a custom Setup.local
file that
builds C extensions in a specific manner. The undesirable behavior of
setup.py
is bypassed and the Python C extensions are compiled just
the way we want.
Dependency Notes¶
DBM¶
Python has the option of building its _dbm
extension against
NDBM, GDBM, and Berkeley DB. Both NDBM and GDBM are GNU GPL Version 3.
Modern versions of Berkeley DB are GNU AGPL v3. Versions 6.0.19 and
older are licensed under the Sleepycat License. The Sleepycat License
is more permissive. So we build the _dbm
extension against BDB
6.0.19.
We explicitly disable the _gdbm
extension on all targets to avoid
the GPL dependency.
readline / libedit / ncurses¶
Python has the option of building its readline
extension against
either libreadline
or libedit
. libreadline
is licensed GNU
GPL Version 3 and libedit
has a more permissive license.
libedit
/libreadline
link against a curses library, most likely
ncurses
. And ncurses
has tie-ins with a terminal database. This
is a thorny situation, as terminal databases can be difficult to
distribute because end-users often want software to respect their
terminal databases. But for that to work, ncurses
needs to be compiled
in a way that respects the user’s environment.
On macOS, we use the system libedit
and libncurses
, which is
typically provided in /usr/lib
.
On Linux, we build libedit
and ncurses
from source and statically
link against their respective libraries. Project releases before 2023 linked
against readline
on Linux.
gettext / locale Module¶
The locale
Python module exposes some functionality from the gettext
software (specifically libintl
). (Technically, this functionality is exposed
from the _locale
C extension module and locale
re-exports symbols.)
gettext
is GPL version 3 or later licensed. And having it statically linked
in the Python distribution via the _locale
module can have licensing
implications.
Python’s configure script probes for the ability to compile/link with
-lintl
. If it works, Python is linked against libintl
. If it doesn’t,
libintl
is omitted. (Search configure
for ac_cv_lib_intl_textdomain
and -lintl
references.)
With the container based build environment on Linux, presence of gettext
and libintl
is deterministic. However, on macOS where there is no
sandboxing of the build environment, Python’s configure script can find and
use a gettext
/libintl
installed outside the system default (e.g. via
Homebrew or MacPorts). This can result in the built Python referencing a shared
library not reliably present on every macOS machine. So our build system
disables the configure check.
This means that the gettext
/libintl
features in the Python distribution
are not available.
libnsl / nis Module¶
The nis
Python extension module has a dependency on libnsl
.
libnsl
has historically been in base Linux distribution installations.
But it is being phased away, with it being an optional install in modern
versions of Fedora and RHEL.
Because the nis
extension is perceived to be likely unused functionality,
we’ve decided to not build it instead of adding complexity to deal with
the libnsl
dependency. See further discussion in
https://github.com/indygreg/python-build-standalone/issues/51.
If nis
functionality is important to you, please file a GitHub issue
to request it.
Upgrading CPython¶
This section documents some of the work that needs to be performed when upgrading CPython major versions.
Review Release Notes¶
CPython’s release notes often have a section on build system changes. e.g. https://docs.python.org/3/whatsnew/3.8.html#build-and-c-api-changes. These are a must review.
Modules/Setup
¶
The Modules/Setup
file defines the default extension build settings
for boring extensions which are always compiled the same way.
We need to audit it for differences such as added/removed extensions, changes to compile settings, etc just in case we have special code handling an extension defined in this file.
See code in cpython.py
dealing with this file.
setup.py
/ static-modules
¶
The setup.py
script in the Python source distribution defines
logic for dynamically building C extensions depending on environment
settings.
Because we don’t like what this file does by default in many cases,
we have instead defined static compilation invocations for various
extensions in static-modules.*
files. Presence of an extension
in this file overrides CPython’s setup.py
logic. Essentially what
we’ve done is encoded what setup.py
would have done into our
static-modules.*
files, bypassing setup.py
.
This means that we need to audit setup.py
every time we perform
an upgrade to see if we need to adjust the content of our
static-modules.*
files.
A telltale way to find added extension is to look for .so
files
in python/install/lib/pythonX.Y/lib-dynload
. If an extension
exists in a static build, it is being built by setup.py
and
we may be missing an entry in our static-modules.*
files.
The most robust method to audit changes is to run a build of CPython
out of a source checkout and then manually compare the compiler
invocations for each extension against what exists in our
static-modules.*
files. Differences like missing source files
should be obvious, as they usually result in a compilation failure.
But differences in preprocessor defines are more subtle and can
sneak in if we aren’t careful.