Technical Notes

CPython Initialization

Most code lives in pylifecycle.c.

Call tree with Python 3.7:

``Py_Initialize()``
  ``Py_InitializeEx()``
    ``_Py_InitializeFromConfig(_PyCoreConfig config)``
      ``_Py_InitializeCore(PyInterpreterState, _PyCoreConfig)``
        Sets up allocators.
        ``_Py_InitializeCore_impl(PyInterpreterState, _PyCoreConfig)``
          Does most of the initialization.
          Runtime, new interpreter state, thread state, GIL, built-in types,
          Initializes sys module and sets up sys.modules.
          Initializes builtins module.
          ``_PyImport_Init()``
            Copies ``interp->builtins`` to ``interp->builtins_copy``.
          ``_PyImportHooks_Init()``
            Sets up ``sys.meta_path``, ``sys.path_importer_cache``,
            ``sys.path_hooks`` to empty data structures.
          ``initimport()``
            ``PyImport_ImportFrozenModule("_frozen_importlib")``
            ``PyImport_AddModule("_frozen_importlib")``
            ``interp->importlib = importlib``
            ``interp->import_func = interp->builtins.__import__``
            ``PyInit__imp()``
              Initializes ``_imp`` module, which is implemented in C.
            ``sys.modules["_imp"} = imp``
            ``importlib._install(sys, _imp)``
            ``_PyImportZip_Init()``

      ``_Py_InitializeMainInterpreter(interp, _PyMainInterpreterConfig)``
        ``_PySys_EndInit()``
          ``sys.path = XXX``
          ``sys.executable = XXX``
          ``sys.prefix = XXX``
          ``sys.base_prefix = XXX``
          ``sys.exec_prefix = XXX``
          ``sys.base_exec_prefix = XXX``
          ``sys.argv = XXX``
          ``sys.warnoptions = XXX``
          ``sys._xoptions = XXX``
          ``sys.flags = XXX``
          ``sys.dont_write_bytecode = XXX``
        ``initexternalimport()``
          ``interp->importlib._install_external_importers()``
        ``initfsencoding()``
          ``_PyCodec_Lookup(Py_FilesystemDefaultEncoding)``
            ``_PyCodecRegistry_Init()``
              ``interp->codec_search_path = []``
              ``interp->codec_search_cache = {}``
              ``interp->codec_error_registry = {}``
              # This is the first non-frozen import during startup.
              ``PyImport_ImportModuleNoBlock("encodings")``
            ``interp->codec_search_cache[codec_name]``
            ``for p in interp->codec_search_path: p[codec_name]``
        ``initsigs()``
        ``add_main_module()``
          ``PyImport_AddModule("__main__")``
        ``init_sys_streams()``
          ``PyImport_ImportModule("encodings.utf_8")``
          ``PyImport_ImportModule("encodings.latin_1")``
          ``PyImport_ImportModule("io")``
          Consults ``PYTHONIOENCODING`` and gets encoding and error mode.
          Sets up ``sys.__stdin__``, ``sys.__stdout__``, ``sys.__stderr__``.
        Sets warning options.
        Sets ``_PyRuntime.initialized``, which is what ``Py_IsInitialized()``
        returns.
        ``initsite()``
          ``PyImport_ImportModule("site")``

CPython Importing Mechanism

Lib/importlib defines importing mechanisms and is 100% Python.

Programs/_freeze_importlib.c is a program that takes a path to an input .py file and path to output .h file. It initializes a Python interpreter and compiles the .py file to marshalled bytecode. It writes out a .h file with an inline const unsigned char _Py_M__importlib array containing bytecode.

Lib/importlib/_bootstrap_external.py compiled to Python/importlib_external.h with _Py_M__importlib_external[].

Lib/importlib/_bootstrap.py compiled to Python/importlib.h with _Py_M__importlib[].

Python/frozen.c has _PyImport_FrozenModules[] effectively mapping _frozen_importlib to importlib._bootstrap and _frozen_importlib_external to importlib._bootstrap_external.

initimport() calls PyImport_ImportFrozenModule("_frozen_importlib"), effectively import importlib._bootstrap. Module import doesn’t appear to have meaningful side-effects.

importlib._bootstrap.__import__ is installed as interp->import_func.

C implemented _imp module is initialized.

importlib._bootstrap._install(sys, _imp is called. Calls _setup(sys, _imp) and adds BuiltinImporter and FrozenImporter to sys.meta_path.

_setup() defines globals _imp and sys. Populates __name__, __loader__, __package__, __spec__, __path__, __file__, __cached__ on all sys.modules entries. Also loads builtins _thread, _warnings, and _weakref.

Later during interpreter initialization, initexternal() effectively calls importlib._bootstrap._install_external_importers(). This runs import _frozen_importlib_external, which is effectively import importlib._bootstrap_external. This module handle is aliased to importlib._bootstrap._bootstrap_external.

importlib._bootstrap_external import doesn’t appear to have significant side-effects.

importlib._bootstrap_external._install() is called with a reference to importlib._bootstrap. _setup() is called.

importlib._bootstrap._setup() imports builtins _io, _warnings, _builtins, marshal. Either posix or nt imported depending on OS. Various module-level attributes set defining run-time environment. This includes _winreg. SOURCE_SUFFIXES and EXTENSION_SUFFIXES are updated accordingly.

importlib._bootstrap._get_supported_file_loaders() returns various loaders. ExtensionFileLoader configured from _imp.extension_suffixes(). SourceFileLoader configured from SOURCE_SUFFIXES. SourcelessFileLoader configured from BYTECODE_SUFFIXES.

FileFinder.path_hook() called with all loaders and result added to sys.path_hooks. PathFinder added to sys.meta_path.

sys.modules After Interpreter Init

Module

Type

Source

__main__

add_main_module()

_abc

builtin

abc

_codecs

builtin

initfsencoding()

_frozen_importlib

frozen

initimport()

_frozen_importlib_external

frozen

initexternal()

_imp

builtin

initimport()

_io

builtin

importlib._bootstrap._setup()

_signal

builtin

initsigs()

_thread

builtin

importlib._bootstrap._setup()

_warnings

builtin

importlib._bootstrap._setup()

_weakref

builtin

importlib._bootstrap._setup()

_winreg

builtin

importlib._bootstrap._setup()

abc

py

builtins

builtin

_Py_InitializeCore_impl()

codecs

py

encodings via initfsencoding()

encodings

py

initfsencoding()

encodings.aliases

py

encodings

encodings.latin_1

py

init_sys_streams()

encodings.utf_8

py

init_sys_streams() + initfsencoding()

io

py

init_sys_streams()

marshal

builtin

importlib._bootstrap._setup()

nt

builtin

importlib._bootstrap._setup()

posix

builtin

importlib._bootstrap._setup()

readline

builtin

sys

builtin

_Py_InitializeCore_impl()

zipimport

builtin

initimport()

Modules Imported by site.py

_collections_abc _sitebuiltins _stat atexit genericpath os os.path posixpath rlcompleter site stat

Random Notes

Frozen importer iterates an array looking for module names. On each item, it calls _PyUnicode_EqualToASCIIString(), which verifies the search name is ASCII. Performing an O(n) scan for every frozen module if there are a large number of frozen modules could contribute performance overhead. A better frozen importer would use a map/hash/dict for lookups. This //may// require CPython API breakages, as the PyImport_FrozenModules data structure is documented as part of the public API and its value could be updated dynamically at run-time.

importlib._bootstrap cannot call import because the global import hook isn’t registered until after initimport().

importlib._bootstrap_external is the best place to monkeypatch because of the limited run-time functionality available during importlib._bootstrap.

It’s a bit wonky that Py_Initialize() will import modules from the standard library and it doesn’t appear possible to disable this. If site.py is disabled, non-extension builtins are limited to codecs, encodings, abc, and whatever encodings.* modules are needed by initfsencoding() and init_sys_streams().

An attempt was made to freeze the set of standard library modules loaded during initialization. However, the built-in extension importer doesn’t set all of the module attributes that are expected of the modules system. The from . import aliases in encodings/__init__.py is confused without these attributes. And relative imports seemed to have issues as well. One would think it would be possible to run an embedded interpreter with all standard library modules frozen, but this doesn’t work.