API Reference

Module Level Functions

oxidized_importer.decode_source(io_module, source_bytes) str

Decodes Python source code bytes to a str.

This is effectively a reimplementation of importlib._bootstrap_external.decode_source()

oxidized_importer.find_resources_in_path(path) List

This function will scan the specified filesystem path and return an iterable of objects representing found resources. Those objects will be 1 of the types documented in oxidized_importer Python Resource Types.

Only directories can be scanned.

oxidized_importer.register_pkg_resources()

Enables pkg_resources integration.

This function effectively does the following:

It is safe to call this function multiple times, as behavior should be deterministic.

oxidized_importer.pkg_resources_find_distributions(finder: OxidizedPathEntryFinder, path_item: str, only=false) list

Resolve pkg_resources.Distribution instances given a OxidizedPathEntryFinder and search criteria.

This function is what is registered with pkg_resources for distribution resolution and you likely don’t need to call it directly.

The OxidizedFinder Class

class oxidized_importer.OxidizedFinder

A meta path finder that resolves indexed resources. See See OxidizedFinder Meta Path Finder for more high-level documentation.

This type implements the following interfaces:

  • importlib.abc.MetaPathFinder

  • importlib.abc.Loader

  • importlib.abc.InspectLoader

  • importlib.abc.ExecutionLoader

See the importlib.abc documentation for more on these interfaces.

In addition to the methods on the above interfaces, the following methods defined elsewhere in importlib are exposed:

  • get_resource_reader(fullname: str) -> importlib.abc.ResourceReader

  • find_distributions(context: Optional[DistributionFinder.Context]) -> [Distribution]

ResourceReader is documented alongside other importlib.abc interfaces. find_distribution() is documented in importlib.metadata.

Instances have additional functionality beyond what is defined by importlib. This functionality allows you to construct, inspect, and manipulate instances.

multiprocessing_set_start_method

(Opional[str]) Value to pass to multiprocessing.set_start_method() on import of multiprocessing module.

None means the method won’t be called.

origin

(str) The path this instance is using as the anchor for relative path references.

path_hook_base_str

(str) The base path that the path hook handler on this instance will respond to.

This value is often the same as sys.executable but isn’t guaranteed to be that exact value.

pkg_resources_import_auto_register

(bool) Whether this instance will be registered via pkg_resources.register_finder() upon this instance importing the pkg_resources module.

__new__(cls, relative_path_origin: Optional[os.PathLike]) OxidizedFinder

Construct a new instance of OxidizedFinder.

New instances of OxidizedFinder can be constructed like normal Python types:

finder = OxidizedFinder()

The constructor takes the following named arguments:

relative_path_origin

A path-like object denoting the filesystem path that should be used as the origin value for relative path resources. Filesystem-based resources are stored as a relative path to an anchor value. This is that anchor value. If not specified, the directory of the current executable will be used.

See the python_packed_resources Rust crate for the specification of the binary data blob defining packed resources data.

Important

The packed resources data format is still evolving. It is recommended to use the same version of the oxidized_importer extension to produce and consume this data structure to ensure compatibility.

index_bytes(data: bytes) None

This method parses any bytes-like object and indexes the resources within.

index_file_memory_mapped(path: pathlib.Path) None

This method parses the given Path-like argument and indexes the resources within. Memory mapped I/O is used to read the file. Rust managed the memory map via the memmap crate: this does not use the Python interpreter’s memory mapping code.

index_interpreter_builtins() None

This method indexes Python resources that are built-in to the Python interpreter itself. This indexes built-in extension modules and frozen modules.

index_interpreter_builtin_extension_modules() None

This method will index Python extension modules that are compiled into the Python interpreter itself.

index_interpreter_frozen_modules() None

This method will index Python modules whose bytecode is frozen into the Python interpreter itself.

indexed_resources() List[OxidizedResource]

This method returns a list of resources that are indexed by the instance. It allows Python code to inspect what the finder knows about.

Any mutations to returned values are not reflected in the finder.

See OxidizedResource for more on the returned type.

add_resource(resource: OxidizedResource)

This method registers an OxidizedResource instance with the finder, enabling the finder to use it to service lookups.

When an OxidizedResource is registered, its data is copied into the finder instance. So changes to the original OxidizedResource are not reflected on the finder. (This is because OxidizedFinder maintains an index and it is important for the data behind that index to not change out from under it.)

Resources are stored in an invisible hash map where they are indexed by the name attribute. When a resource is added, any existing resource under the same name has its data replaced by the incoming OxidizedResource instance.

If you have source code and want to produce bytecode, you can do something like the following:

def register_module(finder, module_name, source):
    code = compile(source, module_name, "exec")
    bytecode = marshal.dumps(code)

    resource = OxidizedResource()
    resource.name = module_name
    resource.is_module = True
    resource.in_memory_bytecode = bytecode
    resource.in_memory_source = source

    finder.add_resource(resource)
add_resources(resources: List[OxidizedResource]

This method is syntactic sugar for calling add_resource() for every item in an iterable. It is exposed because function call overhead in Python can be non-trivial and it can be quicker to pass in an iterable of OxidizedResource than to call add_resource() potentially hundreds of times.

serialize_indexed_resources(ignore_builtin=true, ignore_frozen=true) bytes

This method serializes all resources currently indexed by the instance into an opaque bytes instance. The returned data can be fed into a separate OxidizedFinder instance by passing it to OxidizedFinder.__new__().

Arguments:

ignore_builtin (bool)

Whether to ignore builtin extension modules from the serialized data.

Default is True

ignore_frozen (bool)

Whether to ignore frozen extension modules from the serialized data.

Default is True.

Entries for built-in and frozen modules are ignored by default because they aren’t portable, as they are compiled into the interpreter and aren’t guaranteed to work from one Python interpreter to another. The serialized format does support expressing them. Use at your own risk.

path_hook(path: Union[str, bytes, os.PathLike[AnyStr]]) OxidizedPathEntryFinder

Implements a path hook for obtaining a PathEntryFinder from a sys.path entry. See Paths Hooks Compatibility for details.

Raises ImportError if the given path isn’t serviceable. The exception should have .__cause__ set to an inner exception with more details on why the path was rejected.

The OxidizedDistribution Class

class oxidized_importer.OxidizedDistribution

Represents the metadata of a Python package. Comparable to importlib.metadata.Distribution. Instances of this type are emitted by OxidizedFinder.find_distributions.

from_name(cls, name: str) OxidizedDistribution
Classmethod:

Resolve the instance for the given package name.

discover(cls, **kwargs) list[OxidizedDistribution]
Classmethod:

Resolve instances for all known packages.

read_text(filename) str

Attempt to read metadata file given its filename.

property metadata
Type:

email.message.EmailMessage

Return the parsed metadata for this distribution.

property name
Type:

str

Return the Name metadata for this distribution package.

property _normalized_name
Type:

str

Return the normalized version of the Name.

property version
Type:

str

Return the Version metadata for this distribution package.

property entry_points

Resolve entry points for this distribution package.

property files

Not implemented. Always raises when called.

property requires

Generated requirements specified for this distribution.

The OxidizedResourceReader Class

class oxidized_importer.OxidizedResourceReader

importlib.abc.ResourceReader implementer for OxidizedFinder.

open_resource(resource: str)
resource_path(resource: str)
is_resource(name: str) bool
contents() list[str]

The OxidizedPathEntryFinder Class

class oxidized_importer.OxidizedPathEntryFinder

A path entry finder that can find resources contained in an associated OxidizedFinder instance.

Instances are created via OxidizedFinder.path_hook.

Direct use of OxidizedPathEntryFinder is generally unnecessary: OxidizedFinder is the primary interface to the custom importer.

See Paths Hooks Compatibility for more on path hook and path entry finder behavior in oxidized_importer.

find_spec(fullname: str, target: Optional[types.ModuleType] = None) Optional[importlib.machinery.ModuleSpec]

Search for modules visible to the instance.

invalidate_caches() None

Invoke the same method on the OxidizedFinder instance with which the OxidizedPathEntryFinder instance was constructed.

iter_modules(prefix: str = '') List[pkgutil.ModuleInfo]

Iterate over the visible modules. This method complies with pkgutil.iter_modules’s protocol.

The OxidizedPkgResourcesProvider Class

class oxidized_importer.OxidizedPkgResourcesProvider

A pkg_resources.IMetadataProvider and pkg_resources.IResourceProvider enabling pkg_resources to access package metadata and resources.

All members of the aforementioned interfaces are implemented. Divergence from pkg_resources defined behavior is documented next to the method.

has_metadata(name: str) bool
get_metadata(name: str) str
get_metadata_lines(name: str) List[str]

Returns a list instead of a generator.

metadata_isdir(name: str) bool
metadata_listdir(name: str) List[str]
run_script(script_name: str, namespace: Any)

Always raises NotImplementedError.

Please leave a comment in #384 if you would like this functionality implemented.

get_resource_filename(manager, resource_name: str)

Always raises NotImplementedError.

This behavior appears to be allowed given code in pkg_resources. However, it means that pkg_resources.resource_filename() will not work. Please leave a comment in #383 if you would like this functionality implemented.

get_resource_stream(manager, resource_name: str) io.BytesIO
get_resource_string(manager, resource_name: str) bytes
has_resource(resource_name: str) bool
resource_isdir(resource_name: str) bool
resource_listdir(resource_name: str) List[str]

Returns a list instead of a generator.

The OxidizedResource Class

class oxidized_importer.OxidizedResource

Represents a resource that is indexed by a OxidizedFinder instance.

Each instance represents a named entity with associated metadata and data. e.g. an instance can represent a Python module with associated source and bytecode.

New instances can be constructed via OxidizedResource(). This will return an instance whose name = "" and all properties will be None or false.

is_module

A bool indicating if this resource is a Python module. Python modules are backed by source or bytecode.

is_builtin_extension_module

A bool indicating if this resource is a Python extension module built-in to the Python interpreter.

is_frozen_module

A bool indicating if this resource is a Python module whose bytecode is frozen into the Python interpreter.

is_extension_module

A bool indicating if this resource is a Python extension module.

is_shared_library

A bool indicating if this resource is a shared library.

name

The str name of the resource.

is_package

A bool indicating if this resource is a Python package.

is_namespace_package

A bool indicating if this resource is a Python namespace package.

in_memory_source

bytes or None holding Python module source code that should be imported from memory.

in_memory_bytecode

bytes or None holding Python module bytecode that should be imported from memory.

This is raw Python bytecode, as produced from the marshal module. .pyc files have a header before this data that will need to be stripped should you want to move data from a .pyc file into this field.

in_memory_bytecode_opt1

bytes or None holding Python module bytecode at optimization level 1 that should be imported from memory.

This is raw Python bytecode, as produced from the marshal module. .pyc files have a header before this data that will need to be stripped should you want to move data from a .pyc file into this field.

in_memory_bytecode_opt2

bytes or None holding Python module bytecode at optimization level 2 that should be imported from memory.

This is raw Python bytecode, as produced from the marshal module. .pyc files have a header before this data that will need to be stripped should you want to move data from a .pyc file into this field.

in_memory_extension_module_shared_library

bytes or None holding native machine code defining a Python extension module shared library that should be imported from memory.

in_memory_package_resources

dict[str, bytes] or None holding resource files to make available to the importlib.resources APIs via in-memory data access. The name of this object will be a Python package name. Keys in this dict are virtual filenames under that package. Values are raw file data.

in_memory_distribution_resources

dict[str, bytes] or None holding resource files to make available to the importlib.metadata API via in-memory data access. The name of this object will be a Python package name. Keys in this dict are virtual filenames. Values are raw file data.

in_memory_shared_library

bytes or None holding a shared library that should be imported from memory.

shared_library_dependency_names

list[str] or None holding the names of shared libraries that this resource depends on. If this resource defines a loadable shared library, this list can be used to express what other shared libraries it depends on.

relative_path_module_source

pathlib.Path or None holding the relative path to Python module source that should be imported from the filesystem.

relative_path_module_bytecode

pathlib.Path or None holding the relative path to Python module bytecode that should be imported from the filesystem.

relative_path_module_bytecode_opt1

pathlib.Path or None holding the relative path to Python module bytecode at optimization level 1 that should be imported from the filesystem.

relative_path_module_bytecode_opt2

pathlib.Path or None holding the relative path to Python module bytecode at optimization level 2 that should be imported from the filesystem.

relative_path_extension_module_shared_library

pathlib.Path or None holding the relative path to a Python extension module that should be imported from the filesystem.

relative_path_package_resources

dict[str, pathlib.Path] or None holding resource files to make available to the importlib.resources APIs via filesystem access. The name of this object will be a Python package name. Keys in this dict are filenames under that package. Values are relative paths to files from which to read data.

relative_path_distribution_resources

dict[str, pathlib.Path] or None holding resource files to make available to the importlib.metadata APIs via filesystem access. The name of this object will be a Python package name. Keys in this dict are filenames under that package. Values are relative paths to files from which to read data.

The OxidizedResourceCollector Class

class oxidized_importer.OxidizedResourceCollector

Provides functionality for turning instances of Python resource types into a collection of OxidizedResource for loading into an OxidizedFinder instance.

__new__(cls, allowed_locations: list[str])

Construct an instance by defining locations that resources can be loaded from.

The accepted string values are in-memory and filesystem-relative.

allowed_locations

(list[str]) Exposes allowed locations where resources can be loaded from.

add_in_memory_resource(resource)

Adds a Python resource type (PythonModuleSource, PythonModuleBytecode, etc) to the collector and marks it for loading via in-memory mechanisms.

add_filesystem_relative(prefix, resource)

Adds a Python resource type (PythonModuleSource, PythonModuleBytecode, etc) to the collector and marks it for loading via a relative path next to some origin path (as specified to the OxidizedFinder). That relative path can have a prefix value prepended to it. If no prefix is desired and you want the resource placed next to the origin, use an empty str for prefix.

oxidize() tuple[list[OxidizedResource], list[tuple[pathlib.Path, bytes, bool]]]

Takes all the resources collected so far and turns them into data structures to facilitate later use.

The first element in the returned tuple is a list of OxidizedResource instances.

The second is a list of 3-tuples containing the relative filesystem path for a file, the content to write to that path, and whether the file should be marked as executable.

The OxidizedResourceReader Class

class oxidized_importer.OxidizedResourceResource

An implementation of importlib.abc.ResourceReader to facilitate resource reading from an OxidizedFinder.

See Support for ResourceReader for more.

The OxidizedZipFinder Class

class oxidized_importer.OxidizedZipFinder

A meta path finder that operates on zip files.

This type attempts to be a pure Rust reimplementation of the Python standard library zipimport.zipimporter type.

This type implements the following interfaces:

  • importlib.abc.MetaPathFinder

  • importlib.abc.Loader

  • importlib.abc.InspectLoader

from_zip_data(cls, source: bytes, path: Union[bytes, str, pathlib.Path, None] = None) OxidizedZipFinder

Construct an instance from zip archive data.

The source argument can be any bytes-like object. A reference to the original Python object will be kept and zip I/O will be performed against the memory tracked by that object. It is possible to trigger an out-of-bounds memory read if the source object is mutated after being passed into this function.

The path argument denotes the path to the zip archive. This path will be advertised in __file__ attributes. If not defined, the path of the current executable will be used.

from_path(cls, path: Union[bytes, str, pathlib.Path]) OxidizedZipFinder

Construct an instance from a filesystem path.

The source represents the path to a file containing zip archive data. The file will be opened using Rust file I/O. The content of the file will be read lazily.

If you don’t already have a copy of the zip data and the zip file will be immutable for the lifetime of the constructed instance, this method may yield better performance than opening the file, reading its content, and calling OxidizedZipFinder.from_zip_data() because it may incur less overall I/O.

The PythonModuleSource Class

class oxidized_importer.PythonModuleSource

Represents Python module source code. e.g. a .py file.

module

(str) The fully qualified Python module name. e.g. my_package.foo.

source

(bytes) The source code of the Python module.

Note that source code is stored as bytes, not str. Most Python source is stored as utf-8, so you can .encode("utf-8") or .decode("utf-8") to convert between bytes and str.

is_package

(bool) Whether this module is a Python package.

The PythonModuleBytecode Class

class oxidized_importer.PythonModuleBytecode

Represents Python module bytecode. e.g. what a .pyc file holds (but without the header that a .pyc file has).

module

(str) The fully qualified Python module name.

bytecode

(bytes) The bytecode of the Python module.

This is what you would get by compiling Python source code via something like marshal.dumps(compile(source, "exe")). The bytecode does not contain a header, like what would be found in a .pyc file.

optimize_level

(int) The bytecode optimization level. Either 0, 1, or 2.

is_package

(bool) Whether this module is a Python package.

The PythonPackageResource Class

class oxidized_importer.PythonPackageResource

Represents a non-module resource file. These are files that live next to Python modules that are typically accessed via the APIs in importlib.resources.

package

(str) The name of the leaf-most Python package this resource is associated with.

With OxidizedFinder, an importlib.abc.ResourceReader associated with this package will be used to load the resource.

name

(str) The name of the resource within its package. This is typically the filename of the resource. e.g. resource.txt or child/foo.png.

data

(bytes) The raw binary content of the resource.

The PythonPackageDistributionResource Class

class oxidized_importer.PythonPackageDistributionResource

Represents a non-module resource file living in a package distribution directory (e.g. <package>-<version>.dist-info or <package>-<version>.egg-info).

These resources are typically accessed via the APIs in importlib.metadata.

package

(str) The name of the Python package this resource is associated with.

version

(str) Version string of Python package this resource is associated with.

name

(str) The name of the resource within the metadata distribution. This is typically the filename of the resource. e.g. METADATA.

data

(bytes) The raw binary content of the resource.

The PythonExtensionModule Class

class oxidized_importer.PythonExtensionModule

Represents a Python extension module. This is a shared library defining a Python extension implemented in native machine code that can be loaded into a process and defines a Python module. Extension modules are typically defined by .so, .dylib, or .pyd files.

Note

Properties of this type are read-only.

Footnotes