OxidizedFinder
Behavior and Compliance¶
OxidizedFinder
strives to be as compliant as possible with
other meta path importers. So generally speaking, the behavior as
described by the
importlib documentation
should be compatible. In other words, things should mostly just work
and any deviance from the importlib
documentation constitutes a bug
worth reporting.
That being said, OxidizedFinder
’s approach to loading
resources is drastically different from more traditional means, notably
loading files from the filesystem. oxidized_finder
breaks a lot of
assumptions about how things have worked in Python and there is some
behavior that may seem odd or in violation of documented behavior in Python.
The sections below attempt to call out known areas where
OxidizedFinder
deviates from typical behavior.
__file__
and __cached__
Module Attributes¶
Python modules typically have a __file__
attribute holding a str
defining the filesystem path the source module was imported from (usually
a path to a .py
file). There is also the similar - but lesser known -
__cached__
attribute holding the filesystem path of the bytecode module
(usually the path to a .pyc
file).
Important
OxidizedFinder
will not set either attribute when
importing modules from memory.
These attributes are not set because it isn’t obvious what the values
should be! Typically, __file__
is used by Python as an anchor point
to derive the path to some other file. However, when loading modules
from memory, the traditional filesystem hierarchy of Python modules
does not exist. In the opinion of PyOxidizer’s maintainer, exposing
__file__
would be lying and this would cause more potential for
harm than good.
While we may make it possible to define __file__
(and __cached__
)
on modules imported from memory someday, we do not yet support this.
OxidizedFinder
does, however, set __file__
and
__cached__
on modules imported from the filesystem. So, a
workaround to restore these missing attributes is to avoid in-memory
loading.
Note
Use of __file__
is commonly encountered in code loading resource
files. See Loading Resource Files for more on this topic, including
how to port code to more modern Python APIs for loading resources.
__path__
Module Attribute¶
Python modules that are also packages must have a __path__
attribute
containing an iterable of str
. The iterable can be empty.
If a module is imported from the filesystem,
OxidizedFinder
will set __path__
to the parent
directory of the module’s file, just like the standard filesystem
importer would.
If a module is imported from memory, __path__
will be set to the
path of the current executable joined with the package name. e.g. if
the current executable is /usr/bin/myapp
and the module/package name
is foo.bar
, __path__
will be ["/usr/bin/myapp/foo/bar"]
.
On Windows, paths might look like C:\dev\myapp.exe\foo\bar
.
Python’s zipimport
importer uses the same approach for modules
imported from zip files, so there is precedence for
OxidizedFinder
doing things this way.
Support for __init__
in Module Names¶
There exists Python code that does things like from .__init__ import X
.
__init__
is special in Python module names because it is the filename
used to denote a Python package’s filename. So syntax like
from .__init__ import X
is probably intended to be equivalent to
from . import X
. Or import foo.__init__
is probably intended to be
written as import foo
.
Python’s filesystem importer doesn’t treat __init__
in module names
as special. If you attempt to import a module named foo.__init__
,
it will attempt to locate a file named foo/__init__.py
. If that
module is a package, this will succeed. However, the module name seen by
the importer has __init__
in it and the name on the created module
object will have __init__
in it. This means that you can have both a
module foo
and foo.__init__
. These will both be derived from the
same file but are actually separate module objects.
PyOxidizer will automatically remove trailing .__init__
from
module names. This will enable PyOxidizer to work with syntax such
as import foo.__init__
and from .__init__ import X
and therefore
be compatible with Python code in the wild. However, PyOxidizer may not
preserve the .__init__
in the module name. For example, with Python’s
path based importer, you could have both foo
and foo.__init__
in
sys.modules
but PyOxidizer will only have foo
.
A limitation of PyOxidizer module name normalization is it only normalizes
the single trailing .__init__
from the module name: __init__
appearing inside the module name are not normalized. e.g.
foo.__init__.bar
is not normalized to foo.bar
. This may introduce
incompatibilities with Python code in the wild. However, for this to be
true, the filesystem layout would have to be something like
foo/__init__/bar.py
. This hopefully does not occur in the wild. But
it is conceivable it does.
See https://github.com/indygreg/PyOxidizer/issues/317 and https://bugs.python.org/issue42564 for more discussion on this issue.
ResourceReader
Compatibility¶
ResourceReader
has known compatibility differences with Python’s default
filesystem-based importer. See Support for ResourceReader for details.
ResourceLoader
Compatibility¶
The ResourceLoader
interface is implemented but behavior of
get_data(path)
has some variance with Python’s filesystem-based importer.
See Support for ResourceLoader for details.
Note
ResourceLoader
is deprecated as of Python 3.7. Code should be ported
to ResourceReader
/ importlib.resources
if possible.
importlib.metadata
Compatibility¶
OxidizedFinder
implements find_distributions()
and
therefore provides the required hook for importlib.metadata
to
resolve Distribution
instances. However, the returned objects do
not implement the full Distribution
interface.
Here are the known differences between OxidizedDistribution
and
importlib.metadata.Distribution
instances:
OxidizedDistribution
is not an instance ofimportlib.metadata.Distribution
.locate_file()
is not defined.@staticmethod at()
is not defined.@property files
raisesNotImplementedError
.
There are additional _
prefixed attributes of
importlib.metadata.Distribution
that are not implemented. But we do not
consider these part of the public API and don’t feel they are worth calling
out.
In addition, OxidizedFinder.find_distributions()
ignores the path
attribute of the passed Context
instance. Only the name
attribute
is consulted. If name
is None
, all packages with registered
distribution files will be returned. Otherwise the returned list
contains at most 1 PyOxidizerDistribution
corresponding to the
requested package name
.
pkgutil
Compatibility¶
The pkgutil package
in Python’s standard library reacts to special functionality on
MetaPathFinder
instances.
pkgutil.iter_modules()
attempts to use an iter_modules()
method
to obtain results.
OxidizedFinder
implements iter_modules(prefix="")
and
pkgutil.iter_modules()
should work. However, there are some
differences in behavior:
iter_modules()
is defined to be a generator butOxidizedFinder.iter_modules()
returns alist
.list
is iterable and this difference should hopefully be a harmless implementation detail.Support for the
path
argument topkgutil.iter_modules()
requires thatOxidizedFinder
’spath_hook
is installed insys.path_hooks
. This will be done automatically ifOxidizedFinder
is installed at interpreter initialization time.
Paths Hooks Compatibility¶
The OxidizedFinder.path_hook
method
from an instantiated instance can be installed on sys.path_hooks
to
enable a OxidizedFinder
to function as a
path entry finder.
As a brief refresher, callables on sys.path_hooks
are called with
paths, giving them the opportunity to service a particular path.
If a path hook responds to a path by returning a path entry finder,
that returned object will service that path. Often, the paths passed
to path hooks are from sys.path
. However, arbitrary paths can be
passed in. A property of the returned path entry finder is it only
targets a particular level in the package hierarchy. Unlike meta
path finders (which can service any named resource it knows about),
path entry finders are bound to a specific package target level
and will only return resources existing at that level.
path hooks are used by the following mechanisms:
The standard library PathFinder (the meta path finder that Python uses to load resources from the filesystem) uses
sys.path_hooks
as part of resolving a finder for a givensys.path
entry.pkgutil.get_importer()
for resolving the finder for a givensys.path
entry. This in turn is used by various code, including otherpkgutil
APIs.pkg_resources
maps path entry finder types to functions to enable a resolution ofpkg_resources.Distribution
instances for individual paths.
When installed on sys.path_hooks
,
OxidizedFinder.path_hook
will respond
to the following path values:
The path to the current executable, as defined by
OxidizedFinder.path_hook_base_str
.A virtual sub-directory of the path to the current executable, as defined by
OxidizedFinder.path_hook_base_str
.
Important
path_hook
is very strict about
what values it will respond to.
The value must be a str
and be equal to
OxidizedFinder.path_hook_base_str
or have
OxidizedFinder.path_hook_base_str
plus a directory separator
as the exact string prefix.
path_hook
will not respond
to bytes
, pathlib.Path
, or any other path-like type.
OxidizedFinder.path_hook_base_str
may not be the same value as
sys.executable
. Always use OxidizedFinder.path_hook_base_str
to derive sys.path
values to ensure the path hook will respond.
When path_hook
is called with its
OxidizedFinder.path_hook_base_str
value, a
OxidizedPathEntryFinder
bound to the source
OxidizedFinder
is returned. This finder is able to service
root resources (i.e. top-level modules and packages).
When path_hook
is called with
a virtual sub-directory of OxidizedFinder.path_hook_base_str
, the same
thing happens except the returned OxidizedPathEntryFinder
will only service resources at the exact package hierarchy specified
by that virtual sub-directory.
The validation and normalization of path values is similar to the following:
def path_hook(self, path: str):
# Path exactly matching current_exe will be bound to resources at root.
if path == self.path_hook_base_str:
return ...
# Virtual sub-directories must begin with self.current_exe + directory
# separator.
if not path.startswith((self.path_hook_base_str + "/", self.path_hook_base_str + "\\")):
raise ImportError
# Part after directory separator.
package_part = path[len(self.path_hook_base_str) + 1:]
# Normalize to UNIX style directory separators, allowing Windows
# separators to exist.
package_part = package_part.replace("\\", "/")
# Ban leading, trailing, and consecutive directory separators.
if package_part.startswith("/") or package_part.endswith("\\") or package_part.contains("//"):
raise ImportError()
# Ban dots in directory components.
for part in package_part.split("/"):
if part.startswith(".") or part.endswith(".") or part.contains(".."):
raise ImportError()
# Normalize directory tree to package hierarchy. e.g. foo/bar -> foo.bar.
package = package_part.replace("/", ".")
# When converting the package string to a Rust string to facilitate
# resource name comparisons, it is encoded to UTF-8, replacing
# "bad" code points with the Unicode replacement code point.
rust_package_string = package.encode("utf-8", "replace")
Note that when the package component of virtual sub-directories is converted
to a Rust string, we use the UTF-8 encoding, not Python’s active filesystem
encoding. This is to keep things simpler. And since OxidizedFinder
indexes resource names using Rust’s UTF-8 backed string type anyway, this seems
semantically correct from the perspective of oxidized_importer
.
As an example, if path
were
os.path.join(finder.path_hook_base_str, "a")
, the
finder would only service modules of the form a.*
. So a
, a.b
would
match but a.b.c
and d
would not.
For best results, use os.path.join(finder.path_hook_base_str, str)
to define
values that will be accepted by the path hook.
OxidizedPathEntryFinder
complies with the
PathEntryFinder
protocol and implements OxidizedPathEntryFinder.find_spec()
and OxidizedPathEntryFinder.invalidate_caches()
. However,
support for the deprecated methods find_loader
and find_module
is not implemented. Instances also implement
OxidizedPathEntryFinder.iter_modules()
, enabling it to be
used by pkgutil.iter_modules()
.
pkg_resources
Compatibility¶
OxidizedFinder
can be registered as a provider for
pkg_resources
, enabling pkg_resources
APIs to be used with
resources tracked by OxidizedFinder
instances.
However, there are known compatibility differences. See Support for pkg_resources for more.