Brief Talk on Module Import in Python

  • 2021-12-09 09:16:54
  • OfStack

Directory modules and packages __import__ Module cache imp and importlib modules Inert import Summary reference materials

This article will not discuss the import mechanism of Python (the underlying implementation details), but only the concepts related to modules and packages, and import statements. Typically, the import module uses the following statement:


import ...
import ... as ...
from ... import ...
from ... import ... as ...

1 In general, it is enough to import modules with the above statement. However, in some special scenarios, other import methods may be needed. For example, Python also provides __import__ built-in functions and importlib modules for dynamic import. The advantage of dynamic import is that it can delay the loading of modules, and the import action is only supported when the modules are used.

Although the application of __import__ function and importlib module can realize delayed loading of modules, its disadvantage is that the same import statement should be implemented once in any place where a specified module is needed, which is inconvenient to maintain and very troublesome. If you can implement lazy import at the top level, it is a better choice, which is the final point of this article.

Before discussing some advanced uses of 1, we need to understand the concepts of modules and packages under 1.

Modules and packages

Modules can be understood as code files that Python can load and execute. Code files can be not only. py files, but also. so and other types of files. Python has only one module object type, and all modules are of this type. To facilitate the organization of multiple modules and provide a naming of a module hierarchy, Python provides the concept of packages.

You can simply think of packages as directories of a file system and modules as code files in a directory (note that this cannot be completely considered because packages and modules are not only from the file system, but also from compressed files, networks, etc.). Similar to the directory structure of a file system, packages are organized hierarchically, and the packages themselves can contain sub-packages and regular modules.

Package can actually be regarded as a special module. For example, the directory of the regular package (the concept of regular package will be introduced below) needs to contain the file __init__. py. When the package is imported, the top-level code of this file is implicitly executed, just like the top-level code is executed when the module is imported, and this file is just like the package code 1. So package is a special module. Remember that all packages are modules, but not all modules are packages. Both subpackages and modules in a package have the __path__ attribute, specifically, any module that contains the __path__ attribute is considered a package. All modules have a name, similar to the standard attribute access syntax, with dots separating the names of child packages from their parent packages.

Python defines two types of packages, regular packages and namespace packages. Regular packages are traditional packages that exist in Python 3.2 and earlier. The regular package is the directory that contains the __init__. py file. When importing a regular package, the __init__. py file is executed implicitly, and the objects it defines are bound to the names in the package namespace. __init__. The py file can contain the same Python code that any other module can contain, and Python adds one additional attribute to the module when importing.

Beginning with Python 3.3, Python introduced the concept of namespace packages. Namespace packages are composites of different filesets, each contributing one child package to the parent package, and none of the packages need to contain the __init__. py file. Filesets can be stored in different locations on the file system. File set lookups include compressed files, networks, or other places that Python searches during the import process. Namespace packages may or may not correspond directly to objects of the file system; They may be real modules without specific description. Refer to PEP 420 for updated instructions on namespace packages.

The __path__ attribute of a namespace package differs from a regular package in that it uses a custom iterator type to traverse all paths that contain the command space package. If the path of their parent package (or sys. path of the higher-order package) changes, it will automatically re-search the package part in the package the next time it attempts to import.

If there is the following directory structure:

.
(bar-package)
-nsp
The-bar. py
foo-package
nsp
foo. py

Then nsp can be a namespace package, and the following is the test code (remember to run the test with Python version 3.3 and later):


import sys
sys.path.extend(['foo-package', 'bar-package'])

import nsp
import nsp.bar
import nsp.foo

print(nsp.__path__)

#  Output: 
# _NamespacePath(['foo-package/nsp', 'bar-package/nsp'])

Namespace packages have the following characteristics:

1. The lowest priority, after all import rules in the existing version

2. It is no longer necessary to include the __init__. py file in the package

3. You can import and organize the code of directory dispersion

4. Depends on the left-to-right search order in sys. path

__import__

The __import__ function can be used to import modules, and the import statement also calls the function. It is defined as:

__import__(name[, globals[, locals[, fromlist[, level]]]])

Parameter description:

name (required): The name of the loaded module globals (optional): Dictionary containing global variables, this option is rarely used, with the default value global () locals (optional): A dictionary containing a local variable that is not used in the internal standard implementation, using the default value-local () fromlist (Optional): The name of the imported submodule level (Optional): Import path option, default to-1 in Python 2, indicating that both absolute import and relative import are supported. The default in Python 3 is 0, which means that only absolute import is supported. If it is greater than 0, it indicates the number of stages relative to the imported parent directory, that is, 1 is similar to '.' and 2 is similar to '..'.

Use example:


# import spam
spam = __import__('spam')

# import spam.ham
spam = __import__('spam.ham')

# from spam.ham import eggs, sausage as saus
_temp = __import__('spam.ham', fromlist=['eggs', 'sausage'])
eggs = _temp.eggs
saus = _temp.sausage

Module cache

When performing a module import, the import system for Python first attempts to look from sys. modules. In sys. modules is a cache for all imported modules, including intermediate paths. That is, if foo. bar. baz is imported, sys. modules will contain caches into foo, foo. bar, and foo. bar. baz modules. In fact, one dict type, each key has its own value, corresponding to the corresponding module object.

During the import process, first look for the module name in sys. modules, and if it exists, return to the module and end the import process. If the module name is not found, Python continues to search for the module (find and load from sys. path). sys. modules is writable, and deleting one key causes the cache implementation of the specified module, and the specified module will be searched again on the next import, which is similar to reload of the module.

Note that if you keep the module object reference, invalidate the cache in sys. modules, and then re-import the specified module, the two module objects are not the same. In contrast, when importlib. reload () reloads the module, it uses the same module object and simply reinitializes the module contents by rerunning the module code.

imp and importlib modules

The imp module provides interfaces that are implemented internally in the import statement. For example, module lookup (find_module), module load (load_module), and so on (the module import process includes module lookup, loading, caching, and so on). You can use this module to simply implement the built-in __import__ function functionality:


import imp
import sys

def __import__(name, globals=None, locals=None, fromlist=None):
    #  First look from the cache 
    try:
        return sys.modules[name]
    except KeyError:
        pass

    #  If it is not in the module cache, start from the  sys.path  Find module in 
    fp, pathname, description = imp.find_module(name)

    #  How to find a module and load it 
    try:
        return imp.load_module(name, fp, pathname, description)
    finally:
        if fp:
            fp.close()

The importlib module was created in python 2.7 and contains only one function:


importlib.import_module(name, package=None)

This function is an encapsulation of __import__ for easier dynamic importing of modules. For example, use it to implement relative import:


import importlib

#  Similar to  'from . import b'
b = importlib.import_module('.b', __package__)

Beginning with python 3, the built-in reload function was moved to the imp module. Since Python 3.4, the imp module has been rejected and no longer recommended, and its included functions have been moved to the importlib module. That is, starting with Python 3.4, the importlib module is a collection of previous imp modules and importlib modules.

Inert import

Most of the content introduced above is to pave the way for lazy import, while the other small content is only an extension (that is, a little more content is introduced casually). A lazy import is a delayed module import, which takes place only when the module is actually used, and never happens if the module is not used.

The requirement of lazy import is still very common. 1 It is recommended that modules only be imported at the top level, and sometimes importing modules at the top level is not the best choice. For example, if a module is used in only one function or class method, you can use local import (import is performed in local scope), so that the module is imported only when the function or method is executed, thus avoiding introducing module variables in the top-level namespace. For another example, in the project I am responsible for, I need to use the pandas package, and the import of the pandas package will take up 1 bit of memory (not much, but not too little, like a few 10 megabytes), so when the pandas package is not used, we hope it will not be imported. Some of our own packages will take a lot of time to load (it will take a few seconds to 10 seconds to import because we have to read the configuration and so on), so the lazy import feature is also extremely needed.

The following is a simple implementation of lazy import for reference:


import sys
from types import ModuleType


class LazyModuleType(ModuleType):

    @property
    def _mod(self):
        name = super(LazyModuleType, self).__getattribute__("__name__")
        if name not in sys.modules:
            __import__(name)
        return sys.modules[name]

    def __getattribute__(self, name):
        if name == "_mod":
            return super(LazyModuleType, self).__getattribute__(name)

        try:
            return self._mod.__getattribute__(name)
        except AttributeError:
            return super(LazyModuleType, self).__getattribute__(name)

    def __setattr__(self, name, value):
        self._mod.__setattr__(name, value)


def lazy_import(name, package=None):
    if name.startswith('.'):
        if not package:
            raise TypeError("relative imports require the 'package' argument")
        level = 0
        for character in name:
            if character != '.':
                break
            level += 1

        if not hasattr(package, 'rindex'):
            raise ValueError("'package' not set to a string")
        dot = len(package)
        for _ in range(level, 1, -1):
            try:
                dot = package.rindex('.', 0, dot)
            except ValueError:
                raise ValueError("attempted relative import beyond top-level "
                                 "package")

        name = "{}.{}".format(package[:dot], name[level:])

    return LazyModuleType(name)

Summarize

References

https://docs.python.org/3/reference/import.html https://github.com/nipy/nitime https://github.com/mnmelo/lazy_import

Related articles: