Skip to content

Troubleshooting / Package cannot be imported or version error

1. Preface

In normal Python development, any changes to third-party packages require a system restart.

However, in DataFlux Func, for the convenience of development, installing third-party packages does not require a restart of the entire DataFlux Func system.

But this cannot guarantee 100% stability.

Under the condition that DataFlux Func does not restart, third-party packages that can be installed and upgraded normally must meet the following conditions:

  1. Packages purely written in Python (e.g., do not include C extensions)
  2. No preloaded content (e.g., various big data models)

When a third-party package does not meet the above conditions, only the first installation can be guaranteed to work properly, and subsequent updates require a full restart of DataFlux Func.

2. Failure Case numpy

The commonly used numpy package is a typical example of a package that uses C extensions. After deploying DataFlux Func, the initial installation of numpy can be used normally, but if you install a different version of numpy later,

You may encounter the following types of failures:

2.1 numpy cannot be imported

Taking numpy as an example, when the script executes to import numpy, the following error is thrown:

Text Only
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#1 --------------------
Executing function: demo__demo.test_numpy()

Error stack:
Traceback (most recent call last):
  File "demo__demo", line 1, in <module>
    import numpy
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
    https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
  * The Python version is: Python3.8 from "/opt/python/bin/python3.8"
  * The NumPy version is: "1.22.1"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: libopenblas64_p-r0-2f7c42d4.3.18.so: cannot open shared object file: No such file or directory

2.2 numpy can be used but calls the old version

Taking numpy as an example, by printing the __version___ of the package object, it can be seen that during the actual code execution process, the version of the numpy package does not match the installed one.

Python
1
print(numpy.__version__)

Meeting this situation does not mean the code can still run normally. Simply restarting DataFlux Func will turn it back into the issue described in 1.1 above. Do not overlook this.

3. Cause Explanation

Third-party packages like numpy may download additional resources (such as other language codes that need to be compiled, data models, etc.) during installation. After the first installation, Python can read this third-party package normally and load external data.

However, when installing a different version of the same package again, the resource files from the previous version are still in use and cannot actually be updated/overwritten, leading to improper installation.

At this point, there are several scenarios for DataFlux Func:

  1. If a Worker process has already imported this third-party package before, then this Worker process may rely on cached data and continue to run, but it will show that the old version (and the first cached version) is being called.
  2. If a Worker process has never imported this third-party package before, then during the first import and loading of related resources, since this package was not properly installed during the second installation, it cannot be imported normally.
  3. At this time, if DataFlux Func is restarted, scenario 1 above will change into scenario 2 because the cache is released upon restart.

4. Solution

For example, after installing numpy 1.22.1, the following folders will be generated under the Resource Catalog / extra-python-packages:

  1. numpy
  2. numpy-1.22.1.dist-info
  3. numpy.libs

The default location of the Resource Catalog on the host machine is: /usr/local/dataflux-func/data/resources/extra-python-packages

The location of the Resource Catalog inside the container is: /data/resources/extra-python-packages

The steps to resolve this issue are as follows:

  1. Thoroughly delete the folders related to numpy mentioned above
  2. Restart DataFlux Func
  3. Reinstall numpy