Skip to content

Troubleshooting / Package Cannot Be Imported or Version Error

1. Preface

In normal Python development, any changes to third-party packages require restarting the system.

However, in DataFlux Func, for development convenience, installing third-party packages does not require restarting the entire DataFlux Func system.

But this does not guarantee 100% reliability.

For third-party packages to be installed or upgraded normally without restarting DataFlux Func, the following conditions must be met:

  1. The package is written purely in Python (e.g., without C extensions).
  2. It has no preloaded content (e.g., various large data models).

When a third-party package does not meet the above conditions, it can only be guaranteed to work properly during the first installation. Subsequent updates require restarting the entire DataFlux Func.

2. Case Study: numpy

The commonly used numpy package is a typical example of a package that uses C extensions. After deploying DataFlux Func, numpy can be installed and used normally for the first time. However, if a different version of numpy is installed later.

Then, you may encounter the following types of failures:

2.1 numpy Cannot Be Imported

Taking numpy as an example, when the script executes import numpy, the following error is thrown:

Text Only
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#1 --------------------
Executing function: demo__demo.test_numpy()

Error stack:
Traceback (most recent call last):
  File "demo__demo", line 1, in <module>
    import numpy
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
    https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
  * The Python version is: Python3.8 from "/opt/python/bin/python3.8"
  * The NumPy version is: "1.22.1"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: libopenblas64_p-r0-2f7c42d4.3.18.so: cannot open shared object file: No such file or directory

2.2 numpy Is Usable but Calls the Old Version

Taking numpy as an example, by printing the __version__ attribute of the package object, you can see the actual version of the numpy package during code execution, but it does not match the actually installed version.

Python
1
print(numpy.__version__)

Meeting this condition does not mean the code can still run normally. Simply restarting DataFlux Func will revert to the problem described in 1.1 above. Do not ignore this.

3. Explanation of the Cause

For third-party packages like numpy, during the installation process, there may be situations where additional resources are downloaded (e.g., code in other languages that needs to be compiled, data models, etc.). After the first installation, Python can normally read this third-party package and load external data.

However, when installing a different version of the same package again, the resource files from the previous version are still in use and cannot actually be updated/overwritten, leading to an incorrect installation.

At this point, the following scenarios exist for DataFlux Func:

  1. If a Worker process has previously imported this third-party package, that Worker process may still be able to run relying on cached data, but it manifests as calling the old version (the version cached during the first import).
  2. If a Worker process has never imported this third-party package before, when it first imports and loads related resources, because the package was not installed correctly during the second installation, it cannot be imported normally.
  3. At this point, if DataFlux Func is restarted, scenario 1 above will change to scenario 2 above because the restart releases the cache.

4. Solution

Taking numpy 1.22.1 as an example, after installation, the following folders will be generated in the Resource Catalog / extra-python-packages:

  1. numpy
  2. numpy-1.22.1.dist-info
  3. numpy.libs

The default host location for the Resource Catalog is: /usr/local/dataflux-func/data/resources/extra-python-packages

The container location for the Resource Catalog is: /data/resources/extra-python-packages

The steps to resolve this issue are as follows:

  1. Completely delete the folders related to numpy mentioned above.
  2. Restart DataFlux Func
  3. Reinstall numpy.