Skip to content

Deployment and Maintenance / DataFlux Func GSE Edition

This article mainly introduces the description of the DataFlux Func GSE Edition.

1. Background

During the actual use of DataFlux Func, we found that a significant portion of users primarily use DataFlux Func for data collection from various cloud platforms and deployment of self-built inspections to connect with TrueWatch.

Although all required functionalities can be fully realized on the original version of DataFlux Func, users inevitably need to install various third-party Python packages, core packages from the official script market, and perform other operations. Meanwhile, some Python third-party packages take a long time to install and are prone to version dependency issues, which is time-consuming and labor-intensive.

Therefore, targeting this specific use case, we have built the "DataFlux Func GSE Edition" based on the original DataFlux Func by pre-installing necessary Python third-party packages and script sets.

2. GSE Edition vs Original Edition

The following are the differences between the GSE Edition and the original edition:

Comparison Item GSE Edition Original Edition
Pre-installed Script Set Official Script Set:
1. Integrated Core Package
2. Self-built Inspection Core Package
3. Algorithm Library
4. Tool Package
Automatically updated to the latest version upon each restart
None
Pre-installed Python Packages In addition to the packages DataFlux Func itself depends on:
1. Third-party packages dependent on the official script set
2. numpy, pandas mathematical packages
3. jinja2, mailer, openpyxl and other packages
Limited to packages DataFlux Func itself depends on
Pre-added Connectors Connects to the host's DataKit:
ID: datakit
Host: 172.17.0.1:9529
None
Pre-added Script Market Official Script Market None
Access to Public Network Initialization processing of pre-installed script sets
DataFlux Func itself must be able to access the public network,
otherwise it may fail to start properly
Not required

Additional Pre-installed Python Packages in the GSE Edition

The following list is current as of version 3.5.2

Package Version Description
aliyun-python-sdk-core 2.13.31 Dependency for integrated Core package (guance_integration)
azure-core 1.29.4 Dependency for integrated Core package (guance_integration)
azure-identity 1.14.1 Dependency for integrated Core package (guance_integration)
azure-mgmt-core 1.4.0 Dependency for integrated Core package (guance_integration)
boto3 1.34.75 Dependency for integrated Core package (guance_integration)
cos-python-sdk-v5 1.9.15 Dependency for integrated Core package (guance_integration)
esdk-obs-python 3.22.2 Dependency for integrated Core package (guance_integration)
huaweicloudsdkcore 3.1.45 Dependency for integrated Core package (guance_integration)
oss2 2.15.0 Dependency for integrated Core package (guance_integration)
tencentcloud-sdk-python 3.0.657 Dependency for integrated Core package (guance_integration)
tos 2.6.8 Dependency for integrated Core package (guance_integration)
volcengine-python-sdk 1.0.65 Dependency for integrated Core package (guance_integration)
tomli 2.0.1 Dependency for self-built inspection Core package (guance_monitor)
adtk 0.6.2 Dependency for algorithm library (guance_algorithm)
drain3 0.9.11 Dependency for algorithm library (guance_algorithm)
numpy 1.23.2 Dependency for algorithm library (guance_algorithm)
pandas 1.3.1 Dependency for algorithm library (guance_algorithm)
scikit-learn 1.2.2 Dependency for algorithm library (guance_algorithm)
scipy 1.8.0 Dependency for algorithm library (guance_algorithm)
statsmodels 0.13.2 Dependency for algorithm library (guance_algorithm)
jsonpath2 0.4.5 Used for JSON handling
mailer 0.8.1 Used for sending emails
Mako 1.1.3 Used for text rendering
Markdown 3.4.1 Used for Markdown rendering
openpyxl 3.0.7 Used for Excel handling

3. Version Selection

Below is an explanation of how to choose between the GSE or original edition.

When should you choose the GSE Edition?

  • Need to collect data from various cloud vendors via DataFlux Func and report to TrueWatch
  • Need to implement self-built inspections within TrueWatch
  • Need to reduce installation operations for various SDKs, dependency packages, and script packages
  • Do not want end-users to arbitrarily modify SDK versions, dependency packages, or script packages

When should you not choose the GSE Edition?

  • The target host for deployment cannot access the public network (the GSE Edition needs to connect to the internet each time it starts to update core packages from the official script market)
  • Using DataFlux Func independently without the need to connect with TrueWatch
  • Special customization requirements, such as modifying the core packages from the official script market
  • Need to use specific versions of cloud vendor SDKs

4. Switching Between GSE Edition and Original Edition

Regardless of which version the user has installed previously, they can switch to the other version when needed.

Specific operations are as follows:

4.1 Stop DataFlux Func

  1. Use the docker stack rm dataflux-func command to shut down DataFlux Func (this step may take some time)
  2. Use docker ps to confirm that all containers have exited

4.2 Modify Docker Stack File

Since the image names for the GSE Edition and the original edition are different, the Docker Stack configuration needs to be modified before switching versions.

Find the Docker Stack Configuration docker-stack.yaml and change all the image names in images to the name of the target version, such as:

Diff
1
2
3
 server:
-  image: pubrepo.jiagouyun.com/dataflux-func/dataflux-func-gse:x.y.z
+  image: pubrepo.jiagouyun.com/dataflux-func/dataflux-func:x.y.z
Diff
1
2
3
 server:
-  image: pubrepo.jiagouyun.com/dataflux-func/dataflux-func:x.y.z
+  image: pubrepo.jiagouyun.com/dataflux-func/dataflux-func-gse:x.y.z

4.3 Reinstall Using Target Version Installation Package

After modifying the Docker Stack configuration, use the installation script of the target version to reinstall DataFlux Func.