Skip to content

Deployment and Maintenance / Architecture, Scaling, and Resource Limits

This Guide mainly introduces the overall architecture of DataFlux Func, and how to scale to improve processing capabilities.

1. Architecture

Internally, it follows the typical "Producer -> Consumer" model. Any execution of a Python function will go through the process of "Task Generation -> Enqueue -> Dequeue -> Execution -> Return Result".

Any Python function is first wrapped into a "Task" and enters its corresponding "Work Queue" (numbered from #0), and then is fetched and executed by the corresponding "Worker Unit" (numbered from worker-0).

flowchart TB
    USER[User]
    FUNC_SERVER[Func Server Service]
    REDIS_QUEUE_N[Redis Queue #N]
    FUNC_WORKER_N[Func Worker-N Service]
    FUNC_BEAT[Func Beat Service]

    USER --HTTP Request--> FUNC_SERVER

    FUNC_SERVER --Function Execution Task Enqueue--> REDIS_QUEUE_N

    REDIS_QUEUE_N --Function Execution Task Dequeue--> FUNC_WORKER_N

    FUNC_BEAT --"Function Execution Task Enqueue
    (Cron Job)"--> REDIS_QUEUE_N

1.1 Services and Their Purposes

DataFlux Func includes multiple services, each with different responsibilities. The specific services are as follows:

Service Purpose
server Web service, providing the following functionalities:
1. Web interface
2. API interface
3. Maintains subscribers
worker-{queue number} Worker unit, used to execute user scripts, including:
1. Func API
2. Func API
3. Cron Job
Also handles some system-level background tasks
See queue details
beat Trigger for Cron Jobs
mysql Database
redis Cache / Function execution task queue

1.2 Worker Unit and Queue Listening Relationship

For the service worker-{queue number} (Worker Unit), each Worker service only listens to specific queues:

Queues and Worker Units Do Not Need to Be One-to-One

Queues and Worker Units do not have to be one-to-one. For example, the Worker Unit worker-0 is not limited to listening to queue #0 tasks. Each Worker Unit can listen to any one or more queues.

Additionally, the same queue can be listened to by multiple Worker Units simultaneously, or not listened to at all (not recommended).

Standalone Deployment Func vs. Data Platform Attached Func Have Different Queues

Since most standalone deployments of Func are lightly used, to reduce unnecessary resource consumption, the number of Worker Units in standalone deployments is less than the number of queues.

Conversely, Data Platform attached Func, which handles heavy business such as monitors and message sending modules (Message Desk), has Worker Units and queues in a one-to-one correspondence, and has more numbered Worker Units and queues than standalone deployments.

Worker Unit Queue
Standalone Deployment
Queue
Data Platform Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
worker-10 - #10
worker-11 - #11
worker-12 - #12
worker-13 - #13
worker-14 - #14
worker-15 - #15
Worker Unit Queue
Standalone Deployment
Queue
Data Platform Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
Worker Unit Queue
worker-0 #0
worker-1-6 #1, #2, #3, #4, #5, #6
worker-7 #7
worker-8-9 #8, #9

2. Services / Queues and Their Responsibilities and Scaling Recommendations

Scaling Requires More Hardware Investment

Scaling requires higher performance from the server, including but not limited to the server itself, database services, Redis, etc.

Generally, scaling DataFlux Func only requires increasing the number of replicas of the corresponding services. Therefore, users should first understand their actual business situation to scale accordingly.

The complete services, queues, their responsibilities, and scaling recommendations are as follows:

Service / Queue Responsibility
Standalone Deployment
Responsibility
Data Platform Attached
Default Pod Count
Data Platform Attached
Scaling Recommendation
server Web service, providing the following functionalities:
1. Web interface
2. API interface
3. Maintains subscribers
← Same as Left 1 Generally does not need scaling
server-inner (No such service) Web service, specifically for internal cluster API calls 1 Generally does not need scaling
worker-0
Queue #0
System Worker Unit, does not directly participate in user code processing ← Same as Left 2 Generally does not need scaling
worker-1
Queue #1
Executes tasks from Sync API functions ← Same as Left 1 Scale when needing to increase Sync API concurrency
worker-2
Queue #2
Executes tasks from Cron Jobs ← Same as Left 1 Scale when needing to increase Cron Job concurrency
worker-3
Queue #3
Executes tasks from Async API functions ← Same as Left 1 Scale when needing to increase Async API concurrency
worker-4
Queue #4
(Reserved) (Reserved) 0 Does not need scaling
worker-5
Queue #5
Debug code execution
i.e., directly running functions in the Web interface
← Same as Left 1 Scale when needing to support more users developing scripts
worker-6
Queue #6
Executes tasks from Connector subscription message processing ← Same as Left 1 Scale when needing to increase Connector subscription message processing concurrency
worker-7
Queue #7
(Reserved) Executes Data Platform system business tasks
e.g., Data Platform backend admin login, updating various caches, releasing message aggregation pools, etc.
2 Scale when having a large number of monitors
worker-8
Queue #8
(Reserved) Executes Data Platform threshold detection and other general monitor, metric generation related tasks 5 Scale when having a large number of general monitors
worker-9
Queue #9
(Reserved) Executes Data Platform advanced detection, intelligent monitor tasks 3 Scale when having a large number of advanced detection, intelligent monitors
worker-10
Queue #10
(No such service) Executes Data Platform user-reported event tasks 1 Scale when having a large volume of user-reported events
worker-11
Queue #11
(No such service) Executes Message Desk message sending tasks 3 Scale when having a large volume of message sending
worker-12
Queue #12
(No such service) (Reserved) 0 Does not need scaling
worker-13
Queue #13
(No such service) (Reserved) 0 Does not need scaling
worker-14
Queue #14
(No such service) Executes AI-related processing that requires immediate user response
e.g., calling "Auto Pipeline Writing"
2 Scale when needing to support more users writing Piplelines
worker-15
Queue #15
(No such service) Executes AI-related processing that does not require immediate user response
e.g., calling "Alert Compression and Merging" processing
2 Scale when having a large number of monitors using AI alert aggregation
beat Trigger for Cron Jobs ← Same as Left 1 Do not scale, ensure global single instance
mysql Database (No such service) - Does not need scaling, choose self-built or cloud services for higher demands
redis Cache / Function execution task queue (No such service) - Does not need scaling, choose self-built or cloud services for higher demands
Service / Queue Responsibility
Standalone Deployment
Responsibility
Data Platform Attached
Scaling Recommendation
server Web service, providing the following functionalities:
1. Web interface
2. API interface
3. Maintains subscribers
← Same as Left Generally does not need scaling
server-inner (No such service) Web service, specifically for internal cluster API calls Generally does not need scaling
worker-0
Queue #0
System Worker Unit, does not directly participate in user code processing ← Same as Left Generally does not need scaling
worker-1
Queue #1
Executes tasks from Sync API functions ← Same as Left Scale when needing to increase Sync API concurrency
worker-2
Queue #2
Executes tasks from Cron Jobs ← Same as Left Scale when needing to increase Cron Job concurrency
worker-3
Queue #3
Executes tasks from Async API functions ← Same as Left Scale when needing to increase Async API concurrency
worker-4
Queue #4
(Reserved) (Reserved) Does not need scaling
worker-5
Queue #5
Debug code execution
i.e., directly running functions in the Web interface
← Same as Left Scale when needing to support more users developing scripts
worker-6
Queue #6
Executes tasks from Connector subscription message processing ← Same as Left Scale when needing to increase Connector subscription message processing concurrency
worker-7
Queue #7
(Reserved) Executes Data Platform system business, message sending tasks
e.g., Data Platform backend admin login, updating various caches, releasing message aggregation pools, Message Desk message sending
Scale when having a large volume of message sending
worker-8
Queue #8
(Reserved) Executes Data Platform threshold detection and other general monitor related tasks Scale when having a large number of general monitors
worker-9
Queue #9
(Reserved) Executes Data Platform advanced detection, intelligent monitor tasks Scale when having a large number of advanced detection, intelligent monitors
beat Trigger for Cron Jobs ← Same as Left Do not scale, ensure global single instance
mysql Database (No such service) Does not need scaling, choose self-built or cloud services for higher demands
redis Cache / Function execution task queue (No such service) Does not need scaling, choose self-built or cloud services for higher demands
Service Responsibility Scaling Recommendation
server Web service, providing the following functionalities:
1. Web interface
2. API interface
3. Maintains subscribers
Generally does not need scaling
worker-0
Queue #0
System Worker Unit, does not directly participate in user code processing Generally does not need scaling
worker-1-6
Queue #1, #2, #3, #4, #5, #6
By default, responsible for synchronous function call processing, such as:
1. Sync API
2. Subscription message processing
Scale when needing to increase Sync API, subscription message processing concurrency
worker-7
Queue #7
By default, responsible for debug code processing (i.e., directly running functions in the Web interface) Scale when needing to support more users developing scripts
worker-8-9
Queue #8, #9
By default, responsible for asynchronous function call processing, such as:
1. Async API
2. Cron Jobs
Scale when needing to increase Cron Job, Async API processing concurrency
beat Trigger for Cron Jobs Do not scale, ensure global single instance
mysql Database Does not need scaling, choose self-built or cloud services for higher demands
redis Cache / Function execution task queue Does not need scaling, choose self-built or cloud services for higher demands

Example: When needing to enhance Cron Job processing capability...

As mentioned above, Cron Jobs are located in "Queue #8", and "Queue #8" corresponds to "Service worker-8", so scaling "Service worker-8" is sufficient.

Estimating Scaling Requirements

Taking the common worker-8 as an example:

worker-8 in the Data Platform attached version is mainly responsible for executing monitor tasks. Assuming a detection task takes T milliseconds, then 1 minute can execute 60 × 1,000 ÷ T detections. By default, worker-8 has 5 processes per Pod.

Thus, the detection capability of a single worker-8 Pod is 5 × (60 × 1,000 ÷ T) monitors.

Formula

Text Only
1
2
A = 5 × (60 × 1,000 ÷ T)
  = 300,000 ÷ T

A: Detection capability

T: Detection task execution time (milliseconds)

Based on different execution times for each monitor, the following table can be listed:

Single Detection Time Single Pod Detection Capability Compared to Baseline
300 1,000 167%
500 600 Baseline
800 375 63%
1,000 300 50%
2,000 150 25%
3,000 100 17%

Conversely, assuming the total number of monitors is M, then the required number of Pods can be calculated based on M ÷ (5 × (60 × 1,000 ÷ T)).

Formula

Text Only
1
2
P = M ÷ (300,000 ÷ T)
  = M × T ÷ 300,000

P: Required number of Pods

M: Number of monitors

T: Detection task execution time (milliseconds)

Based on different numbers of monitors and execution times, the following tables can be listed:

Number of Monitors Single Detection Time Required Number of Pods Compared to Baseline
1,000 300 1 50%
1,000 500 2 Baseline
1,000 800 3 150%
1,000 1,000 4 200%
1,000 2,000 7 350%
1,000 3,000 10 500%
Number of Monitors Single Detection Time Required Number of Pods Compared to Baseline
5,000 300 5 56%
5,000 500 9 Baseline
5,000 800 14 156%
5,000 1,000 17 189%
5,000 2,000 34 378%
5,000 3,000 50 556%
Number of Monitors Single Detection Time Required Number of Pods Compared to Baseline
10,000 300 10 59%
10,000 500 17 Baseline
10,000 800 27 159%
10,000 1,000 34 200%
10,000 2,000 67 394%
10,000 3,000 100 588%

Operation Method

For standalone deployments of DataFlux Func, scaling can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml), increasing the deploy.replicas of the corresponding service.

Refer to Official Documentation

For complete information on the deploy.replicas option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas

To improve the processing capability of worker-8, the specific modification part is as follows:

Example is only a partial excerpt

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modification Part
1
2
3
4
5
services:
  worker-8:
    deploy:
      # Start 2 Worker Units for Queue 8 simultaneously
      replicas: 2

3. Resource Limits

Resource limits should be adjusted reasonably based on actual business

Adjust resource limits reasonably based on actual business needs.

Blindly limiting resources may lead to longer task execution times or insufficient memory to complete code execution.

Operation Method

For standalone deployments of DataFlux Func, resource limits can be implemented by modifying the configuration ({installation directory}/docker-stack.yaml), adding deploy.resources for the corresponding service.

Refer to Official Documentation

For complete information on the deploy.resources option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources

By default, each worker-N replica can occupy up to 5 CPU cores (i.e., each Worker Unit has 5 worker processes).

To limit the resources occupied by worker-8, the specific modification part is as follows:

Example is only a partial excerpt

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modification Part
1
2
3
4
5
6
7
services:
  worker-8:
    deploy:
      resources:
        limits:
          cpus  : '2.50' # Limit CPU usage to a maximum of 2.5 cores
          memory: 4G     # Limit memory usage to a maximum of 4 GB

4. Splitting Worker Units

All Worker Units are already split in newer versions

In standalone deployments of Func version 3.2.0 and later, all non-reserved Worker Units are split by default. Users can enable reserved queues as needed.

In Data Platform attached Func version 1.77.145 and later, all Worker Units are split by default, and users no longer need to split them manually.

In special cases, default merged Worker Units (e.g., worker-1-6) can be split to achieve finer-grained task scheduling, enabling scaling and resource limits for Worker Units responsible for specific queues.

Assuming the business requires higher performance for subscription processing in DataFlux Func, and subscription message processing should not interfere with Sync API processing, then worker-1-6 can be split into worker-1-5 and worker-6.

Operation Method

For standalone deployments of DataFlux Func, splitting Worker Units can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml), adding or modifying the corresponding service, and changing the queue number specified in command.

The queue listened to by the Worker Unit is specified by the parameter after ./run-worker-by-queue.sh. The service name itself is mainly used for labeling and is recommended to be consistent with the actual listening queue to avoid confusion.

Example is only a partial excerpt

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modification Part
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
services:
  # Delete the original "worker-1-6" and replace it with the following content

  worker-1-5:
    # Specify the Worker Unit to handle queues 1 ~ 5
    command: ./run-worker-by-queue.sh 1 2 3 4 5

  worker-6:
    # Specify the Worker Unit to handle queue 6
    command: ./run-worker-by-queue.sh 6