Skip to content

Deployment and Maintenance / Architecture, Scaling, and Resource Limits

This article mainly introduces the overall architecture of DataFlux Func and how to scale to improve processing capabilities.

1. Architecture

Internally, the system follows a typical "producer -> consumer" model. Any execution of a Python function goes through the process of "task generation -> enqueue -> dequeue -> execution -> return result."

Any Python function is first wrapped into a "task" and placed into its corresponding "work queue" (numbered starting from #0). It is then executed by the corresponding "worker unit" (numbered starting from worker-0) after being dequeued.

flowchart TB
    USER[User]
    FUNC_SERVER[Func Server Service]
    REDIS_QUEUE_N[Redis Queue #N]
    FUNC_WORKER_N[Func Worker-N Service]
    FUNC_BEAT[Func Beat Service]

    USER --HTTP Request--> FUNC_SERVER

    FUNC_SERVER --Function Execution Task Enqueue--> REDIS_QUEUE_N

    REDIS_QUEUE_N --Function Execution Task Dequeue--> FUNC_WORKER_N

    FUNC_BEAT --"Function Execution Task Enqueue
    (Cron Job)"--> REDIS_QUEUE_N

1.1 Services and Their Purposes

DataFlux Func includes multiple services, each with different responsibilities. The specific services are as follows:

Service Purpose
server Web service, providing the following functions:
1. Web interface
2. API interface
3. Maintains subscribers
worker-{queue number} Worker unit, used to execute user scripts, including:
1. Func API
2. Sync / Async API (legacy "Auth Link" / "Batch")
3. Cron Job (legacy: automatic trigger configuration)
Also handles some system-level background tasks
See queue description for details
beat Cron Job trigger
mysql Database
redis Cache / Function execution task queue

1.2 Worker Unit and Queue Listening Relationship

For the service worker-{queue number} (worker unit), each Worker service only listens to specific queues:

Queues and Worker Units Do Not Need to Be One-to-One

Queues and worker units do not have to be one-to-one. For example, the worker unit worker-0 is not limited to listening to queue #0. Each worker unit can listen to any one or more queues.

Additionally, the same queue can be listened to by multiple worker units simultaneously, or not listened to at all (not recommended).

Different Queues for Standalone Func and Data Platform Attached Func

Since most standalone Func deployments are relatively lightweight, to reduce unnecessary resource consumption, the number of worker units in standalone Func is less than the number of queues.

Conversely, the Data Platform Attached Func, which handles heavy tasks such as monitors and message sending modules (Message Desk), has worker units and queues in a one-to-one correspondence, with more numbered worker units and queues than standalone Func.

Worker Unit Queue
Standalone
Queue
Data Platform Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
worker-10 - #10
worker-11 - #11
worker-12 - #12
worker-13 - #13
worker-14 - #14
worker-15 - #15
Worker Unit Queue
Standalone
Queue
Data Platform Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
Worker Unit Queue
worker-0 #0
worker-1-6 #1, #2, #3, #4, #5, #6
worker-7 #7
worker-8-9 #8, #9

2. Services / Queues and Their Responsibilities and Scaling Recommendations

Scaling Requires More Hardware Investment

Scaling requires higher performance from the server, including but not limited to the server itself, database service, Redis, etc.

Generally, scaling DataFlux Func only requires increasing the number of replicas for the corresponding service. Therefore, users should first understand their actual business situation to scale accordingly.

The complete services, queues, their responsibilities, and scaling recommendations are as follows:

Service / Queue Responsibility
Standalone
Responsibility
Data Platform Attached
Default Pod Count
Data Platform Attached
Scaling Recommendation
server Web service, providing the following functions:
1. Web interface
2. API interface
3. Maintains subscribers
← Same as left 1 Generally does not need scaling
server-inner (No such service) Web service, specifically for internal API calls within the cluster 1 Generally does not need scaling
worker-0
Queue #0
System worker unit, does not directly participate in user code processing ← Same as left 2 Generally does not need scaling
worker-1
Queue #1
Executes tasks from
1. Func API (Sync execution)
2. Legacy Sync API
3. Legacy Auth Link
← Same as left 1 Scale when needing to increase concurrency for Sync API (legacy: Auth Link)
worker-2
Queue #2
Executes tasks from Cron Job (legacy: automatic trigger configuration) ← Same as left 1 Scale when needing to increase concurrency for Cron Job (legacy: automatic trigger configuration)
worker-3
Queue #3
Executes tasks from
1. Func API (Async execution)
2. Legacy Async API
3. Legacy Batch
← Same as left 1 Scale when needing to increase concurrency for Async API (legacy: Batch)
worker-4
Queue #4
(Reserved) (Reserved) 0 Does not need scaling
worker-5
Queue #5
Debug code execution
i.e., directly running functions in the Web interface
← Same as left 1 Scale when needing to support more users developing scripts simultaneously
worker-6
Queue #6
Executes tasks from Connector subscription message processing ← Same as left 1 Scale when needing to increase concurrency for Connector subscription message processing
worker-7
Queue #7
(Reserved) Executes function tasks for Data Platform system business
e.g., Data Platform backend administrator login, updating various caches, releasing message aggregation pools, etc.
2 Scale when the total number of monitors is large
worker-8
Queue #8
(Reserved) Executes function tasks for Data Platform threshold detection, metric generation, etc. 5 Scale when the number of ordinary monitors is large
worker-9
Queue #9
(Reserved) Executes function tasks for Data Platform advanced detection, intelligent monitoring 3 Scale when the number of advanced detection, intelligent monitors is large
worker-10
Queue #10
(No such service) Executes function tasks for Data Platform receiving user-reported events 1 Scale when the volume of user-reported events is large
worker-11
Queue #11
(No such service) Executes Message Desk message sending tasks 3 Scale when the volume of message sending is large
worker-12
Queue #12
(No such service) (Reserved) 0 Does not need scaling
worker-13
Queue #13
(No such service) (Reserved) 0 Does not need scaling
worker-14
Queue #14
(No such service) Executes AI-related processing that requires immediate user response
e.g., calling "Auto Pipeline Writing"
2 Scale when needing to support more users writing Piplelines simultaneously
worker-15
Queue #15
(No such service) Executes AI-related processing that does not require immediate user response
e.g., calling "Alert Compression and Merging" processing
2 Scale when the number of monitors using AI alert aggregation is large
beat Cron Job trigger ← Same as left 1 Do not scale, ensure global single instance
mysql Database (No such service) - Does not need scaling, choose self-built or cloud service for higher demands
redis Cache / Function execution task queue (No such service) - Does not need scaling, choose self-built or cloud service for higher demands
Service / Queue Responsibility
Standalone
Responsibility
Data Platform Attached
Scaling Recommendation
server Web service, providing the following functions:
1. Web interface
2. API interface
3. Maintains subscribers
← Same as left Generally does not need scaling
server-inner (No such service) Web service, specifically for internal API calls within the cluster Generally does not need scaling
worker-0
Queue #0
System worker unit, does not directly participate in user code processing ← Same as left Generally does not need scaling
worker-1
Queue #1
Executes tasks from Sync API (legacy: Auth Link) ← Same as left Scale when needing to increase concurrency for Sync API (legacy: Auth Link)
worker-2
Queue #2
Executes tasks from Cron Job (legacy: automatic trigger configuration) ← Same as left Scale when needing to increase concurrency for Cron Job (legacy: automatic trigger configuration)
worker-3
Queue #3
Executes tasks from Async API (legacy: Batch) ← Same as left Scale when needing to increase concurrency for Async API (legacy: Batch)
worker-4
Queue #4
(Reserved) (Reserved) Does not need scaling
worker-5
Queue #5
Debug code execution
i.e., directly running functions in the Web interface
← Same as left Scale when needing to support more users developing scripts simultaneously
worker-6
Queue #6
Executes tasks from Connector subscription message processing ← Same as left Scale when needing to increase concurrency for Connector subscription message processing
worker-7
Queue #7
(Reserved) Executes function tasks for Data Platform system business, message sending
e.g., Data Platform backend administrator login, updating various caches, releasing message aggregation pools, Message Desk message sending
Scale when the volume of message sending is large
worker-8
Queue #8
(Reserved) Executes function tasks for Data Platform threshold detection, etc. Scale when the number of ordinary monitors is large
worker-9
Queue #9
(Reserved) Executes function tasks for Data Platform advanced detection, intelligent monitoring Scale when the number of advanced detection, intelligent monitors is large
beat Cron Job trigger ← Same as left Do not scale, ensure global single instance
mysql Database (No such service) Does not need scaling, choose self-built or cloud service for higher demands
redis Cache / Function execution task queue (No such service) Does not need scaling, choose self-built or cloud service for higher demands
Service Responsibility Scaling Recommendation
server Web service, providing the following functions:
1. Web interface
2. API interface
3. Maintains subscribers
Generally does not need scaling
worker-0
Queue #0
System worker unit, does not directly participate in user code processing Generally does not need scaling
worker-1-6
Queue #1, #2, #3, #4, #5, #6
By default, responsible for function synchronous call processing, such as:
1. Auth Link processing
2. Subscription message processing
Scale when needing to increase concurrency for Auth Link, subscription message processing
worker-7
Queue #7
By default, responsible for debug code processing (i.e., directly running functions in the Web interface) Scale when needing to support more users developing scripts simultaneously
worker-8-9
Queue #8, #9
By default, responsible for function asynchronous call processing, such as:
1. Automatic trigger processing
2. Batch processing
Scale when needing to increase concurrency for automatic trigger, batch processing
beat Cron Job trigger Do not scale, ensure global single instance
mysql Database Does not need scaling, choose self-built or cloud service for higher demands
redis Cache / Function execution task queue Does not need scaling, choose self-built or cloud service for higher demands

Example: When needing to enhance Cron Job processing capability...

As mentioned above, Cron Jobs are located in "Queue #8," which corresponds to "Service worker-8." Therefore, scaling "Service worker-8" is sufficient.

Estimating Scaling Amount

Taking the common worker-8 as an example:

worker-8 in the Data Platform Attached version is mainly responsible for executing monitor tasks. Assuming a single detection task takes T milliseconds, then 1 minute can execute 60 × 1,000 ÷ T detections. By default, worker-8 each Pod starts 5 processes.

That is, the detection capability of a single worker-8 Pod is 5 × (60 × 1,000 ÷ T) monitors.

Formula

Text Only
1
2
A = 5 × (60 × 1,000 ÷ T)
  = 300,000 ÷ T

A: Detection capability

T: Detection task execution time (milliseconds)

Based on the different execution times of monitors, the following table can be listed:

Single Detection Time Single Pod Detection Capability Compared to Baseline
300 1,000 167%
500 600 Baseline
800 375 63%
1,000 300 50%
2,000 150 25%
3,000 100 17%

Conversely, assuming the total number of monitors is M, then the required number of Pods can be calculated based on M ÷ (5 × (60 × 1,000 ÷ T)).

Formula

Text Only
1
2
P = M ÷ (300,000 ÷ T)
  = M × T ÷ 300,000

P: Required number of Pods

M: Number of monitors

T: Detection task execution time (milliseconds)

Based on the number of monitors and their execution times, the following table can be listed:

Number of Monitors Single Detection Time Required Number of Pods Compared to Baseline
1,000 300 1 50%
1,000 500 2 Baseline
1,000 800 3 150%
1,000 1,000 4 200%
1,000 2,000 7 350%
1,000 3,000 10 500%
Number of Monitors Single Detection Time Required Number of Pods Compared to Baseline
5,000 300 5 56%
5,000 500 9 Baseline
5,000 800 14 156%
5,000 1,000 17 189%
5,000 2,000 34 378%
5,000 3,000 50 556%
Number of Monitors Single Detection Time Required Number of Pods Compared to Baseline
10,000 300 10 59%
10,000 500 17 Baseline
10,000 800 27 159%
10,000 1,000 34 200%
10,000 2,000 67 394%
10,000 3,000 100 588%

Operation Method

For single-machine deployed DataFlux Func, scaling can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml) and increasing the deploy.replicas for the corresponding service.

Refer to Official Documentation

For complete information on the deploy.replicas option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas

To enhance the processing capability of worker-8, the specific modification part is as follows:

Example is only a selection

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

Key Modification Part of docker-stack.yaml
1
2
3
4
5
services:
  worker-8:
    deploy:
      # Start 2 worker units processing queue 8 simultaneously
      replicas: 2

3. Resource Limits

Resource limits should be adjusted reasonably based on actual business

Please adjust resource limits reasonably based on actual business.

Blindly limiting resources may lead to longer task execution times or insufficient memory to complete code execution.

Operation Method

For single-machine deployed DataFlux Func, resource limits can be implemented by modifying the configuration ({installation directory}/docker-stack.yaml) and adding deploy.resources for the corresponding service.

Refer to Official Documentation

For complete information on the deploy.resources option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources

By default, each worker-N replica can occupy up to 5 CPU cores (i.e., each worker unit has 5 worker processes).

To limit the resources occupied by worker-8, the specific modification part is as follows:

Example is only a selection

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

Key Modification Part of docker-stack.yaml
1
2
3
4
5
6
7
services:
  worker-8:
    deploy:
      resources:
        limits:
          cpus  : '2.50' # Limit CPU usage to a maximum of 2.5 cores
          memory: 4G     # Limit memory usage to a maximum of 4 GB

4. Splitting Worker Units

All worker units have been split in the new version

In standalone Func version 3.2.0 and later, all non-reserved worker units have been split by default. Users can enable reserved queues as needed.

In Data Platform Attached Func version 1.77.145 and later, all worker units have been split by default, and users no longer need to split them manually.

In special cases, the default merged worker units (e.g., worker-1-6) can be split to achieve finer-grained task scheduling, enabling scaling and resource limiting for worker units responsible for specific queues.

Assuming that based on business requirements, DataFlux Func has higher performance requirements for subscription processing and hopes that subscription message processing does not interfere with Sync API (legacy: Auth Link) processing, then worker-1-6 can be split into worker-1-5 and worker-6.

Operation Method

For single-machine deployed DataFlux Func, splitting worker units can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml), adding or modifying the corresponding service, and changing the queue number specified in command.

Specifying the queue listened to by the worker unit is achieved through the parameters after ./run-worker-by-queue.sh. The service name itself is mainly used for labeling. It is recommended to be consistent with the actual listening queue to avoid confusion.

Example is only a selection

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

Key Modification Part of docker-stack.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
services:
  # Delete the original "worker-1-6" and replace it with the following content

  worker-1-5:
    # Specify the worker unit to process work queues 1 ~ 5
    command: ./run-worker-by-queue.sh 1 2 3 4 5

  worker-6:
    # Specify the worker unit to process work queue 6
    command: ./run-worker-by-queue.sh 6