Skip to content

Deployment and Maintenance / Architecture, Scaling, and Resource Limits

This article mainly introduces the overall architecture of DataFlux Func and how to scale to improve processing capacity.

1. Architecture

Internally, the system follows a typical "producer -> consumer" model. Any execution of a Python function goes through the process of "task generation -> enqueue -> dequeue -> execution -> result return".

Any Python function is first packaged into a "task" and enters its designated "work queue" (numbered starting from #0). Subsequently, the corresponding "work unit" (numbered starting from worker-0) retrieves the task from the queue for execution.

flowchart TB
    USER[User]
    FUNC_SERVER[Func Server Service]
    REDIS_QUEUE_N[Redis Queue #N]
    FUNC_WORKER_N[Func Worker-N Service]
    FUNC_BEAT[Func Beat Service]

    USER --HTTP Request--> FUNC_SERVER

    FUNC_SERVER --Function Execution Task Enqueues--> REDIS_QUEUE_N

    REDIS_QUEUE_N --Function Execution Task Dequeues--> FUNC_WORKER_N

    FUNC_BEAT --"Function Execution Task Enqueues
    (Scheduled Tasks)"--> REDIS_QUEUE_N

1.1 Services and Their Purposes

DataFlux Func consists of multiple services, each with different responsibilities. The specific services are as follows:

Service Purpose
server Web service, providing the following functionalities:
1. Web interface
2. API interfaces
3. Maintaining subscribers
worker-{queue number} Work unit, used to execute user scripts, including:
1. Function APIs
2. Function APIs
3. Scheduled tasks
Also handles some system-level background tasks
See queue description for details
beat Trigger for scheduled tasks
mysql Database
redis Cache / Function execution task queue

1.2 Work Unit and Queue Listening Relationships

For the service worker-{queue number} (work unit), each Worker service only listens to specific queues:

Queues and work units do not have to correspond one-to-one

Queues and work units are not required to have a one-to-one correspondence. For example, work unit worker-0 is not limited to listening only to tasks from queue #0. Each work unit can listen to any one or more queues.

Furthermore, the same queue can be listened to by multiple work units simultaneously, or not listened to at all (not recommended).

Different queues for standalone Func and data platform attached Func

Since most standalone Func deployments are for relatively light usage, to reduce unnecessary resource consumption, standalone Func has fewer work units than queues.

Conversely, data platform attached Func, due to handling heavy business like monitors and message sending modules (Message Desk), has a one-to-one correspondence between work units and queues, and has more numbered work units and queues than standalone Func.

Work Unit Queues
Standalone Deployment
Queues
Data Platform Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
worker-10 - #10
worker-11 - #11
worker-12 - #12
worker-13 - #13
worker-14 - #14
worker-15 - #15
Work Unit Queues
Standalone Deployment
Queues
Data Platform Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
Work Unit Queues
worker-0 #0
worker-1-6 #1, #2, #3, #4, #5, #6
worker-7 #7
worker-8-9 #8, #9

2. Services / Queues, Their Responsibilities, and Scaling Recommendations

Scaling requires more hardware investment

Scaling requires higher performance from the server, including but not limited to the server itself, database service, Redis, etc.

Generally, scaling DataFlux Func only requires increasing the replica count of the corresponding services. Therefore, users should first understand their actual business situation to scale targeted.

The complete services, queues, their responsibilities, and scaling recommendations are as follows:

Service / Queue Responsibility
Standalone Deployment
Responsibility
Data Platform Attached
Default Pod Count
Data Platform Attached
Scaling Recommendation
server Web service, providing the following functionalities:
1. Web interface
2. API interfaces
3. Maintaining subscribers
← Same as left 1 Generally no need to scale
server-inner (No such service) Web service, dedicated for internal cluster API calls 1 Generally no need to scale
worker-0
Queue #0
System work unit, not directly involved in user code processing ← Same as left 2 Generally no need to scale
worker-1
Queue #1
Executes function tasks from synchronously executed function APIs ← Same as left 1 Scale when needing to increase concurrency of synchronously executed function APIs
worker-2
Queue #2
Executes function tasks from scheduled tasks ← Same as left 1 Scale when needing to increase concurrency of scheduled tasks
worker-3
Queue #3
Executes function tasks from asynchronously executed function APIs ← Same as left 1 Scale when needing to increase concurrency of asynchronously executed function APIs
worker-4
Queue #4
(Reserved) (Reserved) 0 No need to scale
worker-5
Queue #5
Debug code execution
i.e., directly running functions in the Web interface
← Same as left 1 Scale when needing to support more users developing scripts simultaneously
worker-6
Queue #6
Executes function tasks from connector subscription message processing ← Same as left 1 Scale when needing to increase concurrency of connector subscription message processing
worker-7
Queue #7
(Reserved) Executes data platform system business function tasks
e.g., logging in via data platform backend admin, updating various caches, releasing message aggregation pools, etc.
2 Scale when the total number of monitors is high
worker-8
Queue #8
(Reserved) Executes data platform threshold detection and other ordinary monitor, metric generation related function tasks 5 Scale when the number of ordinary monitors is high
worker-9
Queue #9
(Reserved) Executes data platform advanced detection, intelligent monitor function tasks 3 Scale when the number of advanced detection and intelligent monitors is high
worker-10
Queue #10
(No such service) Executes data platform user-reported event receiving function tasks 1 Scale when the volume of user-reported events is large
worker-11
Queue #11
(No such service) Executes Message Desk message sending tasks 3 Scale when the volume of message sending is large
worker-12
Queue #12
(No such service) (Reserved) 0 No need to scale
worker-13
Queue #13
(No such service) (Reserved) 0 No need to scale
worker-14
Queue #14
(No such service) Executes AI-related processing requiring immediate user response
e.g., calling "Auto-write Pipeline" etc.
2 Scale when needing to support more users writing Pipelines simultaneously
worker-15
Queue #15
(No such service) Executes AI-related processing not requiring immediate user response
e.g., calling "Alert compression and merging" processing, etc.
2 Scale when there are many monitors using AI for alert aggregation
beat Trigger for scheduled tasks ← Same as left 1 Must not scale, ensure global single replica
mysql Database (No such service) - No need to scale, choose self-built or cloud service for higher demands
redis Cache / Function execution task queue (No such service) - No need to scale, choose self-built or cloud service for higher demands
Service / Queue Responsibility
Standalone Deployment
Responsibility
Data Platform Attached
Scaling Recommendation
server Web service, providing the following functionalities:
1. Web interface
2. API interfaces
3. Maintaining subscribers
← Same as left Generally no need to scale
server-inner (No such service) Web service, dedicated for internal cluster API calls Generally no need to scale
worker-0
Queue #0
System work unit, not directly involved in user code processing ← Same as left Generally no need to scale
worker-1
Queue #1
Executes function tasks from synchronously executed function APIs ← Same as left Scale when needing to increase concurrency of synchronously executed function APIs
worker-2
Queue #2
Executes function tasks from scheduled tasks ← Same as left Scale when needing to increase concurrency of scheduled tasks
worker-3
Queue #3
Executes function tasks from asynchronously executed function APIs ← Same as left Scale when needing to increase concurrency of asynchronously executed function APIs
worker-4
Queue #4
(Reserved) (Reserved) No need to scale
worker-5
Queue #5
Debug code execution
i.e., directly running functions in the Web interface
← Same as left Scale when needing to support more users developing scripts simultaneously
worker-6
Queue #6
Executes function tasks from connector subscription message processing ← Same as left Scale when needing to increase concurrency of connector subscription message processing
worker-7
Queue #7
(Reserved) Executes data platform system business and message sending function tasks
e.g., logging in via data platform backend admin, updating various caches, releasing message aggregation pools, Message Desk message sending
Scale when the volume of message sending is large
worker-8
Queue #8
(Reserved) Executes data platform threshold detection and other ordinary monitor related function tasks Scale when the number of ordinary monitors is high
worker-9
Queue #9
(Reserved) Executes data platform advanced detection, intelligent monitor function tasks Scale when the number of advanced detection and intelligent monitors is high
beat Trigger for scheduled tasks ← Same as left Must not scale, ensure global single replica
mysql Database (No such service) No need to scale, choose self-built or cloud service for higher demands
redis Cache / Function execution task queue (No such service) No need to scale, choose self-built or cloud service for higher demands
Service Responsibility Scaling Recommendation
server Web service, providing the following functionalities:
1. Web interface
2. API interfaces
3. Maintaining subscribers
Generally no need to scale
worker-0
Queue #0
System work unit, not directly involved in user code processing Generally no need to scale
worker-1-6
Queue #1, #2, #3, #4, #5, #6
By default, responsible for synchronous function call processing, such as:
1. Synchronously executed function APIs
2. Subscription message processing
Scale when needing to increase concurrency of synchronously executed function APIs and subscription message processing
worker-7
Queue #7
By default, responsible for debug code processing (i.e., directly running functions in the Web interface) Scale when needing to support more users developing scripts simultaneously
worker-8-9
Queue #8, #9
By default, responsible for asynchronous function call processing, such as:
1. Asynchronously executed function APIs
2. Scheduled tasks
Scale when needing to increase concurrency of scheduled tasks and asynchronously executed function API processing
beat Trigger for scheduled tasks Must not scale, ensure global single replica
mysql Database No need to scale, choose self-built or cloud service for higher demands
redis Cache / Function execution task queue No need to scale, choose self-built or cloud service for higher demands

Example: When needing to enhance scheduled task processing capacity...

From the above, scheduled tasks are located in "Queue #8", and "Queue #8" corresponds to "Service worker-8". Therefore, scaling "Service worker-8" is sufficient.

Estimating Scaling Amount

Taking the common worker-8 as an example:

worker-8 in the data platform attached version is mainly responsible for executing monitor tasks. Assuming one detection task takes T milliseconds, then 1 minute can execute 60 × 1,000 ÷ T detections. By default, each worker-8 Pod starts 5 processes.

That is, the detection capacity of a single worker-8 Pod is 5 × (60 × 1,000 ÷ T) monitors.

Formula

Text Only
1
2
A = 5 × (60 × 1,000 ÷ T)
  = 300,000 ÷ T

A: Detection capacity

T: Detection task execution time (milliseconds)

Based on different monitor execution durations, the following table can be listed:

Single Detection Time (ms) Single Pod Detection Capacity Compared to Baseline
300 1,000 167%
500 600 baseline
800 375 63%
1,000 300 50%
2,000 150 25%
3,000 100 17%

Conversely, assuming the total number of monitors is M, then the required number of Pods can be derived from M ÷ (5 × (60 × 1,000 ÷ T)).

Formula

Text Only
1
2
P = M ÷ (300,000 ÷ T)
  = M × T ÷ 300,000

P: Required number of Pods

M: Number of monitors

T: Detection task execution time (milliseconds)

Based on different monitor counts and execution durations, the following tables can be listed:

Monitor Count Single Detection Time (ms) Required Pods Compared to Baseline
1,000 300 1 50%
1,000 500 2 baseline
1,000 800 3 150%
1,000 1,000 4 200%
1,000 2,000 7 350%
1,000 3,000 10 500%
Monitor Count Single Detection Time (ms) Required Pods Compared to Baseline
5,000 300 5 56%
5,000 500 9 baseline
5,000 800 14 156%
5,000 1,000 17 189%
5,000 2,000 34 378%
5,000 3,000 50 556%
Monitor Count Single Detection Time (ms) Required Pods Compared to Baseline
10,000 300 10 59%
10,000 500 17 baseline
10,000 800 27 159%
10,000 1,000 34 200%
10,000 2,000 67 394%
10,000 3,000 100 588%

Operation Method

For single-machine deployed DataFlux Func, scaling can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml) and increasing the deploy.replicas for the corresponding service.

Please refer to the official documentation

For complete information on the deploy.replicas option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas

Taking improving worker-8 processing capacity as an example, the specific modified part is as follows:

Example is only an excerpt

The example only shows the key modified parts. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modified Part
1
2
3
4
5
services:
  worker-8:
    deploy:
      # Start 2 work units processing queue 8 simultaneously
      replicas: 2

3. Resource Limits

Resource limits should be adjusted reasonably based on actual business

Please adjust resource limits reasonably based on actual business.

Blindly limiting resources may lead to longer task execution times or insufficient memory preventing code execution.

Operation Method

For single-machine deployed DataFlux Func, resource limits can be implemented by modifying the configuration ({installation directory}/docker-stack.yaml) and adding deploy.resources for the corresponding service.

Please refer to the official documentation

For complete information on the deploy.resources option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources

By default, each worker-N replica can occupy up to 5 CPU cores (i.e., each work unit has 5 worker processes).

Taking limiting resources for worker-8 as an example, the specific modified part is as follows:

Example is only an excerpt

The example only shows the key modified parts. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modified Part
1
2
3
4
5
6
7
services:
  worker-8:
    deploy:
      resources:
        limits:
          cpus  : '2.50' # Limit CPU usage to at most 2.5 cores
          memory: 4G     # Limit memory usage to at most 4 GB

4. Splitting Work Units

All work units are already split in the new version

In standalone Func version 3.2.0 and later, all non-reserved work units are split by default. Users can enable reserved queues as needed.

In data platform attached Func version 1.77.145 and later, all work units are split by default, and users no longer need to split them manually.

Under special circumstances, default merged work units (e.g., worker-1-6) can be split to achieve finer-grained task scheduling, enabling scaling and resource limiting for work units responsible for specific queues.

Assuming based on business requirements, DataFlux Func has high performance requirements for subscription processing and hopes that subscription message processing does not interfere with synchronous function API processing, then worker-1-6 can be split into worker-1-5 and worker-6.

Operation Method

For single-machine deployed DataFlux Func, splitting work units can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml), adding or modifying corresponding services, and changing the queue numbers specified in the command.

Specifying the queues a work unit listens to is achieved through the parameters after ./run-worker-by-queue.sh. The service name itself is mainly for labeling purposes; it is recommended to keep it consistent with the actual listening queues to avoid confusion.

Example is only an excerpt

The example only shows the key modified parts. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modified Part
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
services:
  # Delete the original "worker-1-6" and replace it with the following content

  worker-1-5:
    # Specify the work unit to process work queues 1 ~ 5
    command: ./run-worker-by-queue.sh 1 2 3 4 5

  worker-6:
    # Specify the work unit to process work queue 6
    command: ./run-worker-by-queue.sh 6