Deployment and Maintenance / Architecture, Scaling, and Resource Limits
This article mainly introduces the overall architecture of DataFlux Func and how to scale to improve processing capabilities.
1. Architecture
Internally, the system follows a typical "producer -> consumer" model. Any execution of a Python function goes through the process of "task generation -> enqueue -> dequeue -> execution -> return result."
Any Python function is first wrapped into a "task" and placed into its corresponding "work queue" (numbered starting from #0
). It is then executed by the corresponding "worker unit" (numbered starting from worker-0
) after being dequeued.
flowchart TB
USER[User]
FUNC_SERVER[Func Server Service]
REDIS_QUEUE_N[Redis Queue #N]
FUNC_WORKER_N[Func Worker-N Service]
FUNC_BEAT[Func Beat Service]
USER --HTTP Request--> FUNC_SERVER
FUNC_SERVER --Function Execution Task Enqueue--> REDIS_QUEUE_N
REDIS_QUEUE_N --Function Execution Task Dequeue--> FUNC_WORKER_N
FUNC_BEAT --"Function Execution Task Enqueue
(Cron Job)"--> REDIS_QUEUE_N
1.1 Services and Their Purposes
DataFlux Func includes multiple services, each with different responsibilities. The specific services are as follows:
Service | Purpose |
---|---|
server | Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers |
worker-{queue number} | Worker unit, used to execute user scripts, including: 1. Func API 2. Sync / Async API (legacy "Auth Link" / "Batch") 3. Cron Job (legacy: automatic trigger configuration) Also handles some system-level background tasks See queue description for details |
beat | Cron Job trigger |
mysql | Database |
redis | Cache / Function execution task queue |
1.2 Worker Unit and Queue Listening Relationship
For the service worker-{queue number}
(worker unit), each Worker service only listens to specific queues:
Queues and Worker Units Do Not Need to Be One-to-One
Queues and worker units do not have to be one-to-one. For example, the worker unit worker-0
is not limited to listening to queue #0
. Each worker unit can listen to any one or more queues.
Additionally, the same queue can be listened to by multiple worker units simultaneously, or not listened to at all (not recommended).
Different Queues for Standalone Func and Data Platform Attached Func
Since most standalone Func deployments are relatively lightweight, to reduce unnecessary resource consumption, the number of worker units in standalone Func is less than the number of queues.
Conversely, the Data Platform Attached Func, which handles heavy tasks such as monitors and message sending modules (Message Desk), has worker units and queues in a one-to-one correspondence, with more numbered worker units and queues than standalone Func.
Worker Unit | Queue Standalone |
Queue Data Platform Attached |
---|---|---|
worker-0 | #0, #4, #7, #8, #9 | #0 |
worker-1 | #1 | #1 |
worker-2 | #2 | #2 |
worker-3 | #3 | #3 |
worker-4 | - | #4 |
worker-5 | #5 | #5 |
worker-6 | #6 | #6 |
worker-7 | - | #7 |
worker-8 | - | #8 |
worker-9 | - | #9 |
worker-10 | - | #10 |
worker-11 | - | #11 |
worker-12 | - | #12 |
worker-13 | - | #13 |
worker-14 | - | #14 |
worker-15 | - | #15 |
Worker Unit | Queue Standalone |
Queue Data Platform Attached |
---|---|---|
worker-0 | #0, #4, #7, #8, #9 | #0 |
worker-1 | #1 | #1 |
worker-2 | #2 | #2 |
worker-3 | #3 | #3 |
worker-4 | - | #4 |
worker-5 | #5 | #5 |
worker-6 | #6 | #6 |
worker-7 | - | #7 |
worker-8 | - | #8 |
worker-9 | - | #9 |
Worker Unit | Queue |
---|---|
worker-0 | #0 |
worker-1-6 | #1, #2, #3, #4, #5, #6 |
worker-7 | #7 |
worker-8-9 | #8, #9 |
2. Services / Queues and Their Responsibilities and Scaling Recommendations
Scaling Requires More Hardware Investment
Scaling requires higher performance from the server, including but not limited to the server itself, database service, Redis, etc.
Generally, scaling DataFlux Func only requires increasing the number of replicas for the corresponding service. Therefore, users should first understand their actual business situation to scale accordingly.
The complete services, queues, their responsibilities, and scaling recommendations are as follows:
Service / Queue | Responsibility Standalone |
Responsibility Data Platform Attached |
Default Pod Count Data Platform Attached |
Scaling Recommendation |
---|---|---|---|---|
server | Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers |
← Same as left | 1 | Generally does not need scaling |
server-inner | (No such service) | Web service, specifically for internal API calls within the cluster | 1 | Generally does not need scaling |
worker-0 Queue #0 |
System worker unit, does not directly participate in user code processing | ← Same as left | 2 | Generally does not need scaling |
worker-1 Queue #1 |
Executes tasks from 1. Func API (Sync execution) 2. Legacy Sync API 3. Legacy Auth Link |
← Same as left | 1 | Scale when needing to increase concurrency for Sync API (legacy: Auth Link) |
worker-2 Queue #2 |
Executes tasks from Cron Job (legacy: automatic trigger configuration) | ← Same as left | 1 | Scale when needing to increase concurrency for Cron Job (legacy: automatic trigger configuration) |
worker-3 Queue #3 |
Executes tasks from 1. Func API (Async execution) 2. Legacy Async API 3. Legacy Batch |
← Same as left | 1 | Scale when needing to increase concurrency for Async API (legacy: Batch) |
worker-4 Queue #4 |
(Reserved) | (Reserved) | 0 | Does not need scaling |
worker-5 Queue #5 |
Debug code execution i.e., directly running functions in the Web interface |
← Same as left | 1 | Scale when needing to support more users developing scripts simultaneously |
worker-6 Queue #6 |
Executes tasks from Connector subscription message processing | ← Same as left | 1 | Scale when needing to increase concurrency for Connector subscription message processing |
worker-7 Queue #7 |
(Reserved) | Executes function tasks for Data Platform system business e.g., Data Platform backend administrator login, updating various caches, releasing message aggregation pools, etc. |
2 | Scale when the total number of monitors is large |
worker-8 Queue #8 |
(Reserved) | Executes function tasks for Data Platform threshold detection, metric generation, etc. | 5 | Scale when the number of ordinary monitors is large |
worker-9 Queue #9 |
(Reserved) | Executes function tasks for Data Platform advanced detection, intelligent monitoring | 3 | Scale when the number of advanced detection, intelligent monitors is large |
worker-10 Queue #10 |
(No such service) | Executes function tasks for Data Platform receiving user-reported events | 1 | Scale when the volume of user-reported events is large |
worker-11 Queue #11 |
(No such service) | Executes Message Desk message sending tasks | 3 | Scale when the volume of message sending is large |
worker-12 Queue #12 |
(No such service) | (Reserved) | 0 | Does not need scaling |
worker-13 Queue #13 |
(No such service) | (Reserved) | 0 | Does not need scaling |
worker-14 Queue #14 |
(No such service) | Executes AI-related processing that requires immediate user response e.g., calling "Auto Pipeline Writing" |
2 | Scale when needing to support more users writing Piplelines simultaneously |
worker-15 Queue #15 |
(No such service) | Executes AI-related processing that does not require immediate user response e.g., calling "Alert Compression and Merging" processing |
2 | Scale when the number of monitors using AI alert aggregation is large |
beat | Cron Job trigger | ← Same as left | 1 | Do not scale, ensure global single instance |
mysql | Database | (No such service) | - | Does not need scaling, choose self-built or cloud service for higher demands |
redis | Cache / Function execution task queue | (No such service) | - | Does not need scaling, choose self-built or cloud service for higher demands |
Service / Queue | Responsibility Standalone |
Responsibility Data Platform Attached |
Scaling Recommendation |
---|---|---|---|
server | Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers |
← Same as left | Generally does not need scaling |
server-inner | (No such service) | Web service, specifically for internal API calls within the cluster | Generally does not need scaling |
worker-0 Queue #0 |
System worker unit, does not directly participate in user code processing | ← Same as left | Generally does not need scaling |
worker-1 Queue #1 |
Executes tasks from Sync API (legacy: Auth Link) | ← Same as left | Scale when needing to increase concurrency for Sync API (legacy: Auth Link) |
worker-2 Queue #2 |
Executes tasks from Cron Job (legacy: automatic trigger configuration) | ← Same as left | Scale when needing to increase concurrency for Cron Job (legacy: automatic trigger configuration) |
worker-3 Queue #3 |
Executes tasks from Async API (legacy: Batch) | ← Same as left | Scale when needing to increase concurrency for Async API (legacy: Batch) |
worker-4 Queue #4 |
(Reserved) | (Reserved) | Does not need scaling |
worker-5 Queue #5 |
Debug code execution i.e., directly running functions in the Web interface |
← Same as left | Scale when needing to support more users developing scripts simultaneously |
worker-6 Queue #6 |
Executes tasks from Connector subscription message processing | ← Same as left | Scale when needing to increase concurrency for Connector subscription message processing |
worker-7 Queue #7 |
(Reserved) | Executes function tasks for Data Platform system business, message sending e.g., Data Platform backend administrator login, updating various caches, releasing message aggregation pools, Message Desk message sending |
Scale when the volume of message sending is large |
worker-8 Queue #8 |
(Reserved) | Executes function tasks for Data Platform threshold detection, etc. | Scale when the number of ordinary monitors is large |
worker-9 Queue #9 |
(Reserved) | Executes function tasks for Data Platform advanced detection, intelligent monitoring | Scale when the number of advanced detection, intelligent monitors is large |
beat | Cron Job trigger | ← Same as left | Do not scale, ensure global single instance |
mysql | Database | (No such service) | Does not need scaling, choose self-built or cloud service for higher demands |
redis | Cache / Function execution task queue | (No such service) | Does not need scaling, choose self-built or cloud service for higher demands |
Service | Responsibility | Scaling Recommendation |
---|---|---|
server | Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers |
Generally does not need scaling |
worker-0 Queue #0 |
System worker unit, does not directly participate in user code processing | Generally does not need scaling |
worker-1-6 Queue #1, #2, #3, #4, #5, #6 |
By default, responsible for function synchronous call processing, such as: 1. Auth Link processing 2. Subscription message processing |
Scale when needing to increase concurrency for Auth Link, subscription message processing |
worker-7 Queue #7 |
By default, responsible for debug code processing (i.e., directly running functions in the Web interface) | Scale when needing to support more users developing scripts simultaneously |
worker-8-9 Queue #8, #9 |
By default, responsible for function asynchronous call processing, such as: 1. Automatic trigger processing 2. Batch processing |
Scale when needing to increase concurrency for automatic trigger, batch processing |
beat | Cron Job trigger | Do not scale, ensure global single instance |
mysql | Database | Does not need scaling, choose self-built or cloud service for higher demands |
redis | Cache / Function execution task queue | Does not need scaling, choose self-built or cloud service for higher demands |
Example: When needing to enhance Cron Job processing capability...
As mentioned above, Cron Jobs are located in "Queue #8
," which corresponds to "Service worker-8
." Therefore, scaling "Service worker-8
" is sufficient.
Estimating Scaling Amount
Taking the common worker-8
as an example:
worker-8
in the Data Platform Attached version is mainly responsible for executing monitor tasks. Assuming a single detection task takes T
milliseconds, then 1 minute can execute 60 × 1,000 ÷ T
detections. By default, worker-8
each Pod starts 5 processes.
That is, the detection capability of a single worker-8
Pod is 5 × (60 × 1,000 ÷ T)
monitors.
Formula
Text Only | |
---|---|
1 2 |
|
A: Detection capability
T: Detection task execution time (milliseconds)
Based on the different execution times of monitors, the following table can be listed:
Single Detection Time | Single Pod Detection Capability | Compared to Baseline |
---|---|---|
300 | 1,000 | 167% |
500 | 600 | Baseline |
800 | 375 | 63% |
1,000 | 300 | 50% |
2,000 | 150 | 25% |
3,000 | 100 | 17% |
Conversely, assuming the total number of monitors is M
, then the required number of Pods can be calculated based on M ÷ (5 × (60 × 1,000 ÷ T))
.
Formula
Text Only | |
---|---|
1 2 |
|
P: Required number of Pods
M: Number of monitors
T: Detection task execution time (milliseconds)
Based on the number of monitors and their execution times, the following table can be listed:
Number of Monitors | Single Detection Time | Required Number of Pods | Compared to Baseline |
---|---|---|---|
1,000 | 300 | 1 | 50% |
1,000 | 500 | 2 | Baseline |
1,000 | 800 | 3 | 150% |
1,000 | 1,000 | 4 | 200% |
1,000 | 2,000 | 7 | 350% |
1,000 | 3,000 | 10 | 500% |
Number of Monitors | Single Detection Time | Required Number of Pods | Compared to Baseline |
---|---|---|---|
5,000 | 300 | 5 | 56% |
5,000 | 500 | 9 | Baseline |
5,000 | 800 | 14 | 156% |
5,000 | 1,000 | 17 | 189% |
5,000 | 2,000 | 34 | 378% |
5,000 | 3,000 | 50 | 556% |
Number of Monitors | Single Detection Time | Required Number of Pods | Compared to Baseline |
---|---|---|---|
10,000 | 300 | 10 | 59% |
10,000 | 500 | 17 | Baseline |
10,000 | 800 | 27 | 159% |
10,000 | 1,000 | 34 | 200% |
10,000 | 2,000 | 67 | 394% |
10,000 | 3,000 | 100 | 588% |
Operation Method
For single-machine deployed DataFlux Func, scaling can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml
) and increasing the deploy.replicas
for the corresponding service.
Refer to Official Documentation
For complete information on the deploy.replicas option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas
To enhance the processing capability of worker-8
, the specific modification part is as follows:
Example is only a selection
The example only shows the key modification part. Please ensure the configuration is complete during actual operation.
Key Modification Part of docker-stack.yaml | |
---|---|
1 2 3 4 5 |
|
3. Resource Limits
Resource limits should be adjusted reasonably based on actual business
Please adjust resource limits reasonably based on actual business.
Blindly limiting resources may lead to longer task execution times or insufficient memory to complete code execution.
Operation Method
For single-machine deployed DataFlux Func, resource limits can be implemented by modifying the configuration ({installation directory}/docker-stack.yaml
) and adding deploy.resources
for the corresponding service.
Refer to Official Documentation
For complete information on the deploy.resources option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources
By default, each worker-N
replica can occupy up to 5 CPU cores (i.e., each worker unit has 5 worker processes).
To limit the resources occupied by worker-8
, the specific modification part is as follows:
Example is only a selection
The example only shows the key modification part. Please ensure the configuration is complete during actual operation.
Key Modification Part of docker-stack.yaml | |
---|---|
1 2 3 4 5 6 7 |
|
4. Splitting Worker Units
All worker units have been split in the new version
In standalone Func version 3.2.0 and later, all non-reserved worker units have been split by default. Users can enable reserved queues as needed.
In Data Platform Attached Func version 1.77.145 and later, all worker units have been split by default, and users no longer need to split them manually.
In special cases, the default merged worker units (e.g., worker-1-6
) can be split to achieve finer-grained task scheduling, enabling scaling and resource limiting for worker units responsible for specific queues.
Assuming that based on business requirements, DataFlux Func has higher performance requirements for subscription processing and hopes that subscription message processing does not interfere with Sync API (legacy: Auth Link) processing, then worker-1-6
can be split into worker-1-5
and worker-6
.
Operation Method
For single-machine deployed DataFlux Func, splitting worker units can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml
), adding or modifying the corresponding service, and changing the queue number specified in command
.
Specifying the queue listened to by the worker unit is achieved through the parameters after ./run-worker-by-queue.sh
. The service name itself is mainly used for labeling. It is recommended to be consistent with the actual listening queue to avoid confusion.
Example is only a selection
The example only shows the key modification part. Please ensure the configuration is complete during actual operation.
Key Modification Part of docker-stack.yaml | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|