Deployment and Maintenance / Architecture, Scaling, and Resource Limits
This Guide mainly introduces the overall architecture of DataFlux Func, and how to scale to improve processing capabilities.
1. Architecture
Internally, it follows the typical "Producer -> Consumer" model. Any execution of a Python function will go through the process of "Task Generation -> Enqueue -> Dequeue -> Execution -> Return Result".
Any Python function is first wrapped into a "Task" and enters its corresponding "Work Queue" (numbered from #0
), and then is fetched and executed by the corresponding "Worker Unit" (numbered from worker-0
).
flowchart TB
USER[User]
FUNC_SERVER[Func Server Service]
REDIS_QUEUE_N[Redis Queue #N]
FUNC_WORKER_N[Func Worker-N Service]
FUNC_BEAT[Func Beat Service]
USER --HTTP Request--> FUNC_SERVER
FUNC_SERVER --Function Execution Task Enqueue--> REDIS_QUEUE_N
REDIS_QUEUE_N --Function Execution Task Dequeue--> FUNC_WORKER_N
FUNC_BEAT --"Function Execution Task Enqueue
(Cron Job)"--> REDIS_QUEUE_N
1.1 Services and Their Purposes
DataFlux Func includes multiple services, each with different responsibilities. The specific services are as follows:
Service | Purpose |
---|---|
server | Web service, providing the following functionalities: 1. Web interface 2. API interface 3. Maintains subscribers |
worker-{queue number} | Worker unit, used to execute user scripts, including: 1. Func API 2. Func API 3. Cron Job Also handles some system-level background tasks See queue details |
beat | Trigger for Cron Jobs |
mysql | Database |
redis | Cache / Function execution task queue |
1.2 Worker Unit and Queue Listening Relationship
For the service worker-{queue number}
(Worker Unit), each Worker service only listens to specific queues:
Queues and Worker Units Do Not Need to Be One-to-One
Queues and Worker Units do not have to be one-to-one. For example, the Worker Unit worker-0
is not limited to listening to queue #0
tasks. Each Worker Unit can listen to any one or more queues.
Additionally, the same queue can be listened to by multiple Worker Units simultaneously, or not listened to at all (not recommended).
Standalone Deployment Func vs. Data Platform Attached Func Have Different Queues
Since most standalone deployments of Func are lightly used, to reduce unnecessary resource consumption, the number of Worker Units in standalone deployments is less than the number of queues.
Conversely, Data Platform attached Func, which handles heavy business such as monitors and message sending modules (Message Desk), has Worker Units and queues in a one-to-one correspondence, and has more numbered Worker Units and queues than standalone deployments.
Worker Unit | Queue Standalone Deployment |
Queue Data Platform Attached |
---|---|---|
worker-0 | #0, #4, #7, #8, #9 | #0 |
worker-1 | #1 | #1 |
worker-2 | #2 | #2 |
worker-3 | #3 | #3 |
worker-4 | - | #4 |
worker-5 | #5 | #5 |
worker-6 | #6 | #6 |
worker-7 | - | #7 |
worker-8 | - | #8 |
worker-9 | - | #9 |
worker-10 | - | #10 |
worker-11 | - | #11 |
worker-12 | - | #12 |
worker-13 | - | #13 |
worker-14 | - | #14 |
worker-15 | - | #15 |
Worker Unit | Queue Standalone Deployment |
Queue Data Platform Attached |
---|---|---|
worker-0 | #0, #4, #7, #8, #9 | #0 |
worker-1 | #1 | #1 |
worker-2 | #2 | #2 |
worker-3 | #3 | #3 |
worker-4 | - | #4 |
worker-5 | #5 | #5 |
worker-6 | #6 | #6 |
worker-7 | - | #7 |
worker-8 | - | #8 |
worker-9 | - | #9 |
Worker Unit | Queue |
---|---|
worker-0 | #0 |
worker-1-6 | #1, #2, #3, #4, #5, #6 |
worker-7 | #7 |
worker-8-9 | #8, #9 |
2. Services / Queues and Their Responsibilities and Scaling Recommendations
Scaling Requires More Hardware Investment
Scaling requires higher performance from the server, including but not limited to the server itself, database services, Redis, etc.
Generally, scaling DataFlux Func only requires increasing the number of replicas of the corresponding services. Therefore, users should first understand their actual business situation to scale accordingly.
The complete services, queues, their responsibilities, and scaling recommendations are as follows:
Service / Queue | Responsibility Standalone Deployment |
Responsibility Data Platform Attached |
Default Pod Count Data Platform Attached |
Scaling Recommendation |
---|---|---|---|---|
server | Web service, providing the following functionalities: 1. Web interface 2. API interface 3. Maintains subscribers |
← Same as Left | 1 | Generally does not need scaling |
server-inner | (No such service) | Web service, specifically for internal cluster API calls | 1 | Generally does not need scaling |
worker-0 Queue #0 |
System Worker Unit, does not directly participate in user code processing | ← Same as Left | 2 | Generally does not need scaling |
worker-1 Queue #1 |
Executes tasks from Sync API functions | ← Same as Left | 1 | Scale when needing to increase Sync API concurrency |
worker-2 Queue #2 |
Executes tasks from Cron Jobs | ← Same as Left | 1 | Scale when needing to increase Cron Job concurrency |
worker-3 Queue #3 |
Executes tasks from Async API functions | ← Same as Left | 1 | Scale when needing to increase Async API concurrency |
worker-4 Queue #4 |
(Reserved) | (Reserved) | 0 | Does not need scaling |
worker-5 Queue #5 |
Debug code execution i.e., directly running functions in the Web interface |
← Same as Left | 1 | Scale when needing to support more users developing scripts |
worker-6 Queue #6 |
Executes tasks from Connector subscription message processing | ← Same as Left | 1 | Scale when needing to increase Connector subscription message processing concurrency |
worker-7 Queue #7 |
(Reserved) | Executes Data Platform system business tasks e.g., Data Platform backend admin login, updating various caches, releasing message aggregation pools, etc. |
2 | Scale when having a large number of monitors |
worker-8 Queue #8 |
(Reserved) | Executes Data Platform threshold detection and other general monitor, metric generation related tasks | 5 | Scale when having a large number of general monitors |
worker-9 Queue #9 |
(Reserved) | Executes Data Platform advanced detection, intelligent monitor tasks | 3 | Scale when having a large number of advanced detection, intelligent monitors |
worker-10 Queue #10 |
(No such service) | Executes Data Platform user-reported event tasks | 1 | Scale when having a large volume of user-reported events |
worker-11 Queue #11 |
(No such service) | Executes Message Desk message sending tasks | 3 | Scale when having a large volume of message sending |
worker-12 Queue #12 |
(No such service) | (Reserved) | 0 | Does not need scaling |
worker-13 Queue #13 |
(No such service) | (Reserved) | 0 | Does not need scaling |
worker-14 Queue #14 |
(No such service) | Executes AI-related processing that requires immediate user response e.g., calling "Auto Pipeline Writing" |
2 | Scale when needing to support more users writing Piplelines |
worker-15 Queue #15 |
(No such service) | Executes AI-related processing that does not require immediate user response e.g., calling "Alert Compression and Merging" processing |
2 | Scale when having a large number of monitors using AI alert aggregation |
beat | Trigger for Cron Jobs | ← Same as Left | 1 | Do not scale, ensure global single instance |
mysql | Database | (No such service) | - | Does not need scaling, choose self-built or cloud services for higher demands |
redis | Cache / Function execution task queue | (No such service) | - | Does not need scaling, choose self-built or cloud services for higher demands |
Service / Queue | Responsibility Standalone Deployment |
Responsibility Data Platform Attached |
Scaling Recommendation |
---|---|---|---|
server | Web service, providing the following functionalities: 1. Web interface 2. API interface 3. Maintains subscribers |
← Same as Left | Generally does not need scaling |
server-inner | (No such service) | Web service, specifically for internal cluster API calls | Generally does not need scaling |
worker-0 Queue #0 |
System Worker Unit, does not directly participate in user code processing | ← Same as Left | Generally does not need scaling |
worker-1 Queue #1 |
Executes tasks from Sync API functions | ← Same as Left | Scale when needing to increase Sync API concurrency |
worker-2 Queue #2 |
Executes tasks from Cron Jobs | ← Same as Left | Scale when needing to increase Cron Job concurrency |
worker-3 Queue #3 |
Executes tasks from Async API functions | ← Same as Left | Scale when needing to increase Async API concurrency |
worker-4 Queue #4 |
(Reserved) | (Reserved) | Does not need scaling |
worker-5 Queue #5 |
Debug code execution i.e., directly running functions in the Web interface |
← Same as Left | Scale when needing to support more users developing scripts |
worker-6 Queue #6 |
Executes tasks from Connector subscription message processing | ← Same as Left | Scale when needing to increase Connector subscription message processing concurrency |
worker-7 Queue #7 |
(Reserved) | Executes Data Platform system business, message sending tasks e.g., Data Platform backend admin login, updating various caches, releasing message aggregation pools, Message Desk message sending |
Scale when having a large volume of message sending |
worker-8 Queue #8 |
(Reserved) | Executes Data Platform threshold detection and other general monitor related tasks | Scale when having a large number of general monitors |
worker-9 Queue #9 |
(Reserved) | Executes Data Platform advanced detection, intelligent monitor tasks | Scale when having a large number of advanced detection, intelligent monitors |
beat | Trigger for Cron Jobs | ← Same as Left | Do not scale, ensure global single instance |
mysql | Database | (No such service) | Does not need scaling, choose self-built or cloud services for higher demands |
redis | Cache / Function execution task queue | (No such service) | Does not need scaling, choose self-built or cloud services for higher demands |
Service | Responsibility | Scaling Recommendation |
---|---|---|
server | Web service, providing the following functionalities: 1. Web interface 2. API interface 3. Maintains subscribers |
Generally does not need scaling |
worker-0 Queue #0 |
System Worker Unit, does not directly participate in user code processing | Generally does not need scaling |
worker-1-6 Queue #1, #2, #3, #4, #5, #6 |
By default, responsible for synchronous function call processing, such as: 1. Sync API 2. Subscription message processing |
Scale when needing to increase Sync API, subscription message processing concurrency |
worker-7 Queue #7 |
By default, responsible for debug code processing (i.e., directly running functions in the Web interface) | Scale when needing to support more users developing scripts |
worker-8-9 Queue #8, #9 |
By default, responsible for asynchronous function call processing, such as: 1. Async API 2. Cron Jobs |
Scale when needing to increase Cron Job, Async API processing concurrency |
beat | Trigger for Cron Jobs | Do not scale, ensure global single instance |
mysql | Database | Does not need scaling, choose self-built or cloud services for higher demands |
redis | Cache / Function execution task queue | Does not need scaling, choose self-built or cloud services for higher demands |
Example: When needing to enhance Cron Job processing capability...
As mentioned above, Cron Jobs are located in "Queue #8
", and "Queue #8
" corresponds to "Service worker-8
", so scaling "Service worker-8
" is sufficient.
Estimating Scaling Requirements
Taking the common worker-8
as an example:
worker-8
in the Data Platform attached version is mainly responsible for executing monitor tasks. Assuming a detection task takes T
milliseconds, then 1 minute can execute 60 × 1,000 ÷ T
detections. By default, worker-8
has 5 processes per Pod.
Thus, the detection capability of a single worker-8
Pod is 5 × (60 × 1,000 ÷ T)
monitors.
Formula
Text Only | |
---|---|
1 2 |
|
A: Detection capability
T: Detection task execution time (milliseconds)
Based on different execution times for each monitor, the following table can be listed:
Single Detection Time | Single Pod Detection Capability | Compared to Baseline |
---|---|---|
300 | 1,000 | 167% |
500 | 600 | Baseline |
800 | 375 | 63% |
1,000 | 300 | 50% |
2,000 | 150 | 25% |
3,000 | 100 | 17% |
Conversely, assuming the total number of monitors is M
, then the required number of Pods can be calculated based on M ÷ (5 × (60 × 1,000 ÷ T))
.
Formula
Text Only | |
---|---|
1 2 |
|
P: Required number of Pods
M: Number of monitors
T: Detection task execution time (milliseconds)
Based on different numbers of monitors and execution times, the following tables can be listed:
Number of Monitors | Single Detection Time | Required Number of Pods | Compared to Baseline |
---|---|---|---|
1,000 | 300 | 1 | 50% |
1,000 | 500 | 2 | Baseline |
1,000 | 800 | 3 | 150% |
1,000 | 1,000 | 4 | 200% |
1,000 | 2,000 | 7 | 350% |
1,000 | 3,000 | 10 | 500% |
Number of Monitors | Single Detection Time | Required Number of Pods | Compared to Baseline |
---|---|---|---|
5,000 | 300 | 5 | 56% |
5,000 | 500 | 9 | Baseline |
5,000 | 800 | 14 | 156% |
5,000 | 1,000 | 17 | 189% |
5,000 | 2,000 | 34 | 378% |
5,000 | 3,000 | 50 | 556% |
Number of Monitors | Single Detection Time | Required Number of Pods | Compared to Baseline |
---|---|---|---|
10,000 | 300 | 10 | 59% |
10,000 | 500 | 17 | Baseline |
10,000 | 800 | 27 | 159% |
10,000 | 1,000 | 34 | 200% |
10,000 | 2,000 | 67 | 394% |
10,000 | 3,000 | 100 | 588% |
Operation Method
For standalone deployments of DataFlux Func, scaling can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml
), increasing the deploy.replicas
of the corresponding service.
Refer to Official Documentation
For complete information on the deploy.replicas option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas
To improve the processing capability of worker-8
, the specific modification part is as follows:
Example is only a partial excerpt
The example only shows the key modification part. Please ensure the configuration is complete during actual operation.
docker-stack.yaml Key Modification Part | |
---|---|
1 2 3 4 5 |
|
3. Resource Limits
Resource limits should be adjusted reasonably based on actual business
Adjust resource limits reasonably based on actual business needs.
Blindly limiting resources may lead to longer task execution times or insufficient memory to complete code execution.
Operation Method
For standalone deployments of DataFlux Func, resource limits can be implemented by modifying the configuration ({installation directory}/docker-stack.yaml
), adding deploy.resources
for the corresponding service.
Refer to Official Documentation
For complete information on the deploy.resources option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources
By default, each worker-N
replica can occupy up to 5 CPU cores (i.e., each Worker Unit has 5 worker processes).
To limit the resources occupied by worker-8
, the specific modification part is as follows:
Example is only a partial excerpt
The example only shows the key modification part. Please ensure the configuration is complete during actual operation.
docker-stack.yaml Key Modification Part | |
---|---|
1 2 3 4 5 6 7 |
|
4. Splitting Worker Units
All Worker Units are already split in newer versions
In standalone deployments of Func version 3.2.0 and later, all non-reserved Worker Units are split by default. Users can enable reserved queues as needed.
In Data Platform attached Func version 1.77.145 and later, all Worker Units are split by default, and users no longer need to split them manually.
In special cases, default merged Worker Units (e.g., worker-1-6
) can be split to achieve finer-grained task scheduling, enabling scaling and resource limits for Worker Units responsible for specific queues.
Assuming the business requires higher performance for subscription processing in DataFlux Func, and subscription message processing should not interfere with Sync API processing, then worker-1-6
can be split into worker-1-5
and worker-6
.
Operation Method
For standalone deployments of DataFlux Func, splitting Worker Units can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml
), adding or modifying the corresponding service, and changing the queue number specified in command
.
The queue listened to by the Worker Unit is specified by the parameter after ./run-worker-by-queue.sh
. The service name itself is mainly used for labeling and is recommended to be consistent with the actual listening queue to avoid confusion.
Example is only a partial excerpt
The example only shows the key modification part. Please ensure the configuration is complete during actual operation.
docker-stack.yaml Key Modification Part | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|