Deployment and Maintenance / Architecture, Scaling, and Resource Limitations
This article primarily introduces the overall architecture of DataFlux Func and how to scale it to improve processing capabilities.
1. Architecture
Internally in the system, there is a typical "Producer -> Consumer" model. Every execution of a Python function goes through the process of "Generating Task -> Enqueue -> Dequeue -> Execution -> Returning Result".
Any Python function is actually first wrapped into a "task" that enters its corresponding "work queue" (starting from #0
), and then executed by the corresponding "work unit" (starting from worker-0
) after being fetched from the queue.
flowchart TB
USER[User]
FUNC_SERVER[Func Server Service]
REDIS_QUEUE_N[Redis Queue #N]
FUNC_WORKER_N[Func Worker-N Service]
FUNC_BEAT[Func Beat Service]
USER --HTTP Request--> FUNC_SERVER
FUNC_SERVER --Enqueue function execution task--> REDIS_QUEUE_N
REDIS_QUEUE_N --Dequeue function execution task--> FUNC_WORKER_N
FUNC_BEAT --"Enqueue function execution tasks
(Scheduled Tasks)"--> REDIS_QUEUE_N
1.1 Services and Their Purposes
DataFlux Func includes multiple services, each with different responsibilities. The specific services are as follows:
Service | Purpose |
---|---|
server | Web service, providing the following functions: 1. Web Interface 2. API Interface 3. Subscription Maintainer |
worker-{Queue Number} | Work Unit, used for executing user scripts, including: 1. Synchronous API (Old Version: Authorized Link) 2. Asynchronous API (Old Version: Batch Processing) 3. Scheduled Tasks (Old Version: Automatic Trigger Configuration) Also handles some system-level background tasks See Queue Description |
beat | Trigger for scheduled tasks |
mysql | Database |
redis | Cache / Function Execution Task Queue |
1.2 Work Unit and Queue Listening Relationship
For the service worker-{Queue Number}
(Work Unit), each Worker service listens only to certain queues:
Queues and work units do not need to be one-to-one
Queues and work units are not necessarily one-to-one. For example, work unit worker-0
can listen not only to tasks in queue #0
, but each work unit can listen to any one or more queues.
Moreover, the same queue can also be listened to simultaneously by multiple work units, or not listened to at all (not recommended).
Independent deployment Func and TrueWatch attached Func have different queues
Since most independently deployed Funcs are lightly used, to reduce unnecessary resource consumption, the number of work units in an independent deployment Func is less than the number of queues.
Conversely, due to handling monitors, message sending modules (Message Desk) and other heavy-duty businesses, the work units and queues in the TrueWatch attached Func are one-to-one, and there are more numbered work units and queues compared to the independent deployment Func.
Work Unit | Queue Independent Deployment |
Queue TrueWatch Attached |
---|---|---|
worker-0 | #0, #4, #7, #8, #9 | #0 |
worker-1 | #1 | #1 |
worker-2 | #2 | #2 |
worker-3 | #3 | #3 |
worker-4 | - | #4 |
worker-5 | #5 | #5 |
worker-6 | #6 | #6 |
worker-7 | - | #7 |
worker-8 | - | #8 |
worker-9 | - | #9 |
worker-10 | - | #10 |
worker-11 | - | #11 |
worker-12 | - | #12 |
worker-13 | - | #13 |
worker-14 | - | #14 |
worker-15 | - | #15 |
Work Unit | Queue Independent Deployment |
Queue TrueWatch Attached |
---|---|---|
worker-0 | #0, #4, #7, #8, #9 | #0 |
worker-1 | #1 | #1 |
worker-2 | #2 | #2 |
worker-3 | #3 | #3 |
worker-4 | - | #4 |
worker-5 | #5 | #5 |
worker-6 | #6 | #6 |
worker-7 | - | #7 |
worker-8 | - | #8 |
worker-9 | - | #9 |
Work Unit | Queue |
---|---|
worker-0 | #0 |
worker-1-6 | #1, #2, #3, #4, #5, #6 |
worker-7 | #7 |
worker-8-9 | #8, #9 |
2. Services / Queues and Responsibilities with Scaling Suggestions
Scaling requires additional hardware investment
Scaling requires higher performance requirements from the host server, including but not limited to the server itself, database services, Redis, etc.
Generally speaking, scaling DataFlux Func usually just involves increasing the number of replicas for corresponding services. Therefore, users should first understand their actual business situation so they can scale accordingly.
The complete list of services, queues, their responsibilities, and scaling suggestions are as follows:
Service / Queue | Responsibility Independent Deployment |
Responsibility TrueWatch Attached |
Default Pod Count TrueWatch Attached |
Scaling Suggestions |
---|---|---|---|---|
server | Web service, providing the following functions: 1. Web Interface 2. API Interface 3. Subscription Maintainer |
← Same as Left | 1 | Generally no need for scaling |
server-inner | (No Such Service) | Web service, specifically for internal cluster API calls | 1 | Generally no need for scaling |
worker-0 Queue #0 |
System work unit, does not directly handle user code | ← Same as Left | 2 | Generally no need for scaling |
worker-1 Queue #1 |
Executes functions from synchronous API (Old Version: Authorized Link) | ← Same as Left | 1 | Can scale when needing to increase concurrency of synchronous API (Old Version: Authorized Link) |
worker-2 Queue #2 |
Executes functions from scheduled tasks (Old Version: Automatic Trigger Configuration) | ← Same as Left | 1 | Can scale when needing to increase concurrency of scheduled tasks (Old Version: Automatic Trigger Configuration) |
worker-3 Queue #3 |
Executes functions from asynchronous API (Old Version: Batch Processing) | ← Same as Left | 1 | Can scale when needing to increase concurrency of asynchronous API (Old Version: Batch Processing) |
worker-4 Queue #4 |
(Reserved) | (Reserved) | 0 | No need for scaling |
worker-5 Queue #5 |
Debugging code execution i.e., running functions directly on the Web Interface |
← Same as Left | 1 | Can scale when needing to support more users developing scripts simultaneously |
worker-6 Queue #6 |
Executes functions from connector subscription message processing | ← Same as Left | 1 | Can scale when needing to increase concurrency of connector subscription message processing |
worker-7 Queue #7 |
(Reserved) | Executes TrueWatch system business functions e.g., via TrueWatch backend admin login, update various caches, release message aggregation pools, etc. |
2 | Can scale when having more monitoring agents |
worker-8 Queue #8 |
(Reserved) | Executes threshold detection and other common monitor-related functions | 5 | Can scale when having more common monitors |
worker-9 Queue #9 |
(Reserved) | Executes advanced detection and intelligent monitoring functions | 3 | Can scale when having more common advanced detection and intelligent monitors |
worker-10 Queue #10 |
(No Such Service) | Executes user-reported event handling functions in TrueWatch | 1 | Can scale when user-reported events volume is large |
worker-11 Queue #11 |
(No Such Service) | Executes Message Desk message sending tasks in TrueWatch | 3 | Can scale when message sending volume is large |
worker-12 Queue #12 |
(No Such Service) | (Reserved) | 0 | No need for scaling |
worker-13 Queue #13 |
(No Such Service) | (Reserved) | 0 | No need for scaling |
worker-14 Queue #14 |
(No Such Service) | Executes immediate response AI-related processing in TrueWatch e.g., invoking "automatic Pipeline writing" etc. |
2 | Can scale when needing to support more users writing Pipelines simultaneously |
worker-15 Queue #15 |
(No Such Service) | Executes non-immediate response AI-related processing in TrueWatch e.g., invoking "alarm compression merging" processing etc. |
2 | Can scale when using AI-aggregated alarms for more monitors |
beat | Trigger for scheduled tasks | ← Same as Left | 1 | Must not scale, ensure global single replica |
mysql | Database | (No Such Service) | - | No need for scaling, choose self-hosted or cloud services for higher demands |
redis | Cache / Function Execution Task Queue | (No Such Service) | - | No need for scaling, choose self-hosted or cloud services for higher demands |
Service / Queue | Responsibility Independent Deployment |
Responsibility TrueWatch Attached |
Scaling Suggestions |
---|---|---|---|
server | Web service, providing the following functions: 1. Web Interface 2. API Interface 3. Subscription Maintainer |
← Same as Left | Generally no need for scaling |
server-inner | (No Such Service) | Web service, specifically for internal cluster API calls | Generally no need for scaling |
worker-0 Queue #0 |
System work unit, does not directly handle user code | ← Same as Left | Generally no need for scaling |
worker-1 Queue #1 |
Executes functions from synchronous API (Old Version: Authorized Link) | ← Same as Left | Can scale when needing to increase concurrency of synchronous API (Old Version: Authorized Link) |
worker-2 Queue #2 |
Executes functions from scheduled tasks (Old Version: Automatic Trigger Configuration) | ← Same as Left | Can scale when needing to increase concurrency of scheduled tasks (Old Version: Automatic Trigger Configuration) |
worker-3 Queue #3 |
Executes functions from asynchronous API (Old Version: Batch Processing) | ← Same as Left | Can scale when needing to increase concurrency of asynchronous API (Old Version: Batch Processing) |
worker-4 Queue #4 |
(Reserved) | (Reserved) | No need for scaling |
worker-5 Queue #5 |
Debugging code execution i.e., running functions directly on the Web Interface |
← Same as Left | Can scale when needing to support more users developing scripts simultaneously |
worker-6 Queue #6 |
Executes functions from connector subscription message processing | ← Same as Left | Can scale when needing to increase concurrency of connector subscription message processing |
worker-7 Queue #7 |
(Reserved) | Executes TrueWatch system business, message sending functions e.g., via TrueWatch backend admin login, update various caches, release message aggregation pools, Message Desk message sending |
Can scale when message sending volume is large |
worker-8 Queue #8 |
(Reserved) | Executes threshold detection and other common monitor-related functions | Can scale when having more common monitors |
worker-9 Queue #9 |
(Reserved) | Executes advanced detection and intelligent monitoring functions | Can scale when having more common advanced detection and intelligent monitors |
beat | Trigger for scheduled tasks | ← Same as Left | Must not scale, ensure global single replica |
mysql | Database | (No Such Service) | No need for scaling, choose self-hosted or cloud services for higher demands |
redis | Cache / Function Execution Task Queue | (No Such Service) | No need for scaling, choose self-hosted or cloud services for higher demands |
Service | Responsibility | Scaling Suggestions |
---|---|---|
server | Web service, providing the following functions: 1. Web Interface 2. API Interface 3. Subscription Maintainer |
Generally no need for scaling |
worker-0 Queue #0 |
System work unit, does not directly handle user code | Generally no need for scaling |
worker-1-6 Queues #1, #2, #3, #4, #5, #6 |
By default, responsible for handling function synchronous calls, such as: 1. Authorized Link Handling 2. Subscription Message Handling |
Can scale when needing to increase concurrency of authorized links and subscription messages |
worker-7 Queue #7 |
By default, responsible for debugging code handling (i.e., running functions directly on the Web Interface) | Can scale when needing to support more users developing scripts simultaneously |
worker-8-9 Queues #8, #9 |
By default, responsible for handling function asynchronous calls, such as: 1. Automatic Trigger Handling 2. Batch Processing |
Can scale when needing to increase concurrency of automatic triggers and batch processing |
beat | Trigger for scheduled tasks | Must not scale, ensure global single replica |
mysql | Database | No need for scaling, choose self-hosted or cloud services for higher demands |
redis | Cache / Function Execution Task Queue | No need for scaling, choose self-hosted or cloud services for higher demands |
Example: When needing to enhance the processing capability of scheduled tasks...
From the above, scheduled tasks are located in "Queue #8
", "Queue #8
" corresponds to "Service worker-8
", so scaling "Service worker-8
" will suffice.
Estimating Scaling Capacity
Taking the common worker-8
as an example:
worker-8
in the TrueWatch attached version mainly executes monitor tasks. Assuming a single detection task takes T
milliseconds, then in 1 minute it can execute 60 × 1,000 ÷ T
detections. By default, worker-8
starts 5 processes per Pod.
Thus, the detection capability of a single worker-8
Pod is 5 × (60 × 1,000 ÷ T)
monitors.
Formula
Text Only | |
---|---|
1 2 |
|
A: Detection Capability
T: Detection Task Execution Time (milliseconds)
Based on the execution time of each monitor, the table below can be listed:
Single Detection Time | Detection Capability per Pod | Compared to Baseline |
---|---|---|
300 | 1,000 | 167% |
500 | 600 | Baseline |
800 | 375 | 63% |
1,000 | 300 | 50% |
2,000 | 150 | 25% |
3,000 | 100 | 17% |
Conversely, assuming a consistent total number of monitors M
, then the required number of Pods can be calculated based on M ÷ (5 × (60 × 1,000 ÷ T))
.
Formula
Text Only | |
---|---|
1 2 |
|
P: Required Number of Pods
M: Monitor Count
T: Detection Task Execution Time (milliseconds)
Based on the number of monitors and the execution time of each, the table below can be listed:
Monitor Count | Single Detection Time | Required Pod Count | Compared to Baseline |
---|---|---|---|
1,000 | 300 | 1 | 50% |
1,000 | 500 | 2 | Baseline |
1,000 | 800 | 3 | 150% |
1,000 | 1,000 | 4 | 200% |
1,000 | 2,000 | 7 | 350% |
1,000 | 3,000 | 10 | 500% |
Monitor Count | Single Detection Time | Required Pod Count | Compared to Baseline |
---|---|---|---|
5,000 | 300 | 5 | 56% |
5,000 | 500 | 9 | Baseline |
5,000 | 800 | 14 | 156% |
5,000 | 1,000 | 17 | 189% |
5,000 | 2,000 | 34 | 378% |
5,000 | 3,000 | 50 | 556% |
Monitor Count | Single Detection Time | Required Pod Count | Compared to Baseline |
---|---|---|---|
10,000 | 300 | 10 | 59% |
10,000 | 500 | 17 | Baseline |
10,000 | 800 | 27 | 159% |
10,000 | 1,000 | 34 | 200% |
10,000 | 2,000 | 67 | 394% |
10,000 | 3,000 | 100 | 588% |
Operation Method
A single-machine deployed DataFlux Func can achieve scaling by modifying the configuration ({Installation Directory}/docker-stack.yaml
) and increasing the deploy.replicas
for the corresponding service.
Please refer to the official documentation
For complete information about the deploy.replicas option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas
To enhance the processing capability of worker-8
, the specific modification part is as follows:
Example is excerpt only
The example shows only the key modification parts; please ensure the configuration is complete during actual operations.
docker-stack.yaml Key Modification Parts | |
---|---|
1 2 3 4 5 |
|
3. Resource Limitation
Resource limitation needs to be reasonably adjusted according to actual business needs
Please adjust resource limitations reasonably according to actual business needs.
Arbitrary limitation of resources may result in longer task execution times or insufficient memory to complete code execution.
Operation Method
A single-machine deployed DataFlux Func can limit resources by modifying the configuration ({Installation Directory}/docker-stack.yaml
) and adding deploy.resources
for the corresponding service.
Please refer to the official documentation
For complete information about the deploy.resources option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources
By default, each worker-N
replica can occupy up to 5 CPU cores (i.e., there are 5 work processes in each work unit).
To limit the resources used by worker-8
, the specific modification part is as follows:
Example is excerpt only
The example shows only the key modification parts; please ensure the configuration is complete during actual operations.
docker-stack.yaml Key Modification Parts | |
---|---|
1 2 3 4 5 6 7 |
|
4. Splitting Work Units
In the new versions, all work units have already been split
In the independent deployment Func 3.2.0 and later versions, all non-reserved work units are split by default, and users can enable reserved queues as needed.
In the TrueWatch attached Func 1.77.145 and later versions, all work units are split by default, and users do not need to split them manually.
In special cases, the default merged work units (e.g., worker-1-6
) can be split to achieve finer-grained task scheduling, allowing scaling and resource limiting of work units responsible for specific queues.
Assuming that according to business needs, DataFlux Func has higher performance requirements for subscription processing, and hopes that subscription message processing will not interfere with synchronous API (Old Version: Authorized Link) processing, then worker-1-6
can be split into worker-1-5
and worker-6
.
Operation Method
A single-machine deployed DataFlux Func can achieve splitting work units by modifying the configuration ({Installation Directory}/docker-stack.yaml
), adding, and modifying corresponding services, and specifying the queue numbers in the command
.
Specify the queues that the work unit listens to via parameters after ./run-worker-by-queue.sh
. The service name itself is mainly used as a label, and it is recommended to keep it consistent with the actual listened queue to avoid confusion.
Example is excerpt only
The example shows only the key modification parts; please ensure the configuration is complete during actual operations.
docker-stack.yaml Key Modification Parts | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|