Skip to content

Deployment and Maintenance / Architecture, Scaling, and Resource Limitations

This article primarily introduces the overall architecture of DataFlux Func and how to scale it to improve processing capabilities.

1. Architecture

Internally in the system, there is a typical "Producer -> Consumer" model. Every execution of a Python function goes through the process of "Generating Task -> Enqueue -> Dequeue -> Execution -> Returning Result".

Any Python function is actually first wrapped into a "task" that enters its corresponding "work queue" (starting from #0), and then executed by the corresponding "work unit" (starting from worker-0) after being fetched from the queue.

flowchart TB
    USER[User]
    FUNC_SERVER[Func Server Service]
    REDIS_QUEUE_N[Redis Queue #N]
    FUNC_WORKER_N[Func Worker-N Service]
    FUNC_BEAT[Func Beat Service]

    USER --HTTP Request--> FUNC_SERVER

    FUNC_SERVER --Enqueue function execution task--> REDIS_QUEUE_N

    REDIS_QUEUE_N --Dequeue function execution task--> FUNC_WORKER_N

    FUNC_BEAT --"Enqueue function execution tasks
    (Scheduled Tasks)"--> REDIS_QUEUE_N

1.1 Services and Their Purposes

DataFlux Func includes multiple services, each with different responsibilities. The specific services are as follows:

Service Purpose
server Web service, providing the following functions:
1. Web Interface
2. API Interface
3. Subscription Maintainer
worker-{Queue Number} Work Unit, used for executing user scripts, including:
1. Synchronous API (Old Version: Authorized Link)
2. Asynchronous API (Old Version: Batch Processing)
3. Scheduled Tasks (Old Version: Automatic Trigger Configuration)
Also handles some system-level background tasks
See Queue Description
beat Trigger for scheduled tasks
mysql Database
redis Cache / Function Execution Task Queue

1.2 Work Unit and Queue Listening Relationship

For the service worker-{Queue Number} (Work Unit), each Worker service listens only to certain queues:

Queues and work units do not need to be one-to-one

Queues and work units are not necessarily one-to-one. For example, work unit worker-0 can listen not only to tasks in queue #0, but each work unit can listen to any one or more queues.

Moreover, the same queue can also be listened to simultaneously by multiple work units, or not listened to at all (not recommended).

Independent deployment Func and TrueWatch attached Func have different queues

Since most independently deployed Funcs are lightly used, to reduce unnecessary resource consumption, the number of work units in an independent deployment Func is less than the number of queues.

Conversely, due to handling monitors, message sending modules (Message Desk) and other heavy-duty businesses, the work units and queues in the TrueWatch attached Func are one-to-one, and there are more numbered work units and queues compared to the independent deployment Func.

Work Unit Queue
Independent Deployment
Queue
TrueWatch Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
worker-10 - #10
worker-11 - #11
worker-12 - #12
worker-13 - #13
worker-14 - #14
worker-15 - #15
Work Unit Queue
Independent Deployment
Queue
TrueWatch Attached
worker-0 #0, #4, #7, #8, #9 #0
worker-1 #1 #1
worker-2 #2 #2
worker-3 #3 #3
worker-4 - #4
worker-5 #5 #5
worker-6 #6 #6
worker-7 - #7
worker-8 - #8
worker-9 - #9
Work Unit Queue
worker-0 #0
worker-1-6 #1, #2, #3, #4, #5, #6
worker-7 #7
worker-8-9 #8, #9

2. Services / Queues and Responsibilities with Scaling Suggestions

Scaling requires additional hardware investment

Scaling requires higher performance requirements from the host server, including but not limited to the server itself, database services, Redis, etc.

Generally speaking, scaling DataFlux Func usually just involves increasing the number of replicas for corresponding services. Therefore, users should first understand their actual business situation so they can scale accordingly.

The complete list of services, queues, their responsibilities, and scaling suggestions are as follows:

Service / Queue Responsibility
Independent Deployment
Responsibility
TrueWatch Attached
Default Pod Count
TrueWatch Attached
Scaling Suggestions
server Web service, providing the following functions:
1. Web Interface
2. API Interface
3. Subscription Maintainer
← Same as Left 1 Generally no need for scaling
server-inner (No Such Service) Web service, specifically for internal cluster API calls 1 Generally no need for scaling
worker-0
Queue #0
System work unit, does not directly handle user code ← Same as Left 2 Generally no need for scaling
worker-1
Queue #1
Executes functions from synchronous API (Old Version: Authorized Link) ← Same as Left 1 Can scale when needing to increase concurrency of synchronous API (Old Version: Authorized Link)
worker-2
Queue #2
Executes functions from scheduled tasks (Old Version: Automatic Trigger Configuration) ← Same as Left 1 Can scale when needing to increase concurrency of scheduled tasks (Old Version: Automatic Trigger Configuration)
worker-3
Queue #3
Executes functions from asynchronous API (Old Version: Batch Processing) ← Same as Left 1 Can scale when needing to increase concurrency of asynchronous API (Old Version: Batch Processing)
worker-4
Queue #4
(Reserved) (Reserved) 0 No need for scaling
worker-5
Queue #5
Debugging code execution
i.e., running functions directly on the Web Interface
← Same as Left 1 Can scale when needing to support more users developing scripts simultaneously
worker-6
Queue #6
Executes functions from connector subscription message processing ← Same as Left 1 Can scale when needing to increase concurrency of connector subscription message processing
worker-7
Queue #7
(Reserved) Executes TrueWatch system business functions
e.g., via TrueWatch backend admin login, update various caches, release message aggregation pools, etc.
2 Can scale when having more monitoring agents
worker-8
Queue #8
(Reserved) Executes threshold detection and other common monitor-related functions 5 Can scale when having more common monitors
worker-9
Queue #9
(Reserved) Executes advanced detection and intelligent monitoring functions 3 Can scale when having more common advanced detection and intelligent monitors
worker-10
Queue #10
(No Such Service) Executes user-reported event handling functions in TrueWatch 1 Can scale when user-reported events volume is large
worker-11
Queue #11
(No Such Service) Executes Message Desk message sending tasks in TrueWatch 3 Can scale when message sending volume is large
worker-12
Queue #12
(No Such Service) (Reserved) 0 No need for scaling
worker-13
Queue #13
(No Such Service) (Reserved) 0 No need for scaling
worker-14
Queue #14
(No Such Service) Executes immediate response AI-related processing in TrueWatch
e.g., invoking "automatic Pipeline writing" etc.
2 Can scale when needing to support more users writing Pipelines simultaneously
worker-15
Queue #15
(No Such Service) Executes non-immediate response AI-related processing in TrueWatch
e.g., invoking "alarm compression merging" processing etc.
2 Can scale when using AI-aggregated alarms for more monitors
beat Trigger for scheduled tasks ← Same as Left 1 Must not scale, ensure global single replica
mysql Database (No Such Service) - No need for scaling, choose self-hosted or cloud services for higher demands
redis Cache / Function Execution Task Queue (No Such Service) - No need for scaling, choose self-hosted or cloud services for higher demands
Service / Queue Responsibility
Independent Deployment
Responsibility
TrueWatch Attached
Scaling Suggestions
server Web service, providing the following functions:
1. Web Interface
2. API Interface
3. Subscription Maintainer
← Same as Left Generally no need for scaling
server-inner (No Such Service) Web service, specifically for internal cluster API calls Generally no need for scaling
worker-0
Queue #0
System work unit, does not directly handle user code ← Same as Left Generally no need for scaling
worker-1
Queue #1
Executes functions from synchronous API (Old Version: Authorized Link) ← Same as Left Can scale when needing to increase concurrency of synchronous API (Old Version: Authorized Link)
worker-2
Queue #2
Executes functions from scheduled tasks (Old Version: Automatic Trigger Configuration) ← Same as Left Can scale when needing to increase concurrency of scheduled tasks (Old Version: Automatic Trigger Configuration)
worker-3
Queue #3
Executes functions from asynchronous API (Old Version: Batch Processing) ← Same as Left Can scale when needing to increase concurrency of asynchronous API (Old Version: Batch Processing)
worker-4
Queue #4
(Reserved) (Reserved) No need for scaling
worker-5
Queue #5
Debugging code execution
i.e., running functions directly on the Web Interface
← Same as Left Can scale when needing to support more users developing scripts simultaneously
worker-6
Queue #6
Executes functions from connector subscription message processing ← Same as Left Can scale when needing to increase concurrency of connector subscription message processing
worker-7
Queue #7
(Reserved) Executes TrueWatch system business, message sending functions
e.g., via TrueWatch backend admin login, update various caches, release message aggregation pools, Message Desk message sending
Can scale when message sending volume is large
worker-8
Queue #8
(Reserved) Executes threshold detection and other common monitor-related functions Can scale when having more common monitors
worker-9
Queue #9
(Reserved) Executes advanced detection and intelligent monitoring functions Can scale when having more common advanced detection and intelligent monitors
beat Trigger for scheduled tasks ← Same as Left Must not scale, ensure global single replica
mysql Database (No Such Service) No need for scaling, choose self-hosted or cloud services for higher demands
redis Cache / Function Execution Task Queue (No Such Service) No need for scaling, choose self-hosted or cloud services for higher demands
Service Responsibility Scaling Suggestions
server Web service, providing the following functions:
1. Web Interface
2. API Interface
3. Subscription Maintainer
Generally no need for scaling
worker-0
Queue #0
System work unit, does not directly handle user code Generally no need for scaling
worker-1-6
Queues #1, #2, #3, #4, #5, #6
By default, responsible for handling function synchronous calls, such as:
1. Authorized Link Handling
2. Subscription Message Handling
Can scale when needing to increase concurrency of authorized links and subscription messages
worker-7
Queue #7
By default, responsible for debugging code handling (i.e., running functions directly on the Web Interface) Can scale when needing to support more users developing scripts simultaneously
worker-8-9
Queues #8, #9
By default, responsible for handling function asynchronous calls, such as:
1. Automatic Trigger Handling
2. Batch Processing
Can scale when needing to increase concurrency of automatic triggers and batch processing
beat Trigger for scheduled tasks Must not scale, ensure global single replica
mysql Database No need for scaling, choose self-hosted or cloud services for higher demands
redis Cache / Function Execution Task Queue No need for scaling, choose self-hosted or cloud services for higher demands

Example: When needing to enhance the processing capability of scheduled tasks...

From the above, scheduled tasks are located in "Queue #8", "Queue #8" corresponds to "Service worker-8", so scaling "Service worker-8" will suffice.

Estimating Scaling Capacity

Taking the common worker-8 as an example:

worker-8 in the TrueWatch attached version mainly executes monitor tasks. Assuming a single detection task takes T milliseconds, then in 1 minute it can execute 60 × 1,000 ÷ T detections. By default, worker-8 starts 5 processes per Pod.

Thus, the detection capability of a single worker-8 Pod is 5 × (60 × 1,000 ÷ T) monitors.

Formula

Text Only
1
2
A = 5 × (60 × 1,000 ÷ T)
  = 300,000 ÷ T

A: Detection Capability

T: Detection Task Execution Time (milliseconds)

Based on the execution time of each monitor, the table below can be listed:

Single Detection Time Detection Capability per Pod Compared to Baseline
300 1,000 167%
500 600 Baseline
800 375 63%
1,000 300 50%
2,000 150 25%
3,000 100 17%

Conversely, assuming a consistent total number of monitors M, then the required number of Pods can be calculated based on M ÷ (5 × (60 × 1,000 ÷ T)).

Formula

Text Only
1
2
P = M ÷ (300,000 ÷ T)
  = M × T ÷ 300,000

P: Required Number of Pods

M: Monitor Count

T: Detection Task Execution Time (milliseconds)

Based on the number of monitors and the execution time of each, the table below can be listed:

Monitor Count Single Detection Time Required Pod Count Compared to Baseline
1,000 300 1 50%
1,000 500 2 Baseline
1,000 800 3 150%
1,000 1,000 4 200%
1,000 2,000 7 350%
1,000 3,000 10 500%
Monitor Count Single Detection Time Required Pod Count Compared to Baseline
5,000 300 5 56%
5,000 500 9 Baseline
5,000 800 14 156%
5,000 1,000 17 189%
5,000 2,000 34 378%
5,000 3,000 50 556%
Monitor Count Single Detection Time Required Pod Count Compared to Baseline
10,000 300 10 59%
10,000 500 17 Baseline
10,000 800 27 159%
10,000 1,000 34 200%
10,000 2,000 67 394%
10,000 3,000 100 588%

Operation Method

A single-machine deployed DataFlux Func can achieve scaling by modifying the configuration ({Installation Directory}/docker-stack.yaml) and increasing the deploy.replicas for the corresponding service.

Please refer to the official documentation

For complete information about the deploy.replicas option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas

To enhance the processing capability of worker-8, the specific modification part is as follows:

Example is excerpt only

The example shows only the key modification parts; please ensure the configuration is complete during actual operations.

docker-stack.yaml Key Modification Parts
1
2
3
4
5
services:
  worker-8:
    deploy:
      # Start 【2】 work units processing queue 8 simultaneously
      replicas: 2

3. Resource Limitation

Resource limitation needs to be reasonably adjusted according to actual business needs

Please adjust resource limitations reasonably according to actual business needs.

Arbitrary limitation of resources may result in longer task execution times or insufficient memory to complete code execution.

Operation Method

A single-machine deployed DataFlux Func can limit resources by modifying the configuration ({Installation Directory}/docker-stack.yaml) and adding deploy.resources for the corresponding service.

Please refer to the official documentation

For complete information about the deploy.resources option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources

By default, each worker-N replica can occupy up to 5 CPU cores (i.e., there are 5 work processes in each work unit).

To limit the resources used by worker-8, the specific modification part is as follows:

Example is excerpt only

The example shows only the key modification parts; please ensure the configuration is complete during actual operations.

docker-stack.yaml Key Modification Parts
1
2
3
4
5
6
7
services:
  worker-8:
    deploy:
      resources:
        limits:
          cpus  : '2.50' # Limit CPU usage to a maximum of 2.5 cores
          memory: 4G     # Limit memory usage to a maximum of 4 GB

4. Splitting Work Units

In the new versions, all work units have already been split

In the independent deployment Func 3.2.0 and later versions, all non-reserved work units are split by default, and users can enable reserved queues as needed.

In the TrueWatch attached Func 1.77.145 and later versions, all work units are split by default, and users do not need to split them manually.

In special cases, the default merged work units (e.g., worker-1-6) can be split to achieve finer-grained task scheduling, allowing scaling and resource limiting of work units responsible for specific queues.

Assuming that according to business needs, DataFlux Func has higher performance requirements for subscription processing, and hopes that subscription message processing will not interfere with synchronous API (Old Version: Authorized Link) processing, then worker-1-6 can be split into worker-1-5 and worker-6.

Operation Method

A single-machine deployed DataFlux Func can achieve splitting work units by modifying the configuration ({Installation Directory}/docker-stack.yaml), adding, and modifying corresponding services, and specifying the queue numbers in the command.

Specify the queues that the work unit listens to via parameters after ./run-worker-by-queue.sh. The service name itself is mainly used as a label, and it is recommended to keep it consistent with the actual listened queue to avoid confusion.

Example is excerpt only

The example shows only the key modification parts; please ensure the configuration is complete during actual operations.

docker-stack.yaml Key Modification Parts
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
services:
  # Delete the original "worker-1-6" and replace it with the following content

  worker-1-5:
    # Specify the work unit to handle queues 1 ~ 5
    command: ./run-worker-by-queue.sh 1 2 3 4 5

  worker-6:
    # Specify the work unit to handle queue 6
    command: ./run-worker-by-queue.sh 6