Deployment and Maintenance / Architecture, Scaling, and Resource Limits

This article mainly introduces the overall architecture of DataFlux Func and how to scale to improve processing capabilities.

1. Architecture

Internally, the system follows a typical "producer -> consumer" model. Any execution of a Python function goes through the process of "task generation -> enqueue -> dequeue -> execution -> return result."

Any Python function is first wrapped into a "task" and placed into its corresponding "work queue" (numbered starting from #0). It is then executed by the corresponding "worker unit" (numbered starting from worker-0) after being dequeued.

flowchart TB
    USER[User]
    FUNC_SERVER[Func Server Service]
    REDIS_QUEUE_N[Redis Queue #N]
    FUNC_WORKER_N[Func Worker-N Service]
    FUNC_BEAT[Func Beat Service]

    USER --HTTP Request--> FUNC_SERVER

    FUNC_SERVER --Function Execution Task Enqueue--> REDIS_QUEUE_N

    REDIS_QUEUE_N --Function Execution Task Dequeue--> FUNC_WORKER_N

    FUNC_BEAT --"Function Execution Task Enqueue
    (Cron Job)"--> REDIS_QUEUE_N

1.1 Services and Their Purposes

DataFlux Func includes multiple services, each with different responsibilities. The specific services are as follows:

Service	Purpose
server	Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers
worker-{queue number}	Worker unit, used to execute user scripts, including: 1. Func API 2. Sync / Async API (legacy "Auth Link" / "Batch") 3. Cron Job (legacy: automatic trigger configuration) Also handles some system-level background tasks See queue description for details
beat	Cron Job trigger
mysql	Database
redis	Cache / Function execution task queue

1.2 Worker Unit and Queue Listening Relationship

For the service worker-{queue number} (worker unit), each Worker service only listens to specific queues:

Queues and Worker Units Do Not Need to Be One-to-One

Queues and worker units do not have to be one-to-one. For example, the worker unit worker-0 is not limited to listening to queue #0. Each worker unit can listen to any one or more queues.

Additionally, the same queue can be listened to by multiple worker units simultaneously, or not listened to at all (not recommended).

Different Queues for Standalone Func and Data Platform Attached Func

Since most standalone Func deployments are relatively lightweight, to reduce unnecessary resource consumption, the number of worker units in standalone Func is less than the number of queues.

Conversely, the Data Platform Attached Func, which handles heavy tasks such as monitors and message sending modules (Message Desk), has worker units and queues in a one-to-one correspondence, with more numbered worker units and queues than standalone Func.

Version 6.0.4 and LaterVersion 3.2.0 (1.77.145) and LaterVersion 3.1.0 (1.79.155) and Earlier

Worker Unit	Queue Standalone	Queue Data Platform Attached
worker-0	#0, #4, #7, #8, #9	#0
worker-1	#1	#1
worker-2	#2	#2
worker-3	#3	#3
worker-4	-	#4
worker-5	#5	#5
worker-6	#6	#6
worker-7	-	#7
worker-8	-	#8
worker-9	-	#9
worker-10	-	#10
worker-11	-	#11
worker-12	-	#12
worker-13	-	#13
worker-14	-	#14
worker-15	-	#15

Worker Unit	Queue Standalone	Queue Data Platform Attached
worker-0	#0, #4, #7, #8, #9	#0
worker-1	#1	#1
worker-2	#2	#2
worker-3	#3	#3
worker-4	-	#4
worker-5	#5	#5
worker-6	#6	#6
worker-7	-	#7
worker-8	-	#8
worker-9	-	#9

Worker Unit	Queue
worker-0	#0
worker-1-6	#1, #2, #3, #4, #5, #6
worker-7	#7
worker-8-9	#8, #9

2. Services / Queues and Their Responsibilities and Scaling Recommendations

Scaling Requires More Hardware Investment

Scaling requires higher performance from the server, including but not limited to the server itself, database service, Redis, etc.

Generally, scaling DataFlux Func only requires increasing the number of replicas for the corresponding service. Therefore, users should first understand their actual business situation to scale accordingly.

The complete services, queues, their responsibilities, and scaling recommendations are as follows:

Version 6.0.4 and LaterVersion 3.2.0 (1.77.145) and LaterVersion 3.1.0 (1.79.155) and Earlier

Service / Queue	Responsibility Standalone	Responsibility Data Platform Attached	Default Pod Count Data Platform Attached	Scaling Recommendation
server	Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers	← Same as left	1	Generally does not need scaling
server-inner	（No such service）	Web service, specifically for internal API calls within the cluster	1	Generally does not need scaling
worker-0 Queue #0	System worker unit, does not directly participate in user code processing	← Same as left	2	Generally does not need scaling
worker-1 Queue #1	Executes tasks from 1. Func API (Sync execution) 2. Legacy Sync API 3. Legacy Auth Link	← Same as left	1	Scale when needing to increase concurrency for Sync API (legacy: Auth Link)
worker-2 Queue #2	Executes tasks from Cron Job (legacy: automatic trigger configuration)	← Same as left	1	Scale when needing to increase concurrency for Cron Job (legacy: automatic trigger configuration)
worker-3 Queue #3	Executes tasks from 1. Func API (Async execution) 2. Legacy Async API 3. Legacy Batch	← Same as left	1	Scale when needing to increase concurrency for Async API (legacy: Batch)
worker-4 Queue #4	（Reserved）	（Reserved）	0	Does not need scaling
worker-5 Queue #5	Debug code execution i.e., directly running functions in the Web interface	← Same as left	1	Scale when needing to support more users developing scripts simultaneously
worker-6 Queue #6	Executes tasks from Connector subscription message processing	← Same as left	1	Scale when needing to increase concurrency for Connector subscription message processing
worker-7 Queue #7	（Reserved）	Executes function tasks for Data Platform system business e.g., Data Platform backend administrator login, updating various caches, releasing message aggregation pools, etc.	2	Scale when the total number of monitors is large
worker-8 Queue #8	（Reserved）	Executes function tasks for Data Platform threshold detection, metric generation, etc.	5	Scale when the number of ordinary monitors is large
worker-9 Queue #9	（Reserved）	Executes function tasks for Data Platform advanced detection, intelligent monitoring	3	Scale when the number of advanced detection, intelligent monitors is large
worker-10 Queue #10	（No such service）	Executes function tasks for Data Platform receiving user-reported events	1	Scale when the volume of user-reported events is large
worker-11 Queue #11	（No such service）	Executes Message Desk message sending tasks	3	Scale when the volume of message sending is large
worker-12 Queue #12	（No such service）	（Reserved）	0	Does not need scaling
worker-13 Queue #13	（No such service）	（Reserved）	0	Does not need scaling
worker-14 Queue #14	（No such service）	Executes AI-related processing that requires immediate user response e.g., calling "Auto Pipeline Writing"	2	Scale when needing to support more users writing Piplelines simultaneously
worker-15 Queue #15	（No such service）	Executes AI-related processing that does not require immediate user response e.g., calling "Alert Compression and Merging" processing	2	Scale when the number of monitors using AI alert aggregation is large
beat	Cron Job trigger	← Same as left	1	Do not scale, ensure global single instance
mysql	Database	（No such service）	-	Does not need scaling, choose self-built or cloud service for higher demands
redis	Cache / Function execution task queue	（No such service）	-	Does not need scaling, choose self-built or cloud service for higher demands

Service / Queue	Responsibility Standalone	Responsibility Data Platform Attached	Scaling Recommendation
server	Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers	← Same as left	Generally does not need scaling
server-inner	（No such service）	Web service, specifically for internal API calls within the cluster	Generally does not need scaling
worker-0 Queue #0	System worker unit, does not directly participate in user code processing	← Same as left	Generally does not need scaling
worker-1 Queue #1	Executes tasks from Sync API (legacy: Auth Link)	← Same as left	Scale when needing to increase concurrency for Sync API (legacy: Auth Link)
worker-2 Queue #2	Executes tasks from Cron Job (legacy: automatic trigger configuration)	← Same as left	Scale when needing to increase concurrency for Cron Job (legacy: automatic trigger configuration)
worker-3 Queue #3	Executes tasks from Async API (legacy: Batch)	← Same as left	Scale when needing to increase concurrency for Async API (legacy: Batch)
worker-4 Queue #4	（Reserved）	（Reserved）	Does not need scaling
worker-5 Queue #5	Debug code execution i.e., directly running functions in the Web interface	← Same as left	Scale when needing to support more users developing scripts simultaneously
worker-6 Queue #6	Executes tasks from Connector subscription message processing	← Same as left	Scale when needing to increase concurrency for Connector subscription message processing
worker-7 Queue #7	（Reserved）	Executes function tasks for Data Platform system business, message sending e.g., Data Platform backend administrator login, updating various caches, releasing message aggregation pools, Message Desk message sending	Scale when the volume of message sending is large
worker-8 Queue #8	（Reserved）	Executes function tasks for Data Platform threshold detection, etc.	Scale when the number of ordinary monitors is large
worker-9 Queue #9	（Reserved）	Executes function tasks for Data Platform advanced detection, intelligent monitoring	Scale when the number of advanced detection, intelligent monitors is large
beat	Cron Job trigger	← Same as left	Do not scale, ensure global single instance
mysql	Database	（No such service）	Does not need scaling, choose self-built or cloud service for higher demands
redis	Cache / Function execution task queue	（No such service）	Does not need scaling, choose self-built or cloud service for higher demands

Service	Responsibility	Scaling Recommendation
server	Web service, providing the following functions: 1. Web interface 2. API interface 3. Maintains subscribers	Generally does not need scaling
worker-0 Queue #0	System worker unit, does not directly participate in user code processing	Generally does not need scaling
worker-1-6 Queue #1, #2, #3, #4, #5, #6	By default, responsible for function synchronous call processing, such as: 1. Auth Link processing 2. Subscription message processing	Scale when needing to increase concurrency for Auth Link, subscription message processing
worker-7 Queue #7	By default, responsible for debug code processing (i.e., directly running functions in the Web interface)	Scale when needing to support more users developing scripts simultaneously
worker-8-9 Queue #8, #9	By default, responsible for function asynchronous call processing, such as: 1. Automatic trigger processing 2. Batch processing	Scale when needing to increase concurrency for automatic trigger, batch processing
beat	Cron Job trigger	Do not scale, ensure global single instance
mysql	Database	Does not need scaling, choose self-built or cloud service for higher demands
redis	Cache / Function execution task queue	Does not need scaling, choose self-built or cloud service for higher demands

Example: When needing to enhance Cron Job processing capability...

As mentioned above, Cron Jobs are located in "Queue #8," which corresponds to "Service worker-8." Therefore, scaling "Service worker-8" is sufficient.

Estimating Scaling Amount

Taking the common worker-8 as an example:

worker-8 in the Data Platform Attached version is mainly responsible for executing monitor tasks. Assuming a single detection task takes T milliseconds, then 1 minute can execute 60 × 1,000 ÷ T detections. By default, worker-8 each Pod starts 5 processes.

That is, the detection capability of a single worker-8 Pod is 5 × (60 × 1,000 ÷ T) monitors.

Formula

Text Only
A = 5 × (60 × 1,000 ÷ T)
  = 300,000 ÷ T

A: Detection capability

T: Detection task execution time (milliseconds)

Based on the different execution times of monitors, the following table can be listed:

Single Detection Time	Single Pod Detection Capability	Compared to Baseline
300	1,000	167%
500	600	Baseline
800	375	63%
1,000	300	50%
2,000	150	25%
3,000	100	17%

Conversely, assuming the total number of monitors is M, then the required number of Pods can be calculated based on M ÷ (5 × (60 × 1,000 ÷ T)).

Formula

Text Only
P = M ÷ (300,000 ÷ T)
  = M × T ÷ 300,000

P: Required number of Pods

M: Number of monitors

T: Detection task execution time (milliseconds)

Based on the number of monitors and their execution times, the following table can be listed:

1000 Monitors5000 Monitors10,000 Monitors

Number of Monitors	Single Detection Time	Required Number of Pods	Compared to Baseline
1,000	300	1	50%
1,000	500	2	Baseline
1,000	800	3	150%
1,000	1,000	4	200%
1,000	2,000	7	350%
1,000	3,000	10	500%

Number of Monitors	Single Detection Time	Required Number of Pods	Compared to Baseline
5,000	300	5	56%
5,000	500	9	Baseline
5,000	800	14	156%
5,000	1,000	17	189%
5,000	2,000	34	378%
5,000	3,000	50	556%

Number of Monitors	Single Detection Time	Required Number of Pods	Compared to Baseline
10,000	300	10	59%
10,000	500	17	Baseline
10,000	800	27	159%
10,000	1,000	34	200%
10,000	2,000	67	394%
10,000	3,000	100	588%

Operation Method

For single-machine deployed DataFlux Func, scaling can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml) and increasing the deploy.replicas for the corresponding service.

Refer to Official Documentation

For complete information on the deploy.replicas option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas

To enhance the processing capability of worker-8, the specific modification part is as follows:

Example is only a selection

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

Key Modification Part of docker-stack.yaml
services:
  worker-8:
    deploy:
      # Start 2 worker units processing queue 8 simultaneously
      replicas: 2

3. Resource Limits

Resource limits should be adjusted reasonably based on actual business

Please adjust resource limits reasonably based on actual business.

Blindly limiting resources may lead to longer task execution times or insufficient memory to complete code execution.

Operation Method

For single-machine deployed DataFlux Func, resource limits can be implemented by modifying the configuration ({installation directory}/docker-stack.yaml) and adding deploy.resources for the corresponding service.

Refer to Official Documentation

For complete information on the deploy.resources option, refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources

By default, each worker-N replica can occupy up to 5 CPU cores (i.e., each worker unit has 5 worker processes).

To limit the resources occupied by worker-8, the specific modification part is as follows:

Example is only a selection

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

Key Modification Part of docker-stack.yaml
services:
  worker-8:
    deploy:
      resources:
        limits:
          cpus  : '2.50' # Limit CPU usage to a maximum of 2.5 cores
          memory: 4G     # Limit memory usage to a maximum of 4 GB

4. Splitting Worker Units

All worker units have been split in the new version

In standalone Func version 3.2.0 and later, all non-reserved worker units have been split by default. Users can enable reserved queues as needed.

In Data Platform Attached Func version 1.77.145 and later, all worker units have been split by default, and users no longer need to split them manually.

In special cases, the default merged worker units (e.g., worker-1-6) can be split to achieve finer-grained task scheduling, enabling scaling and resource limiting for worker units responsible for specific queues.

Assuming that based on business requirements, DataFlux Func has higher performance requirements for subscription processing and hopes that subscription message processing does not interfere with Sync API (legacy: Auth Link) processing, then worker-1-6 can be split into worker-1-5 and worker-6.

Operation Method

For single-machine deployed DataFlux Func, splitting worker units can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml), adding or modifying the corresponding service, and changing the queue number specified in command.

Specifying the queue listened to by the worker unit is achieved through the parameters after ./run-worker-by-queue.sh. The service name itself is mainly used for labeling. It is recommended to be consistent with the actual listening queue to avoid confusion.

Example is only a selection

The example only shows the key modification part. Please ensure the configuration is complete during actual operation.

Key Modification Part of docker-stack.yaml
services:
  # Delete the original "worker-1-6" and replace it with the following content

  worker-1-5:
    # Specify the worker unit to process work queues 1 ~ 5
    command: ./run-worker-by-queue.sh 1 2 3 4 5

  worker-6:
    # Specify the worker unit to process work queue 6
    command: ./run-worker-by-queue.sh 6