Deployment and Maintenance / Architecture, Scaling, and Resource Limits

This article mainly introduces the overall architecture of DataFlux Func and how to scale to improve processing capacity.

1. Architecture

Internally, the system follows a typical "producer -> consumer" model. Any execution of a Python function goes through the process of "task generation -> enqueue -> dequeue -> execution -> result return".

Any Python function is first packaged into a "task" and enters its designated "work queue" (numbered starting from #0). Subsequently, the corresponding "work unit" (numbered starting from worker-0) retrieves the task from the queue for execution.

flowchart TB
    USER[User]
    FUNC_SERVER[Func Server Service]
    REDIS_QUEUE_N[Redis Queue #N]
    FUNC_WORKER_N[Func Worker-N Service]
    FUNC_BEAT[Func Beat Service]

    USER --HTTP Request--> FUNC_SERVER

    FUNC_SERVER --Function Execution Task Enqueues--> REDIS_QUEUE_N

    REDIS_QUEUE_N --Function Execution Task Dequeues--> FUNC_WORKER_N

    FUNC_BEAT --"Function Execution Task Enqueues
    (Scheduled Tasks)"--> REDIS_QUEUE_N

1.1 Services and Their Purposes

DataFlux Func consists of multiple services, each with different responsibilities. The specific services are as follows:

Service	Purpose
server	Web service, providing the following functionalities: 1. Web interface 2. API interfaces 3. Maintaining subscribers
worker-{queue number}	Work unit, used to execute user scripts, including: 1. Function APIs 2. Function APIs 3. Scheduled tasks Also handles some system-level background tasks See queue description for details
beat	Trigger for scheduled tasks
mysql	Database
redis	Cache / Function execution task queue

1.2 Work Unit and Queue Listening Relationships

For the service worker-{queue number} (work unit), each Worker service only listens to specific queues:

Queues and work units do not have to correspond one-to-one

Queues and work units are not required to have a one-to-one correspondence. For example, work unit worker-0 is not limited to listening only to tasks from queue #0. Each work unit can listen to any one or more queues.

Furthermore, the same queue can be listened to by multiple work units simultaneously, or not listened to at all (not recommended).

Different queues for standalone Func and data platform attached Func

Since most standalone Func deployments are for relatively light usage, to reduce unnecessary resource consumption, standalone Func has fewer work units than queues.

Conversely, data platform attached Func, due to handling heavy business like monitors and message sending modules (Message Desk), has a one-to-one correspondence between work units and queues, and has more numbered work units and queues than standalone Func.

Version 6.0.4 and laterVersion 3.2.0 (1.77.145) and laterVersion 3.1.0 (1.79.155) and earlier

Work Unit	Queues Standalone Deployment	Queues Data Platform Attached
worker-0	#0, #4, #7, #8, #9	#0
worker-1	#1	#1
worker-2	#2	#2
worker-3	#3	#3
worker-4	-	#4
worker-5	#5	#5
worker-6	#6	#6
worker-7	-	#7
worker-8	-	#8
worker-9	-	#9
worker-10	-	#10
worker-11	-	#11
worker-12	-	#12
worker-13	-	#13
worker-14	-	#14
worker-15	-	#15

Work Unit	Queues Standalone Deployment	Queues Data Platform Attached
worker-0	#0, #4, #7, #8, #9	#0
worker-1	#1	#1
worker-2	#2	#2
worker-3	#3	#3
worker-4	-	#4
worker-5	#5	#5
worker-6	#6	#6
worker-7	-	#7
worker-8	-	#8
worker-9	-	#9

Work Unit	Queues
worker-0	#0
worker-1-6	#1, #2, #3, #4, #5, #6
worker-7	#7
worker-8-9	#8, #9

2. Services / Queues, Their Responsibilities, and Scaling Recommendations

Scaling requires more hardware investment

Scaling requires higher performance from the server, including but not limited to the server itself, database service, Redis, etc.

Generally, scaling DataFlux Func only requires increasing the replica count of the corresponding services. Therefore, users should first understand their actual business situation to scale targeted.

The complete services, queues, their responsibilities, and scaling recommendations are as follows:

Version 6.0.4 and laterVersion 3.2.0 (1.77.145) and laterVersion 3.1.0 (1.79.155) and earlier

Service / Queue	Responsibility Standalone Deployment	Responsibility Data Platform Attached	Default Pod Count Data Platform Attached	Scaling Recommendation
server	Web service, providing the following functionalities: 1. Web interface 2. API interfaces 3. Maintaining subscribers	← Same as left	1	Generally no need to scale
server-inner	(No such service)	Web service, dedicated for internal cluster API calls	1	Generally no need to scale
worker-0 Queue #0	System work unit, not directly involved in user code processing	← Same as left	2	Generally no need to scale
worker-1 Queue #1	Executes function tasks from synchronously executed function APIs	← Same as left	1	Scale when needing to increase concurrency of synchronously executed function APIs
worker-2 Queue #2	Executes function tasks from scheduled tasks	← Same as left	1	Scale when needing to increase concurrency of scheduled tasks
worker-3 Queue #3	Executes function tasks from asynchronously executed function APIs	← Same as left	1	Scale when needing to increase concurrency of asynchronously executed function APIs
worker-4 Queue #4	(Reserved)	(Reserved)	0	No need to scale
worker-5 Queue #5	Debug code execution i.e., directly running functions in the Web interface	← Same as left	1	Scale when needing to support more users developing scripts simultaneously
worker-6 Queue #6	Executes function tasks from connector subscription message processing	← Same as left	1	Scale when needing to increase concurrency of connector subscription message processing
worker-7 Queue #7	(Reserved)	Executes data platform system business function tasks e.g., logging in via data platform backend admin, updating various caches, releasing message aggregation pools, etc.	2	Scale when the total number of monitors is high
worker-8 Queue #8	(Reserved)	Executes data platform threshold detection and other ordinary monitor, metric generation related function tasks	5	Scale when the number of ordinary monitors is high
worker-9 Queue #9	(Reserved)	Executes data platform advanced detection, intelligent monitor function tasks	3	Scale when the number of advanced detection and intelligent monitors is high
worker-10 Queue #10	(No such service)	Executes data platform user-reported event receiving function tasks	1	Scale when the volume of user-reported events is large
worker-11 Queue #11	(No such service)	Executes Message Desk message sending tasks	3	Scale when the volume of message sending is large
worker-12 Queue #12	(No such service)	(Reserved)	0	No need to scale
worker-13 Queue #13	(No such service)	(Reserved)	0	No need to scale
worker-14 Queue #14	(No such service)	Executes AI-related processing requiring immediate user response e.g., calling "Auto-write Pipeline" etc.	2	Scale when needing to support more users writing Pipelines simultaneously
worker-15 Queue #15	(No such service)	Executes AI-related processing not requiring immediate user response e.g., calling "Alert compression and merging" processing, etc.	2	Scale when there are many monitors using AI for alert aggregation
beat	Trigger for scheduled tasks	← Same as left	1	Must not scale, ensure global single replica
mysql	Database	(No such service)	-	No need to scale, choose self-built or cloud service for higher demands
redis	Cache / Function execution task queue	(No such service)	-	No need to scale, choose self-built or cloud service for higher demands

Service / Queue	Responsibility Standalone Deployment	Responsibility Data Platform Attached	Scaling Recommendation
server	Web service, providing the following functionalities: 1. Web interface 2. API interfaces 3. Maintaining subscribers	← Same as left	Generally no need to scale
server-inner	(No such service)	Web service, dedicated for internal cluster API calls	Generally no need to scale
worker-0 Queue #0	System work unit, not directly involved in user code processing	← Same as left	Generally no need to scale
worker-1 Queue #1	Executes function tasks from synchronously executed function APIs	← Same as left	Scale when needing to increase concurrency of synchronously executed function APIs
worker-2 Queue #2	Executes function tasks from scheduled tasks	← Same as left	Scale when needing to increase concurrency of scheduled tasks
worker-3 Queue #3	Executes function tasks from asynchronously executed function APIs	← Same as left	Scale when needing to increase concurrency of asynchronously executed function APIs
worker-4 Queue #4	(Reserved)	(Reserved)	No need to scale
worker-5 Queue #5	Debug code execution i.e., directly running functions in the Web interface	← Same as left	Scale when needing to support more users developing scripts simultaneously
worker-6 Queue #6	Executes function tasks from connector subscription message processing	← Same as left	Scale when needing to increase concurrency of connector subscription message processing
worker-7 Queue #7	(Reserved)	Executes data platform system business and message sending function tasks e.g., logging in via data platform backend admin, updating various caches, releasing message aggregation pools, Message Desk message sending	Scale when the volume of message sending is large
worker-8 Queue #8	(Reserved)	Executes data platform threshold detection and other ordinary monitor related function tasks	Scale when the number of ordinary monitors is high
worker-9 Queue #9	(Reserved)	Executes data platform advanced detection, intelligent monitor function tasks	Scale when the number of advanced detection and intelligent monitors is high
beat	Trigger for scheduled tasks	← Same as left	Must not scale, ensure global single replica
mysql	Database	(No such service)	No need to scale, choose self-built or cloud service for higher demands
redis	Cache / Function execution task queue	(No such service)	No need to scale, choose self-built or cloud service for higher demands

Service	Responsibility	Scaling Recommendation
server	Web service, providing the following functionalities: 1. Web interface 2. API interfaces 3. Maintaining subscribers	Generally no need to scale
worker-0 Queue #0	System work unit, not directly involved in user code processing	Generally no need to scale
worker-1-6 Queue #1, #2, #3, #4, #5, #6	By default, responsible for synchronous function call processing, such as: 1. Synchronously executed function APIs 2. Subscription message processing	Scale when needing to increase concurrency of synchronously executed function APIs and subscription message processing
worker-7 Queue #7	By default, responsible for debug code processing (i.e., directly running functions in the Web interface)	Scale when needing to support more users developing scripts simultaneously
worker-8-9 Queue #8, #9	By default, responsible for asynchronous function call processing, such as: 1. Asynchronously executed function APIs 2. Scheduled tasks	Scale when needing to increase concurrency of scheduled tasks and asynchronously executed function API processing
beat	Trigger for scheduled tasks	Must not scale, ensure global single replica
mysql	Database	No need to scale, choose self-built or cloud service for higher demands
redis	Cache / Function execution task queue	No need to scale, choose self-built or cloud service for higher demands

Example: When needing to enhance scheduled task processing capacity...

From the above, scheduled tasks are located in "Queue #8", and "Queue #8" corresponds to "Service worker-8". Therefore, scaling "Service worker-8" is sufficient.

Estimating Scaling Amount

Taking the common worker-8 as an example:

worker-8 in the data platform attached version is mainly responsible for executing monitor tasks. Assuming one detection task takes T milliseconds, then 1 minute can execute 60 × 1,000 ÷ T detections. By default, each worker-8 Pod starts 5 processes.

That is, the detection capacity of a single worker-8 Pod is 5 × (60 × 1,000 ÷ T) monitors.

Formula

Text Only
A = 5 × (60 × 1,000 ÷ T)
  = 300,000 ÷ T

A: Detection capacity

T: Detection task execution time (milliseconds)

Based on different monitor execution durations, the following table can be listed:

Single Detection Time (ms)	Single Pod Detection Capacity	Compared to Baseline
300	1,000	167%
500	600	baseline
800	375	63%
1,000	300	50%
2,000	150	25%
3,000	100	17%

Conversely, assuming the total number of monitors is M, then the required number of Pods can be derived from M ÷ (5 × (60 × 1,000 ÷ T)).

Formula

Text Only
P = M ÷ (300,000 ÷ T)
  = M × T ÷ 300,000

P: Required number of Pods

M: Number of monitors

T: Detection task execution time (milliseconds)

Based on different monitor counts and execution durations, the following tables can be listed:

1000 monitors5000 monitors10,000 monitors

Monitor Count	Single Detection Time (ms)	Required Pods	Compared to Baseline
1,000	300	1	50%
1,000	500	2	baseline
1,000	800	3	150%
1,000	1,000	4	200%
1,000	2,000	7	350%
1,000	3,000	10	500%

Monitor Count	Single Detection Time (ms)	Required Pods	Compared to Baseline
5,000	300	5	56%
5,000	500	9	baseline
5,000	800	14	156%
5,000	1,000	17	189%
5,000	2,000	34	378%
5,000	3,000	50	556%

Monitor Count	Single Detection Time (ms)	Required Pods	Compared to Baseline
10,000	300	10	59%
10,000	500	17	baseline
10,000	800	27	159%
10,000	1,000	34	200%
10,000	2,000	67	394%
10,000	3,000	100	588%

Operation Method

For single-machine deployed DataFlux Func, scaling can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml) and increasing the deploy.replicas for the corresponding service.

Please refer to the official documentation

For complete information on the deploy.replicas option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / replicas

Taking improving worker-8 processing capacity as an example, the specific modified part is as follows:

Example is only an excerpt

The example only shows the key modified parts. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modified Part
services:
  worker-8:
    deploy:
      # Start 2 work units processing queue 8 simultaneously
      replicas: 2

3. Resource Limits

Resource limits should be adjusted reasonably based on actual business

Please adjust resource limits reasonably based on actual business.

Blindly limiting resources may lead to longer task execution times or insufficient memory preventing code execution.

Operation Method

For single-machine deployed DataFlux Func, resource limits can be implemented by modifying the configuration ({installation directory}/docker-stack.yaml) and adding deploy.resources for the corresponding service.

Please refer to the official documentation

For complete information on the deploy.resources option, please refer to the Docker official documentation: Docker Documentation / Compose file deploy reference / resources

By default, each worker-N replica can occupy up to 5 CPU cores (i.e., each work unit has 5 worker processes).

Taking limiting resources for worker-8 as an example, the specific modified part is as follows:

Example is only an excerpt

The example only shows the key modified parts. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modified Part
services:
  worker-8:
    deploy:
      resources:
        limits:
          cpus  : '2.50' # Limit CPU usage to at most 2.5 cores
          memory: 4G     # Limit memory usage to at most 4 GB

4. Splitting Work Units

All work units are already split in the new version

In standalone Func version 3.2.0 and later, all non-reserved work units are split by default. Users can enable reserved queues as needed.

In data platform attached Func version 1.77.145 and later, all work units are split by default, and users no longer need to split them manually.

Under special circumstances, default merged work units (e.g., worker-1-6) can be split to achieve finer-grained task scheduling, enabling scaling and resource limiting for work units responsible for specific queues.

Assuming based on business requirements, DataFlux Func has high performance requirements for subscription processing and hopes that subscription message processing does not interfere with synchronous function API processing, then worker-1-6 can be split into worker-1-5 and worker-6.

Operation Method

For single-machine deployed DataFlux Func, splitting work units can be achieved by modifying the configuration ({installation directory}/docker-stack.yaml), adding or modifying corresponding services, and changing the queue numbers specified in the command.

Specifying the queues a work unit listens to is achieved through the parameters after ./run-worker-by-queue.sh. The service name itself is mainly for labeling purposes; it is recommended to keep it consistent with the actual listening queues to avoid confusion.

Example is only an excerpt

The example only shows the key modified parts. Please ensure the configuration is complete during actual operation.

docker-stack.yaml Key Modified Part
services:
  # Delete the original "worker-1-6" and replace it with the following content

  worker-1-5:
    # Specify the work unit to process work queues 1 ~ 5
    command: ./run-worker-by-queue.sh 1 2 3 4 5

  worker-6:
    # Specify the work unit to process work queue 6
    command: ./run-worker-by-queue.sh 6