Skip to content

Interpretation of "Monitor" Logs

2024-03-04

In Guance and TrueWatch, the term "Monitor" actually refers to scheduled tasks in DataFlux Func (Automata). Viewing "Monitor" logs means checking the logs for these "scheduled tasks (old version: automatic trigger configurations)."

For details on how to view "scheduled task (old version: automatic trigger configuration)" logs, please refer to Deployment and Maintenance / System Metrics and Task Records / Task Records

1. Basic Format

Each line of the "Monitor" log follows this format:

Time Difference from previous log entry
(milliseconds)
Total time since task start to current log entry
(milliseconds)
Module Content
[03-25 11:04:05] [+1ms] [64ms] [Function] Function call: guance__api_impl.custom_check
Specific Example
1
[03-25 11:04:05] [+1ms] [64ms] [Function] Function call: guance__api_impl.custom_check

Each line of the "Monitor" log follows this format:

Time Difference from previous log entry
(milliseconds)
Module Content
[2024-03-06 20:58:05.088] [+1ms] [Function] Function call: guance__api_impl.custom_check
Specific Example
1
[2024-03-06 20:58:05.088] [+1ms] [Function] Function call: guance__api_impl.custom_check

2. Log Reduction Options

Since the logic of monitors has become increasingly complex, resulting in longer and harder-to-read logs, by default, Guance and TrueWatch output concise logs after the 2024-03-27 iteration while meeting basic troubleshooting needs.

If you wish to output full logs, you can create specific environment variables to enable detailed logging for Guance and TrueWatch:

Environment Variable Data Type Value
ENABLE_DETAILED_GUANCE_LOG Boolean Enabled: true
Disabled: false

3. Fixed Format Log Blocks

Certain processes use a fixed format to output log blocks.

3.1 [KODO] DQL Query Logs

When executing DQL queries within monitors, the Kodo component's API must be called. Each DQL query records a log, as shown below:

Specific Example
1
2
3
[03-25 11:04:05] [+0ms] [67ms] [KODO] Executing DQL query -> Time range: 2024-03-25 10:55:00 ~ 2024-03-25 11:01:00, up to 5 pages
[03-25 11:04:05] [+0ms] [67ms] [KODO] --> Page 1 (soffset = 0 ~ 500)
[03-25 11:04:05] [+0ms] [68ms] [KODO] Calling KODO API -> POST /v1/query

The logs record:

  • DQL query time range
  • Specific API method and path used
Specific Example
1
2
3
4
5
6
7
8
9
[2024-03-06 20:58:05.092] [+0ms] [KODO] Executing DQL query
[2024-03-06 20:58:05.092] [+0ms] [KODO] --> Maximum page flipping: 20 pages
[2024-03-06 20:58:05.093] [+0ms] [KODO] --> Time range: 2024-03-06 20:51:00 ~ 2024-03-06 20:55:00
[2024-03-06 20:58:05.093] [+0ms] [KODO] --> Page 1 (soffset = 0 ~ 500)
[2024-03-06 20:58:05.093] [+0ms] [KODO] Calling KODO API
[2024-03-06 20:58:05.093] [+0ms] [KODO] >> Request: POST /v1/query
[2024-03-06 20:58:05.093] [+0ms] [KODO] >>>> Body: {"echo_explain":false,"queries":[ ... ],"workspace_uuid":"wksp_xxxxx"}
[2024-03-06 20:58:05.093] [+0ms] [KODO] >> First request
[2024-03-06 20:58:05.111] [+18ms] [KODO] >> Response: `200 OK` => `{"content":[ ... ]}`

The logs record:

  • DQL query time range
  • Specific API method, path, request body
  • Original response from the Kodo API

3.2 [Studio] Studio Inner API Call Logs for Guance, TrueWatch

To obtain business data from Guance or TrueWatch monitors, calls to the Studio Inner API are required. Each Inner API call records a log, as shown below:

Specific Example
1
[03-25 11:04:05] [+0ms] [172ms] [Studio] Calling Studio Inner API -> GET /api/v1/inner/alert_opt/get

The logs record:

  • Specific API method, path, and request body
Specific Example
1
2
3
4
5
[2024-03-06 20:58:05.169] [+0ms] [Studio] Calling Studio Inner API
[2024-03-06 20:58:05.169] [+0ms] [Studio] >> Request: GET /api/v1/inner/alert_opt/get
[2024-03-06 20:58:05.169] [+0ms] [Studio] >>>> Query: {"checkerUUID":"rul_xxxxx","workspaceUUID":"wksp_xxxxx"}`
[2024-03-06 20:58:05.169] [+0ms] [Studio] >> First request
[2024-03-06 20:58:05.177] [+7ms] [Studio] >> Response: `200 OK` => `{"code":200,"content":{"data":{ ... }},"errorCode":"","message":"","success":true,"traceId":"TRACE-XXXXX"}`

The logs record:

  • Specific API method, path, and request body
  • Original response from the Guance, TrueWatch Studio Inner API

4. Complete Example Analysis

Log content may change after each iteration

Due to new features being added or modifications made to address issues found in previous logs,

the detailed specifics of the logs may vary slightly after each iteration.

Below is an annotated complete example

In the following logs:

Lines starting with # are explanatory notes, not part of the original logs.

Additional blank lines have been inserted for readability; there are no blank lines in the original logs.

Other content is directly from the logs.

```text title="Log Interpretation" hl_lines="1 5 8 23 27 43 56 65 69 77 85 99 103 112 117-119 136-137 149 154 163-164 168"

Counting 'task scheduling' in Guance and TrueWatch

[04-01 03:43:00] [+100ms] [100ms] [Usage Quota] Data query range is 15 minutes, not exceeding 15 minutes, no additional measurement needed [04-01 03:43:00] [+0ms] [100ms] [Usage Quota] workspace_uuid parameter exists, value is wksp_xxxxx, need to measure once

Current workspace information

[04-01 03:43:00] [+1ms] [101ms] [Studio] Workspace information (from cache): {"declaration":{"b":["asfawfgajfasfafgafwba","asfgahjfaf"],"business":"aaa","organization":"64fe7b4062f74d0007b46676"},"isJobDisabled":false,"isSMSDisabled":false,"language":"en","name":"[Doris] Development Testing Together_","token":"tkn_xxxxx"}

ID and parameter list of the function executed in this task

[04-01 03:43:00] [+1ms] [103ms] [Function] Function call: guance__api_impl.custom_check [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: checker="custom_metric" [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: kwargs={"version":"v2"} [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: targets=[{"alias":"Result","dql":"M::fake_data_for_test:(avg(field_int)) {tag= 'fake-data-1' } BYtag","queryType":"dql","range":900}] [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: channels=["chan_xxxxx"] [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: extra_data={"type":"simpleCheck"} [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: checker_opt={"id":"rul_xxxxx","infoEvent":false,"label":["xxxxx_test"],"message":"Content: xxxxx-Monitor (Single) {df_dimension_tags}\n1: {{ (Result * 100) | to_int }}\n2: {{ Result | to_int * 100 }}","name":"Title: xxxxx-Monitor (M, Single) {df_dimension_tags}","noDataAction":"noData","noDataInterval":120,"noDataMessage":"","noDataTitle":"","recoverInterval":120,"rules":[{"conditionLogic":"and","conditions":[{"alias":"Result","operands":["0"],"operator":">="}],"status":"critical"}],"title":"Title: xxxxx-Monitor (M, Single) {df_dimension_tags}"} [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: monitor_opt={"id":"monitor_xxxxx","name":"default"} [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: workspace_uuid="wksp_xxxxx" [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: workspace_token="tkn_xxxxx" [04-01 03:43:00] [+0ms] [103ms] [Function] --> Parameter: disable_check_end_time=false [04-01 03:43:00] [+0ms] [104ms] [Function] --> Parameter: at_accounts=null [04-01 03:43:00] [+0ms] [104ms] [Function] --> Parameter: at_accounts_nodata=null

Monitor frequency configuration

[04-01 03:43:00] [+2ms] [106ms] [Monitor] Calculating detection interval based on actual Crontab (/1 * * * ) [04-01 03:43:00] [+0ms] [106ms] [Monitor] --> Detection interval: 60 seconds

Query recent data within the user-configured no-data range and two previous time ranges (2 DQL queries)

[04-01 03:43:00] [+0ms] [106ms] [Monitor] ----------------- Loading Gap / New Object Information ------------------ [04-01 03:43:00] [+0ms] [106ms] [Monitor] No-data range configured: 120 seconds [04-01 03:43:00] [+0ms] [106ms] [Monitor] Querying last round data: T - (Detection Frequency 60 seconds) - (No-data Range 120 seconds) - (3x Redundant No-data Range 360 seconds) ~ T - (No-data Range 120 seconds) [04-01 03:43:00] [+0ms] [106ms] [KODO] Executing DQL query -> Time range: 2024-04-01 03:33:00 ~ 2024-04-01 03:40:00, up to 20 pages [04-01 03:43:00] [+0ms] [106ms] [KODO] --> Page 1 (soffset = 0 ~ 500) [04-01 03:43:00] [+0ms] [106ms] [KODO] Calling KODO API -> POST /v1/query [04-01 03:43:00] [+14ms] [121ms] [Studio] Metric unit (from cache): wksp_xxxxx/fake_data_for_test => {"_DFF_CACHE_EXPIRE_TIME":1711914240} [04-01 03:43:00] [+0ms] [121ms] [KODO] --> Unpacking DQL result data: {"metric_units":{"field_int":null},"query_time_range":[1711913580000,1711914000000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-31T19:39:50Z",56.08730158730159]]}]} [04-01 03:43:00] [+0ms] [121ms] [Monitor] Querying this round data: T - (No-data Range 120 seconds) ~ T [04-01 03:43:00] [+0ms] [122ms] [KODO] Executing DQL query -> Time range: 2024-04-01 03:40:00 ~ 2024-04-01 03:42:00, up to 20 pages [04-01 03:43:00] [+0ms] [122ms] [KODO] --> Page 1 (soffset = 0 ~ 500) [04-01 03:43:00] [+0ms] [122ms] [KODO] Calling KODO API -> POST /v1/query [04-01 03:43:00] [+18ms] [140ms] [Studio] Metric unit (from cache): wksp_xxxxx/fake_data_for_test => {"_DFF_CACHE_EXPIRE_TIME":1711914240} [04-01 03:43:00] [+0ms] [140ms] [KODO] --> Unpacking DQL result data: {"metric_units":{"field_int":null},"query_time_range":[1711914000000,1711914120000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-31T19:41:50Z",49.69444444444444]]}]}

Based on the queried data from the two time ranges, determine if there is a data gap or re-reporting, and generate corresponding [No-data Event] or [No-data Recovery Event]

[04-01 03:43:00] [+0ms] [140ms] [Monitor] ----------------- Gap / New Object Load Results ------------------ [04-01 03:43:00] [+0ms] [140ms] [Monitor] --> Last round existing objects: {"tag":"fake-data-1"} [04-01 03:43:00] [+0ms] [141ms] [Monitor] --> This round existing objects: {"tag":"fake-data-1"} [04-01 03:43:00] [+0ms] [141ms] [Monitor] ----> Data gap objects (Last round exists -> This round does not exist): None [04-01 03:43:00] [+0ms] [141ms] [Monitor] --------------------- Determine Data Gap --------------------- [04-01 03:43:00] [+0ms] [141ms] [Monitor] --> No data gap objects [04-01 03:43:00] [+0ms] [141ms] [Monitor] ------------------- Determine Data Recovery from Gap -------------------- [04-01 03:43:00] [+0ms] [141ms] [Monitor] --> Object: {"tag":"fake-data-1"} [04-01 03:43:00] [+4ms] [146ms] [Monitor] Fault cycle information (fault_info) for object {"tag":"fake-data-1"}: null [04-01 03:43:00] [+0ms] [146ms] [Monitor] ----> No last no-data event [04-01 03:43:00] [+0ms] [146ms] [Monitor] ----> No active no-data event, no need to generate a no-data recovery event

Determine whether to generate [Alert Event] based on user-configured detection rules

[04-01 03:43:00] [+0ms] [146ms] [Monitor] -------------------- Execute Data Value Detection -------------------- [04-01 03:43:00] [+0ms] [146ms] [Monitor] Query pending detection data [04-01 03:43:00] [+0ms] [146ms] [KODO] Executing DQL query -> Time range: 2024-04-01 03:27:00 ~ 2024-04-01 03:42:00, up to 20 pages [04-01 03:43:00] [+0ms] [146ms] [KODO] --> Page 1 (soffset = 0 ~ 500) [04-01 03:43:00] [+0ms] [146ms] [KODO] Calling KODO API -> POST /v1/query [04-01 03:43:00] [+21ms] [168ms] [Studio] Metric unit (from cache): wksp_xxxxx/fake_data_for_test => {"_DFF_CACHE_EXPIRE_TIME":1711914240} [04-01 03:43:00] [+0ms] [169ms] [KODO] --> Unpacking DQL result data: {"metric_units":{"field_int":null},"query_time_range":[1711913220000,1711914120000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-31T19:41:50Z",54.370370370370374]]}]}

Iterate through all detection objects sequentially and execute detections

[04-01 03:43:00] [+0ms] [169ms] [Monitor] [Detection Object 1/1] {"tag":"fake-data-1"} [04-01 03:43:00] [+2ms] [171ms] [General Threshold Detection] Pending detection data: {'Result': [54.370370370370374]}

Iterate through all configured rules to determine which detection rule matches

[04-01 03:43:00] [+0ms] [171ms] [General Threshold Detection] [Threshold Rule 1/1] critical: Result >= ['0'] [04-01 03:43:00] [+0ms] [171ms] [Condition Check] [Condition 1/1] IF Result (ANY[54.370370370370374]) >= ["0"] [04-01 03:43:00] [+0ms] [171ms] [Condition Check] --> Intermediate result is True, condition relation is AND, continue [04-01 03:43:00] [+0ms] [171ms] [General Threshold Detection] --> Matched successfully, end check [04-01 03:43:00] [+0ms] [172ms] [General Threshold Detection] Threshold rule match result: {"check_data":{"Result":54.370370370370374},"conditions":[{"alias":"Result","operands":["0"],"operator":">="}],"status":"critical"} [04-01 03:43:00] [+0ms] [172ms] [Monitor] --> Detection object: {"tag":"fake-data-1"}: has reached fault conditions

Call Guance, TrueWatch Studio to get alert strategies configured for this monitor

[04-01 03:43:00] [+2ms] [175ms] [Studio] Alert information cache disabled [04-01 03:43:00] [+0ms] [175ms] [Studio] Calling Studio Inner API -> GET /api/v1/inner/alert_opt/get [04-01 03:43:00] [+20ms] [195ms] [Studio] Alert configuration (from API): rul_xxxxx => {"_DFF_CACHE_EXPIRE_TIME":1711914360,"alertPolicies":[{"aggClusterFields":[],"aggFields":[],"aggInterval":0,"aggLabels":[],"id":"altpl_xxxxx","minInterval":900,"name":"xxxxx-Alert Policy1","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"Custom Notification Config1","targets":[{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-Alert Policy1-Rule1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-WeChat","status":"warning","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-Alert Policy1-Rule1","status":"warning","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}],"upgradeTargets":[{"duration":180,"name":"xxxxx-Alert Policy1-critical-3 Minutes Upgrade","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"duration":600,"name":"xxxxx-Alert Policy1-critical-10 Minutes Upgrade","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"duration":600,"name":"xxxxx-Alert Policy1-critical-10 Minutes Upgrade-2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]},{"targets":[{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-Alert Policy1-Rule2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-WeChat","status":"warning","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-Alert Policy1-Rule2","status":"warning","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}],"upgradeTargets":[{"duration":180,"name":"xxxxx-Alert Policy1-critical-3 Minutes Upgrade","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"duration":600,"name":"xxxxx-Alert Policy1-critical-10 Minutes Upgrade","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"duration":600,"name":"xxxxx-Alert Policy1-critical-10 Minutes Upgrade-2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]}],"workspaceUUID":"wksp_xxxxx"}],"silent":[]} [04-01 03:43:00] [+1ms] [196ms] [Studio] Constant configuration (from cache): envName => {"_DFF_CACHE_EXPIRE_TIME":1711914334,"value":"Test Environment"} [04-01 03:43:00] [+2ms] [199ms] [Studio] Constant configuration (from cache): UsePublicAlertLink => {"_DFF_CACHE_EXPIRE_TIME":1711914215,"value":false} [04-01 03:43:00] [+1ms] [200ms] [Studio] Constant configuration (from cache): consoleBaseURL => {"_DFF_CACHE_EXPIRE_TIME":1711914215,"value":"http://testing-ft2x.dataflux.cn"}

Render event titles/content based on user-configured alert templates and event data

[04-01 03:43:00] [+1ms] [202ms] [Text Renderer] Rendering template: Content: xxxxx-Monitor (Single) {df_dimension_tags} 1: {{ (Result * 100) | to_int }} 2: {{ Result | to_int * 100 }} [04-01 03:43:00] [+1ms] [204ms] [Text Renderer] --> Rendering successful. Output: Content: xxxxx-Monitor (Single) {"tag":"fake-data-1"} 1: 5437 2: 5400 [04-01 03:43:00] [+3ms] [207ms] [Text Renderer] Rendering template: Title: xxxxx-Monitor (M, Single) {df_dimension_tags} [04-01 03:43:00] [+1ms] [208ms] [Text Renderer] --> Rendering successful. Output: Title: xxxxx-Monitor (M, Single) {"tag":"fake-data-1"}

Iterate through events/mute rules to determine if each event should be muted

[04-01 03:43:00] [+0ms] [209ms] [Event Alarm] [Event 1/1] [04-01 03:43:00] [+0ms] [209ms] [Event Alarm] No mute rules, no need to mute

Iterate through alert policies to determine which alert policy/rule the event matches

[04-01 03:43:00] [+0ms] [209ms] [Event Alarm] [Alert Policy 1/1] xxxxx-Alert Policy1 (altpl_xxxxx) [04-01 03:43:00] [+0ms] [209ms] [Event Alarm] --------------------- Send Event Alert --------------------- [04-01 03:43:00] [+0ms] [209ms] [Event Alarm] [Alert Rule 1/2] Looping by Crontab 00 09 * * *, each loop lasts 39600 seconds [04-01 03:43:00] [+0ms] [209ms] [Event Alarm] --> Repeat intervals configured but not within repeat interval range [04-01 03:43:00] [+0ms] [209ms] [Event Alarm] --> Does not meet repeat interval alert, skip [04-01 03:43:00] [+0ms] [210ms] [Event Alarm] [Alert Rule 2/2] Remaining other intervals [04-01 03:43:00] [+0ms] [210ms] [Event Alarm] Successfully matched alert rule, need to alert

Read event status duration

[04-01 03:43:00] [+0ms] [210ms] [Event Alarm] -------------------- Event Status Duration -------------------- [04-01 03:43:00] [+1ms] [211ms] [Event Alarm] --> Current event status is critical, clear non-critical status durations [04-01 03:43:00] [+1ms] [212ms] [Event Alarm] --> Start time of current event status critical has been recorded, start time is 2024-03-26 20:01:00

Generate general alert notifications

Iterate through notification targets to check if they are in a mute period

(All alert notification targets under the same alert policy/rule will align their mute periods)

[04-01 03:43:00] [+1ms] [213ms] [Event Alarm] --------------------- General Alert Notifications --------------------- [04-01 03:43:00] [+0ms] [213ms] [Event Alarm] [Alert Notification Target 1/4] dingTalkRobot/xxxxx-Alert Policy1-Rule2 (critical) [04-01 03:43:00] [+0ms] [214ms] [Event Alarm] Matching event status: critical <=> critical [04-01 03:43:00] [+1ms] [215ms] [Event Alarm] --> Last alert at 2024-04-01 03:36:00, mute for 900 seconds. Mute period ends at 2024-04-01 03:51:00 (540 seconds later) [04-01 03:43:00] [+0ms] [215ms] [Event Alarm] ----> Currently in mute period, skip [04-01 03:43:00] [+0ms] [215ms] [Event Alarm] [Alert Notification Target 2/4] wechatRobot/xxxxx-WeChat (critical) [04-01 03:43:00] [+0ms] [215ms] [Event Alarm] Matching event status: critical <=> critical [04-01 03:43:00] [+1ms] [216ms] [Event Alarm] --> Last alert at 2024-04-01 03:36:00, mute for 900 seconds. Mute period ends at 2024-04-01 03:51:00 (540 seconds later) [04-01 03:43:00```text

[0ms] [216ms] [Event Alarm] ----> Currently in mute period, skip [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] [Alert Notification Target 3/4] dingTalkRobot/xxxxx-Alert Policy1-Rule2 (warning) [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] Matching event status: critical <=> warning [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] --> Not met, skip [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] [Alert Notification Target 4/4] wechatRobot/xxxxx-WeChat (warning) [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] Matching event status: critical <=> warning [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] --> Not met, skip

Generate escalation advanced notifications

Iterate through escalation notification targets to check if the escalation time limit has been reached

[04-01 03:43:00] [+0ms] [217ms] [Event Alarm] --------------------- Escalation Alert Notifications --------------------- [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] [Escalation Alert Notification Target 1/3] dingTalkRobot/xxxxx-Alert Policy1-critical-3 Minutes Upgrade (180/critical) [04-01 03:43:00] [+0ms] [217ms] [Event Alarm] Matching event status: critical <=> critical [04-01 03:43:00] [+1ms] [218ms] [Event Alarm] --> Escalation alert already sent at 2024-03-26 20:04:00, no need for alert escalation [04-01 03:43:00] [+0ms] [218ms] [Event Alarm] [Escalation Alert Notification Target 2/3] dingTalkRobot/xxxxx-Alert Policy1-critical-10 Minutes Upgrade (600/critical) [04-01 03:43:00] [+0ms] [218ms] [Event Alarm] Matching event status: critical <=> critical [04-01 03:43:00] [+1ms] [220ms] [Event Alarm] --> Escalation alert already sent at 2024-03-26 20:11:00, no need for alert escalation [04-01 03:43:00] [+0ms] [220ms] [Event Alarm] [Escalation Alert Notification Target 3/3] dingTalkRobot/xxxxx-Alert Policy1-critical-10 Minutes Upgrade-2 (600/critical) [04-01 03:43:00] [+0ms] [220ms] [Event Alarm] Matching event status: critical <=> critical [04-01 03:43:00] [+1ms] [221ms] [Event Alarm] --> Escalation alert already sent at 2024-03-26 20:11:00, no need for alert escalation

Cache events generated for use by the next monitor task

[04-01 03:43:00] [+2ms] [223ms] [Internal DataWay] Cache events [04-01 03:43:00] [+0ms] [223ms] [Internal DataWay] Cache fault information [04-01 03:43:00] [+0ms] [224ms] [Internal DataWay] --> Create cache: key=rul_xxxxx-check, field={"tag":"fake-data-1"}

Write events into Guance, TrueWatch

[04-01 03:43:00] [+2ms] [226ms] [Internal DataWay] Write events [04-01 03:43:00] [+0ms] [226ms] [Internal DataWay] --> [Event 1/1] Title: xxxxx-Monitor (M, Single) {"tag":"fake-data-1"} (event-xxxxx) [04-01 03:43:00] [+1ms] [228ms] [Internal DataWay] Line protocol write data -> POST /v1/write/keyevent, workspace Token: tkn_xxxxx [04-01 03:43:00] [+0ms] [228ms] [Internal DataWay] --> First/1 data example: {"fields":{"df_alert_policy_ids":["altpl_xxxxx"],"df_alert_policy_names":["xxxxx-Alert Policy1"],"df_at_accounts":"[]","df_at_accounts_nodata":"[]","df_channels":"[\"chan_xxxxx\"]","df_check_range_end":1711914120,"df_check_range_start":1711913220,"df_date_range":900,"df_dimension_tags":"{\"tag\":\"fake-data-1\"}","df_event_reason":"Meets the conditions for recognizing faults in the monitor, generating a fault event","df_fault_duration":459840,"df_fault_start_time":1711454280,"df_issue_duration":459840,"df_issue_start_time":1711454280,"df_matched_alert_policy_rules":["xxxxx-Alert Policy1 / -"],"df_message":"Content: xxxxx-Monitor (Single) {\"tag\":\"fake-data-1\"} \n1: 5437 \n2: 5400","df_meta":"Omitted, see corresponding event data for detailed content","df_monitor_checker_name":"Title: xxxxx-Monitor (M, Single) {df_dimension_tags}","df_monitor_checker_value":"54.370370370370374","df_monitor_name":"xxxxx-Alert Policy1","df_title":"Title: xxxxx-Monitor (M, Single) {\"tag\":\"fake-data-1\"}","df_workspace_declaration":"{\"b\":[\"asfawfgajfasfafgafwba\",\"asfgahjfaf\"],\"business\":\"aaa\",\"organization\":\"64fe7b4062f74d0007b46676\"}"},"measurement":"keyevent","tags":{"df_crontab_exec_mode":"crontab","df_event_id":"event-xxxxx","df_fault_id":"event-xxxxx","df_fault_status":"fault","df_label":"[\"xxxxx_test\"]","df_language":"en","df_monitor_checker":"custom_metric","df_monitor_checker_event_ref":"xxxxx","df_monitor_checker_id":"rul_xxxxx","df_monitor_checker_ref":"xxxxx","df_monitor_checker_sub":"check","df_monitor_checker_type":"monitor","df_monitor_id":"altpl_xxxxx","df_monitor_type":"custom","df_site_name":"Test Environment","df_source":"monitor","df_status":"critical","df_sub_status":"critical","df_workspace_name":"[Doris] Development Testing Together_","df_workspace_uuid":"wksp_xxxxx","tag":"fake-data-1"},"timestamp":1711914120} [04-01 03:43:00] [+14ms] [243ms] [Internal DataWay] --> Response: [200 OK] "" [04-01 03:43:00] [+0ms] [243ms] [Studio] Buffer events that need to notify Studio [04-01 03:43:00] [+6ms] [249ms] [Studio] --> Event: Title: xxxxx-Monitor (M, Single) {"tag":"fake-data-1"} (wksp_xxxxx/event-xxxxx)

Notify Guance, TrueWatch Studio with generated events as per user configuration for tracking

(The monitor only notifies here; specific incident handling is implemented by Guance, TrueWatch Studio)

[04-01 03:43:00] [+0ms] [249ms] [Studio] Buffer events that need to notify Studio [04-01 03:43:00] [+3ms] [252ms] [Studio] --> Event: Title: Yiling-Zhou-Monitor (M, Single) {"tag":"fake-data-1"} (wksp_xxxxx/event-xxxxx)

Number of events generated during this detection

[04-01 03:43:00] [+1ms] [253ms] This detection generates 1 monitor event

Log Interpretation
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# Counting 'task scheduling' in Guance and TrueWatch
[2024-03-06 20:58:05.084] [+0ms] [Usage Quota] Data query range is 1 minute, not exceeding 15 minutes, no additional measurement needed
[2024-03-06 20:58:05.084] [+0ms] [Usage Quota] `workspace_uuid` parameter exists, value is wksp_xxxxx, need to measure once

# Current workspace information
[2024-03-06 20:58:05.086] [+1ms] [Studio] Workspace information (from cache): {"declaration":{"test":["value1","value2"],"test2":"value3","test3":"value4"},"isJobDisabled":false,"isSMSDisabled":false,"language":"en","name":"[Doris] Development Testing Together_","token":"tkn_xxxxx"}

# ID and parameter list of the function executed in this task
[2024-03-06 20:58:05.088] [+1ms] [Function] Function call: guance__api_impl.custom_check
[2024-03-06 20:58:05.088] [+0ms] [Function] --> Parameter: `checker`=`"custom_metric"`
[2024-03-06 20:58:05.088] [+0ms] [Function] --> Parameter: `kwargs`=`{"version":"v2"}`
[2024-03-06 20:58:05.088] [+0ms] [Function] --> Parameter: `targets`=`[{"alias":"Result","dql":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","queryType":"dql","range":60}]`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `channels`=`["chan_xxxxx"]`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `extra_data`=`{"type":"simpleCheck"}`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `checker_opt`=`{"id":"rul_xxxxx","infoEvent":false,"label":["xxxxx_test"],"message":"Content: xxxxx-Monitor (Single) {df_dimension_tags}\nSecond line\nThird line","name":"Title: xxxxx-Monitor (Single) {df_dimension_tags}","noDataAction":"noData","noDataInterval":120,"noDataMessage":"","noDataTitle":"","recoverInterval":120,"rules":[{"conditionLogic":"and","conditions":[{"alias":"Result","operands":["0"],"operator":">="}],"status":"critical"}],"title":"Title: xxxxx-Monitor (Single) {df_dimension_tags}"}`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `monitor_opt`=`{"id":"monitor_xxxxx","name":"default"}`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `workspace_uuid`=`"wksp_xxxxx"`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `workspace_token`=`"tkn_xxxxx"`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `disable_check_end_time`=`false`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `at_accounts`=`null`
[2024-03-06 20:58:05.089] [+0ms] [Function] --> Parameter: `at_accounts_nodata`=`null`

# Monitor frequency configuration
[2024-03-06 20:58:05.092] [+2ms] [Monitor] Calculating detection interval based on actual Crontab (*/1 * * * *)
[2024-03-06 20:58:05.092] [+0ms] [Monitor] --> This trigger time: 2024-03-06 20:57:00
[2024-03-06 20:58:05.092] [+0ms] [Monitor] --> Last trigger time: 2024-03-06 20:56:00
[2024-03-06 20:58:05.092] [+0ms] [Monitor] --> Detection interval: 60 seconds

# Query recent data within the user-configured no-data range and two previous time ranges (2 DQL queries)
[2024-03-06 20:58:05.092] [+0ms] [Monitor] ----------------- Loading Gap / New Object Information ------------------
[2024-03-06 20:58:05.092] [+0ms] [Monitor] No-data range configured: 120 seconds
[2024-03-06 20:58:05.092] [+0ms] [Monitor] Query last round data
[2024-03-06 20:58:05.092] [+0ms] [KODO] Executing DQL query
[2024-03-06 20:58:05.092] [+0ms] [KODO] --> Maximum pages: 20 pages
[2024-03-06 20:58:05.093] [+0ms] [KODO] --> Time range: 2024-03-06 20:51:00 ~ 2024-03-06 20:55:00
[2024-03-06 20:58:05.093] [+0ms] [KODO] --> Page 1 (soffset = 0 ~ 500)
[2024-03-06 20:58:05.093] [+0ms] [KODO] Calling KODO API
[2024-03-06 20:58:05.093] [+0ms] [KODO] >> Request: POST /v1/query
[2024-03-06 20:58:05.093] [+0ms] [KODO] >>>> Body: {"echo_explain":false,"queries":[{"mask_visible":true,"qtype":"dql","query":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","slimit":500,"soffset":0,"time_range":[1709729460000,1709729700000]}],"workspace_uuid":"wksp_xxxxx"}
[2024-03-06 20:58:05.093] [+0ms] [KODO] >> First request
[2024-03-06 20:58:05.111] [+18ms] [KODO] >> Response: `200 OK` => `{"content":[{"async_id":"","complete":false,"cost":"4.172766ms","group_by":["tag"],"index_name":"","index_names":"","index_store_type":"","interval":0,"is_running":false,"next_cursor_time":-1,"points":null,"query_parse":{"fields":{"avg(field_int)":"field_int"},"funcs":{"avg(field_int)":["avg"]},"namespace":"metric","sources":{"fake_data_for_test":"exact"}},"query_type":"guancedb","sample":1,"scan_completed":false,"scan_index":"","series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[[1709729699000,56.4206008583691]]}],"window":0}]}`
[2024-03-06 20:58:05.113] [+1ms] [Studio] Calling Studio Inner API
[2024-03-06 20:58:05.113] [+0ms] [Studio] >> Request: GET /api/v1/inner/metrics_units
[2024-03-06 20:58:05.113] [+0ms] [Studio] >>>> Query: {"metrics":"fake_data_for_test","workspaceUUID":"wksp_xxxxx"}`
[2024-03-06 20:58:05.113] [+0ms] [Studio] >> First request
[2024-03-06 20:58:05.123] [+9ms] [Studio] >> Response: `200 OK` => `{"code":200,"content":{},"errorCode":"","message":"","success":true,"traceId":"TRACE-3645139A-BC9D-48E7-A17F-3CC93C51E650"}`
[2024-03-06 20:58:05.124] [+1ms] [Studio] Metric unit (from API): wksp_xxxxx/fake_data_for_test => `{"_DFF_CACHE_EXPIRE_TIME":1709730065}`
[2024-03-06 20:58:05.125] [+0ms] [KODO] --> Unpacking DQL result data: {"metric_units":{"field_int":null},"query_time_range":[1709729460000,1709729700000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-06T12:54:59Z",56.4206008583691]]}]}
[2024-03-06 20:58:05.125] [+0ms] [Monitor] Query this round data
[2024-03-06 20:58:05.125] [+0ms] [KODO] Executing DQL query
[2024-03-06 20:58:05.125] [+0ms] [KODO] --> Maximum pages: 20 pages
[2024-03-06 20:58:05.125] [+0ms] [KODO] --> Time range: 2024-03-06 20:55:00 ~ 2024-03-06 20:57:00
[2024-03-06 20:58:05.125] [+0ms] [KODO] --> Page 1 (soffset = 0 ~ 500)
[2024-03-06 20:58:05.125] [+0ms] [KODO] Calling KODO API
[2024-03-06 20:58:05.125] [+0ms] [KODO] >> Request: POST /v1/query
[2024-03-06 20:58:05.125] [+0ms] [KODO] >>>> Body: {"echo_explain":false,"queries":[{"mask_visible":true,"qtype":"dql","query":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","slimit":500,"soffset":0,"time_range":[1709729700000,1709729820000]}],"workspace_uuid":"wksp_xxxxx"}
[2024-03-06 20:58:05.125] [+0ms] [KODO] >> First request
[2024-03-06 20:58:05.144] [+18ms] [KODO] >> Response: `200 OK` => `{"content":[{"async_id":"","complete":false,"cost":"4.405232ms","group_by":["tag"],"index_name":"","index_names":"","index_store_type":"","interval":0,"is_running":false,"next_cursor_time":-1,"points":null,"query_parse":{"fields":{"avg(field_int)":"field_int"},"funcs":{"avg(field_int)":["avg"]},"namespace":"metric","sources":{"fake_data_for_test":"exact"}},"query_type":"guancedb","sample":1,"scan_completed":false,"scan_index":"","series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[[1709729819000,53.91111111111111]]}],"window":0}]}`
[2024-03-06 20:58:05.146] [+1ms] [Studio] Metric unit (from cache): wksp_xxxxx/fake_data_for_test => {"_DFF_CACHE_EXPIRE_TIME":1709730065}
[2024-03-06 20:58:05.146] [+0ms] [KODO] --> Unpacking DQL result data: {"metric_units":{"field_int":null},"query_time_range":[1709729700000,1709729820000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-06T12:56:59Z",53.91111111111111]]}]}

# Based on the queried data from the two time ranges, determine if there is a data gap or re-reporting, and generate corresponding [No-data Event] or [No-data Recovery Event]
[2024-03-06 20:58:05.146] [+0ms] [Monitor] ----------------- Gap / New Object Load Results ------------------
[2024-03-06 20:58:05.147] [+0ms] [Monitor] --> Last round existing objects: {"tag":"fake-data-1"}
[2024-03-06 20:58:05.147] [+0ms] [Monitor] --> This round existing objects: {"tag":"fake-data-1"}
[2024-03-06 20:58:05.147] [+0ms] [Monitor] ----> Data gap objects (Last round exists -> This round does not exist): None
[2024-03-06 20:58:05.147] [+0ms] [Monitor] --------------------- Determine Data Gap ---------------------
[2024-03-06 20:58:05.147] [+0ms] [Monitor] --> No data gap objects
[2024-03-06 20:58:05.147] [+0ms] [Monitor] ------------------- Determine Data Recovery from Gap --------------------
[2024-03-06 20:58:05.147] [+0ms] [Monitor] --> Object: {"tag":"fake-data-1"}
[2024-03-06 20:58:05.151] [+4ms] [Monitor] Fault cycle information (fault_info) for object {"tag":"fake-data-1"}: {"date":1709729160,"faultDuration":720,"faultId":"event-xxxxx","faultStartTime":1709728440,"status":"ok"}
[2024-03-06 20:58:05.151] [+0ms] [Monitor] ----> Last no-data event was a no-data recovery, no active no-data event exists
[2024-03-06 20:58:05.151] [+0ms] [Monitor] ----> No active no-data event, no need to generate a no-data recovery event

# Determine whether to generate [Alert Event] based on user-configured detection rules
[2024-03-06 20:58:05.151] [+0ms] [Monitor] -------------------- Execute Data Value Detection --------------------
[2024-03-06 20:58:05.151] [+0ms] [Monitor] Query pending detection data
[2024-03-06 20:58:05.151] [+0ms] [KODO] Executing DQL query
[2024-03-06 20:58:05.151] [+0ms] [KODO] --> Maximum pages: 20 pages
[2024-03-06 20:58:05.152] [+0ms] [KODO] --> Time range: 2024-03-06 20:56:00 ~ 2024-03-06 20:57:00
[2024-03-06 20:58:05.152] [+0ms] [KODO] --> Page 1 (soffset = 0 ~ 500)
[2024-03-06 20:58:05.152] [+0ms] [KODO] Calling KODO API
[2024-03-06 20:58:05.152] [+0ms] [KODO] >> Request: POST /v1/query
[2024-03-06 20:58:05.152] [+0ms] [KODO] >>>> Body: {"echo_explain":false,"queries":[{"mask_visible":true,"qtype":"dql","query":"M::`fake_data_for_test`:(avg(`field_int`)) { `tag` = 'fake-data-1' } BY `tag`","slimit":500,"soffset":0,"time_range":[1709729760000,1709729820000]}],"workspace_uuid":"wksp_xxxxx"}
[2024-03-06 20:58:05.152] [+0ms] [KODO] >> First request
[2024-03-06 20:58:05.162] [+9ms] [KODO] >> Response: `200 OK` => `{"content":[{"async_id":"","complete":false,"cost":"3.042875ms","group_by":["tag"],"index_name":"","index_names":"","index_store_type":"","interval":0,"is_running":false,"next_cursor_time":-1,"points":null,"query_parse":{"fields":{"avg(field_int)":"field_int"},"funcs":{"avg(field_int)":["avg"]},"namespace":"metric","sources":{"fake_data_for_test":"exact"}},"query_type":"guancedb","sample":1,"scan_completed":false,"scan_index":"","series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[[1709729819000,54.90555555555556]]}],"window":0}]}`
[2024-03-06 20:58:05.163] [+1ms] [Studio] Metric unit (from cache): wksp_xxxxx/fake_data_for_test => {"_DFF_CACHE_EXPIRE_TIME":1709730065}
[2024-03-06 20:58:05.163] [+0ms] [KODO] --> Unpacking DQL result data: {"metric_units":{"field_int":null},"query_time_range":[1709729760000,1709729820000],"series":[{"columns":["time","avg(field_int)"],"name":"fake_data_for_test","tags":{"tag":"fake-data-1"},"values":[["2024-03-06T12:56:59Z",54.90555555555556]]}]}

# Sequentially iterate through all detection objects and execute detections
[2024-03-06 20:58:05.164] [+0ms] [Monitor] Detection objects: Total 1
[2024-03-06 20:58:05.164] [+0ms] [Monitor] [Detection Object 1/1] {"tag":"fake-data-1"}
[2024-03-06 20:58:05.165] [+1ms] [General Threshold Detection] Pending detection data: {'Result': [54.90555555555556]}

# Sequentially iterate through all configured rules to determine which detection rule matches
[2024-03-06 20:58:05.166] [+0ms] [General Threshold Detection] Threshold rules: Total 1
[2024-03-06 20:58:05.166] [+0ms] [General Threshold Detection] [Threshold Rule 1/1] critical: Result >= ['0']
[2024-03-06 20:58:05.166] [+0ms] [Condition Check] [Condition 1/1] IF Result (ANY[54.90555555555556]) >= ["0"]
[2024-03-06 20:58:05.166] [+0ms] [Condition Check] --> Intermediate result is True, condition relation is AND, continue
[2024-03-06 20:58:05.166] [+0ms] [General Threshold Detection] --> Matched successfully, end check
[2024-03-06 20:58:05.166] [+0ms] [General Threshold Detection] Threshold rule match result: {"check_data":{"Result":54.90555555555556},"conditions":[{"alias":"Result","operands":["0"],"operator":">="}],"status":"critical"}
[2024-03-06 20:58:05.166] [+0ms] [Monitor] --> Detection object: {"tag":"fake-data-1"}: has reached fault conditions

# Call Guance, TrueWatch Studio to get alert strategies configured for this monitor
[2024-03-06 20:58:05.169] [+2ms] [Studio] Alert information cache disabled
[2024-03-06 20:58:05.169] [+0ms] [Studio] Calling Studio Inner API
[2024-03-06 20:58:05.169] [+0ms] [Studio] >> Request: GET /api/v1/inner/alert_opt/get
[2024-03-06 20:58:05.169] [+0ms] [Studio] >>>> Query: {"checkerUUID":"rul_xxxxx","workspaceUUID":"wksp_xxxxx"}`
[2024-03-06 20:58:05.169] [+0ms] [Studio] >> First request
[2024-03-06 20:58:05.177] [+7ms] [Studio] >> Response: `200 OK` => `{"code":200,"content":{"data":{"alertPolicies":[{"aggClusterFields":[],"aggFields":[],"aggInterval":0,"aggLabels":[],"id":8222,"minInterval":900,"name":"xxxxx-Alert Policy1","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"Custom Notification Config1","targets":[{"name":"xxxxx-Alert Policy2-Rule1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]},{"targets":[{"name":"xxxxx-Alert Policy2-Rule2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]}],"status":0,"uuid":"altpl_xxxxx","workspaceUUID":"wksp_xxxxx"}],"silent":[]}},"errorCode":"","message":"","success":true,"traceId":"TRACE-XXXXX"}`
[2024-03-06 20:58:05.178] [+1ms] [Studio] Alert configuration (from API): rul_xxxxx => `{"_DFF_CACHE_EXPIRE_TIME":1709730065,"alertPolicies":[{"aggClusterFields":[],"aggFields":[],"aggInterval":0,"aggLabels":[],"id":8222,"minInterval":900,"name":"xxxxx-Alert Policy1","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"Custom Notification Config1","targets":[{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-Alert Policy1-Rule1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]},{"targets":[{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"},{"name":"xxxxx-Alert Policy1-Rule2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"}]}],"status":0,"uuid":"altpl_xxxxx","workspaceUUID":"wksp_xxxxx"},{"aggClusterFields":["df_title"],"aggFields":["CLUSTER"],"aggInterval":60,"aggLabels":[],"id":8223,"minInterval":900,"name":"xxxxx-Alert Policy2","ruleTimezone":"Asia/Shanghai","rules":[{"crontab":"00 09 * * *","crontabDuration":39600,"name":"Custom Notification Config1","targets":[{"name":"xxxxx-Alert Policy2-Rule1","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]},{"targets":[{"name":"xxxxx-Alert Policy2-Rule2","status":"critical","type":"dingTalkRobot","webhook":"https://oapi.dingtalk.com/robot/send?access_token=xxxxx"},{"name":"xxxxx-WeChat","status":"critical","type":"wechatRobot","webhook":"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx"}]}],"status":0,"uuid":"altpl_xxxxx","workspaceUUID":"wksp_xxxxx"}],"silent":[]}`
[2024-03-06 20:58:05.180] [+1ms] [Studio] Constant configuration (from cache): envName => {"_DFF_CACHE_EXPIRE_TIME":1709730060,"value":"Test Environment"}
[2024-03-06 20:58:05.183] [+2ms] [Studio] Constant configuration (from cache): UsePublicAlertLink => {"_DFF_CACHE_EXPIRE_TIME":1709730060,"value":false}
[2024-03-06 20:58:05.184] [+1ms] [Studio] Constant configuration (from cache): consoleBaseURL => {"_DFF_CACHE_EXPIRE_TIME":1709730060,"value":"http://testing.domain.com"}

# Render event titles/content based on user-configured alert templates and event data
[2024-03-06 20:58:05.186] [+1ms] [Text Renderer] Rendering template:
Content: xxxxx-Monitor (Single) {df_dimension_tags}
Second line
Third line
[2024-03-06 20:58:05.187] [+1ms] [Text Renderer] --> Rendering successful. Output:
Content: xxxxx-Monitor (Single) {"tag":"fake-data-1"}
Second line
Third line
[2024-03-06 20:58:05.191] [+4ms] [Text Renderer] Rendering template:
Title: xxxxx-Monitor (Single) {df_dimension_tags}
[2024-03-06 20:58:05.192] [+0ms] [Text Renderer] --> Rendering successful. Output:
Title: xxxxx-Monitor (Single) {"tag":"fake-data-1"}

# Sequentially iterate through events/mute rules to determine if each event should be muted
[2024-03-06 20:58:05.192] [+0ms] [Event Alarm] [Event 1/1] <critical event@monitor:{"tag":"fake-data-1"}: Title: xxxxx-Monitor (Single) {"tag":"fake-data-1"}>
[2024-03-06 20:58:05.192] [+0ms] [Event Alarm] No mute rules, no need to mute

# Sequentially iterate through alert policies to determine which alert policy/rule the event matches
[2024-03-06 20:58:05.192] [+0ms] [Event Alarm] Alert policies: Total 2
[2024-03-06 20:58:05.192] [+0ms] [Event Alarm] [Alert Policy 1/2] xxxxx-Alert Policy1 (8222)
[2024-03-06 20:58:05.192] [+0ms] [Event Alarm] --------------------- Send Event Alert ---------------------
[2024-03-06 20:58:05.192] [+0ms] [Event Alarm] Alert rules: Total 2
[2024-03-06 20:58:05.192] [+0ms] [Event Alarm] [Alert Rule 1/2] Looping by Crontab `00 09 * * *`, each loop lasts 39600 seconds
[2024-03-06 20:58:05.193] [+0ms] [Event Alarm] --> Repeat intervals configured but not within repeat interval range
[2024-03-06 20:58:05.193] [+0ms] [Event Alarm] --> Does not meet repeat interval alert, skip
[2024-03-06 20:58:05.193] [+0ms] [Event Alarm] [Alert Rule 2/2] Remaining other intervals
[2024-03-06 20:58:05.193] [+0ms] [Event Alarm] Successfully matched alert rule, need to alert

# Sequentially iterate through notification targets to check if they are in a mute period
# (All alert notification targets under the same alert policy/rule will align their mute periods)
[2024-03-06 20:58:05.193] [+0ms] [Event Alarm] Alert notification targets: Total 2
[2024-03-06 20:58:05.193] [+0ms] [Event Alarm] [Alert Notification Target 1/2] dingTalkRobot/xxxxx-Alert Policy1-Rule2 (critical)
[2024-03-06 20:58:05.193] [+0ms] [Event Alarm] Check event status match action: `critical` => critical
[2024-03-06 20:58:05.195] [+1ms] [Event Alarm] --> Last alert at 2024-03-06 20:46:00, mute for 900 seconds. Mute period ends at 2024-03-06 21:01:00 (240 seconds later)
[2024-03-06 20:58:05.195] [+0ms] [Event Alarm] ----> Currently in mute period, skip
[2024-03-06 20:58:05.195] [+0ms] [Event Alarm] [Alert Notification Target 2/2] wechatRobot/xxxxx-WeChat (critical)
[2024-03-06 20:58:05.195] [+0ms] [Event Alarm] Check event status match action: `critical` => critical
[2024-03-06 20:58:05.197] [+1ms] [Event Alarm] --> Last alert at 2024-03-06 20:46:00, mute for 900 seconds. Mute period ends at 2024-03-06 21:01:00 (240 seconds later)
[2024-03-06 20:58:05.197] [+0ms] [Event Alarm] ----> Currently in mute period, skip
[2024-03-06 20:58:05.197] [+0ms] [Event Alarm] [Alert Policy 2/2] xxxxx-Alert Policy2 (8223)
[2024-03-06 20:58:05.197] [+0ms] [Event Alarm] --------------------- Send Event Alert ---------------------
[2024-03-06 20:58:05.197] [+0ms] [Event Alarm] Alert rules: Total 2
[2024-03-06 20:58:05.197] [+0ms] [Event Alarm] [Alert Rule 1/2] Looping by Crontab `00 09 * * *`, each loop lasts 39600 seconds
[2024-03-06 20:58:05.197] [+0ms] [Event Alarm] --> Repeat intervals configured but not within repeat interval range
[2024-03-06 20:58:05.197] [+0ms] [Event Alarm] --> Does not meet repeat interval alert, skip
[2024-03-06 20:58:05.198] [+0ms] [Event Alarm] [Alert Rule 2/2] Remaining other intervals
[2024-03-06 20:58:05.198] [+0ms] [Event Alarm] Successfully matched alert rule, need to alert
[2024-03-06 20:58:05.198] [+0ms] [Event Alarm] Alert notification targets: Total 2
[2024-03-06 20:58:05.198] [+0ms] [Event Alarm] [Alert Notification Target 1/2] dingTalkRobot/xxxxx-Alert Policy2-Rule2 (critical)
[2024-03-06 20:58:05.198] [+0ms] [Event Alarm] Check event status match action: `critical` => critical
[2024-03-06 20:58:05.199] [+1ms] [Event Alarm] --> Last alert at 2024-03-06 20:46:00, mute for 900 seconds. Mute period ends at 2024-03-06 21:01:00 (240 seconds later)
[2024-03-06 20:58:05.199] [+0ms] [Event Alarm] ----> Currently in mute period, skip
[2024-03-06 20:58:05.200] [+0ms] [Event Alarm] [Alert Notification Target 2/2] wechatRobot/xxxxx-WeChat (critical)
[2024-03-06 20:58:05.200] [+0ms] [Event Alarm] Check event status match action: `critical` => critical
[2024-03-06 20:58:05.201] [+1ms] [Event Alarm] --> Last alert at 2024-03-06 20:46:00, mute for 900 seconds. Mute period ends at 2024-03-06 21:01:00 (240 seconds later)
[2024-03-06 20:58:05.201] [+0ms] [Event Alarm] ----> Currently in mute period, skip

# Cache generated events for use by the next monitor task
[2024-03-06 20:58:05.204] [+2ms] [Internal DataWay] Cache events
[2024-03-06 20:58:05.204] [+0ms] [Internal DataWay] Cache fault information
[2024-03-06 20:58:05.204] [+0ms] [Internal DataWay] --> Create cache: key=`rul_xxxxx-check`, field=`{"tag":"fake-data-1"}`

# Write events into Guance, TrueWatch
[2024-03-06 20:58:05.206] [+2ms] [Internal DataWay] Write events
[2024-03-06 20:58:05.208] [+1ms] [Internal DataWay] Line protocol write data
[2024-03-06 20:58:05.208] [+0ms] [Internal DataWay] --> Workspace TOKEN: `tkn_xxxxx`
[2024-03-06 20:58:05.208] [+0ms] [Internal DataWay] --> Request: POST /v1/write/keyevent
[2024-03-06 20:58:05.208] [+0ms] [Internal DataWay] --> First 1/1 data example: [{"fields":{"df_alert_policy_ids":["altpl_xxxxx","altpl_xxxxx"],"df_alert_policy_names":["xxxxx-Alert Policy1","xxxxx-Alert Policy2"],"df_at_accounts":"[]","df_at_accounts_nodata":"[]","df_channels":"[\"chan_xxxxx\"]","df_check_range_end":1709729820,"df_check_range_start":1709729760,"df_date_range":60,"df_dimension_tags":"{\"tag\":\"fake-data-1\"}","df_event_reason":"Meets the conditions for recognizing faults in the monitor, generating a fault event","df_fault_duration":2880,"df_fault_start_time":1709726940,"df_issue_duration":2880,"df_issue_start_time":1709726940,"df_matched_alert_policy_rules":["xxxxx-Alert Policy1 / -","xxxxx-Alert Policy2 / -"],"df_message":"Content: xxxxx-Monitor (Single) {\"tag\":\"fake-data-1\"}\nSecond line\nThird line","df_meta":"{\"alert_info\":{\"matchedAlertPolicyRules\":[{\"aggClusterFields\":[],\"aggFields\":[],\"aggInterval\":0,\"aggLabels\":[],\"id\":8222,\"minInterval\":900,\"name\":\"xxxxx-Alert Policy1\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-Alert Policy1-Rule2\",\"status\":\"critical\",\"type\":\"dingTalkRobot\",\"webhook\":\"https://oapi.dingtalk.com/robot/send?access_token=xxxxx\"},{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-WeChat\",\"status\":\"critical\",\"type\":\"wechatRobot\",\"webhook\":\"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx\"}]},\"ruleTimezone\":\"Asia/Shanghai\",\"status\":0,\"uuid\":\"altpl_xxxxx\",\"workspaceUUID\":\"wksp_xxxxx\"},{\"aggClusterFields\":[\"df_title\"],\"aggFields\":[\"CLUSTER\"],\"aggInterval\":60,\"aggLabels\":[],\"id\":8223,\"minInterval\":900,\"name\":\"xxxxx-Alert Policy2\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-Alert Policy2-Rule2\",\"status\":\"critical\",\"type\":\"dingTalkRobot\",\"webhook\":\"https://oapi.dingtalk.com/robot/send?access_token=xxxxx\"},{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-WeChat\",\"status\":\"critical\",\"type\":\"wechatRobot\",\"webhook\":\"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx\"}]},\"ruleTimezone\":\"Asia/Shanghai\",\"status\":0,\"uuid\":\"altpl_xxxxx\",\"workspaceUUID\":\"wksp_xxxxx\"}],\"matchedSilentRule\":null,\"targets\":[{\"hasSecret\":false,i... <Length: 6784>
[2024-03-06 20:58:05.209] [+0ms] [Internal DataWay] --> First data line protocol example: `keyevent,df_crontab_exec_mode=crontab,df_event_id=event-xxxxx,df_fault_id=event-xxxxx,df_fault_status=fault,df_label=["xxxxx_test"],df_language=en,df_monitor_checker=custom_metric,df_monitor_checker_event_ref=xxxxx,df_monitor_checker_id=rul_xxxxx,df_monitor_checker_ref=xxxxx,df_monitor_checker_sub=check,df_monitor_checker_type=monitor,df_monitor_id=altpl_xxxxx;altpl_xxxxx,df_monitor_type=custom,df_site_name=Test Environment,df_source=monitor,df_status=critical,df_sub_status=critical,df_workspace_name=[Doris] Development Testing Together_,df_workspace_uuid=wksp_xxxxx,tag=fake-data-1 df_alert_policy_ids=["altpl_xxxxx","altpl_xxxxx"],df_alert_policy_names=["xxxxx-Alert Policy1","xxxxx-Alert Policy2"],df_at_accounts="[]",df_at_accounts_nodata="[]",df_channels="[\"chan_xxxxx\"]",df_check_range_end=1709729820i,df_check_range_start=1709729760i,df_date_range=60i,df_dimension_tags="{\"tag\":\"fake-data-1\"}",df_event_reason="Meets the conditions for recognizing faults in the monitor, generating a fault event",df_fault_duration=2880i,df_fault_start_time=1709726940i,df_issue_duration=2880i,df_issue_start_time=1709726940i,df_matched_alert_policy_rules=["xxxxx-Alert Policy1 / -","xxxxx-Alert Policy2 / -"],df_message="Content: xxxxx-Monitor (Single) {\"tag\":\"fake-data-1\"}
Second line
Third line",df_meta="{\"alert_info\":{\"matchedAlertPolicyRules\":[{\"aggClusterFields\":[],\"aggFields\":[],\"aggInterval\":0,\"aggLabels\":[],\"id\":8222,\"minInterval\":900,\"name\":\"xxxxx-Alert Policy1\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-Alert Policy1-Rule2\",\"status\":\"critical\",\"type\":\"dingTalkRobot\",\"webhook\":\"https://oapi.dingtalk.com/robot/send?access_token=xxxxx\"},{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-WeChat\",\"status\":\"critical\",\"type\":\"wechatRobot\",\"webhook\":\"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx\"}]},\"ruleTimezone\":\"Asia/Shanghai\",\"status\":0,\"uuid\":\"altpl_xxxxx\",\"workspaceUUID\":\"wksp_xxxxx\"},{\"aggClusterFields\":[\"df_title\"],\"aggFields\":[\"CLUSTER\"],\"aggInterval\":60,\"aggLabels\":[],\"id\":8223,\"minInterval\":900,\"name\":\"xxxxx-Alert Policy2\",\"rule\":{\"md5\":\"xxxxx\",\"seq\":2,\"targets\":[{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-Alert Policy2-Rule2\",\"status\":\"critical\",\"type\":\"dingTalkRobot\",\"webhook\":\"https://oapi.dingtalk.com/robot/send?access_token=xxxxx\"},{\"hasSecret\":false,\"ignoreReason\":\"Currently in mute period. Last alert at 2024-03-06 20:46:00 (mute 900 seconds), mute period ends at 2024-03-06 21:01:00 (240 seconds later)\",\"isIgnored\":true,\"name\":\"xxxxx-WeChat\",\"status\":\"critical\",\"type\":\"wechatRobot\",\"webhook\":\"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxx\"}]},\"ruleTimezone\":\"Asia/Shanghai\",\"status\":0,\"uuid\":\"altpl_xxxxx\",\"workspaceUUID\":\"wksp_xxxxx\"}],\"matchedSilentRule\":null,\"targets\":[{\"hasSecret\":false,... <Length: 6616>
[2024-03-06 20:58:05.216] [+6ms] [Internal DataWay] --> Response result: `200 OK`
[2024-03-06 20:58:05.216] [+0ms] [Internal DataWay] --> Response content:""

# Notify Guance, TrueWatch Studio with generated events as per user configuration for tracking
# (The monitor only notifies here; specific incident handling is implemented by Guance, TrueWatch Studio)
[2024-03-06 20:58:05.216] [+0ms] [Studio] Buffer events that need to notify Studio
[2024-03-06 20:58:05.220] [+4ms] [Studio] --> Event: `{"df_at_accounts":[],"df_at_accounts_nodata":[],"df_channels":["chan_xxxxx"],"df_check_range_end":1709729820,"df_check_range_start":1709729760,"df_crontab_exec_mode":"crontab","df_date_range":60,"df_dimension_tags":"{\"tag\":\"fake-data-1\"}","df_event_id":"event-xxxxx","df_fault_duration":2880,"df_fault_id":"event-xxxxx","df_fault_start_time":1709726940,"df_fault_status":"fault","df_label":"[\"xxxxx_test\"]","df_message":"Content: xxxxx-Monitor (Single) {\"tag\":\"fake-data-1\"} \nSecond line \nThird line","df_monitor_checker":"custom_metric","df_monitor_checker_event_ref":"xxxxx","df_monitor_checker_id":"rul_xxxxx","df_monitor_checker_name":"Title: xxxxx-Monitor (Single) {df_dimension_tags}","df_monitor_checker_ref":"xxxxx","df_monitor_checker_sub":"check","df_monitor_checker_type":"monitor","df_monitor_checker_value":"54.90555555555556","df_monitor_id":"altpl_xxxxx;altpl_xxxxx","df_monitor_name":"xxxxx-Alert Policy1;xxxxx-Alert Policy2","df_monitor_type":"custom","df_site_name":"Test Environment","df_source":"monitor","df_status":"critical","df_sub_status":"critical","df_title":"Title: xxxxx-Monitor (Single) {\"tag\":\"fake-data-1\"}","df_workspace_uuid":"wksp_xxxxx","timestamp":1709729820}`

# Number of events generated during this detection
[2024-03-06 20:58:05.221] [+1ms] This detection generates 1 monitor event