Skip to content

Server Application Error Inspection

Background

When the server encounters runtime errors, we need to detect them early and issue timely warnings so that developers and operations personnel can troubleshoot. It is important to promptly confirm whether these errors have potential impacts on the application. The content of the server application error inspection event notifies developers and operations about new errors that occurred in the last hour, pinpoints the exact location of the error, and provides associated diagnostic clues to users.

Prerequisites

  1. Applications already integrated with TrueWatch "Application Performance Monitoring"
  2. Offline deployment of self-hosted DataFlux Func
  3. Enable the self-hosted DataFlux Func Script Market
  4. Create an API Key for performing operations in the TrueWatch "Management / API Key Management" section
  5. In the self-hosted DataFlux Func, install the "Self-built Inspection Core Package", "Algorithm Library", and "Self-built Inspection (APM Errors)" via the "Script Market"
  6. Write a custom inspection processing function in the self-hosted DataFlux Func
  7. In the self-hosted DataFlux Func, create scheduled tasks (Old version: Automatic Trigger Configuration) for the written functions through "Management / Scheduled Tasks (Old version: Automatic Trigger Configuration)"

If you consider using a cloud server for offline deployment of DataFlux Func, ensure it is deployed with your currently used TrueWatch SaaS in the same operator and region

Configuring Inspection

Create a new script set in the self-hosted DataFlux Func to enable memory leak inspection configuration.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from guance_monitor__runner import Runner
from guance_monitor__register import self_hosted_monitor
import guance_monitor_apm_error__main as main

# Account Configuration
API_KEY_ID  = 'wsak_xxxxxx'
API_KEY     = '5K3Ixxxxxx'

# Function filters parameter has precedence over the priority settings in studio monitoring\smart inspections. If configured in the function filters parameter, there's no need to change detection settings in studio monitoring\smart inspections. If both are configured, the function filter parameters will take effect first.

def filter_project_servcie_sub(data):
    # {'project': None, 'service_sub': 'mysql:dev'}, {'project': None, 'service_sub': 'redis:dev'}, {'project': None, 'service_sub': 'ruoyi-gateway:dev'}, {'project': None, 'service_sub': 'ruoyi-modules-system:dev'}
    project = data['project']
    service_sub = data['service_sub']
    if service_sub in ['ruoyi-08-auth:dev:1.0']:
        return True

'''
Task configuration parameters should use:
@DFF.API('Server Application Error Inspection', fixed_crontab='0 * * * *', timeout=1800)

fixed_crontab: Fixed execution frequency "once per hour"
timeout: Task execution timeout length, controlled within 30 minutes
'''

# Server Application Error Inspection Configuration - User does not need to modify
@self_hosted_monitor(API_KEY_ID, API_KEY)
@DFF.API('Server Application Error Inspection', fixed_crontab='0 * * * *', timeout=1800)
def run(configs={}):
    """
    Parameters:
        configs: List of app_names to be inspected (optional; default inspects all app_names)

        Example Configuration:
        configs = {
            "project": ["project", "project2"]   Projects
            "env"    : ["env1", "env2"]          Environments
            "version": ["version1", "version2"]  Versions
            "service": ["service1", "service2"]  Services
        }
    """
    checkers = [
        # APM Error Inspection
        main.ApmErrorCheck(configs=configs, filters=[filter_project_servcie_sub]),

    ]

    Runner(checkers, debug=False).run()

Enabling Inspection

After configuring the inspection in DataFlux Func, you can test it by selecting and running the run() method directly from the page. After publishing, you can view and configure the task in the DataFlux Func "Management / Scheduled Tasks".

Viewing Events

This inspection scans for new application errors in the past hour. Once a new type of error occurs, smart inspection generates corresponding events, which can then be viewed in the "Event Center".

Event Details

  • Event Overview: Describes the object and content of the anomaly inspection event.
  • Error Distribution: View changes in the number of errors that occurred in the last hour for the current anomalous application.
  • Error Details: Displays detailed information about new errors and specific error counts for the anomalous application. You can click on specific error messages, error types, and error stacks to navigate to the error detail page.

Common Issues

1. How to configure the detection frequency for server application error inspections

  • In the self-hosted DataFlux Func, when writing the custom inspection processing function, add fixed_crontab='0 * * * *', timeout=1800 in the decorator, then configure it in "Management / Scheduled Tasks (Old version: Automatic Trigger Configuration)".

2. Why might there be no anomaly analysis when the server application error inspection triggers

If there is no anomaly analysis in the inspection report, check the data collection status of the current datakit.

3. Under what circumstances will server application error inspection events be generated

The server application error inspection scans for new application errors in the last hour. Once a new type of error occurs, the smart inspection generates a corresponding event.

4. What to do if a previously normal script starts producing abnormal errors during inspection

Update the referenced script set in the DataFlux Func Script Market. You can view the update records of the script market through the Change Log to facilitate instant updates of scripts.