Skip to content

Memory Leak Inspection

Background

"Memory Leak" is based on the memory anomaly analysis detector, which conducts intelligent inspections of hosts periodically. By analyzing hosts with memory anomalies, it performs root cause analysis to determine process and pod information corresponding to the anomalous time points, analyzing whether there are memory leak issues in the current workspace's hosts.

Prerequisites

  1. Offline deployment of self-built DataFlux Func
  2. Enable the [Script Market] of self-built DataFlux Func (../script-market-basic-usage/)
  3. Create an [API Key] for operations in the "TrueWatch" "Management / API Key Management": (https://docs.guance.com/management/api-key/open-api/)
  4. In the self-built DataFlux Func, install the "Self-Built Inspection Core Package", "Algorithm Library", and "Self-Built Inspection (Memory Leak)" through the "Script Market"
  5. In the self-built DataFlux Func, write a processing function for self-built inspections
  6. In the self-built DataFlux Func, create a scheduled task (Old version: Automatic Trigger Configuration) for the written function via "Management / Scheduled Tasks (Old version: Automatic Trigger Configuration)"

If considering using a cloud server for offline deployment of DataFlux Func, please ensure it is deployed with the currently used TrueWatch SaaS on the same operator and in the same region

Configure Inspection

Create a new script set in the self-built DataFlux Func to enable memory leak inspection configuration.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from guance_monitor__register import self_hosted_monitor
from guance_monitor__runner import Runner
import guance_monitor_memory_leak__main as memory_leak_check

# Account Configuration
API_KEY_ID  = 'wsak_xxx'
API_KEY     = 'wsak_xxx'

# The function filters parameter has priority over the configuration in Studio Monitoring\Intelligent Inspection. If the function filters parameter is configured, there is no need to change the detection settings in Studio Monitoring\Intelligent Inspection. If both are configured, the filters parameter in the script will take precedence.

def filter_host(host):
    '''
    Filter host, define conditions for matching hosts, return True if matched, otherwise return False.
    return True|False
    '''
    if host in ['iZuf6aq9gu32lpgvx8ynhbZ']:
        return True

'''
Task configuration parameters should use:
@DFF.API('Memory Leak Inspection', fixed_crontab='0 * * * *', timeout=900)

fixed_crontab: Fixed execution frequency "once per hour"
timeout: Task execution timeout duration, controlled within 15 minutes
'''

@self_hosted_monitor(API_KEY_ID, API_KEY)
@DFF.API('Memory Leak Inspection', fixed_crontab='0 * * * *', timeout=900)
def run(configs={}):
    '''
    Parameters:
    configs : List of hosts to be inspected (optional, default inspects all hosts in the current workspace if not configured)

    Example:
        configs = {
            "hosts": ["localhost"]
        }
    '''
    checkers = [
        memory_leak_check.MemoryLeakCheck(configs=configs, filters=[filter_host]), # This is just an example
    ]

    Runner(checkers, debug=False).run()

Start Inspection

After configuring the inspection in DataFlux Func, you can test it by selecting the run() method directly from the page. After publishing, you can view and configure it in DataFlux Func's "Management / Scheduled Tasks".

View Events

This inspection scans memory usage information for the last 6 hours. When an abnormal condition occurs, the intelligent inspection generates corresponding events. These events can be viewed in the "Event Center" after generation.

Event Details

  • Event Overview: Describes the object and content of the abnormal inspection event.
  • Abnormal Details: Displays the usage rate changes of the current abnormal host over the past 6 hours.
  • Abnormal Analysis: Shows the Top 10 list of processes (Pod lists) consuming memory on the abnormal host.

Common Issues

1. How to configure the detection frequency of the memory leak inspection

  • In the self-built DataFlux Func, add fixed_crontab='0 * * * *', timeout=900 in the decorator while writing the self-built inspection processing function, then configure it in "Management / Scheduled Tasks (Old version: Automatic Trigger Configuration)".

2. No abnormal analysis may appear when the memory leak inspection is triggered

When there is no abnormal analysis in the inspection report, please check the data collection status of the current datakit.

3. Scripts that were previously running normally during inspection show abnormal errors

Please update the referenced script sets in the Script Market of DataFlux Func. You can view the update records of the Script Market via the Change Log to facilitate timely updates to the scripts.