Configuration Manual for "Alibaba Cloud - Cloud Monitor" Collector

Before reading this article, please read the following first:

TrueWatch Integration

Before using this collector, you must install the 'Integration Core Package' and its associated third-party dependency packages

This collector supports multi-threading by default (five threads are enabled by default). If you need to change the thread pool size, you can set the environment variable COLLECTOR_THREAD_POOL_SIZE

1. Configuration Structure

The configuration structure of this collector is as follows:

Field	Type	Required	Description
`targets`	list	Required	Configuration list for cloud monitoring collection objects Logical relationship between multiple configurations with the same namespace is 'AND'
`targets[#].namespace`	str	Required	The required cloud monitoring namespace to collect. For example: `'acs_ecs_dashboard'` Refer to appendix for the full table
`targets[#].metrics`	list	Required	List of cloud monitoring Metrics names to be collected Refer to appendix for the full table
`targets[#].metrics[#]`	str	Required	Metric name pattern, supporting `"NOT"` and wildcard matching In normal cases, logical relationship between multiple entries is 'OR' When including `"NOT"` marker, the logical relationship between multiple entries is 'AND'. See more details below

2. Configuration Example

Specifying Specific Metrics

Collect 2 metrics named CPUUtilization and concurrentConnections from ECS.

Python
collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard',
            'metrics'  : ['CPUUtilization', 'concurrentConnections'],
        },
    ],
}

Wildcard Matching Metrics

Metric names can use * wildcard for matching.

In this example, the following metrics will be collected:

Metrics named CPUUtilization
Metrics starting with CPU
Metrics ending with Connections
Metrics containing Conn

Python
collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard',
            'metrics'  : ['CPUUtilization', 'CPU*', '*Connections', '*Conn*'],
        },
    ],
}

Excluding Some Metrics

Adding "NOT" at the beginning indicates that the subsequent metrics should be excluded.

In this example, the following metrics will not be collected:

Metrics named CPUUtilization
Metrics starting with CPU
Metrics ending with Connections
Metrics containing Conn

Python
collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard',
            'metrics'  : ['NOT', 'CPUUtilization', 'CPU*', '*Connections', '*Conn*'],
        },
    ],
}

Multiple Filtering to Specify Desired Metrics

The same namespace can be specified multiple times, filtering by metric names in order from top to bottom.

In this example, it is equivalent to performing the following filtering steps on the metric names:

Select all metrics containing CPU in their names.
In the results of the previous step, exclude metrics named CPUUtilization.

Python
collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard',
            'metrics'  : ['*CPU*'],
        },
        {
            'namespace': 'acs_ecs_dashboard',
            'metrics'  : ['NOT', 'CPUUtilization'],
        },
    ],
}

Configuring Filters (Optional)

This collector script supports user-defined filters, allowing users to filter target resources through object attributes. The filter function returns True or False.

True: Target resource needs to be collected.
False: Target resource does not need to be collected.

Supported object attributes:

Product Name	Supported Attributes
Cloud Assets (Object Data)/Cloud Server ECS	`instanceId`, `userId`
Cloud Assets (Object Data)/Database and Storage/Cloud Database RDS	`instanceId`, `userId`
Cloud Assets (Object Data)/Load Balancer SLB	`instanceId`, `userId`
Cloud Assets (Object Data)/Database and Storage/Object Storage OSS	`BucketName`, `userId`

When custom object collection is enabled, more object attributes will be supported for filtering. Refer to the documentation of each product's custom object collector for more details (under development...).

Python
# Example: Enable a filter based on the InstanceId and RegionId properties of an object, with the following configuration format:

def filter_instance(instance, namespace='acs_ecs_dashboard'):
    '''
    Collect metrics for instances with InstanceId i-xxxxxa, i-xxxxxb and RegionId cn-hangzhou
    '''
    instance_id = instance['tags'].get('InstanceId')
    region_id = instance['tags'].get('RegionId')
    if instance_id in ['i-xxxxxa', 'i-xxxxxb'] and region_id in ['cn-hangzhou']:
        return True
    return False

from guance_integration__runner import Runner
import guance_aliyun_monitor__main as main

@DFF.API('AlibabaCloud-monitor ', timeout=3600, fixed_crontab="*/5 * * * *")
def run():
    Runner(main.DataCollector(account, collector_configs, filters=[filter_instance])).run()

When configuring multiple filters under the same namespace, only data satisfying all filters will be reported.

3. Data Reporting Format

After data synchronization, the data can be viewed in the 'Metrics' section of TrueWatch.

For example, consider the following collector configuration:

Python
collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard',
            'metrics'  : ['CPUUtilization'],
        },
    ],
}

Example of reported data:

JSON
{
  "measurement": "aliyun_acs_ecs_dashboard",
  "tags": {
    "instanceId": "i-xxxxx",
    "userId"    : "xxxxx"
  },
  "fields": {
    "CPUUtilization_Average": 1.23,
    "CPUUtilization_Maximum": 1.23,
    "CPUUtilization_Minimum": 1.23
  }
}

All metric values will be reported as float type.

4. Coordination with Custom Object Collectors

When other custom object collectors (such as ECS, RDS) are running within the same DataFlux Func, this collector will automatically attempt to match the tags.instanceId field with the tags.name field in custom objects.

Since knowledge of custom object information is needed before coordination in cloud monitoring collectors, it is generally recommended to place the cloud monitoring collector at the end of the list, like this:

Python
    # Create collectors
    collectors = [
        aliyun_ecs.DataCollector(account, common_aliyun_configs),
        aliyun_rds.DataCollector(account, common_aliyun_configs),
        aliyun_slb.DataCollector(account, common_aliyun_configs),
        aliyun_oss.DataCollector(account, common_aliyun_configs),
        aliyun_monitor.DataCollector(account, monitor_collector_configs), # Cloud monitoring collectors are usually placed at the end
    ]

After successful matching, fields from the matched custom object tags (excluding name) will be added to the tags of the monitoring data, thus enabling effects such as filtering cloud monitoring metrics data using instance names. Specific effects are as follows:

Assume the original data collected by cloud monitoring is as follows:

JSON
{
  "measurement": "aliyun_acs_ecs_dashboard",
  "tags": {
    "instanceId": "i-001",
    "{other fields}": "{omitted}"
  },
  "fields": {
    "{metric}": "{metric value}"
  }
}

At the same time, the custom object data collected by the Alibaba Cloud ECS collector is as follows:

JSON
{
  "measurement": "aliyun_ecs",
  "tags": {
    "name"      : "i-001",
    "InstanceId": "i-001",
    "RegionId"  : "cn-hangzhou",
    "{other fields}": "{omitted}"
  },
  "fields": {
    "{other fields}": "{omitted}"
  }
}

Then, the final cloud monitoring data reported will be as follows:

JSON
{
"measurement": "aliyun_acs_ecs_dashboard",
  "tags": {
    "instanceId": "i-001",
    "RegionId"  : "cn-hangzhou",
    "{other fields}": "{omitted}"
  },
  "fields": {
    "{metric}": "{metric value}"
  }
}

5. Explanation of Cloud Monitoring API Call Limits

Alibaba Cloud Cloud Monitor has free quota limits for some API calls (currently: Query API free quota 1 million/month, exceeding part charged at 0.12 yuan per 10,000 calls). The DescribeMetricLast used by this collector is also within the limit range. Below is a detailed explanation of the number of script calls:

1. Users have multiple resources and need to collect various monitoring items, judge whether they exceed the free quota:

This collector uses DescribeMetricLast (query the latest monitoring data for specific monitoring items) which can obtain multiple (up to 1000, exceeding requires paging) resources for one monitoring item per request. Examples of request counts:

An account has 1000 ecs resources needing to collect CPUUtilization for 1 monitoring item, requiring 1 request;
An account has 1000 ecs resources needing to collect CPUUtilization and DiskReadBPS for 2 monitoring items, requiring 2 requests (one for each monitoring item);
An account has 1001 ecs resources needing to collect CPUUtilization for 1 monitoring item, requiring 2 requests (paging due to exceeding 1000 resources);
An account has 1001 ecs resources needing to collect CPUUtilization and DiskReadBPS for 2 monitoring items, requiring 4 requests;

2. Find the actual call count by checking task execution logs:

The collector keeps track of the number of API calls made during each task execution, which can be viewed in the logs, for example:

Bash
[2023-04-21 15:32:13.194] [+0ms] Finished collecting for the 【1】st account, total execution time【274 milliseconds】, during which API was called【2 times】
[2023-04-21 15:32:13.194] [+0ms] Detailed calls are as follows:
[2023-04-21 15:32:13.194] [+0ms] -> metrics.aliyuncs.com/?Action=DescribeMetricMetaList: 1 time
[2023-04-21 15:32:13.194] [+0ms] -> metrics.aliyuncs.com/?Action=DescribeMetricLast: 1 time

Given that cloud monitoring API calls have free quotas, it is recommended that users configure monitoring items as needed to avoid additional costs caused by wildcards.

Precautions

Common Errors and Solutions

Number of collected instances does not match the actual number of existing instances

Reason: Instance status is shutdown

Solution:

Start the instance.

X. Appendix

Refer to the official Alibaba Cloud documentation: