Collector「AWS-CloudWatch」Configuration Manual

Before reading this article, please first read:

TrueWatch Integration

Before using this collector, the 「Integration Core Package」and its corresponding third-party dependency packages must be installed

This collector supports multi-threading by default (five threads are enabled by default). If you need to change the thread pool size, you can set the environment variable COLLECTOR_THREAD_POOL_SIZE

1. Configuration Structure

The configuration structure of this collector is as follows:

Field	Type	Required	Description
`Regions`	List	Required	List of CloudWatch regions to collect data from
`regions[#]`	str	Required	Region ID such as: `'cn-northwest-1'` Refer to appendix for full list
`targets`	list	Required	CloudWatch collection object configuration list Logical relationship between multiple configurations with the same namespace is 「AND」
`targets[#].namespace`	str	Required	The CloudWatch namespace to collect data from. For example: `'AWS/EC2'`Refer to appendix for full list
`targets[#].dimensions`	str	Optional	List of CloudWatch dimension names to collect data from. `This configuration is a new configuration specifying the dimensions of the collected metrics` Refer to appendix for full list
`targets[#].metrics`	list	Required	List of CloudWatch metric names to collect data from Refer to appendix for full list)
`targets[#].metrics[#]`	str	Required	Metric name pattern, supports `"NOT"` and wildcard matching In normal cases, logical relationship between multiple ones is 「OR」 Includes `"NOT"` marker when logical relationship between multiple ones is 「AND」. Refer to details below

2. Configuration Example

Specifying Specific Metrics

Collect 2 metrics named CPUCreditBalance and MetadataNoToken in AWS/EC2 with dimension InstanceId

Python
aws_cloudwatch_configs = {
    'regions': ['cn-northwest-1'],
    'targets': [
        {
            'namespace' : 'AWS/EC2',
            'dimensions': ['InstanceId'],
            'metrics'   : ['CPUCreditBalance', 'MetadataNoToken'],
        }
    ],
}

Wildcard Matching Metrics

Metric names can use * wildcard to match.

In this example, the following metrics will be collected:

Metrics with dimension InstanceId
Metric named CPUCreditBalance
Metrics starting with CPU
Metrics ending with Balance
Metrics containing Credit

Python
aws_cloudwatch_configs = {
    'regions': ['cn-northwest-1'],
    'targets': [
        {
            'namespace' : 'AWS/EC2',
            'dimensions': ['InstanceId'],
            'metrics'   : ['CPUCreditBalance', 'CPU*', '*Balance', '*Credit*'],
        }
    ],
}

Exclude Some Metrics

Adding "NOT" at the beginning indicates removing the following metrics.

In this example, the following metrics 【will not】 be collected:

Metrics with dimension InstanceId
Metric named CPUCreditBalance
Metrics starting with CPU
Metrics ending with Balance
Metrics containing Credit

Python
aws_cloudwatch_configs = {
    'regions': ['cn-northwest-1'],
    'targets': [
        {
            'namespace' : 'AWS/EC2',
            'dimensions': ['InstanceId'],
            'metrics'   : ['NOT', 'CPUCreditBalance', 'CPU*', '*Balance', '*Credit*'],
        }
    ],
}

Multiple Filtering for Desired Metrics

The same namespace can be specified multiple times, filtering by metric name sequentially from top to bottom.

In this example, it is equivalent to performing the following filtering steps on the metric name:

Metrics under dimension InstanceId whose names contain CPU
In the results of the previous step, remove metrics named CPUUtilization

Python
aws_cloudwatch_configs = {
    'regions': ['cn-northwest-1'],
    'targets': [
        {
            'namespace' : 'AWS/EC2',
            'dimensions': ['InstanceId'],
            'metrics'   : ['*CPU*'],
        },
        {
            'namespace' : 'AWS/EC2',
            'dimensions': ['InstanceId'],
            'metrics'   : ['NOT', 'CPUCreditBalance'],
        },
    ],
}

Configuring Filters (Optional)

This collector script supports user-defined filters, allowing users to filter target resources through object attributes. Filter functions return True｜False

True: Target resource needs to be collected.
False: Target resource does not need to be collected

Supported object properties:

Product Name	Supported Attributes
Elastic Compute Cloud (EC2)	`InstanceId`
Relational Database Service (RDS)	`DBInstanceIdentifier`

When custom object collection is enabled, more object attributes can be filtered. Please refer to the corresponding product's custom object collector documentation (under support...).

Python
# Example: Enable filter, filter according to object's RegionId, InstanceId attribute, configuration format as follows:

def filter_instance(instance, namespace='AWS/EC2'):
    '''
    Collect metrics where InstanceId is i-0d7620xxxxxxxa, i-0d7620xxxxxxxb and RegionId is cn-northwest-1
    '''
    region_id = instance['tags'].get('RegionId')
    instance_id = instance['tags'].get('InstanceId')

    if instance_id in ['i-0d7620xxxxxxxa', 'i-0d7620xxxxxxxb'] and region_id in ['cn-northwest-1']:
        return True
    return False

from guance_integration__runner import Runner
import guance_aws_cloudwatch__main as main

@DFF.API('AWS-CloudWatch Collection', timeout=3600, fixed_crontab='*/15 * * * *')
def run():
    Runner(main.DataCollector(account, collector_configs, filters=[filter_instance])).run()

When configuring multiple filters under the same namespace, all filters must be satisfied simultaneously for reporting

3. Data Collection Explanation

Cloud Product Configuration Information

Product Name	Namespace (Namespace)	Dimension (Dimension)	Description
`Amazon EC2`	`AWS/EC2`	`InstanceId`
`Amazon RDS`	`AWS/RDS`	`DBInstanceIdentifier`
`Amazon S3`	`AWS/S3`	`*`	*Indicates collecting all dimensions of this namespace, same applies below S3 request metrics require manual configuration in the console, see attachment for details
`Amazon OpenSearch Service`	`AWS/ES`	`*`	Same as above
`Elastic Load Balancing`	`AWS/ELB`	`LoadBalancerName`
`Elastic Load Balancing`	`AWS/NetworkELB` `AWS/GatewayELB` `AWS/ApplicationELB`	`LoadBalancer`	Filter metrics data by load balancer. Specify the load balancer in the following way: net/load-balancer-name/1234567890123456 (last part of the load balancer ARN).
`Amazon ElastiCache for Memcached`	`AWS/ElastiCache`	`CacheClusterId`	The collector currently collects `HOST-level metrics`, see appendix (Amazon ElastiCache for Memcached HOST-level metric monitoring)
`Amazon ElastiCache for Redis`	`AWS/ElastiCache`	`CacheClusterId`	The collector currently collects `HOST-level metrics`, see appendix (Amazon ElastiCache for Redis HOST-level metric monitoring)

4. Data Reporting Format

After data is synchronized normally, you can view the data in the 「Metrics」 section of TrueWatch.

For example, with the following collector configuration:

Python
aws_cloudwatch_configs = {
    'regions': ['cn-northwest-1'],
    'targets': [
        {
            'namespace' : 'AWS/EC2',
            'dimensions': ['InstanceId'],
            'metrics'   : ['CPUCreditBalance'],
        },
    ],
}

Example of reported data:

JSON
{
  "measurement": "aws_AWS/EC2",
  "tags": {
    "Dimensions": "InstanceId",
    "InstanceId": "i-xxx"
  },
  "fields": {
    "CPUCreditBalance_Average"    : 576.0,
    "CPUCreditBalance_Maximum"    : 576.0,
    "CPUCreditBalance_Minimum"    : 576.0,
    "CPUCreditBalance_SampleCount": 1.0,
    "CPUCreditBalance_Sum"        : 576.0
  }
}

All metric values will be reported as float type

This collector collects data under the AWS/EC2 namespace with the InstanceId dimension, see Data Collection Explanation

5. Coordination with Custom Object Collectors

When other custom object collectors (such as EC2) are running in the same DataFlux Func, this collector will collect data according to the dimensions indicated in Data Collection Explanation, attempting to match fields like tags.InstanceId with tags.name fields in custom objects.

Instance dimension metrics are automatically supplemented; for an explanation of instance dimensions, please refer to Data Collection Explanation.

Since information about custom objects must be known beforehand to enable coordination within the CloudWatch collector, it is generally recommended to place the CloudWatch collector at the end of the list, such as:

Python
# Create collector
collectors = [
    aws_ec2.DataCollector(account, common_aws_configs),
    aws_cloudwatch.DataCollector(account, aws_cloudwatch_configs) # CloudWatch collector usually placed at the end
]

After successful matching, additional fields from the matched custom object tags will be added to the CloudWatch data tags, thereby enabling effects such as filtering CloudWatch metric data using instance names. Specific effects are as follows:

Assume the original data collected by CloudWatch is as follows:

JSON
{
  "measurement": "aws_AWS/EC2",
  "tags": {
    "Dimensions": "InstanceId",
    "InstanceId": "i-xxx"
  },
  "fields": { "content omitted" }
},

At the same time, the custom object data collected by the AWS EC2 collector is as follows:

JSON
{
  "measurement": "aws_ec2",
  "tags": {
    "InstanceType"   : "c6g.xxx",
    "PlatformDetails": "xxx",
    "{other fields}" : "{omitted}"
  },
  "fields": { "content omitted" }
}

Then, the final CloudWatch data reported is as follows:

JSON
{
  "measurement": "aws_AWS/EC2",
  "tags": {
    "InstanceId"        : "i-xxx",   // Original field from CloudWatch
    "Dimensions"      : "InstanceId", // Dimension information field
    "InstanceType"    : "c6g.xxx", // Field from custom object EC2
    "PlatformDetails" : "xxx",     // Field from custom object EC2
    "{other fields}"  : "{omitted}"
  },
  "fields": { "content omitted" }
},

6. Cloud Monitoring API Call Count Explanation

AWS cloudwatch has free quota limits for some API calls (currently: query API free quota 1 million times/month, exceeding parts refer to Amazon (Global Zone) CloudWatch Pricing、Amazon (China Zone) CloudWatch Pricing, GetMetricStatistics, ListMetrics used by this collector are also within the limit range. Below is a detailed explanation of the number of API calls made by the script set:

1. User has multiple resources and needs to collect various monitoring items, determine whether it exceeds the free quota:

This collector uses ListMetrics (list available metrics), GetMetricStatistics (get aggregated information of specific metrics, each request can only get one metric of one resource) call counts illustrated as follows:

Account has 1 ec2 resource that needs to collect CPUUtilization 1 monitoring item, requires 2 requests (ListMetrics 1 time, GetMetricStatistics 1 time);
Account has 2 ec2 resources that need to collect CPUUtilization, DiskReadBPS 2 monitoring items, requires 5 requests (ListMetrics 1 time, GetMetricStatistics 4 times);

2. Find the actual call count by checking task execution logs:

The collector counts the number of API calls made during each task execution result, which can be viewed in the logs, for example:

Bash
[2023-04-21 15:32:13.194] [+0ms] Completed data collection for account 【1】, total execution time 【274 milliseconds】, called API 【2 times】
[2023-04-21 15:32:13.194] [+0ms] Detailed calls as follows:
[2023-04-21 15:32:13.194] [+0ms] -> monitoring.cn-northwest-1.amazonaws.com.cn/?Action=ListMetrics: 1 time
[2023-04-21 15:32:13.194] [+0ms] -> monitoring.cn-northwest-1.amazonaws.com.cn/?Action=GetMetricStatistics: 1 time

Given that cloud monitoring API calls have free quotas, users are advised to configure monitoring items as needed to avoid extra consumption due to wildcards

7. IAM Policy Permissions

If users use IAM roles for resource collection, certain operation permissions must be enabled

The following operation permissions are required for this collector:

cloudwatch:GetMetricStatistics

cloudwatch:ListMetrics

Notes

Task Trigger Error Scenarios and Solutions

HTTPClientError: An HTTP Client raised an unhandled exception: SoftTimeLimitExceeded()

Cause: Task execution time too long timeout.

Solution:

Appropriately increase the task's timeout setting (e.g., @DFF.API('Execute Collection', timeout=120, fixed_crontab="* * * * *"), indicating setting the timeout in the task to 120 seconds).
Regarding metrics collected by CloudWatch agent from Amazon EC2 instances and local servers

Cause: Agent collects metrics from Amazon EC2 instances and local servers

Solution: - https://docs.amazonaws.cn/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html

X. Appendix

AWS CloudWatch

Please refer to AWS official documentation: