Add Extra Tags to Cloud Resource Reported Data

1. Background

In general, the collector extracts only some universally important attributes from the cloud vendor's resources and uses them as tags. This is not sufficient for some users. This article will introduce how to supplement additional tags to the data collected (before reporting).

2. Solution

Without modifying the official collector, the collector itself provides an after_collect parameter. Users can assign a function to this parameter to perform secondary processing on the collected data, including adding extra tags.

Python
def handler(point):
    point['tags']['origin'] = 'shanghai'
    return point

@DFF.API('xxx Collection', timeout=3600, fixed_crontab='* * * * *')
def run():
    Runner(main.DataCollector(account, collector_configs, after_collect=handler), debug=True).run()

The above example omits irrelevant configurations; focus on the handler function. This function supports only one parameter point. point is the data that the collector is about to report. The data structure can be referenced in the relevant collector documentation under "Data Reporting Format". It can be confirmed that point must include three fields: measurement, tags, and fields. For those who want to learn more, you can refer to the relevant line protocol documentation. We mainly focus on the point.tags field, inserting the key-value pairs to be supplemented into tags. In the example, it is equivalent to adding a key-value pair to point.tags with the key being origin and the value being shanghai.

3. Case Studies

Supplementing AWS Console-configured EC2 tags to the tags of the EC2 object data collected by the collector.

Scenario One: Directly extract the Tags field from point.fields and add it to point.tags

Python
account = {
    'ak_id'     :  DFF.ENV('aws_develop_test')['ak_id'],
    'ak_secret' :  DFF.ENV('aws_develop_test')['ak_secret'],
}

collector_configs = {
    'regions': ['cn-northwest-1']
}

from guance_integration__runner import Runner
import guance_aws_ec2__main as main
from guance_integration__utils import json_loads

def add_tags(point):
    # If there are cloud resource Tags in point.fields, directly retrieve them.
    cloud_tags = json_loads(point['fields'].get('Tags'))
    if not cloud_tags:
        return point

    for t in cloud_tags:
        t_key = t['Key']
        t_v = t['Value']

        # Do not replace existing tags (case-sensitive)
        protected_tags = [k.lower() for k in point['tags'].keys()]
        if t_key.lower() in protected_tags:
            continue

        # Be cautious when adding tags starting and ending with double underscores. The following demo prohibits such additions.
        if t_key.startswith('__') and t_key.endswith('__'):
            continue

        point['tags'][t_key] = t_v

    return point

@DFF.API('AWS-EC2 Collection', timeout=3600, fixed_crontab='*/15 * * * *')
def run():
    Runner(main.DataCollector(account, collector_configs, after_collect=add_tags)).run()

Scenario Two: Not all collectors have a Tags field in their point.fields (continuously supported...). If it is not supported, you need to obtain it from the cloud vendor’s open API (or possibly the customer's own API):

Python
account = {
    'ak_id'     :  DFF.ENV('aws_develop_test')['ak_id'],
    'ak_secret' :  DFF.ENV('aws_develop_test')['ak_secret'],
}

# Collector configuration
collector_configs = {
    'regions': ['cn-northwest-1']
}

from guance_integration__runner import Runner
import guance_aws_ec2__main as main
from guance_integration__utils import json_loads
from guance_integration__client import AWS

def add_tags(point):
    # If there are no cloud resource Tags in point.fields, call the cloud API to obtain them.
    client = AWS(**account)
    region_id = point['tags']['RegionId']
    instance_id = point['tags']['InstanceId']
    biz_params = {
        'Filters': [
            {
                'Name': 'resource-id',
                'Values': [
                    instance_id,
                ]
            }
        ]
    }
    api_res = client.do_api(action='describe_tags', product='ec2', region_id=region_id, **biz_params)
    if not api_res:
        return point

    cloud_tags = api_res.get('Tags')
    if not cloud_tags:
        return point

    for t in cloud_tags:
        t_key = t['Key']
        t_v = t['Value']

        # Do not replace existing tags (case-sensitive)
        protected_tags = [k.lower() for k in point['tags'].keys()]
        if t_key.lower() in protected_tags:
            continue

        # Be cautious when adding tags keys starting and ending with double underscores. The demo below directly prohibits such additions.
        if t_key.startswith('__') and t_key.endswith('__'):
            continue

        point['tags'][t_key] = t_v

    return point

@DFF.API('AWS-EC2 Collection', timeout=3600, fixed_crontab='*/15 * * * *')
def run():
    Runner(main.DataCollector(account, collector_configs, after_collect=add_tags)).run()

4. Key Points to Note

In cloud product collectors, custom object tags are automatically added to associated Metrics tags. Therefore, if you enable both the custom object collector and the cloud monitoring collector and need to supplement tags, you only need to supplement the object collector.
When adding tags to the reported data of the collector, pay special attention to certain fields that cannot be overwritten, such as the name field of custom objects. It is recommended to follow the examples provided: if the original data tags contain the same key, do not add them again to prevent unexpected situations.
The function assigned to after_collect only accepts one parameter point. After processing point, the function must return one or more points. If there is no return or an error occurs during processing, the raw data will be reported without any function handling. When the after_collect function is defined but invalid, first check for this possibility.