<<< brand_name >>> Integration

This document primarily introduces the handling of accessing and processing data related to Alibaba Cloud and AWS platforms using the "Cloud Sync" series script packages available in the Script Market.

Always use the latest version of DataFlux Func for operations

This script package will continuously add new features, please keep an eye on this document page

1. Prerequisites

Log in to <<< brand_name >>> and register an account.

1.1 If enabling DataFlux Func (Automata)

All prerequisites are automatically installed; there is no additional prerequisite work required. Please skip to script installation.

1.2 If deploying Func yourself

Install DataFlux Func on a cloud host; specific system requirements can be referenced here: Deployment and Maintenance / System Requirements
Download and install DataFlux Func GSE Edition on the cloud host:

Bash
# Download DataFlux Func GSE
/bin/bash -c "$(curl -fsSL func.guance.com/download)" -- --for=GSE

# Install DataFlux Func
sudo /bin/bash {installation file directory}/run-portable.sh

For more information, refer to: Quick Start

After installation, create a new connector, select <<< brand_name >>> as the type, and configure the workspace's AK SK in the connector.

2. Script Installation

Here, assuming the need to collect Alibaba Cloud monitoring data and write it into <<< brand_name >>>.

Please prepare an Alibaba Cloud AK that meets the requirements in advance (for simplicity, you can directly assign global read-only permissions ReadOnlyAccess)

2.1 Install specific collectors

To synchronize cloud resource monitoring data, we generally need to install two scripts: one script collects basic information about corresponding cloud assets, and another script collects cloud monitoring information.

If log collection is required, the corresponding log collection script must also be enabled. If bill collection is needed, the cloud bill collection script should be enabled.

Taking Alibaba Cloud ECS collection as an example, click sequentially in the "Management / Script Market" according to the corresponding script packages:

"Integration (Alibaba Cloud - Cloud Monitoring)" (ID: guance_aliyun_monitor)
"Integration (Alibaba Cloud - ECS)" (ID: guance_aliyun_ecs)

After clicking [Install], enter the corresponding parameters: Alibaba Cloud AK, Alibaba Cloud account name.

Click [Deploy Startup Script], and the system will automatically create a Startup script set and automatically configure the corresponding startup scripts.

Additionally, in "Management / Scheduled Tasks (Old Version: Automatic Trigger Configuration)," you will see the corresponding scheduled tasks (Old Version: Automatic Trigger Configuration). Click [Execute] to immediately run it without waiting for the scheduled time. Wait a moment, and you can view the task execution records and corresponding logs.

2.2 Verify Synchronization Status

In "Management / Scheduled Tasks (Old Version: Automatic Trigger Configuration)," confirm whether the corresponding tasks have the appropriate scheduled tasks (Old Version: Automatic Trigger Configuration), and simultaneously check the task records and logs for any abnormalities.
On the <<< brand_name >>> platform, under "Infrastructure / Custom," check if asset information exists.
On the <<< brand_name >>> platform, under "Metrics," check if corresponding monitoring data exists.

3. Code Details

Below is a step-by-step explanation of the code used in this example.

In fact, all "Integration" class scripts can be implemented using similar methods.

import Section

To properly use the scripts provided by the Script Market, after installing the script package, the components need to be imported via the import method.

Python
from guance_integration__runner import Runner
import guance_aliyun_monitor__main as aliyun_monitor

Runner is the actual launcher for all collectors, and Runner must always be imported to start the collector. aliyun_monitor is the "Alibaba Cloud - Cloud Monitoring" collector required in this example.

Account Configuration Section

To properly call the cloud platform API, users also need to provide the corresponding platform AK for the collector to use.

Python
account = {
    'ak_id'    : '<An Alibaba Cloud AK ID with appropriate permissions>',
    'ak_secret': '<An Alibaba Cloud AK Secret with appropriate permissions>',

    'extra_tags': {
        'account_name': 'My Alibaba Cloud Account',
    }
}

Reference for creating Alibaba Cloud AK/SK: Create AccessKey

Besides the basic ak_id, ak_secret, some cloud platform accounts may also require additional content, such as AWS when using IAM roles which requires configuration of assume_role_arn, role_session_name, etc. Refer to Amazon (AWS) Code Example.

Finally, each account allows adding an extra_tags field, allowing users to uniformly add the same tags to the collected data, making it easier to identify different data belonging to accounts within <<< brand_name >>>.

The Key, Value of extra_tags are strings, with unlimited content, and support multiple Key, Value pairs.

In this example, by configuring { 'account_name': 'My Alibaba Cloud Account' } for extra_tags, all data from this account is tagged with account_name="My Alibaba Cloud Account".

Function Definition Section

In DataFlux Func, all code must be included in a function decorated with @DFF.API(...).

Python
@DFF.API('Execute Cloud Asset Sync')
def run():
    # Specific code omitted ...

The first parameter of the @DFF.API(...) decorator is the title, with arbitrary content.

For integration scripts, they ultimately run through "Scheduled Tasks (Old Version: Automatic Trigger Configuration)." Only functions decorated with @DFF.API(...) can be created as "Scheduled Tasks (Old Version: Automatic Trigger Configuration)."

Collector Configuration Section

In addition to configuring the corresponding cloud platform account, the collector also needs to be configured.

Collector configurations can be found in the documentation for specific collectors; this section only provides usage tips.

Basic Configuration

Python
collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard', # Cloud monitoring namespace
            'metrics'  : ['*cpu*', '*mem*'],  # Cloud monitoring metrics containing cpu, mem data
        },
    ],
}
collectors = [
    aliyun_monitor.DataCollector(account, collector_configs),
]

Alibaba Cloud monitoring requires configuration of the targets to be collected. In this example, we specify the collection of metrics related to CPU and memory within ECS instances.

Advanced Configuration

Python
# Metric filter
def filter_ecs_metric(instance, namespace='acs_ecs_dashboard'):
    '''
    Collect metric data where instance_id is in ['xxxx']
    '''
    # return True
    instance_id = instance['tags'].get('InstanceId')
    if instance_id in ['xxxx']:
        return True
    return False

def after_collect_metric(point):
    '''
    Add tags to the collected data
    '''
    if point['tags']['name'] == 'xxx':
        point['tags']['custom_tag'] = 'c1'
    return point

collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard', # Cloud monitoring namespace
            'metrics'  : ['*cpu*', '*mem*'],  # Cloud monitoring metrics containing cpu, mem data
        },
    ],
}
collectors = [
    aliyun_monitor.DataCollector(account, collector_configs, filters=filter_ecs_metric, after_collect=after_collect_metric)),
]

filters: Filter function. Filters the collected data (not all collectors support filters; please refer to the specific collector documentation for "Configuring Filters"). When the filtering conditions are defined, the function returns True to indicate that the condition is met and the data should be collected, and returns False to indicate that the condition is not met and the data should not be collected. Please configure flexibly based on your business needs.
after_collect: A custom after_collect function processes the collected data further. Use case: cutting log data, adding extra fields to field/tags. Note: The return value of this function serves as the data to be reported. It is recommended that you either modify the input point or add a series of points following the original point structure. If you return empty or False, all points collected by the collector will not be reported.

Finally, the specific "collector instance" needs to be generated using the account configuration and collector configuration described above.

Launch Execution Section

The operation of the collector requires a unified Runner launcher to execute.

The launcher needs to initialize with the specific "collector instance" generated in the previous section and call the run() function to start the execution.

The launcher will iterate through all the collectors passed in and sequentially report the collected data to DataKit (the default DataKit connector ID is datakit).

Python
Runner(collectors).run()

After completing the code writing, if you are unsure whether the configuration is correct, you can add the debug=True parameter to the launcher to make it run in debug mode.

When running in debug mode, the launcher will normally perform data collection operations, but it will not finally write to DataKit, as follows:

Python
Runner(collectors, debug=True).run()

If the DataKit connector ID to write to is not the default datakit, you can add datakit_id="<DataKit ID>" to the launcher to specify the DataKit connector ID, as follows:

Python
Runner(collectors, datakit_id='<DataKit ID>').run()

4. Other Cloud Vendors Code References

Configuration methods for other cloud vendors are similar to Alibaba Cloud.

Amazon (AWS)

Example of collecting "EC2 Instance Objects" and "EC2-related Monitoring Metrics":

Python
from guance_integration__runner import Runner
import guance_aws_ec2__main as aws_ec2
import guance_aws_cloudwatch__main as aws_cloudwatch

# Account configuration
# AWS supports collecting resources by bringing in iam roles
# If roles are needed, configure: assume_role_arn, role_session_name
# If Multi-Factor Authentication (MFA) is enabled, configure: serial_number, token_code
account = {
    'ak_id'            : '<AWS AK ID with appropriate permissions>',
    'ak_secret'        : '<AWS AK Secret with appropriate permissions>',
    'assume_role_arn'  : '<Resource Name (ARN) of the role to be brought in>',
    'role_session_name': '<Role Session Name>',
    'serial_number'    : '<Identifier of the MFA device>',
    'token_code'       : '<One-time code provided by the MFA device, optional>',
    'extra_tags': {
        'account_name': 'My AWS Account',
    }
}

@DFF.API('Execute Cloud Asset Sync')
def run():
    regions = ['cn-northwest-1']

    # Collector configuration
    ec2_configs = {
        'regions': regions,
    }
    cloudwatch_configs = {
        'regions': regions,
        'targets': [
            {
                'namespace': 'AWS/EC2',
                'metrics'  : ['*cpu*'],
            },
        ],
    }
    collectors = [
        aws_ec2.DataCollector(account, ec2_configs),
        aws_cloudwatch.DataCollector(account, cloudwatch_configs),
    ]

    # Start execution
    Runner(collectors).run()

For AWS account configuration, refer to: Multiple Authentication Methods for AWS Clients

Tencent Cloud

Example of collecting "CVM Instance Objects" and "CVM-related Monitoring Metrics":

Python
from guance_integration__runner import Runner
import guance_tencentcloud_cvm__main as tencentcloud_cvm
import guance_tencentcloud_monitor__main as tencentcloud_monitor

# Account configuration
account = {
    'ak_id'    : '<Tencent Cloud Secret ID with appropriate permissions>',
    'ak_secret': '<Tencent Cloud Secret Key with appropriate permissions>',

    'extra_tags': {
        'account_name': 'My Tencent Cloud Account',
    }
}

@DFF.API('Execute Cloud Asset Sync')
def run():
    regions = ['ap-shanghai']

    # Collector configuration
    cvm_configs = {
        'regions': regions,
    }
    monitor_configs = {
        'regions': regions,
        'targets': [
            {
                'namespace': 'QCE/CVM',
                'metrics'  : ['*cpu*'],
            },
        ],
    }
    collectors = [
        tencentcloud_cvm.DataCollector(account, cvm_configs),
        tencentcloud_monitor.DataCollector(account, monitor_configs),
    ]

    # Start execution
    Runner(collectors).run()

Microsoft Azure

Example of collecting "VM Instance Objects" and "VM-related Monitoring Metrics":

Python
from guance_integration__runner import Runner
import guance_azure_vm__main as vm_main
import guance_azure_monitor__main as monitor_main

# Account configuration
account = {
    "client_id"     : "<Azure Client Id>",
    "client_secret" : "<Azure Client Secret>",
    "tenant_id"     : "<Azure Tenant Id>",
    "authority_area": "<Azure Area, Default global>",
    "extra_tags": {
        "account_name": "<Your Account Name>",
    }
}

subscriptions = "<Azure Subscriptions (Multiple needs to be separated by ',')>"
subscriptions = subscriptions.split(',')

# Collector configuration
collector_configs = {
    'subscriptions': subscriptions,
}

monitor_configs = {
    'targets': [
        {
            'namespace': 'Microsoft.Compute/virtualMachines',
            'metrics'  : [
                'CPU*'
            ],
        },
    ],
}

@DFF.API('Execute Microsoft Azure VM Resource Collection')
def run():
    collectors = [
        vm_main.DataCollector(account, collector_configs),
        monitor_main.DataCollector(account, monitor_configs),
    ]

    Runner(collectors).run()

Microsoft Azure account parameter hints:

client_id: Tenant ID
client_secret: Application Registration Client ID
tenant_id: Client secret value, note that it is not the ID
authority_area: Region, including global (Global area, overseas area), china (China area, Century互联) etc., optional parameter, default is global

For obtaining Client Id, Client Secret, and Tenant Id, refer to Azure documentation: Authenticating Python Applications Running on Premises to Azure Resources