Collector Configuration Manual for «Tencent Cloud - Cloud Monitor»
Before reading this article, please read the following first:
Before using this collector, you must install the «Integration Core Package» and its accompanying third-party dependency packages
To collect Tencent Cloud Cloud Monitor data, you must first configure custom object collectors for corresponding products
This collector supports multi-threading by default (five threads are enabled by default). If you need to change the thread pool size, set the environment variable COLLECTOR_THREAD_POOL_SIZE
1. Configuration Structure
The configuration structure of this collector is as follows:
Field | Type | Required | Description |
---|---|---|---|
Regions |
List | Required | List of cloud monitoring regions to be collected |
regions[#] |
str | Required | Region ID, such as: ap-shanghai Refer to appendix for the complete list |
targets |
list | Required | Cloud monitoring target configuration list Logical relationship between multiple configurations with the same namespace is «AND» |
targets[#].namespace |
str | Required | Namespace of cloud monitoring data to be collected. For example: QCE/CVM Refer to appendix for the complete list |
targets[#].metrics |
list | Required | List of cloud monitoring metric names to be collected Refer to appendix for the complete list |
targets[#].metrics[#] |
str | Required | Metric name pattern, supporting "NOT" and wildcard matchingNormally, the logical relationship between multiple patterns is «OR». When a "NOT" marker is included, the logical relationship becomes «AND». See details below |
2. Configuration Example
Specifying Specific Metrics
Collect two metrics named WanOuttraffic
and WanOutpkg
from QCE/CVM
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Wildcard Matching Metrics
Metric names can use *
wildcards for matching.
In this example, the following metrics will be collected:
- Metrics named
WanOutpkg
- Metrics whose names start with
Wan
- Metrics whose names end with
Outpkg
- Metrics whose names contain
Out
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Excluding Certain Metrics
Adding "NOT"
at the beginning indicates removing subsequent metrics.
In this example, the following metrics will NOT be collected:
- Metrics named
WanOutpkg
- Metrics whose names start with
Wan
- Metrics whose names end with
Outpkg
- Metrics whose names contain
Out
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Multiple Filters for Specified Metrics
The same namespace can be specified multiple times, filtering metrics sequentially from top to bottom.
In this example, it is equivalent to applying the following filtering steps to the metric names:
-
Select all metrics whose names contain
Out
-
In the results of the previous step, exclude metrics named
WanOutpkg
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Configuring Filters (Optional)
This collector script supports user-defined filters, allowing users to screen target resources based on object attributes. The filter function returns True or False.
- True: Target resources need to be collected.
- False: Target resources do not need to be collected.
Tencent Cloud Monitoring supports filtering properties consistent with the attribute data of objects such as cloud servers (CVM), cloud databases (CDB, Redis, MongoDB), load balancers (CLB), object storage (COS), etc. For more details, refer to the Tencent Cloud custom object collector documentation.
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
When configuring multiple filters under the same namespace, all filters must be satisfied simultaneously for data to be reported
3. Data Collection Instructions
Cloud Product Configuration Information
Product Name | Namespace (Namespace) | Dimension (Dimension) | Description |
---|---|---|---|
Cloud Servers | QCE/CVM |
InstanceId |
vm_uuid , vmUuid , uuid , InstanceId are uniformly recognized as object data's InstanceId |
Cloud Database Mysql | QCE/CDB |
InstanceId , InstanceType |
|
Object Storage Monitoring | QCE/COS |
BucketName |
|
Public Load Balancer Monitoring | QCE/LB_PUBLIC |
vip |
The Address field in object data is recognized as vip |
Private Load Balancer Monitoring | QCE/LB_PRIVATE |
vip , vpcId |
|
Cloud Database Redis | QCE/REDIS_MEM |
InstanceId |
Currently only supports Redis instance monitoring, does not support node monitoring |
Cloud Database MongoDB | QCE/CMONGO |
InstanceId |
Currently only supports MongoDB instance monitoring, does not support replica set or node monitoring |
Monitoring Metrics Configuration Information
Currently, the collector only supports collecting instance-level metrics. Users are advised to configure metrics according to the corresponding namespaces.
QCE/CVM
Metric English Name (MetricName) | Metric Chinese Name |
---|---|
WanInpkg | External network inbound packet count |
WanIntraffic | External network inbound bandwidth |
WanOutpkg | External network outbound packet count |
WanOuttraffic | External network outbound bandwidth |
AccOuttraffic | External network outbound traffic |
BaseCpuUsage | Basic CPU usage |
CpuLoadavg | CPU one-minute average load |
CPUUsage | CPU utilization |
Cpuloadavg5m | CPU five-minute average load |
Cpuloadavg15m | CPU fifteen-minute average load |
CvmDiskUsage | Disk utilization |
LanInpkg | Internal network inbound packet count |
LanOutpkg | Internal network outbound packet count |
LanIntraffic | Internal network inbound bandwidth |
LanOuttraffic | Internal network outbound bandwidth |
MemUsage | Memory utilization |
MemUsed | Memory used |
TcpCurrEstab | TCP connection count |
TimeOffset | UTC time difference between child machine and NTP time |
GpuMemTotal | Total GPU memory |
GpuMemUsage | GPU memory usage rate |
GpuMemUsed | GPU memory used quantity |
GpuPowDraw | GPU power consumption quantity |
GpuPowLimit | Total GPU power capacity |
GpuPowUsage | GPU power usage rate |
GpuTemp | GPU temperature |
GpuUtil | GPU usage rate |
QCE/CDB
Metric English Name (MetricName) | Metric Chinese Name |
---|---|
BytesReceived | Internal network inbound traffic |
BytesSent | Internal network outbound traffic |
Capacity | Disk space occupied |
ComCommit | Commit count |
ComDelete | Delete count |
ComInsert | Insert count |
ComReplace | Replace count |
ComRollback | Rollback count |
ComUpdate | Update count |
ConnectionUseRate | Connection utilization rate |
CpuUseRate | CPU utilization rate |
CreatedTmpDiskTables | Number of temporary disk tables |
CreatedTmpFiles | Number of temporary files |
CreatedTmpTables | Number of temporary memory tables |
HandlerCommit | Internal commit count |
HandlerReadRndNext | Next row read request count |
HandlerRollback | Internal rollback count |
InnodbBufferPoolPagesFree | Number of free InnoDB pages |
InnodbBufferPoolPagesTotal | Total number of InnoDB pages |
InnodbBufferPoolReadRequests | innodb buffer pool pre-read page count |
InnodbBufferPoolReads | innodb disk read page count |
InnodbCacheHitRate | innodb cache hit rate |
InnodbCacheUseRate | innodb cache usage rate |
InnodbDataReads | Total InnoDB read volume |
InnodbDataWrites | Total InnoDB write volume |
InnodbDataWritten | InnoDB write volume |
InnodbNumOpenFiles | Current number of InnoDB open tables |
InnodbOsFileReads | innodb disk reads count |
InnodbOsFileWrites | innodb disk writes count |
InnodbOsFsyncs | innodbfsync count |
InnodbRowLockTimeAvg | Average InnoDB row lock time (milliseconds) |
InnodbRowLockWaits | InnoDB row lock wait count |
InnodbRowsDeleted | InnoDB rows deleted count |
InnodbRowsInserted | InnoDB rows inserted count |
InnodbRowsRead | InnoDB rows read count |
InnodbRowsUpdated | InnoDB rows updated count |
IOPS | Input/output per second (or read/write count) |
KeyBlocksUnused | Number of unused blocks in key cache |
KeyBlocksUsed | Number of used blocks in key cache |
KeyCacheHitRate | myisam cache hit rate |
KeyCacheUseRate | myisam cache usage rate |
KeyReadRequests | Number of times data blocks are read from key cache |
KeyReads | Number of times data blocks are read from disk |
KeyWriteRequests | Number of times data blocks are written to key buffer |
KeyWrites | Number of times data blocks are written to disk |
LogCapacity | Log usage volume |
MasterSlaveSyncDistance | Master-slave delay distance |
MaxConnections | Maximum connections |
MemoryUseRate | Memory utilization rate |
MemoryUse | Memory occupied |
OpenFiles | Open file count |
OpenedTables | Number of already opened tables |
Qps | Operations executed per second |
Queries | Total access volume |
QueryRate | Access volume percentage |
RealCapacity | Disk usage space |
SecondsBehindMaster | Master-slave delay time |
SelectCount | Query count |
SelectScan | Full table scan count |
SlaveIoRunning | IO thread status |
SlaveSqlRunning | SQL thread status |
SlowQueries | Slow query count |
TableLocksImmediate | Number of immediately released table locks |
TableLocksWaited | Number of table lock waits |
ThreadsConnected | Current connection count |
ThreadsCreated | Number of created threads |
ThreadsRunning | Number of running threads |
Tps | Transactions executed per second |
VolumeRate | Disk utilization rate |
InnodbDataRead | InnoDB read volume |
QCE/COS
Metric English Name (MetricName) | Metric Chinese Name |
---|---|
StdReadRequests | Standard storage read requests |
StdRetrieval | Standard data read volume |
StdWriteRequests | Standard storage write requests |
IaRetrieval | Low-frequency data read volume |
IaWriteRequests | Low-frequency storage write requests |
IaReadRequests | Low-frequency storage read requests |
NlWriteRequests | Nl write requests |
NlRetrieval | Nl retrieval volume |
CdnOriginTraffic | CDN origin traffic |
InternetTraffic | External network downstream traffic |
InternalTraffic | Internal network downstream traffic |
InboundTraffic | Total external and internal upload traffic |
QCE/LB_PRIVATE
Metric English Name (MetricName) | Metric Chinese Name |
---|---|
ClientConnum | Active client-to-LB connections |
ClientInactiveConn | Inactive client-to-LB connections |
ClientConcurConn | Concurrent client-to-LB connections |
ClientNewConn | New client-to-LB connections |
ClientInpkg | Client-to-LB inbound packet count |
ClientOutpkg | Client-to-LB outbound packet count |
ClientAccIntraffic | Client-to-LB inbound traffic |
ClientAccOuttraffic | Client-to-LB outbound traffic |
ClientOuttraffic | Client-to-LB outbound bandwidth |
ClientIntraffic | Client-to-LB inbound bandwidth |
DropTotalConns | Dropped connections |
InDropBits | Dropped inbound bandwidth |
OutDropBits | Dropped outbound bandwidth |
InDropPkts | Dropped inbound packets |
OutDropPkts | Dropped outbound packets |
IntrafficVipRatio | Inbound bandwidth utilization rate |
OuttrafficVipRatio | Outbound bandwidth utilization rate |
UnhealthRsCount | Health check anomaly count |
QCE/LB_PUBLIC
Metric English Name (MetricName) | Metric Chinese Name |
---|---|
ClientConnum | Active client-to-LB connections |
ClientInactiveConn | Inactive client-to-LB connections |
ClientConcurConn | Concurrent client-to-LB connections |
ClientNewConn | New client-to-LB connections |
ClientInpkg | Client-to-LB inbound packet count |
ClientOutpkg | Client-to-LB outbound packet count |
ClientAccIntraffic | Client-to-LB inbound traffic |
ClientAccOuttraffic | Client-to-LB outbound traffic |
ClientIntraffic | Client-to-LB inbound bandwidth |
ClientOuttraffic | Client-to-LB outbound bandwidth |
DropTotalConns | Dropped connections |
IntrafficVipRatio | Public network inbound bandwidth utilization rate (may not have this metric) |
InDropBits | Dropped inbound bandwidth |
InDropPkts | Dropped inbound packets |
OuttrafficVipRatio | Public network outbound bandwidth utilization rate (may not have this metric) |
OutDropBits | Dropped outbound bandwidth |
OutDropPkts | Dropped outbound packets |
UnhealthRsCount | Health check anomaly count |
QCE/REDIS_MEM
Metric English Name (MetricName) | Metric Chinese Name |
---|---|
CpuUtil | CPU utilization rate |
CpuMaxUtil | Node maximum CPU utilization rate |
MemUsed | Memory used amount |
MemUtil | Memory utilization rate |
MemMaxUtil | Node maximum memory utilization rate |
Keys | Total number of keys |
Expired | Expired keys count |
Evicted | Evicted keys count |
Connections | Connection count |
ConnectionsUtil | Connection utilization rate |
InFlow | Inbound flow |
InBandwidthUtil | Inbound flow utilization rate |
InFlowLimit | Inbound flow throttling triggered |
OutFlow | Outbound flow |
OutBandwidthUtil | Outbound flow utilization rate |
OutFlowLimit | Outbound flow throttling triggered |
LatencyAvg | Average execution latency |
LatencyMax | Maximum execution latency |
LatencyRead | Read average latency |
LatencyWrite | Write average latency |
LatencyOther | Other commands average latency |
Commands | Total requests |
CmdRead | Read requests |
CmdWrite | Write requests |
CmdOther | Other requests |
CmdBigValue | Large Value requests |
CmdKeyCount | Key request count |
CmdMget | Mget request count |
CmdSlow | Slow queries |
CmdHits | Read request hits |
CmdMiss | Read request misses |
CmdErr | Execution errors |
CmdHitsRatio | Read request hit rate |
QCE/CMONGO
Metric English Name (MetricName) | Metric Chinese Name |
---|---|
Reads | Number of read requests |
Updates | Number of update requests |
Deletes | Number of delete requests |
Counts | Number of count requests |
Success | Number of successful requests |
Commands | Number of command requests |
Qps | Requests per second |
Delay10 | Number of requests with latency between 10 - 50 milliseconds |
Delay50 | Number of requests with latency between 50 - 100 milliseconds |
Delay100 | Number of requests with latency over 100 milliseconds |
ClusterConn | Cluster connection count |
Connper | Connection utilization rate |
ClusterDiskusage | Disk utilization rate |
4. Data Reporting Format
After the data is synchronized normally, you can view the data in the «Metrics» section of TrueWatch.
For example, consider the following collector configuration:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
An example of the reported data is as follows:
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
All metric values are reported as float type
This collector collects data for the WanOutpkg metric under the QCE/CVM namespace (Namespace). See Data Collection Instructions table for details.
5. Linkage with Custom Object Collectors
When other custom object collectors (such as CVM) are running in the same DataFlux Func, this collector supplements fields based on the dimension information described in Data Collection Instructions. For example, the InstanceId
field returned by cloud monitoring data attempts to match the tags.name
field in custom objects.
Since custom object information needs to be known first to enable linkage in the cloud monitoring collector, it is generally recommended to place the cloud monitoring collector at the end of the list, such as:
Python | |
---|---|
1 2 3 4 5 |
|
Upon successful matching, additional fields from the matched custom object tags are added to the cloud monitoring data tags, enabling effects such as filtering cloud monitoring metric data using instance names. The specific effect is as follows:
Assume the original cloud monitoring data collected is as follows:
JSON | |
---|---|
1 2 3 4 5 6 7 |
|
At the same time, the custom object data collected by the Tencent Cloud CVM collector is as follows:
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
Then, the final cloud monitoring data reported is as follows:
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
6. Explanation of Cloud Monitor API Call Limits
Tencent Cloud Cloud Monitor imposes free quota limits on some API call counts (this collector uses GetMonitorData API to request monitoring data, which falls under the limited-free quota API. Each main account has a free request quota of 1 million calls/month. Excess calls are charged at 0.25 RMB/10,000 calls. Additionally, exceeding the free quota will prevent further usage unless "API Pay-as-you-go" is manually activated.) Below is a detailed explanation of the script set call counts:
1. Determining whether users exceed the free quota when they have multiple resources and need to collect various monitoring items:
This collector uses GetMonitorData (querying the latest monitoring data for specified monitoring items) to obtain multiple (up to 10, with pagination for excess) resources for a single monitoring item. Examples of request counts:
- An account with 10 cvm resources to collect CpuUsage requires 1 request;
- An account with 10 cvm resources to collect CpuUsage and BaseCpuUsage requires 2 requests (one request per monitoring item);
- An account with 11 cvm resources to collect CpuUsage requires 2 requests (pagination for resources exceeding 10);
- An account with 11 cvm resources to collect CpuUsage and BaseCpuUsage requires 4 requests;
2. Finding the actual call count by viewing task execution logs:
The collector records the number of API calls made during each task execution, which can be viewed in the logs, for example:
Bash | |
---|---|
1 2 3 4 |
|
Given that there is a free quota for cloud monitoring calls, it is recommended that users configure monitoring items as needed to avoid unnecessary costs caused by wildcard matching
Precautions
Troubleshooting Errors During Task Triggering and Solutions
HTTPClientError: An HTTP Client raised an unhandled exception: SoftTimeLimitExceeded()
Cause: Task execution timeout due to excessive execution time.
Solution:
-
Appropriately increase the task's timeout setting (e.g.,
@DFF.API('Perform Collection', timeout=120, fixed_crontab="* * * * *")
, indicating setting the task's timeout to 120 seconds). -
[TencentCloudSDKException] code:InvalidParameterValue message:cannot find metricName=xxx configure
Cause: Tencent Cloud does not support the collection of this metric (there may be cases where the metric exists in Tencent Cloud documentation but is actually unsupported).
Solution:
-
It is recommended to refer to the Monitoring Metrics Configuration Information in this article and configure valid metric names.
-
[TencentCloudSDKException] code:InvalidParameterValue message: xxxxx does not belong to the developer ....
Cause: While collecting cloud monitoring data for a certain product under a specific account, the product has been released, causing the interface to throw an error, which can be ignored.
X. Appendix
Tencent Cloud Cloud Monitor
Refer to the official Tencent Cloud documentation: