This guide explains how you can configure the Monitoring agent to recognize and export your application metrics to Cloud Monitoring.
The Monitoring agent is a collectd daemon. In addition to exporting many predefined system and third-party metrics to Cloud Monitoring, the agent can export your own collectd application metrics to Monitoring as user-defined metrics. Your collectd plugins can also export to Monitoring.
An alternative way to export application metrics to Monitoring is to use StatsD. Cloud Monitoring provides a default configuration that maps StatsD metrics to user-defined metrics. If you are satisfied with that mapping, then you don't need the customization steps described below. For more information, see the StatsD plugin.
For more information about metrics, see the following documents:
This functionality is only available for agents running on Linux. It is not available on Windows.
Before you begin
Install the most recent Monitoring agent on a VM instance and verify it is working. To update your agent, see Updating the agent.
Configure collectd to get monitoring data from your application. Collectd supports many application frameworks and standard monitoring endpoints through its read plugins. Find a read plugin that works for you.
(Optional) As a convenience, add the agent's collectd reference documentation to your system's
man
pages by updating theMANPATH
variable and then runningmandb
:export MANPATH="$MANPATH:/opt/stackdriver/collectd/share/man" sudo mandb
The man pages are for
stackdriver-collectd
.
Important files and directories
The following files and directories, created by installing the agent, are relevant to using the Monitoring agent (collectd):
/etc/stackdriver/collectd.conf
The collectd configuration file used by the agent. Edit this file to change general configuration.
/etc/stackdriver/collectd.d/
The directory for user-added configuration files. To send user-defined metrics from the agent, you place the required configuration files, discussed below, in this directory. For backward compatibility, the agent also looks for files in
/opt/stackdriver/collectd/etc/collectd.d/
./opt/stackdriver/collectd/share/man/*
The documentation for the agent's version of collectd. You can add these pages to your system's set of
man
pages; see Before you begin for details./etc/init.d/stackdriver-agent
The init script for the agent.
How Monitoring handles collectd metrics
As background, the Monitoring agent processes collectd metrics and sends them to Monitoring, which treats each metric as a member of one of the following categories:
User-defined metrics. Collectd metrics that have the metadata key
stackdriver_metric_type
and a single data source are handled as user-defined metrics and sent to Monitoring using theprojects.timeSeries.create
method in the Monitoring API.Curated metrics. All other collectd metrics are sent to Monitoring using an internal API. Only the metrics in the list of curated metrics are accepted and processed.
Discarded metrics. Collectd metrics that aren't in the curated metrics list and aren't user-defined metrics are silently discarded by Monitoring. The agent itself isn't aware of which metrics are accepted or discarded.
Write user-defined metrics with the agent
You configure the agent to send metric data points to Monitoring. Each point must be associated with a user-defined metric, which you define with a metric descriptor. These concepts are introduced in Metrics, time series, and resources and described in detail at Structure of time series and User-defined metrics overview.
You can have a collectd metric treated as a user-defined metric by adding the proper metadata to the metric:
stackdriver_metric_type
: (required) the name of the exported metric. Example:custom.googleapis.com/my_custom_metric
.label:[LABEL]
: (optional) additional labels for the exported metric. For example, if you want a Monitoring STRING label namedcolor
, then your metadata key would belabel:color
and the value of the key could be"blue"
. You can have up to 10 labels per metric type.
You can use a collectd filter chain to modify the metadata for your metrics. Because filter chains can't modify the list of data sources and user-defined metrics only support a single data source, any collectd metrics that you want to use with this facility must have a single data source.
Example
In this example we will monitor active Nginx connections from two Nginx
services, my_service_a
and my_service_b
. We will send these to
Monitoring using a user-defined metric.
We will take the following steps:
Identify the collectd metrics for each Nginx service.
Define a Monitoring metric descriptor.
Configure a collectd filter chain to add metadata to the collectd metrics, to meet the expectations of the Monitoring agent.
Incoming collectd metrics
Collectd expects metrics to consist of the following components. The first five components make up the collectd identifier for the metric:
Host, Plugin, Plugin-instance, Type, Type-instance, [value]
In this example, the metrics you want to send as a user-defined metric have the following values:
Component | Expected value(s) |
---|---|
Host | any |
Plugin | curl_json |
Plugin instance | nginx_my_service_a ornginx_my_service_b 1 |
Type | gauge |
Type instance | active-connections |
[value] |
any value2 |
Notes:
1 In the example, this value encodes both the application (Nginx) and
the connected service name.
2 The value is typically a timestamp and double-precision number.
Monitoring handles the details of interpreting the various kinds
of values. Compound values aren't supported by the
Monitoring agent.
Monitoring metric descriptor and time series
On the Monitoring side, design a metric descriptor for your user-defined metric. The following descriptor is a reasonable choice for the data in this example:
- Name:
custom.googleapis.com/nginx/active_connections
- Labels:
service_name
(STRING): The name of the service connected to Nginx.
- Kind: GAUGE
- Type: DOUBLE
After you've designed the metric descriptor, you can create it by using
projects.metricDescriptors.create
,
or you can let it be created for you
from the time series metadata, discussed below. For more information,
see Creating metric descriptors on this page.
The time series data for this metric descriptor must contain the following information, because of the way the metric descriptor is defined:
- Metric type:
custom.googleapis.com/nginx/active_connections
- Metric label values:
service_name
: either"my_service_a"
or"my_service_b"
Other time series information, including the associated monitored resource—the VM instance sending the data—and the metric's data point, is automatically obtained by the agent for all metrics. You don't have to do anything special.
Your filter chain
Create a file, /opt/stackdriver/collectd/etc/collectd.d/nginx_curl_json.conf
,
containing the following code:
LoadPlugin match_regex
LoadPlugin target_set
LoadPlugin target_replace
# Insert a new rule in the default "PreCache" chain, to divert your metrics.
PreCacheChain "PreCache"
<Chain "PreCache">
<Rule "jump_to_custom_metrics_from_curl_json">
# If the plugin name and instance match, this is PROBABLY a metric we're looking for:
<Match regex>
Plugin "^curl_json$"
PluginInstance "^nginx_"
</Match>
<Target "jump">
# Go execute the following chain; then come back.
Chain "PreCache_curl_json"
</Target>
</Rule>
# Continue processing metrics in the default "PreCache" chain.
</Chain>
# Following is a NEW filter chain, just for your metric.
# It is only executed if the default chain "jumps" here.
<Chain "PreCache_curl_json">
# The following rule does all the work for your metric:
<Rule "rewrite_curl_json_my_special_metric">
# Do a careful match for just your metrics; if it fails, drop down
# to the next rule:
<Match regex>
Plugin "^curl_json$" # Match on plugin.
PluginInstance "^nginx_my_service_.*$" # Match on plugin instance.
Type "^gauge$" # Match on type.
TypeInstance "^active-connections$" # Match on type instance.
</Match>
<Target "set">
# Specify the metric descriptor type:
MetaData "stackdriver_metric_type" "custom.googleapis.com/nginx/active_connections"
# Specify a value for the "service_name" label; clean it up in the next Target:
MetaData "label:service_name" "%{plugin_instance}"
</Target>
<Target "replace">
# Remove the "nginx_" prefix in the service_name to get the real service name:
MetaData "label:service_name" "nginx_" ""
</Target>
</Rule>
# The following rule is run after rewriting your metric, or
# if the metric wasn't one of your user-defined metrics. The rule returns
# to the default "PreCache" chain. The default processing
# will write all metrics to Cloud Monitoring,
# which will drop any unrecognized metrics: ones that aren't
# in the list of curated metrics and don't have
# the user-defined metric metadata.
<Rule "go_back">
Target "return"
</Rule>
</Chain>
Load the new configuration
Restart your agent to pick up the new configuration by executing the following command on your VM instance:
sudo service stackdriver-agent restart
Your user-defined metric information begins to flow into Monitoring.
Reference and best practices
Metric descriptors and time series
For an introduction to Cloud Monitoring metrics, see Metrics, time series, and resources. More details are available in User-defined metrics overview and Structure of time series.
Metric descriptors. A metric descriptor has the following significant pieces:
A type of the form
custom.googleapis.com/[NAME1]/.../[NAME0]
. For example:custom.googleapis.com/my_measurement custom.googleapis.com/instance/network/received_packets_count custom.googleapis.com/instance/network/sent_packets_count
The recommended naming is hierarchical to make the metrics easier for people to keep track of. Metric types can't contain hyphens; for the exact naming rules, see Naming metric types and labels.
Up to 10 labels to annotate the metric data, such as
device_name
,fault_type
, orresponse_code
. The values of the labels aren't specified in the metric descriptor.The kind and value type of the data points, such as "a gauge value of type double". For more information, see
MetricKind
andValueType
.
Time series. A metric data point has the following significant pieces:
The type of the associated metric descriptor.
Values for all of the metric descriptor's labels.
A timestamped value consistent with the metric descriptor's value type and kind.
The monitored resource the data came from, typically a VM instance. Space for the resource is built in, so the descriptor doesn't need a separate label for it.
Creating metric descriptors
You don't have to create a metric descriptor ahead of time. When a data point arrives in Monitoring, the point's metric type, labels, and the point's value can be used to automatically create a gauge or cumulative metric descriptor. For more information, see Auto-creation of metric descriptors.
However, there are advantages to creating your own metric descriptor:
You can include some thoughtful documentation for the metric and its labels.
You can specify additional kinds and types of metrics. The only (kind, type) combinations supported by the agent are (GAUGE, DOUBLE) and (CUMULATIVE, INT64). For more information, see Metric kinds and value types.
You can specify label types other than STRING.
If you write a data point to Monitoring that uses a metric type that isn't defined, then a new metric descriptor is created for the data point. This behavior can be a problem when you are debugging the code that writes metric data—misspelling the metric type results in spurious metric descriptors.
After you create a metric descriptor, or after it is created for you, it cannot be changed. For example, you can't add or remove labels. You can only delete the metric descriptor—which deletes all its data—and then recreate the descriptor the way you want.
For more details about creating metric descriptors, see Creating your metric.
Pricing
In general, Cloud Monitoring system metrics are free, and metrics from external systems, agents, or applications are not. Billable metrics are billed by either the number of bytes or the number of samples ingested.
For more information about Cloud Monitoring pricing, see the following documents:
Limits
Cloud Monitoring has limits on the number of metric time series and the number of user-defined metric descriptors in each project. For details, see Quotas and limits.
If you discover that you have created metric descriptors you no longer want, you
can find and delete the descriptors using the Monitoring API. For more
information, see projects.metricDescriptors
.
Troubleshooting
This section explains how to configure the Monitoring agent's
write_log
plugin to dump out the full set of metric points, including
metadata. This can be used to determine what points need to be transformed, as
well as to ensure your transformations behave as expected.
Enabling write_log
The write_log
plugin is included in the stackdriver-agent
package. To enable
the plugin:
As root, edit the following configuration file:
/etc/stackdriver/collectd.conf
Right after
LoadPlugin write_gcm
, add:LoadPlugin write_log
Right after
<Plugin "write_gcm">…</Plugin>
, add:<Plugin "write_log"> Format JSON </Plugin>
Search for
<Target "write">…</Target>
and after everyPlugin "write_gcm"
, add:Plugin "write_log"
Save your changes and restart the agent:
sudo service stackdriver-agent restart
These changes will print one log line per metric value reported, including the full collectd identifier, the metadata entries, and the value.
Output of write_log
If you were successful in the previous step, you should see the output of
write_log
in the system logs:
- Debian-based Linux:
/var/log/syslog
- Red Hat-based Linux:
/var/log/messages
The sample lines below have been formatted to make them easier to read in this document.
Dec 8 15:13:45 test-write-log collectd[1061]: write_log values:#012[{
"values":[1933524992], "dstypes":["gauge"], "dsnames":["value"],
"time":1481210025.252, "interval":60.000,
"host":"test-write-log.c.test-write-log.internal",
"plugin":"df", "plugin_instance":"udev", "type":"df_complex", "type_instance":"free"}]
Dec 8 15:13:45 test-write-log collectd[1061]: write_log values:#012[{
"values":[0], "dstypes":["gauge"], "dsnames":["value"],
"time":1481210025.252, "interval":60.000,
"host":"test-write-log.c.test-write-log.internal",
"plugin":"df", "plugin_instance":"udev", "type":"df_complex", "type_instance":"reserved"}]