Custom metrics let you capture application-specific data or client-side system data. The built-in metrics collected by Cloud Monitoring can give you information on backend latency or disk usage, but they can't tell you, for example, how many background routines your application spawned. You can also create metrics that are based on the content of log entries. For information about those types of custom metrics, see Log-based metrics overview.
Custom metrics, also known as application-specific metrics, let you define and collect information the built-in Cloud Monitoring metrics cannot. You capture such metrics by using an API provided by a library to instrument your code, and then you send the metrics to a backend application like Cloud Monitoring.
You can create custom metrics by using the Cloud Monitoring API directly. However, we recommend that you use OpenCensus. For information about how to create custom metrics, see the following documents:
Create custom metrics with OpenCensus describes how to use OpenCensus, an open source monitoring and tracing library. This library lets you create custom metrics, add metric data to those metrics, and export the metric data to Cloud Monitoring.
Create custom metrics with the API describes how to create custom metrics by using the Cloud Monitoring API and how to add metric data to those metrics. This document illustrates how to use the Monitoring API with examples using the APIs Explorer, C#, Go, Java, Node.js, PHP, Python, and Ruby programming languages.
As far as Cloud Monitoring is concerned, you can use custom metrics like the built-in metrics. You can chart them, set alerts on them, read them, and otherwise monitor them. For information about reading metric data, see the following documents:
- Browsing metric and resource types explains how to list and examine your custom and built-in metric types. For example, you can use the information in this document to list all custom metric descriptors in your project.
- Reading metric data explains how to retrieve time series data from custom and built-in metrics using the Monitoring API. For example, this document describes how you can use the API to get the CPU utilization for virtual machine (VM) instances in your Google Cloud project.
The Google Cloud console provides a dedicated page to help you view your usage of custom metrics. For information about the contents of this page, see View metric diagnostics.
Metric descriptors for custom metrics
Each metric type must have a metric descriptor that defines how the metric data is organized. The metric descriptor also defines the labels for the metric and the name of the metric. For example, the metrics lists show the metric descriptors for all built-in metric types.
When you use custom metrics, Cloud Monitoring can create the metric descriptor for you, by using the metric data you write. Alternatively, you can explicitly create the metric descriptor, and then write metric data. In either case, you must decide how you want to organize your metric data.
Suppose you have a program that runs on a single machine,
and that this program calls auxiliary programs
B. You want to count
how often programs
B are called. You also want to know when
A is called more than 10 times per minute and when program
called more than 5 times per minute. Lastly, assume that you have
a single Google Cloud project and you plan to write the data
global monitored resource.
This example describes a few different designs that you could use for your custom metrics:
You use two custom metrics:
Metric-type-Acounts calls to program
Metric-type-Bcounts calls to program
B. In this case,
Metric-type-Acontains 1 time series, and
Metric-type-Bcontains 1 time series.
You can create a single alerting policy with two conditions, or you can create two alerting policies each with one condition with this data mode. An alerting policy can support multiple conditions, but it has a single configuration for the notification channels.
This model might be appropriate when you aren't interested in similarities in the data between the activities being monitored. In this example, the activities are the rate of calls to programs
You use a single custom metric and use a label to store a program identifier. For example, the label might store the value
B. Monitoring creates a time series for each unique combination of labels. Therefore, there is a time series whose label value is
Aand another time series whose label value is
As with the previous model, you can create a single alerting policy or two alerting policies. However, the conditions for the alerting policy are more complicated. A condition that generates an incident when the rate of calls for program
Aexceeds a threshold must use a filter that includes only data points whose label value is
One advantage of this model is that it is simple to compute ratios. For example, you can determine how much of the total is due to calls to
You use a single custom metric to count the number of calls, but you don't use a label to record which program was called. In this model, there is a single time series that combines the data for the two programs. However, you can't create an alerting policy that meets your objectives because the data for two programs can't be separated.
The first two designs let you meet your data analysis requirements; however, the last design doesn't.
For information about creating metric descriptors, see Create metric descriptors.
Names of custom metrics
When you create a custom metric, you define a string identifier that represents
the metric type. This string must be unique among the custom metrics in your
Google Cloud project and it must use a prefix that marks the metric as a
user-defined metric. For Monitoring, the allowable prefixes are
external.googleapis.com/prometheus. The prefix is followed by a name that
describes what you are collecting.
For details on the recommended way to name a custom metric, see
Metric naming conventions.
Here are examples of the two kinds of identifiers for metric types:
In the previous example, the prefix
custom.googleapis.com indicates that both
metrics are custom metrics. Both examples are for metrics that measure the
CPU utilization; however, they use different organizational models. When you
anticipate having a large number of custom metrics, we recommend that you
use a hierarchical naming structure like that used by the second example.
All metric types have globally unique identifiers called resource names. The structure of a resource name for a metric type is:
METRIC_TYPE is the string identifier of the metric type.
If the previous metric examples are created in project
then their resource names for these metrics would be the following:
Name or type? In the metric descriptor, the
name field stores the
metric type's resource name and the
type field stores the
Monitored-resource types for custom metrics
When you write your data to a time series, you must indicate where the data came from. To specify the source of the data, you choose a monitored-resource type that represents where your data comes from, and then use that to describe the specific origin. The monitored resource isn't part of the metric type. Instead, the time series to which you write data includes a reference to the metric type and a reference to the monitored resource. The metric type describes the data while the monitored resource describes where the data originated.
Consider the monitored resource before creating your metric descriptor. The monitored-resource type you use affects which labels you need to include in the metric descriptor. For example, the Compute Engine VM resource contains labels for the project Id, the instance Id, and the instance zone. Therefore, if you plan to write you custom metric against a Compute Engine VM resource, then the resource labels include the instance Id so you don't need a label for the instance Id in the metric descriptor.
Each of your metric's data points must be associated with a monitored resource object. Points from different monitored-resource objects are held in different time series.
You must use one of the following monitored resource types with custom metrics:
aws_ec2_instance: Amazon EC2 instance.
dataflow_job: Dataflow job.
gae_instance: App Engine instance.
gce_instance: Compute Engine instance.
generic_node: User-specified computing node.
generic_task: User-defined task.
gke_container: GKE container instance.
global: Use this resource when no other resource type is suitable. For most use cases,
generic_taskare better choices than
k8s_cluster: Kubernetes cluster.
k8s_container: Kubernetes container.
k8s_node: Kubernetes node.
k8s_pod: Kubernetes pod.
A common practice is to use the monitored resource objects that represent the physical resources where your application code is running. This approach has several advantages:
- You get better performance compared with using a single resource type.
- You avoid out-of-order data caused by multiple processes writing to the same time series.
- You can group your custom-metric data with other metric data from the same resources.
global and generic resources
generic_node resource types are useful in situations
where none of the more specific resource types are appropriate.
generic_task type is useful for defining task-like resources such as
generic_node type is useful for defining node-like
resources such as virtual machines. Both
have several common labels you can use to define unique resource objects,
making it easy to use them in metric filters for aggregations and reductions.
In contrast, the
global resource type has only
labels. When you have many sources of metrics within a project, using the
global resource object can cause
collisions and over-writes of your metric data.
API methods that support custom metrics
The following table shows which methods in the Monitoring API support custom metrics and which methods support built-in metrics:
|Monitoring API method||Use with
Limits and latencies
For limits related to custom metrics and data retention, see Quotas and limits.
To keep your metric data beyond the retention period, you must manually copy the data to another location, such as Cloud Storage or BigQuery.
For information about latencies associated with writing data to custom metrics, see Latency of metric data.
- Create custom metrics with OpenCensus
- Create custom metrics with the API
- Introduction to the Cloud Monitoring API
- Metrics, time series, and resources