Custom metrics

Custom metrics let you capture application-specific data or client-side system data. The built-in metrics collected by Cloud Monitoring can give you information on backend latency or disk usage, but they can't tell you, for example, how many background routines your application spawned. You can also create metrics that are based on the content of log entries. For information about those types of custom metrics, see Log-based metrics overview.

Custom metrics, also known as application-specific metrics, let you define and collect information the built-in Cloud Monitoring metrics cannot. You capture such metrics by using an API provided by a library to instrument your code, and then you send the metrics to a backend application like Cloud Monitoring.

You can create custom metrics by using the Cloud Monitoring API directly. However, we recommend that you use OpenCensus. For information about how to create custom metrics, see the following documents:

  • Create custom metrics with OpenCensus describes how to use OpenCensus, an open source monitoring and tracing library. This library lets you create custom metrics, add metric data to those metrics, and export the metric data to Cloud Monitoring.

  • Create custom metrics with the API describes how to create custom metrics by using the Cloud Monitoring API and how to add metric data to those metrics. This document illustrates how to use the Monitoring API with examples using the APIs Explorer, C#, Go, Java, Node.js, PHP, Python, and Ruby programming languages.

As far as Cloud Monitoring is concerned, you can use custom metrics like the built-in metrics. You can chart them, set alerts on them, read them, and otherwise monitor them. For information about reading metric data, see the following documents:

  • Browsing metric and resource types explains how to list and examine your custom and built-in metric types. For example, you can use the information in this document to list all custom metric descriptors in your project.
  • Reading metric data explains how to retrieve time series data from custom and built-in metrics using the Monitoring API. For example, this document describes how you can use the API to get the CPU utilization for virtual machine (VM) instances in your Google Cloud project.

The Google Cloud console provides a dedicated page to help you view your usage of custom metrics. For information about the contents of this page, see View metric diagnostics.

Metric descriptors for custom metrics

Each metric type must have a metric descriptor that defines how the metric data is organized. The metric descriptor also defines the labels for the metric and the name of the metric. For example, the metrics lists show the metric descriptors for all built-in metric types.

When you use custom metrics, Cloud Monitoring can create the metric descriptor for you, by using the metric data you write. Alternatively, you can explicitly create the metric descriptor, and then write metric data. In either case, you must decide how you want to organize your metric data.

Design example

Suppose you have a program that runs on a single machine, and that this program calls auxiliary programs A and B. You want to count how often programs A and B are called. You also want to know when program A is called more than 10 times per minute and when program B is called more than 5 times per minute. Lastly, assume that you have a single Google Cloud project and you plan to write the data against the global monitored resource.

This example describes a few different designs that you could use for your custom metrics:

  • You use two custom metrics: Metric-type-A counts calls to program A and Metric-type-B counts calls to program B. In this case, Metric-type-A contains 1 time series, and Metric-type-B contains 1 time series.

    You can create a single alerting policy with two conditions, or you can create two alerting policies each with one condition with this data mode. An alerting policy can support multiple conditions, but it has a single configuration for the notification channels.

    This model might be appropriate when you aren't interested in similarities in the data between the activities being monitored. In this example, the activities are the rate of calls to programs A and B.

  • You use a single custom metric and use a label to store a program identifier. For example, the label might store the value A or B. Monitoring creates a time series for each unique combination of labels. Therefore, there is a time series whose label value is A and another time series whose label value is B.

    As with the previous model, you can create a single alerting policy or two alerting policies. However, the conditions for the alerting policy are more complicated. A condition that generates an incident when the rate of calls for program A exceeds a threshold must use a filter that includes only data points whose label value is A.

    One advantage of this model is that it is simple to compute ratios. For example, you can determine how much of the total is due to calls to A.

  • You use a single custom metric to count the number of calls, but you don't use a label to record which program was called. In this model, there is a single time series that combines the data for the two programs. However, you can't create an alerting policy that meets your objectives because the data for two programs can't be separated.

The first two designs let you meet your data analysis requirements; however, the last design doesn't.

For information about creating metric descriptors, see Create metric descriptors.

Names of custom metrics

When you create a custom metric, you define a string identifier that represents the metric type. This string must be unique among the custom metrics in your Google Cloud project and it must use a prefix that marks the metric as a user-defined metric. For Monitoring, the allowable prefixes are custom.googleapis.com/, external.googleapis.com/user, and external.googleapis.com/prometheus. The prefix is followed by a name that describes what you are collecting. For details on the recommended way to name a custom metric, see Metric naming conventions. Here are examples of the two kinds of identifiers for metric types:

    custom.googleapis.com/cpu_utilization
    custom.googleapis.com/instance/cpu/utilization

In the previous example, the prefix custom.googleapis.com indicates that both metrics are custom metrics. Both examples are for metrics that measure the CPU utilization; however, they use different organizational models. When you anticipate having a large number of custom metrics, we recommend that you use a hierarchical naming structure like that used by the second example.

All metric types have globally unique identifiers called resource names. The structure of a resource name for a metric type is:

projects/PROJECT_ID/metricDescriptors/METRIC_TYPE

where METRIC_TYPE is the string identifier of the metric type. If the previous metric examples are created in project my-project-id, then their resource names for these metrics would be the following:

    projects/my-project-id/metricDescriptors/custom.googleapis.com/cpu_utilization
    projects/my-project-id/metricDescriptors/custom.googleapis.com/instance/cpu/utilization

Name or type? In the metric descriptor, the name field stores the metric type's resource name and the type field stores the METRIC_TYPE string.

Monitored-resource types for custom metrics

When you write your data to a time series, you must indicate where the data came from. To specify the source of the data, you choose a monitored-resource type that represents where your data comes from, and then use that to describe the specific origin. The monitored resource isn't part of the metric type. Instead, the time series to which you write data includes a reference to the metric type and a reference to the monitored resource. The metric type describes the data while the monitored resource describes where the data originated.

Consider the monitored resource before creating your metric descriptor. The monitored-resource type you use affects which labels you need to include in the metric descriptor. For example, the Compute Engine VM resource contains labels for the project Id, the instance Id, and the instance zone. Therefore, if you plan to write you custom metric against a Compute Engine VM resource, then the resource labels include the instance Id so you don't need a label for the instance Id in the metric descriptor.

Each of your metric's data points must be associated with a monitored resource object. Points from different monitored-resource objects are held in different time series.

You must use one of the following monitored resource types with custom metrics:

A common practice is to use the monitored resource objects that represent the physical resources where your application code is running. This approach has several advantages:

  • You get better performance compared with using a single resource type.
  • You avoid out-of-order data caused by multiple processes writing to the same time series.
  • You can group your custom-metric data with other metric data from the same resources.

global and generic resources

The generic_task and generic_node resource types are useful in situations where none of the more specific resource types are appropriate. The generic_task type is useful for defining task-like resources such as applications. The generic_node type is useful for defining node-like resources such as virtual machines. Both generic_* types have several common labels you can use to define unique resource objects, making it easy to use them in metric filters for aggregations and reductions.

In contrast, the global resource type has only project_id and location labels. When you have many sources of metrics within a project, using the same global resource object can cause collisions and over-writes of your metric data.

API methods that support custom metrics

The following table shows which methods in the Monitoring API support custom metrics and which methods support built-in metrics:

Monitoring API method Use with
custom metrics
Use with
built-in metrics
monitoredResourceDescriptors.get yes yes
monitoredResourceDescriptors.list yes yes
metricDescriptors.get yes yes
metricDescriptors.list yes yes
timeSeries.list yes yes
timeSeries.create yes
metricDescriptors.create yes
metricDescriptors.delete yes

Limits and latencies

For limits related to custom metrics and data retention, see Quotas and limits.

To keep your metric data beyond the retention period, you must manually copy the data to another location, such as Cloud Storage or BigQuery.

For information about latencies associated with writing data to custom metrics, see Latency of metric data.

What's next