User-defined metrics are all metrics that aren't defined by Google Cloud. These include metrics that you might define, and they include metrics that a third-party application defines. User-defined metrics let you capture application-specific data or client-side system data. The built-in metrics collected by Cloud Monitoring can give you information on backend latency or disk usage, but they can't tell you, for example, how many background routines your application spawned.
You can also create metrics that are based on the content of log entries. Log-based metrics are a class of user-defined metric, but you must create them from Cloud Logging. For more information about log-based metrics, see Log-based metrics overview.
User-defined metrics are sometimes called custom metrics or application-specific metrics. These metrics let you, or a third-party application, define and collect information the built-in Cloud Monitoring metrics cannot. You capture such metrics by using an API provided by a library to instrument your code, and then you send the metrics to a backend application like Cloud Monitoring.
You can create user-defined metrics, except log-based metrics, by using the Cloud Monitoring API directly. However, we recommend that you use OpenTelemetry. For information about how to create user-defined metrics, see the following documents:
Collect OTLP metrics and traces describes how to use the Ops Agent and the agent's OpenTelemetry Protocol (OTLP) receiver to collect metrics and traces from applications instrumented by using OpenTelemetry and running on Compute Engine.
Google Cloud Managed Service for Prometheus describes how to collect Prometheus metrics from applications running on Google Kubernetes Engine and Kubernetes.
Collect Prometheus metrics describes how to use the Ops Agent to collect Prometheus metrics from applications running on Compute Engine.
Create user-defined metrics with the API describes how to create metrics by using the Cloud Monitoring API and how to add metric data to those metrics. This document illustrates how to use the Monitoring API with examples using the APIs Explorer, C#, Go, Java, Node.js, PHP, Python, and Ruby programming languages.
Create custom metrics on Cloud Run shows how to use the OpenTelemetry Collector as a sidecar agent in Cloud Run deployments.
As far as Cloud Monitoring is concerned, you can use user-defined metrics like the built-in metrics. You can chart them, set alerts on them, read them, and otherwise monitor them. For information about reading metric data, see the following documents:
- List metric and resource types explains how to list and examine your user-defined and built-in metric types. For example, you can use the information in that document to list all user-defined metric descriptors in your project.
- Retrieve time-series data explains how to retrieve time series data from metrics using the Monitoring API. For example, this document describes how you can use the API to get the CPU utilization for virtual machine (VM) instances in your Google Cloud project.
The Google Cloud console provides a dedicated page to help you view your usage of user-defined metrics. For information about the contents of this page, see View and manage metric usage.
Metric descriptors for user-defined metrics
Each metric type must have a metric descriptor that defines how the metric data is organized. The metric descriptor also defines the labels for the metric and the name of the metric. For example, the metrics lists show the metric descriptors for all built-in metric types.
Cloud Monitoring can create the metric descriptor for you, by using the metric data you write, or you can explicitly create the metric descriptor, and then write metric data. In either case, you must decide how you want to organize your metric data.
Design example
Suppose you have a program that runs on a single machine,
and that this program calls auxiliary programs A
and B
. You want to count
how often programs A
and B
are called. You also want to know when
program A
is called more than 10 times per minute and when program B
is
called more than 5 times per minute. Lastly, assume that you have
a single Google Cloud project and you plan to write the data
against the global
monitored resource.
This example describes a few different designs that you could use for your user-defined metrics:
You use two metrics:
Metric-type-A
counts calls to programA
andMetric-type-B
counts calls to programB
. In this case,Metric-type-A
contains 1 time series, andMetric-type-B
contains 1 time series.You can create a single alerting policy with two conditions, or you can create two alerting policies each with one condition with this data mode. An alerting policy can support multiple conditions, but it has a single configuration for the notification channels.
This model might be appropriate when you aren't interested in similarities in the data between the activities being monitored. In this example, the activities are the rate of calls to programs
A
andB
.You use a single metric and use a label to store a program identifier. For example, the label might store the value
A
orB
. Monitoring creates a time series for each unique combination of labels. Therefore, there is a time series whose label value isA
and another time series whose label value isB
.As with the previous model, you can create a single alerting policy or two alerting policies. However, the conditions for the alerting policy are more complicated. A condition that generates an incident when the rate of calls for program
A
exceeds a threshold must use a filter that includes only data points whose label value isA
.One advantage of this model is that it is simple to compute ratios. For example, you can determine how much of the total is due to calls to
A
.You use a single metric to count the number of calls, but you don't use a label to record which program was called. In this model, there is a single time series that combines the data for the two programs. However, you can't create an alerting policy that meets your objectives because the data for two programs can't be separated.
The first two designs let you meet your data analysis requirements; however, the last design doesn't.
For more information, see Create a user-defined metric.
Names of user-defined metrics
When you create a user-defined metric, you define a string identifier that
represents the metric type. This string must be unique among the
user-defined metrics in your
Google Cloud project and it must use a prefix that marks the metric as a
user-defined metric. For Monitoring, the allowable prefixes are
custom.googleapis.com/
, workload.googleapis.com/
,
external.googleapis.com/user
, and external.googleapis.com/prometheus
.
The prefix is followed by a name that
describes what you are collecting.
For details on the recommended way to name a metric, see
Metric naming conventions.
Here are examples of the two kinds of identifiers for metric types:
custom.googleapis.com/cpu_utilization custom.googleapis.com/instance/cpu/utilization
In the previous example, the prefix custom.googleapis.com
indicates that both
metrics are user-defined metrics. Both examples are for metrics that measure the
CPU utilization; however, they use different organizational models. When you
anticipate having a large number of user-defined metrics, we recommend that you
use a hierarchical naming structure like that used by the second example.
All metric types have globally unique identifiers called resource names. The structure of a resource name for a metric type is:
projects/PROJECT_ID/metricDescriptors/METRIC_TYPE
where METRIC_TYPE
is the string identifier of the metric type.
If the previous metric examples are created in project my-project-id
,
then their resource names for these metrics would be the following:
projects/my-project-id/metricDescriptors/custom.googleapis.com/cpu_utilization projects/my-project-id/metricDescriptors/custom.googleapis.com/instance/cpu/utilization
Name or type? In the metric descriptor, the name
field stores the
metric type's resource name and the type
field stores the METRIC_TYPE
string.
Monitored-resource types for user-defined metrics
When you write your data to a time series, you must indicate where the data came from. To specify the source of the data, you choose a monitored-resource type that represents where your data comes from, and then use that to describe the specific origin. The monitored resource isn't part of the metric type. Instead, the time series to which you write data includes a reference to the metric type and a reference to the monitored resource. The metric type describes the data while the monitored resource describes where the data originated.
Consider the monitored resource before creating your metric descriptor. The monitored-resource type you use affects which labels you need to include in the metric descriptor. For example, the Compute Engine VM resource contains labels for the project ID, the instance ID, and the instance zone. Therefore, if you plan to write your metric against a Compute Engine VM resource, then the resource labels include the instance ID so you don't need a label for the instance ID in the metric descriptor.
Each of your metric's data points must be associated with a monitored resource object. Points from different monitored-resource objects are held in different time series.
You must use one of the following monitored resource types with user-defined metrics:
aws_ec2_instance
: Amazon EC2 instance.dataflow_job
: Dataflow job.gae_instance
: App Engine instance.gce_instance
: Compute Engine instance.generic_node
: User-specified computing node.generic_task
: User-defined task.gke_container
: GKE container instance.global
: Use this resource when no other resource type is suitable. For most use cases,generic_node
orgeneric_task
are better choices thanglobal
.k8s_cluster
: Kubernetes cluster.k8s_container
: Kubernetes container.k8s_node
: Kubernetes node.k8s_pod
: Kubernetes pod.
A common practice is to use the monitored resource objects that represent the physical resources where your application code is running. This approach has several advantages:
- You get better performance compared with using a single resource type.
- You avoid out-of-order data caused by multiple processes writing to the same time series.
- You can group your user-defined-metric data with other metric data from the same resources.
global
and generic resources
The generic_task
and generic_node
resource types are useful in situations
where none of the more specific resource types are appropriate.
The generic_task
type is useful for defining task-like resources such as
applications. The generic_node
type is useful for defining node-like
resources such as virtual machines. Both generic_*
types
have several common labels you can use to define unique resource objects,
making it easy to use them in metric filters for aggregations and reductions.
In contrast, the global
resource type has only the project_id
label.
When you have many sources of metrics within a project, using the
same global
resource object can cause
collisions and over-writes of your metric data.
API methods that support user-defined metrics
The following table shows which methods in the Monitoring API support user-defined metrics and which methods support built-in metrics:
Monitoring API method | Use with user-defined metrics |
Use with built-in metrics |
---|---|---|
monitoredResourceDescriptors.get | yes | yes |
monitoredResourceDescriptors.list | yes | yes |
metricDescriptors.get | yes | yes |
metricDescriptors.list | yes | yes |
timeSeries.list | yes | yes |
timeSeries.create | yes | |
metricDescriptors.create | yes | |
metricDescriptors.delete | yes |
Limits and latencies
For limits related to user-defined metrics and data retention, see Quotas and limits.
To keep your metric data beyond the retention period, you must manually copy the data to another location, such as Cloud Storage or BigQuery.
For information about latencies associated with writing data to user-defined metrics, see Latency of metric data.
What's next
- Use Google Cloud Managed Service for Prometheus to collect Prometheus metrics from applications running on Google Kubernetes Engine and Kubernetes.
- Collect Prometheus metrics from applications running on Compute Engine.
- Collect OTLP metrics and traces from applications instrumented by using OpenTelemetry and running on Compute Engine.
- Create user-defined metrics with the API
- Introduction to the Cloud Monitoring API
- Metrics, time series, and resources