Create custom metrics with OpenCensus

OpenCensus is a free, open source project whose libraries:

  • Provide vendor-neutral support for the collection of metric and trace data across multiple languages.
  • Can export the collected data to various backend applications, including Cloud Monitoring, by using exporters.

Although Cloud Monitoring provides an API that supports defining and collecting custom metrics, it is a low-level, proprietary API. OpenCensus provides an API that follows the style of the language community, along with an exporter that sends your metric data to Cloud Monitoring through the Monitoring API for you.

OpenCensus also has good support for application tracing; see OpenCensus Tracing for a general overview. Cloud Trace recommends using OpenCensus for trace instrumentation. To collect both metric and trace data from your services, you can use a single distribution of libraries. For information about using OpenCensus with Cloud Trace, see Client Libraries for Trace.

Before you begin

To use Cloud Monitoring, you must have a Cloud project with billing enabled. If necessary, do the following:

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  3. Ensure the Monitoring API is enabled. For details, see Enabling the Monitoring API.
  4. For applications running outside of Google Cloud, your Cloud project must authenticate your application. Typically, you configure authentication by creating a service account for your project and by configuring an environment variable.

    For applications you run on an Amazon Elastic Compute Cloud (Amazon EC2) instance, create the service account for the instance's AWS connector project.

    For information about creating a service account, see Getting started with authentication.

Install OpenCensus

To use metrics collected by OpenCensus in your Google Cloud project, you must make the OpenCensus metrics libraries and the Stackdriver exporter available to your application. The Stackdriver exporter exports the metrics that OpenCensus collects to your Google Cloud project. You can then use Cloud Monitoring to chart or monitor those metrics.

Go

Using OpenCensus requires Go version 1.11 or higher. The dependencies are handled automatically for you.

Java

For Maven, add the following to the dependencies element in your pom.xml file:
<dependency>
  <groupId>io.opencensus</groupId>
  <artifactId>opencensus-api</artifactId>
  <version>${opencensus.version}</version>
</dependency>
<dependency>
  <groupId>io.opencensus</groupId>
  <artifactId>opencensus-impl</artifactId>
  <version>${opencensus.version}</version>
</dependency>
<dependency>
  <groupId>io.opencensus</groupId>
  <artifactId>opencensus-exporter-stats-stackdriver</artifactId>
  <version>${opencensus.version}</version>
</dependency>

Node.js

  1. Before installing the OpenCensus core and exporter libraries, make sure you've prepared your environment for Node.js development.
  2. The easiest way to install OpenCensus is with npm:
    npm install @opencensus/core
    npm install @opencensus/exporter-stackdriver
  3. Place the require statements shown below at the top of your application's main script or entry point, before any other code:
const {globalStats, MeasureUnit, AggregationType} = require('@opencensus/core');
const {StackdriverStatsExporter} = require('@opencensus/exporter-stackdriver');

Python

Install the OpenCensus core and Stackdriver exporter libraries by using the following command:

pip install -r opencensus/requirements.txt

The requirements.txt file is in the GitHub repository for these samples, python-docs-samples.

Write custom metrics with OpenCensus

Instrumenting your code to use OpenCensus for metrics involves three steps:

  1. Import the OpenCensus stats and OpenCensus Stackdriver exporter packages.
  2. Initialize the Stackdriver exporter.
  3. Use the OpenCensus API to instrument your code.

The following example is a minimal program that writes metric data using OpenCensus. The program runs a loop and collects latency measures, and when the loop finishes, it exports the stats to Cloud Monitoring and exits:

Go


// metrics_quickstart is an example of exporting a custom metric from
// OpenCensus to Stackdriver.
package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"contrib.go.opencensus.io/exporter/stackdriver"
	"go.opencensus.io/stats"
	"go.opencensus.io/stats/view"
	"golang.org/x/exp/rand"
)

var (
	// The task latency in milliseconds.
	latencyMs = stats.Float64("task_latency", "The task latency in milliseconds", "ms")
)

func main() {
	ctx := context.Background()

	// Register the view. It is imperative that this step exists,
	// otherwise recorded metrics will be dropped and never exported.
	v := &view.View{
		Name:        "task_latency_distribution",
		Measure:     latencyMs,
		Description: "The distribution of the task latencies",

		// Latency in buckets:
		// [>=0ms, >=100ms, >=200ms, >=400ms, >=1s, >=2s, >=4s]
		Aggregation: view.Distribution(0, 100, 200, 400, 1000, 2000, 4000),
	}
	if err := view.Register(v); err != nil {
		log.Fatalf("Failed to register the view: %v", err)
	}

	// Enable OpenCensus exporters to export metrics
	// to Stackdriver Monitoring.
	// Exporters use Application Default Credentials to authenticate.
	// See https://developers.google.com/identity/protocols/application-default-credentials
	// for more details.
	exporter, err := stackdriver.NewExporter(stackdriver.Options{})
	if err != nil {
		log.Fatal(err)
	}
	// Flush must be called before main() exits to ensure metrics are recorded.
	defer exporter.Flush()

	if err := exporter.StartMetricsExporter(); err != nil {
		log.Fatalf("Error starting metric exporter: %v", err)
	}
	defer exporter.StopMetricsExporter()

	// Record 100 fake latency values between 0 and 5 seconds.
	for i := 0; i < 100; i++ {
		ms := float64(5*time.Second/time.Millisecond) * rand.Float64()
		fmt.Printf("Latency %d: %f\n", i, ms)
		stats.Record(ctx, latencyMs.M(ms))
		time.Sleep(1 * time.Second)
	}

	fmt.Println("Done recording metrics")
}

Java


import com.google.common.collect.Lists;
import io.opencensus.exporter.stats.stackdriver.StackdriverStatsExporter;
import io.opencensus.stats.Aggregation;
import io.opencensus.stats.BucketBoundaries;
import io.opencensus.stats.Measure.MeasureLong;
import io.opencensus.stats.Stats;
import io.opencensus.stats.StatsRecorder;
import io.opencensus.stats.View;
import io.opencensus.stats.View.Name;
import io.opencensus.stats.ViewManager;
import java.io.IOException;
import java.util.Collections;
import java.util.Random;
import java.util.concurrent.TimeUnit;

public class Quickstart {
  private static final int EXPORT_INTERVAL = 70;
  private static final MeasureLong LATENCY_MS =
      MeasureLong.create("task_latency", "The task latency in milliseconds", "ms");
  // Latency in buckets:
  // [>=0ms, >=100ms, >=200ms, >=400ms, >=1s, >=2s, >=4s]
  private static final BucketBoundaries LATENCY_BOUNDARIES =
      BucketBoundaries.create(Lists.newArrayList(0d, 100d, 200d, 400d, 1000d, 2000d, 4000d));
  private static final StatsRecorder STATS_RECORDER = Stats.getStatsRecorder();

  public static void main(String[] args) throws IOException, InterruptedException {
    // Register the view. It is imperative that this step exists,
    // otherwise recorded metrics will be dropped and never exported.
    View view =
        View.create(
            Name.create("task_latency_distribution"),
            "The distribution of the task latencies.",
            LATENCY_MS,
            Aggregation.Distribution.create(LATENCY_BOUNDARIES),
            Collections.emptyList());

    ViewManager viewManager = Stats.getViewManager();
    viewManager.registerView(view);

    // Enable OpenCensus exporters to export metrics to Stackdriver Monitoring.
    // Exporters use Application Default Credentials to authenticate.
    // See https://developers.google.com/identity/protocols/application-default-credentials
    // for more details.
    StackdriverStatsExporter.createAndRegister();

    // Record 100 fake latency values between 0 and 5 seconds.
    Random rand = new Random();
    for (int i = 0; i < 100; i++) {
      long ms = (long) (TimeUnit.MILLISECONDS.convert(5, TimeUnit.SECONDS) * rand.nextDouble());
      System.out.println(String.format("Latency %d: %d", i, ms));
      STATS_RECORDER.newMeasureMap().put(LATENCY_MS, ms).record();
    }

    // The default export interval is 60 seconds. The thread with the StackdriverStatsExporter must
    // live for at least the interval past any metrics that must be collected, or some risk being
    // lost if they are recorded after the last export.

    System.out.println(
        String.format(
            "Sleeping %d seconds before shutdown to ensure all records are flushed.",
            EXPORT_INTERVAL));
    Thread.sleep(TimeUnit.MILLISECONDS.convert(EXPORT_INTERVAL, TimeUnit.SECONDS));
  }
}

Node.js

'use strict';

const {globalStats, MeasureUnit, AggregationType} = require('@opencensus/core');
const {StackdriverStatsExporter} = require('@opencensus/exporter-stackdriver');

const EXPORT_INTERVAL = process.env.EXPORT_INTERVAL || 60;
const LATENCY_MS = globalStats.createMeasureInt64(
  'task_latency',
  MeasureUnit.MS,
  'The task latency in milliseconds'
);

// Register the view. It is imperative that this step exists,
// otherwise recorded metrics will be dropped and never exported.
const view = globalStats.createView(
  'task_latency_distribution',
  LATENCY_MS,
  AggregationType.DISTRIBUTION,
  [],
  'The distribution of the task latencies.',
  // Latency in buckets:
  // [>=0ms, >=100ms, >=200ms, >=400ms, >=1s, >=2s, >=4s]
  [0, 100, 200, 400, 1000, 2000, 4000]
);

// Then finally register the views
globalStats.registerView(view);

// Enable OpenCensus exporters to export metrics to Stackdriver Monitoring.
// Exporters use Application Default Credentials (ADCs) to authenticate.
// See https://developers.google.com/identity/protocols/application-default-credentials
// for more details.
// Expects ADCs to be provided through the environment as ${GOOGLE_APPLICATION_CREDENTIALS}
// A Stackdriver workspace is required and provided through the environment as ${GOOGLE_PROJECT_ID}
const projectId = process.env.GOOGLE_PROJECT_ID;

// GOOGLE_APPLICATION_CREDENTIALS are expected by a dependency of this code
// Not this code itself. Checking for existence here but not retaining (as not needed)
if (!projectId || !process.env.GOOGLE_APPLICATION_CREDENTIALS) {
  throw Error('Unable to proceed without a Project ID');
}

// The minimum reporting period for Stackdriver is 1 minute.
const exporter = new StackdriverStatsExporter({
  projectId: projectId,
  period: EXPORT_INTERVAL * 1000,
});

// Pass the created exporter to Stats
globalStats.registerExporter(exporter);

// Record 100 fake latency values between 0 and 5 seconds.
for (let i = 0; i < 100; i++) {
  const ms = Math.floor(Math.random() * 5);
  console.log(`Latency ${i}: ${ms}`);
  globalStats.record([
    {
      measure: LATENCY_MS,
      value: ms,
    },
  ]);
}

/**
 * The default export interval is 60 seconds. The thread with the
 * StackdriverStatsExporter must live for at least the interval past any
 * metrics that must be collected, or some risk being lost if they are recorded
 * after the last export.
 */
setTimeout(() => {
  console.log('Done recording metrics.');
  globalStats.unregisterExporter(exporter);
}, EXPORT_INTERVAL * 1000);

Python


from random import random
import time

from opencensus.ext.stackdriver import stats_exporter
from opencensus.stats import aggregation
from opencensus.stats import measure
from opencensus.stats import stats
from opencensus.stats import view


# A measure that represents task latency in ms.
LATENCY_MS = measure.MeasureFloat(
    "task_latency",
    "The task latency in milliseconds",
    "ms")

# A view of the task latency measure that aggregates measurements according to
# a histogram with predefined bucket boundaries. This aggregate is periodically
# exported to Stackdriver Monitoring.
LATENCY_VIEW = view.View(
    "task_latency_distribution",
    "The distribution of the task latencies",
    [],
    LATENCY_MS,
    # Latency in buckets: [>=0ms, >=100ms, >=200ms, >=400ms, >=1s, >=2s, >=4s]
    aggregation.DistributionAggregation(
        [100.0, 200.0, 400.0, 1000.0, 2000.0, 4000.0]))


def main():
    # Register the view. Measurements are only aggregated and exported if
    # they're associated with a registered view.
    stats.stats.view_manager.register_view(LATENCY_VIEW)

    # Create the Stackdriver stats exporter and start exporting metrics in the
    # background, once every 60 seconds by default.
    exporter = stats_exporter.new_stats_exporter()
    print('Exporting stats to project "{}"'
          .format(exporter.options.project_id))

    # Register exporter to the view manager.
    stats.stats.view_manager.register_exporter(exporter)

    # Record 100 fake latency values between 0 and 5 seconds.
    for num in range(100):
        ms = random() * 5 * 1000

        mmap = stats.stats.stats_recorder.new_measurement_map()
        mmap.measure_float_put(LATENCY_MS, ms)
        mmap.record()

        print("Fake latency recorded ({}: {})".format(num, ms))

    # Keep the thread alive long enough for the exporter to export at least
    # once.
    time.sleep(65)


if __name__ == '__main__':
    main()
When this metric data is exported to Cloud Monitoring, you can use it like any other data.

The program creates an OpenCensus view that is called task_latency_distribution. This string becomes part of the name of the metric when it is exported to Cloud Monitoring. See Retrieving metric descriptors to see how the OpenCensus view is realized as a Cloud Monitoring metric descriptor. You can therefore use the view name as a search string when selecting a metric to chart.

If you have run the sample program, then you can use Metrics Explorer to look at your data:
  1. In the Google Cloud console, go to the Metrics Explorer page within Monitoring.
  2. Go to Metrics Explorer

  3. In the toolbar, select the Explorer tab.
  4. Select the Configuration tab.
  5. Expand the Select a metric menu, enter OpenCensus/task_latency_distribution in the filter bar, and then use the submenus to select a specific resource type and metric:
    1. In the Active resources menu, select your monitored resource. If you run the program on a local environment, then select Global.
    2. In the Active metric categories menu, select Custom.
    3. In the Active metrics menu, select Task latency distribution.
    4. Click Apply.
  6. Optional: To configure how the data is viewed, add filters and use the Group By, Aggregator, and chart-type menus. For this chart, expand the Line chart menu and then select Heatmap chart. For more information, see Select metrics when using Metrics Explorer.
  7. Optional: Change the graph settings:
    • For quota and other metrics that report one sample per day, set the time frame to at least one week and set the plot type to Stacked bar chart.
    • For distribution valued metrics, set the plot type to Heatmap chart.

The following screenshot shows the time series collected after running the program on a local environment:

Metrics from OpenCensus in Cloud Monitoring.

Each bar in the heatmap represents one run of the program, and the colored components of each bar represent buckets in the latency distribution.

Read OpenCensus metrics in Cloud Monitoring

You use custom metrics, including those metrics written by OpenCensus, like built-in metrics. You can chart them, set alerts on them, read them, and otherwise monitor them.

This section illustrates how to use APIs Explorer to read metric data. For information about how read metric data by using the Cloud Monitoring API or by using client libraries, see the following documents:

  • Browsing metrics explains how to list and examine your custom and built-in metrics.
  • Reading metrics explains how to retrieve time series data from custom and built-in metrics using the Monitoring API.

For example, the screenshot shown in the previous section is from Metrics Explorer. When you use charting tools, we recommend that you use the name of the OpenCensus view to filter the list of metrics. For more information, see Select metrics when using Metrics Explorer.

Retrieve metric descriptors

To retrieve the metric data by using the Monitoring API directly, you need to know the Cloud Monitoring names to which the OpenCensus metrics were exported. You can determine these names by retrieving the metric descriptors that the exporter creates and then looking at the type field. For details on metric descriptors, see MetricDescriptor.

To view the metric descriptors created for the exported metrics, do the following:

  1. Go to the metricDescriptors.list reference page.
  2. In the Try this API widget on the reference page, complete the following fields:

    1. Enter the name of your project in the name field. Use the following name structure projects/PROJECT_ID. This document uses a project with the ID a-gcp-project.

    2. Enter a filter into the filter field. There are many metric descriptors in a project. Filtering lets you eliminate those descriptors that aren't of interest.

      For example, because the name of the OpenCensus view becomes part of metric name, you can add a filter like this:

      metric.type=has_substring("task_latency_distribution")

      The key metric.type is a field in a type embedded in a time series. See TimeSeries for details.

    3. Click Execute.

The following shows the returned metric descriptor:

    {
      "metricDescriptors": [
        {
          "name": "projects/a-gcp-project/metricDescriptors/custom.googleapis.com/opencensus/task_latency_distribution",
          "labels": [
            {
              "key": "opencensus_task",
              "description": "Opencensus task identifier"
            }
          ],
          "metricKind": "CUMULATIVE",
          "valueType": "DISTRIBUTION",
          "unit": "ms",
          "description": "The distribution of the task latencies",
          "displayName": "OpenCensus/task_latency_distribution",
          "type": "custom.googleapis.com/opencensus/task_latency_distribution"
        }
      ]
    }

This line in the metric descriptor tells you the name of the metric type in Cloud Monitoring:

    "type": "custom.googleapis.com/opencensus/task_latency_distribution"

You now have the information you need to manually retrieve the data associated with the metric type. The value of the type field is also shown in the Google Cloud console when you chart the metric.

Retrieve metric data

To manually retrieve time-series data from a metric type, do the following:

  1. Go to the timeSeries.listreference page.
  2. In the Try this API widget on the reference page, complete the following fields:

    1. Enter the name of your project in the name field. Use the following name structure projects/PROJECT_ID.
    2. In the filter field, enter the following value:

      metric.type="custom.googleapis.com/opencensus/task_latency_distribution"

    3. Enter values for the interval.startTime and interval.endTime fields. These values must be entered as a timestamp, for example 2018-10-11T15:48:38-04:00. Ensure the startTime value is earlier than the endTime value.

    4. Click the Execute button.

The following shows the result of one such retrieval:

    {
      "timeSeries": [
        {
          "metric": {
            "labels": {
              "opencensus_task": "java-3424@docbuild"
            },
            "type": "custom.googleapis.com/opencensus/task_latency_distribution"
          },
          "resource": {
            "type": "gce_instance",
            "labels": {
              "instance_id": "2455918024984027105",
              "zone": "us-east1-b",
              "project_id": "a-gcp-project"
            }
          },
          "metricKind": "CUMULATIVE",
          "valueType": "DISTRIBUTION",
          "points": [
            {
              "interval": {
                "startTime": "2019-04-04T17:49:34.163Z",
                "endTime": "2019-04-04T17:50:42.917Z"
              },
              "value": {
                "distributionValue": {
                  "count": "100",
                  "mean": 2610.11,
                  "sumOfSquaredDeviation": 206029821.78999996,
                  "bucketOptions": {
                    "explicitBuckets": {
                      "bounds": [
                        0,
                        100,
                        200,
                        400,
                        1000,
                        2000,
                        4000
                      ]
                    }
                  },
                  "bucketCounts": [
                    "0",
                    "0",
                    "1",
                    "6",
                    "13",
                    "15",
                    "44",
                    "21"
                  ]
                }
              }
            }
          ]
        },
        [ ... data from additional program runs deleted ...]
      ]
    }

The returned metric data includes the following:

  • Information about the monitored resource on which the data was collected. OpenCensus can automatically detect gce_instance, k8s_container, and aws_ec2_instance monitored resources. This data came from a program run on a Compute Engine instance. For information on using other monitored resources, see Set monitored resource for exporter.
  • Description of the kind of metric and the type of the values.
  • The actual data points collected within the time interval requested.

How Monitoring represents OpenCensus metrics

Direct use of the Cloud Monitoring API for custom metrics is supported; using it is described in Create custom Metrics with the API. In fact, the OpenCensus exporter for Cloud Monitoring uses this API for you. This section provides some information about how Cloud Monitoring represents the metrics written by OpenCensus.

The constructs used by the OpenCensus API differ from the constructs used by Cloud Monitoring, as does some use of terminology. Where Cloud Monitoring refers to “metrics”, OpenCensus sometimes refers to “stats”. For example, the component of OpenCensus that sends metric data to Cloud Monitoring is called the “stats exporter for Stackdrdiver”.

For an overview of the OpenCensus model for metrics, see OpenCensus Metrics.

The data models for OpenCensus stats and Cloud Monitoring metrics do not fall into a neat 1:1 mapping. Many of the same concepts exist in each, but they are not directly interchangeable.

  • An OpenCensus view is analogous to the MetricDescriptor in the Monitoring API. A view describes how to collect and aggregate individual measurements. Tags are included with all recorded measurements.

  • An OpenCensus tag is a key-value pair. An OpenCensus tag corresponds generally to the LabelDescriptor in the Monitoring API. Tags let you capture contextual information that you can use to filter and group metrics.

  • An OpenCensus measure describes metric data to be recorded. An OpenCensus aggregation is a function applied to data used to summarize it. These functions are used in exporting to determine the MetricKind, ValueType, and unit reported in the Cloud Monitoring metric descriptor.

  • An OpenCensus measurement is a collected data point. Measurements must be aggregated into views. Otherwise, the individual measurements are dropped. An OpenCensus measurement is analogous to a Point in the Monitoring API. When measurements are aggregated in views, the aggregated data is stored as view data, analogous to a TimeSeries in the Monitoring API.

What's next