Distribution metrics

This document describes how you can create and interpret a chart that displays metric data of the Distribution value type. This value type is used by services when the individual measurements are too numerous to collect, but statistical information, such as averages or percentiles, about those measurements is valuable. For example, when an application relies on HTTP traffic, you can use a distribution-valued metric that captures HTTP response latency to evaluate how quickly HTTP requests complete.

To illustrate how a histogram is created, consider a service that measures the HTTP latency of requests and that reports this data by using a metric with a distribution-value type. The data is reported every minute. The service defines ranges of values for the metric, called buckets, and records the count of measured values that falls into each bucket. For example, when an HTTP request completes, the service increments the count in the bucket whose range includes the request's latency value. These counts create a histogram of values for that minute.

Assume that the latencies measured in a one-minute interval are 5, 1, 3, 5, 6, 10, and 14. If the buckets are [0, 4), [4, 8), [8, 12), and [12, 16), then the histogram of this data is [2, 3, 1, 1]. The following table shows how individual measurements affect the count for each bucket:

Bucket Latency measurements Number of values in the bucket
[12,16) 14 1
[8,12) 10 1
[4,8) 5, 5, 6 3
[0,4) 1, 3 2

When this data is written to the time series, a Point object is created. For metrics with a distribution value, that object includes the histogram of values. For this sampling period, the Point contains [2, 3, 1, 1]. The individual measurements aren't written to the time series.

Assume that the previous table records the histogram for the latency data as measured at time 1:00. That table illustrates how to take a series of measurements and convert them into bucket counts. Suppose that the bucket counts at times 1:01, 1:02, and 1:03 are as shown in the following table:

Bucket Histogram for
1:00
Histogram for
1:01
Histogram for
1:02
Histogram for
1:03
[12,16) 1 6 0 1
[8,12) 1 0 2 2
[4,8) 3 1 1 8
[0,4) 2 6 10 3

The previous table displays a sequence of histograms indexed by time. Each column in the table represents the latency data for a one-minute period. To get the number of measurements at a specific time, sum the bucket counts. However, the actual measurements aren't shown as those measurements aren't available in distribution-valued metrics.

Heatmap charts

Heatmap charts are designed to display a single time series with distribution values. For these charts, the X-axis represents time, the Y-axis represents the buckets, and color represents the value. The brighter the color indicates a higher value. For example, dark areas of the heatmap indicate lower bucket counts than yellow or white areas.

The following figure is one representation of a heatmap for the previous example:

Heatmap chart for the example.

In the previous figure, the heatmap uses black to represent the smallest bucket count, 0, and yellow to represent the largest bucket count, 10. Reds and oranges represent values between these two extremes.

Because heatmap charts can display only a single time series, when you have multiple time series, you must set the aggregation options to combine them into a single time series. For example, to use Metrics Explorer to create a heatmap chart that shows the sum of the time series, do the following:

To use Metrics Explorer to display the RTT Latencies of a VM instance, do the following:
  1. In the Google Cloud console, go to the Metrics Explorer page within Monitoring.
  2. Go to Metrics Explorer

  3. In the toolbar, select the Explorer tab.
  4. Select the Configuration tab.
  5. Expand the Select a metric menu, enter RTT Latencies in the filter bar, and then use the submenus to select a specific resource type and metric:
    1. In the Active resources menu, select VM instance.
    2. In the Active metric categories menu, select Vm-flow.
    3. In the Active metrics menu, select RTT Latencies.
    4. Click Apply.
  6. In the Metrics Explorer toolbar, click Line chart, and then select Heatmap.
  7. Use the configuration pane to combine the time series into a single time series:
    • Ensure that the Group by field is empty.
    • Select sum as the Aggregator.

In the previous example, the heatmap chart is configured by selecting values from menus. However, you can also use Monitoring Query Language (MQL) to chart distribution-valued metrics. For example, you can select the MQL tab in Metrics Explorer and then enter the following query:

fetch gce_instance
| metric 'networking.googleapis.com/vm_flow/rtt'
| align delta(1m)
| every 1m
| group_by [], [aggregate(value.rtt)]

In the previous expression, the time-series data is fetched, aligned, and then grouped. The alignment process uses a delta aligner with a one minute alignment period. Because the first argument to group_by is [], all time series are combined. The second argument, [aggregate(value.rtt)], defines how the time series are combined. In this example, for each timestamp, the values of the rtt field of the different time series are combined with the aggregate function, which is selected by MQL.

If you use menus to select the metric and then select the MQL tab, your selections are converted into a MQL query that is in strict form:

fetch gce_instance
| metric 'networking.googleapis.com/vm_flow/rtt'
| align delta(1m)
| every 1m
| group_by [], [value_rtt_aggregate: aggregate(value.rtt)]

The previous expression is functionally equivalent to the original MQL example.

For more information about MQL, see Introduction to Monitoring Query Language.

Line and bar charts

Line charts, stacked bar charts, and stacked line charts, which are designed to display scalar data, can't display distribution values. To display a metric with a distribution value with one of these chart types, you must convert the histogram values into scalar values. For example, you could compute the sum of the values in the histogram or you could select a percentile.

For example, each row in the following table includes a timestamp, a histogram, and a sum of histogram values:

Time Histogram Sum of histogram values
1:00 [2, 3, 1, 1] 7
1:01 [6, 1, 0, 6] 13
1:02 [10, 1, 2, 0] 13
1:03 [3, 8, 2, 1] 14

In the preceding table, you can display the sum of histogram values with an X-Y plot.

For a metric that stores HTTP latency information, the sum is a meaningful measure, because it indirectly represents the number of completed HTTP requests. The data from the preceding table shows that the rate of HTTP request completion is low but relatively constant:

Line chart for the example.

Line charts only display time series with scalar values. To display a distribution-valued metric on a line chart, use the aggregation fields to convert the distribution values into scalar values. For example, to use Metrics Explorer to display the 99th percentile of a distribution-valued metric, do the following:

  1. In the console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Metrics Explorer .
  3. Select a distribution-valued metric and a resource. For example, select the RTT Latencies metric and the VM instance resource.
  4. Ensure the Metrics Explorer toolbar shows Line chart .
  5. In the configuration pane, select 99th percentile for the Aggregator.

Aggregation and distribution metrics

Aggregation is the process of regularizing points within a time series and of combining multiple time series. Aggregation is the same for distribution type metrics as it is for metrics that have a value type of integer or double. However, the chart type enforces some requirements on the choices used for aligning and grouping time series.

Heatmap charts

Heatmap charts display one distribution-valued time series. When you have multiple time series, you must use aligners and grouping functions to create a single time series.

Select a sum or delta aligner when a chart displays a heatmap. These functions combine, at the bucket level, all samples for a single time series that are in the same alignment period, and the result is a distribution value. For example, if two adjacent samples of a time series are [2, 3, 1, 1] and [2, 5, 4, 1], then the sum aligner produces [4, 8, 5, 2].

The grouping function defines how different time series are combined. This function is sometimes called an aggregator or a reducer. For heatmaps, set the grouping function to the sum function. The sum function adds the values of the same buckets across all histograms, resulting in a new histogram. For example, the sum of the value [2, 3, 1, 1] from timeseries-A and the value [1, 5, 2, 2] from timeseries-B is [3, 8, 3, 3].

Line charts

Line charts display only scalar-valued time series. To display a distribution-valued metric on a line chart, use the aligner or the grouping function to convert the distribution values into scalar values:

  • Percentile aligners convert a distribution value into a scalar value. With these aligners, grouping time series is optional.

  • Sum and delta aligners don't convert a distribution value into a scalar value. When you use these aligners, select a grouping function that converts distribution values into scalar values.

For example, to configure a line chart on a dashboard to display the 99th percentile of every time series for a distribution-valued metric, do the following:

  1. In the console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Dashboards, then select the dashboard that you want to view or edit.
  3. If the Edit dashboard button is shown, then click it.
  4. Add a line chart to your dashboard by selecting the line-chart widget from the Chart library.
  5. Modify the line chart configuration to display a distribution-valued metric for a specific resource. For example, select the RTT Latencies metric and the VM instance resource.
  6. Configure the chart to use a percentile aligner:

    • Basic tab: Clear Grouped and select 99th percentile.
    • Advanced tab: Select Percentile in the preprocessing step and use the menu to select the 99th percentile. Also, ensure the group-by field is empty and the group-by function is set to none.

The resulting chart can display multiple lines, one for each time series.

For another example, suppose you want to display a single time series that is the 99th percentile of the time series for a distribution-valued metric. To configure this chart, replace the final step in the previous sequence with the following steps, which that specify a sum aligner and set a grouping function:

  1. Select the Advanced tab.
  2. Select No preprocessing step in the preprocessing step.
  3. Set the Alignment function to sum.
  4. Ensure the group-by field is empty, and set the Group by function to 99th percentile.

The resulting chart displays a single line.

What's next

For information about how to determine the bucket model for a metric and how to interpret percentiles, see Percentiles and distribution-valued metrics.