This document describes how you can create and interpret a chart that displays
metric data of the Distribution
value type.
This value type is used by services when the individual measurements are too
numerous to collect, but statistical information, such as averages or
percentiles, about those measurements is valuable.
For example, when an application relies on HTTP traffic, you can use a
distribution-valued metric that captures HTTP response latency to evaluate
how quickly HTTP requests complete.
To illustrate how a histogram is created, consider a service that measures the HTTP latency of requests and that reports this data by using a metric with a distribution-value type. The data is reported every minute. The service defines ranges of values for the metric, called buckets, and records the count of measured values that falls into each bucket. For example, when an HTTP request completes, the service increments the count in the bucket whose range includes the request's latency value. These counts create a histogram of values for that minute.
Assume that the latencies measured in a one-minute interval are 5, 1, 3, 5, 6, 10, and 14. If the buckets are [0, 4), [4, 8), [8, 12), and [12, 16), then the histogram of this data is [2, 3, 1, 1]. The following table shows how individual measurements affect the count for each bucket:
Bucket | Latency measurements | Number of values in the bucket |
---|---|---|
[12,16) | 14 | 1 |
[8,12) | 10 | 1 |
[4,8) | 5, 5, 6 | 3 |
[0,4) | 1, 3 | 2 |
When this data is written to the time series, a Point
object is created. For metrics with a distribution value, that object
includes the histogram of values. For this sampling period, the
Point
contains [2, 3, 1, 1]. The individual measurements aren't
written to the time series.
Assume that the previous table records the histogram for the latency data as measured at time 1:00. That table illustrates how to take a series of measurements and convert them into bucket counts. Suppose that the bucket counts at times 1:01, 1:02, and 1:03 are as shown in the following table:
Bucket | Histogram for 1:00 |
Histogram for 1:01 |
Histogram for 1:02 |
Histogram for 1:03 |
---|---|---|---|---|
[12,16) | 1 | 6 | 0 | 1 |
[8,12) | 1 | 0 | 2 | 2 |
[4,8) | 3 | 1 | 1 | 8 |
[0,4) | 2 | 6 | 10 | 3 |
The previous table displays a sequence of histograms indexed by time. Each column in the table represents the latency data for a one-minute period. To get the number of measurements at a specific time, sum the bucket counts. However, the actual measurements aren't shown as those measurements aren't available in distribution-valued metrics.
Heatmap charts
Heatmap charts are designed to display a single time series with distribution values. For these charts, the X-axis represents time, the Y-axis represents the buckets, and color represents the value. The brighter the color indicates a higher value. For example, dark areas of the heatmap indicate lower bucket counts than yellow or white areas.
The following figure is one representation of a heatmap for the previous example:
In the previous figure, the heatmap uses black to represent the smallest bucket count, 0, and yellow to represent the largest bucket count, 10. Reds and oranges represent values between these two extremes.
Because heatmap charts can display only a single time series, you must set the aggregation options to combine all time series.
To use Metrics Explorer to display the sum of the RTT latencies of a VM instance, do the following:- In the Google Cloud console, go to the Metrics Explorer page within Monitoring.
- In the Select a metric pane, expand the Metric menu,
enter
RTT latencies
in the filter bar, and then use the submenus to select a specific resource type and metric:- In the Active resources menu, select VM Instance.
- In the Active metric categories menu, select Vm_flow.
- In the Active metrics menu, select RTT latencies.
- Click Apply.
- In the Group By section, do the following:
- Select Sum as the Grouping function.
- Ensure that no Labels are selected.
- In the Display pane, ensure that the Widget type menu is set to Heatmap.
In the previous example, the heatmap chart is configured by selecting values from menus. However, you can also use Monitoring Query Language (MQL) to chart distribution-valued metrics. For example, you can select the MQL tab in Metrics Explorer and then enter the following query:
fetch gce_instance
| metric 'networking.googleapis.com/vm_flow/rtt'
| align delta(1m)
| every 1m
| group_by [], [aggregate(value.rtt)]
In the previous expression, the time-series data is fetched, aligned, and then
grouped. The alignment process uses a delta
aligner with a one minute
alignment period. Because the first argument to group_by
is []
,
all time series are combined.
The second argument, [aggregate(value.rtt)]
, defines how the time series are
combined. In this example, for each timestamp, the values of the rtt
field
of the different time series are combined with the aggregate
function, which
is selected by MQL.
If you use menus to select the metric and then select the MQL tab, your selections are converted into a MQL query that is in strict form:
fetch gce_instance
| metric 'networking.googleapis.com/vm_flow/rtt'
| align delta(1m)
| every 1m
| group_by [], [value_rtt_aggregate: aggregate(value.rtt)]
The previous expression is functionally equivalent to the original MQL example.
For more information about MQL, see Monitoring Query Language overview.
Line and bar charts
Line charts, stacked bar charts, and stacked line charts, which are designed to display scalar data, can't display distribution values. To display a metric with a distribution value with one of these chart types, you must convert the histogram values into scalar values. For example, you could compute the sum of the values in the histogram or you could select a percentile.
For example, each row in the following table includes a timestamp, a histogram, and a sum of histogram values:
Time | Histogram | Sum of histogram values |
---|---|---|
1:00 | [2, 3, 1, 1] | 7 |
1:01 | [6, 1, 0, 6] | 13 |
1:02 | [10, 1, 2, 0] | 13 |
1:03 | [3, 8, 2, 1] | 14 |
In the preceding table, you can display the sum of histogram values with an X-Y plot.
For a metric that stores HTTP latency information, the sum is a meaningful measure, because it indirectly represents the number of completed HTTP requests. The data from the preceding table shows that the rate of HTTP request completion is low but relatively constant:
Line charts only display time series with scalar values. To display a distribution-valued metric on a line chart, use the aggregation fields to convert the distribution values into scalar values.
To display the 99th percentile of the RTT latencies of a VM instance, do the following:- In the Google Cloud console, go to the Metrics Explorer page within Monitoring.
- In the Select a metric pane, expand the Metric menu,
enter
RTT latencies
in the filter bar, and then use the submenus to select a specific resource type and metric:- In the Active resources menu, select VM Instance.
- In the Active metric categories menu, select Vm_flow.
- In the Active metrics menu, select RTT latencies.
- Click Apply.
- Configure how the data is viewed. By default, Metrics Explorer adds a grouping that averages all time series. For this chart, on the Group By entry, for the Labels field, select acl_operation. For the Grouping function field, select sum. For more information, see Select metrics when using Metrics Explorer.
Aggregation and distribution metrics
Aggregation is the process of regularizing points within a time series and of combining multiple time series. Aggregation is the same for distribution type metrics as it is for metrics that have a value type of integer or double. However, the chart type enforces some requirements on the choices used for aligning and grouping time series.
Heatmap charts
Heatmap charts display one distribution-valued time series. When you have multiple time series, you must use aligners and grouping functions to create a single time series.
Select a sum or delta aligner when a chart displays a heatmap. These functions combine, at the bucket level, all samples for a single time series that are in the same alignment period, and the result is a distribution value. For example, if two adjacent samples of a time series are [2, 3, 1, 1] and [2, 5, 4, 1], then the sum aligner produces [4, 8, 5, 2].
The grouping function defines how different time series are combined. This function is sometimes called an aggregator or a reducer. For heatmaps, set the grouping function to the sum function. The sum function adds the values of the same buckets across all histograms, resulting in a new histogram. For example, the sum of the value [2, 3, 1, 1] from timeseries-A and the value [1, 5, 2, 2] from timeseries-B is [3, 8, 3, 3].
Line charts
Line charts display only scalar-valued time series. To display a distribution-valued metric on a line chart, use the aligner or the grouping function to convert the distribution values into scalar values:
Percentile aligners convert a distribution value into a scalar value. With these aligners, grouping time series is optional.
Sum and delta aligners don't convert a distribution value into a scalar value. When you use these aligners, select a grouping function that converts distribution values into scalar values.
For example, to configure a line chart on a dashboard to display the 99th percentile of every time series for a distribution-valued metric, do the following:
- In the Google Cloud console, select Monitoring
or click the following button:
Go to Monitoring - In the navigation pane, select
Dashboards, then select the dashboard that you want to view or edit.
- If the edit Edit dashboard button is shown, then click it.
- Add a line chart to your dashboard by selecting the line-chart widget from the Chart library.
- Modify the line chart configuration to display a distribution-valued metric for a specific resource. For example, select the RTT latencies metric and the VM instance resource.
Configure the chart to use a percentile aligner:
- Basic tab: Clear Grouped and select 99th percentile.
- Advanced tab: Select Percentile in the preprocessing step and use the menu to select the 99th percentile. Also, ensure the group-by field is empty and the group-by function is set to none.
The resulting chart can display multiple lines, one for each time series.
For another example, suppose you want to display a single time series that is the 99th percentile of the time series for a distribution-valued metric. To configure this chart, replace the final step in the previous sequence with the following steps, which that specify a sum aligner and set a grouping function:
- Select the Advanced tab.
- Select No preprocessing step in the preprocessing step.
- Set the Alignment function to sum.
- Ensure the group-by field is empty, and set the Group by function to 99th percentile.
The resulting chart displays a single line.
What's next
For information about how to determine the bucket model for a metric and how to interpret percentiles, see Percentiles and distribution-valued metrics.