Percentiles and distribution-valued metrics

This document describes how to understand percentiles and the histogram model for a metric data with a Distribution value type. A distribution metric defines ranges of values, called buckets, and records the count of measured values that falls into each bucket. Distribution metrics don't report the individual measure values; they report a histogram of counts in buckets. This value type is used by services when the individual measurements are too numerous to collect, but statistical information, such as averages or percentiles, about those measurements is valuable.

When you chart a distribution-valued metric on a heatmap, you can use an option in the chart toolbar to overlay the 50th, 95th, and 99th percentiles. To display a distribution-valued metric on a line chart, you must configure the chart to convert the distribution value into a numeric value. You can perform this conversion by using an aligner that selects a percentile.

The next section of this page uses a synthetic example to show how percentiles are determined. The example shows that the percentile values depend on the number of buckets, the width of the buckets, and the total count of samples. The percentile values don't depend on the actual measured values because those values aren't available in the histogram.

Example with synthetic data

Consider an Exponential bucket model with a scale of one and a growth factor of two. In a distribution that uses this bucket model, the bucket with index n+1 is twice as wide as the bucket with index n.

This example shows that the width of the bucket determines the maximum error between the computed percentile and the measurements. It also shows that the number of samples in a histogram is important. For example, if the number of samples is less than 20, then the 95th and 99th percentiles are always in the same bucket.

Case 1: The total number of samples is 1.

When there is a single measurement, the three percentile values differ but they only show the 50th, 95th, and 99th percentile of the same bucket. The error between the estimate and the actual measurements can't be determined because the measurement isn't known. For example, if the single measurement is in the bucket with range [128, 256), you don't know if the measured value was 128 or 255.

For example, assume that the histogram of measurements is as shown in the following table:

Bucket count
[0, 1) 0
[1, 2) 0
[2, 4) 0
[4, 8) 0
[8, 16) 0
[16, 32) 0
[32, 64) 0
[64, 128) 0
[128, 256) 1

To compute the 50th percentile, do the following:

  1. Use the bucket counts to determine that the [128, 256) bucket contains the 50th percentile.
  2. Assume that the measured values within the selected bucket are uniformly distributed.

With these rules, the best estimate of the 50th percentile is the bucket midpoint.

By using the same logic, you can compute any percentile for any bucket. Each row in the following table lists a percentile, the corresponding bucket, and the computed value:

Percentile bucket value
50th [128, 256) 192
95th [128, 256) 249.6
99th [128, 256) 254.7

The error between the estimate and the actual measurements can be bounded, but it can't be determined because the measurement isn't known.

Case 2: The total number of samples is 10.

When there are 10 samples, the 50th percentile might be in a different bucket than the 95th and 99th percentiles. However, there aren't enough measurements to allow the 95th and 99th percentiles to be in different buckets.

For example, assume that the histogram of measurements is as shown in the following table:

Bucket count
[0, 1) 4
[1, 2) 2
[2, 4) 1
[4, 8) 1
[8, 16) 1
[16, 32) 0
[32, 64) 0
[64, 128) 0
[128, 256) 1

By using the process described previously, the percentiles can be computed. Each row in the following table lists a percentile, the corresponding bucket, and the computed value:

Percentile bucket value max error
50th [1, 2) 1.5 0.5
95th [128, 256) 249.6 121.6
99th [128, 256) 254.7 126.7

Example with real data

This section contains an example that illustrates how you can determine the bucket model used by a particular metric. This section also illustrates how you can evaluate the potential error in the computed percentile values.

Identify the bucket model

To determine the buckets used for a metric over a specific time interval, call the Cloud Monitoring API's projects.timeSeries/list method.

For example, to identify the bucket model for a metric, do the following:

  1. Go to the projects.timeSeries/list web page.
  2. In APIs Explorer, enter the filter that specifies the metric, a start time, and an end time.

    For example, to get information about the metric that stores API requests latencies, enter the following:

    metric.type="serviceruntime.googleapis.com/api/request_latencies"
    resource.type="consumed_api"
    

    In this example, the filter field specifies a metric type and a resource type. For more information about these filters, see Monitoring filters.

  3. Click Enter.

The following is the list API response for a distribution-valued metric that is available on one Google Cloud project:

{
  "timeSeries": [
    {
      "metric": {...},
      "resource": {...},
      },
      "metricKind": "DELTA",
      "valueType": "DISTRIBUTION",
      "points": [
        {
          "interval": {
            "startTime": "2020-11-03T15:05:00Z",
            "endTime": "2020-11-03T15:06:00Z"
          },
          "value": {
            "distributionValue": {
              "count": "3",
              "mean": 25.889,
              "bucketOptions": {
                "exponentialBuckets": {
                  "numFiniteBuckets": 66,
                  "growthFactor": 1.4,
                  "scale": 1
                }
              },
              "bucketCounts": [
                "0",
                "0",
                "0",
                "0",
                "0",
                "0",
                "0",
                "0",
                "0",
                "0",
                "3"
              ]
            }
          }
        },

In the API response, the value field describes the data stored in the points array. The count and mean fields report that for the specified time interval there were 3 measurements and their average value was 25.889. The bucketOptions field shows that the exponential model is configured to have 66 buckets, a scale of 1, and a growth factor of 1.4.

To compute the lower and upper bounds for the bucket with index n, use the following rules:

  • Lower bound = scale * (growth factor)^(n-1)
  • Upper bound = scale * (growth factor)^n

The buckets for this metric, along with the midpoint of each bucket and the percentile contribution are shown in the following table:

nth interval Lower bound Upper bound Midpoint Each percentile contribution
0 -infinity 0 Not applicable
1 0 1.4 0.7 0.014
2 1.4 1.96 1.58 0.0056
...
9 14.75 20.66 17.7 0.0591
10 20.66 28.93 24.78 0.0827
11 28.9 40.5 34.7 0.116
...

To compute the values in the Each percentile contribution, you can do the following:

    Each percentile contribution = (Upper bound - Lower bound) / 100

For example, for bucket 10, each percentile is (28.93-20.66)/100 = 0.0827.

To compute the 50th percentile for bucket 10, do the following:

    50th percentile = Lower bound + (50 * Each percentile contribution)
                    = 20.66 + 50*0.0827
                    = 24.79

Verify the percentile computations

Now that the bucket configuration is known, you can predict what values are returned for the 50th, 95th, and 99th percentile values. For example, if the 50th percentile is in bucket number 10, then the 50th percentile value is 24.79.

To retrieve the 50th, 95th, and 99th percentile values of the metric, you can use the API method projects.timeSeries/list, and include an alignment period and aligner. In this example, the following settings were selected:

  • Aligner: ALIGN_PERCENTILE_50, ALIGN_PERCENTILE_95, or ALIGN_PERCENTILE_99
  • Alignment Period: 60 s

For the ALIGN_PERCENTILE_50 selection, each value in the time series is the 50th percentile of a bucket:

{
  "timeSeries": [
    {
      "metric": {...},
      "resource": {...},
      "metricKind": "GAUGE",
      "valueType": "DOUBLE",
      "points": [
        {
          "interval": {
            "startTime": "2020-11-03T15:06:36Z",
            "endTime": "2020-11-03T15:06:36Z"
          },
          "value": {
            "doubleValue": 24.793256140799986
          }
        },
        {
          "interval": {
            "startTime": "2020-11-03T15:05:36Z",
            "endTime": "2020-11-03T15:05:36Z"
          },
          "value": {
            "doubleValue": 34.710558597119977
          }
        },
        {
          "interval": {
            "startTime": "2020-11-03T15:04:36Z",
            "endTime": "2020-11-03T15:04:36Z"
          },
          "value": {
            "doubleValue": 24.793256140799986
          }
        }
      ]
    },

For two of the samples, the 50th percentile is in bucket 10, for the other sample it is in bucket 11.

The following table shows the results of executing the projects.timeSeries/list method with different aligners. The first row corresponds to the case where the aligner isn't specified. When you don't specify an aligner, the bucket model and mean values are returned. The next three rows list the data returned when the aligner is set to ALIGN_PERCENTILE_50, ALIGN_PERCENTILE_95, and ALIGN_PERCENTILE_99:

Statistic Sample @ 15:06 Sample @ 15:05 Sample @ 15:04
mean 25.889 33.7435 Not available.
50th percentile 24.79 34.71 24.79
95th percentile 28.51 39.91 28.51
99th percentile 28.84 40.37 28.84

The percentiles in the preceding table match what is expected. For example, the analysis with the synthetic data showed that the 50th percentile value is always the midpoint of a bucket. The ALIGN_PERCENTILE_50 computed values are also midpoints of an interval. Similarly, if you know the 99th percentile is in bucket 10, then the expected value for the 99th percentile is approximately 20.66 +(99*0.0827) or 28.84. The value returned by the ALIGN_PERCENTILE_99 query matches this expected value.

To determine if the 50th percentile is a reasonable estimate of the mean, you can compare the mean value to the 50th percentile. The mean value is returned with the bucket details.

What's next

For information about how to visualize distribution-valued metrics, see Charting distribution metrics.