# Percentiles and distribution-valued metrics

This document describes how to understand percentiles and the histogram model for a metric data with a `Distribution` value type. A distribution metric defines ranges of values, called buckets, and records the count of measured values that falls into each bucket. Distribution metrics don't report the individual measure values; they report a histogram of counts in buckets. This value type is used by services when the individual measurements are too numerous to collect, but statistical information, such as averages or percentiles, about those measurements is valuable.

When you chart a distribution-valued metric on a heatmap, you can use an option in the chart toolbar to overlay the 50th, 95th, and 99th percentiles. To display a distribution-valued metric on a line chart, you must configure the chart to convert the distribution value into a numeric value. You can perform this conversion by using an aligner that selects a percentile.

The next section of this page uses a synthetic example to show how percentiles are determined. The example shows that the percentile values depend on the number of buckets, the width of the buckets, the distribution of the measurements, and the total count of samples. The percentile values don't depend on the actual measured values because those values aren't available in the histogram.

## Example with synthetic data

Consider an `Exponential` bucket model with a scale of one and a growth factor of two. In a distribution that uses this bucket model, the bucket with index n+1 is twice as wide as the bucket with index n.

This example shows that the width of the bucket determines the maximum error between the computed percentile and the measurements. It also shows that the number of samples in a histogram is important. For example, if the number of samples is less than 20, then the 95th and 99th percentiles are always in the same bucket.

### Case 1: The total number of samples is 1.

When there is a single measurement, the three percentile values differ but they only show the 50th, 95th, and 99th percentile of the same bucket. The error between the estimate and the actual measurements can't be determined because the measurement isn't known. For example, if the single measurement is in the bucket with range [128, 256), you don't know if the measured value was 128 or 255.

For example, assume that the histogram of measurements is as shown in the following table:

Bucket count percentile range
[0, 1) 0 0
[1, 2) 0 0
[2, 4) 0 0
[4, 8) 0 0
[8, 16) 0 0
[16, 32) 0 0
[32, 64) 0 0
[64, 128) 0 0
[128, 256) 1 0 - 100

To compute the 50th percentile, do the following:

1. Use the bucket counts to determine that the [128, 256) bucket contains the 50th percentile.
2. Compute the estimate using the following rule:

```pth percentage = bucket_low +
(bucket_up - bucket_low)*(p - p_low)/(p_up - p_low)
```

In the previous expression, `p_low` and `p_up` are the lower and upper bounds of the percentile range for the bucket. Similarly, `bucket_low` and `bucket_up` are the lower and upper bounds of the bucket. The values for `p_low` and `p_up` depend on how the counts are distributed between the different buckets.

For example, the 50th percentile is computed as:

```   50th percentile = 128 + (256-128)*(50-0)/(100-0)
= 128 + 128 * 50 / 100
= 128 + 64
= 192```

To compute the 95th percentile, replace `50` with `95` in the previous expression. For this example where there is exactly one sample, the percentiles are as follows:

Percentile bucket value
50th [128, 256) 192
95th [128, 256) 249.6
99th [128, 256) 254.7

The error between the estimate and the actual measurements can be bounded, but it can't be determined because the measurement isn't known.

### Case 2: The total number of samples is 10.

When there are 10 samples, the 50th percentile might be in a different bucket than the 95th and 99th percentiles. However, there aren't enough measurements to allow the 95th and 99th percentiles to be in different buckets.

For example, assume that the histogram of measurements is as shown in the following table:

Bucket count percentile range
[0, 1) 4 0 - 40
[1, 2) 2 40 - 60
[2, 4) 1 60 - 70
[4, 8) 1 70 - 80
[8, 16) 1 80 - 90
[16, 32) 0
[32, 64) 0
[64, 128) 0
[128, 256) 1 90 - 100

You can use the procedure described previously to compute the 50th, 95th, and 99th percentiles. For example, the 50th percentile, which is in the [1, 2) bucket, is computed as follows:

```50th percentile = 1 + (2-1)*(50-40)/(60-40)
= 1 + (1 * 10 / 20)
= 1 + 0.5
= 1.5
```

Similarly, the 95th percentile is computed as follows:

```95th percentile = 128 + (256-128)*(95-90)/(100-90)
= 128 + 128 * 5 / 10
= 128 + 64
= 192
```

By using the process described previously, the percentiles can be computed. Each row in the following table lists a percentile, the corresponding bucket, and the computed value:

Percentile bucket value max error
50th [1, 2) 1.5 0.5
95th [128, 256) 192 94
99th [128, 256) 243.2 115.2

In this example, and in the previous example, the 95th percentile is in the bucket [128, 256); however, the percentile computation is different. The difference is due to how the samples are distributed. In the first example, all samples are in the same bucket, while in the most recent example, the samples are in different buckets.

## Example with real data

This section contains an example that illustrates how you can determine the bucket model used by a particular metric. This section also illustrates how you can evaluate the potential error in the computed percentile values.

### Identify the bucket model

To determine the buckets used for a metric over a specific time interval, call the Cloud Monitoring API's `projects.timeSeries/list` method.

For example, to identify the bucket model for a metric, do the following:

1. Go to the `projects.timeSeries/list` web page.
2. In APIs Explorer, enter the filter that specifies the metric, a start time, and an end time.

For example, to get information about the metric that stores API requests latencies, enter the following:

``````metric.type="serviceruntime.googleapis.com/api/request_latencies"
resource.type="consumed_api"
``````

In this example, the filter field specifies a metric type and a resource type. For more information about these filters, see Monitoring filters.

3. Click Enter.

The following is the `list` API response for a distribution-valued metric that is available on one Google Cloud project:

```{
"timeSeries": [
{
"metric": {...},
"resource": {...},
},
"metricKind": "DELTA",
"valueType": "DISTRIBUTION",
"points": [
{
"interval": {
"startTime": "2020-11-03T15:05:00Z",
"endTime": "2020-11-03T15:06:00Z"
},
"value": {
"distributionValue": {
"count": "3",
"mean": 25.889,
"bucketOptions": {
"exponentialBuckets": {
"numFiniteBuckets": 66,
"growthFactor": 1.4,
"scale": 1
}
},
"bucketCounts": [
"0",
"0",
"0",
"0",
"0",
"0",
"0",
"0",
"0",
"0",
"3"
]
}
}
},
```

In the API response, the `value` field describes the data stored in the `points` array. The `count` and `mean` fields report that for the specified time interval there were 3 measurements and their average value was 25.889. The `bucketOptions` field shows that the exponential model is configured to have 66 buckets, a scale of 1, and a growth factor of 1.4.

To compute the lower and upper bounds for the bucket with index n, use the following rules:

• Lower bound = scale * (growth factor)^(n-1)
• Upper bound = scale * (growth factor)^n

The buckets for this metric, along with the midpoint of each bucket, are shown in the following table:

nth interval Lower bound Upper bound Midpoint
0 -infinity 0 Not applicable
1 0 1.40 0.68
2 1.40 1.96 1.68
...
9 14.76 20.66 17.71
10 20.66 28.93 24.79
11 28.93 40.50 34.71
...

### Verify the percentile computations

Now that the bucket configuration is known, for any set of measurements you can predict the values of 50th, 95th, and 99th percentile values. For example, if there is one sample and it is in bucket number 10, then the 50th percentile value is 24.79.

To retrieve the 50th, 95th, and 99th percentile values of the metric, you can use the API method `projects.timeSeries/list`, and include an alignment period and aligner. In this example, the following settings were selected:

• Aligner: `ALIGN_PERCENTILE_50`, `ALIGN_PERCENTILE_95`, or `ALIGN_PERCENTILE_99`
• Alignment Period: 60 s

For the `ALIGN_PERCENTILE_50` selection, each value in the time series is the 50th percentile of a bucket:

```{
"timeSeries": [
{
"metric": {...},
"resource": {...},
"metricKind": "GAUGE",
"valueType": "DOUBLE",
"points": [
{
"interval": {
"startTime": "2020-11-03T15:06:36Z",
"endTime": "2020-11-03T15:06:36Z"
},
"value": {
"doubleValue": 24.793256140799986
}
},
{
"interval": {
"startTime": "2020-11-03T15:05:36Z",
"endTime": "2020-11-03T15:05:36Z"
},
"value": {
"doubleValue": 34.710558597119977
}
},
{
"interval": {
"startTime": "2020-11-03T15:04:36Z",
"endTime": "2020-11-03T15:04:36Z"
},
"value": {
"doubleValue": 24.793256140799986
}
}
]
},
```

For two of the samples, the 50th percentile is in bucket 10, for the other sample it is in bucket 11.

The following table shows the results of executing the `projects.timeSeries/list` method with different aligners. The first row corresponds to the case where the aligner isn't specified. When you don't specify an aligner, the bucket model and mean values are returned. The next three rows list the data returned when the aligner is set to `ALIGN_PERCENTILE_50`, `ALIGN_PERCENTILE_95`, and `ALIGN_PERCENTILE_99`:

Statistic Sample @ 15:06 Sample @ 15:05 Sample @ 15:04
mean 25.889 33.7435 Not available.
50th percentile 24.79 34.71 24.79
95th percentile 28.51 39.91 28.51
99th percentile 28.84 40.37 28.84

As the two examples with synthetic data illustrate, the values of the percentiles dependent on how the samples are distributed. When all samples are in the sample bucket, then the 50th percentile is the midpoint of that bucket. However, when samples are in different buckets, that distribution affects the estimates.

To determine if the 50th percentile is a reasonable estimate of the mean, you can compare the mean value to the 50th percentile. The mean value is returned with the bucket details.

## What's next

For information about how to visualize distribution-valued metrics, see About distribution-valued metrics.

[{ "type": "thumb-down", "id": "hardToUnderstand", "label":"Hard to understand" },{ "type": "thumb-down", "id": "incorrectInformationOrSampleCode", "label":"Incorrect information or sample code" },{ "type": "thumb-down", "id": "missingTheInformationSamplesINeed", "label":"Missing the information/samples I need" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]