A request-response service is one where a customer explicitly asks the service to do some work and waits for that work to be completed successfully. The most common examples of such services are:
- Web applications that human users interact with directly by using a browser.
- Mobile applications that consist of a client application on a user's mobile phone and an API backend that the client interacts with.
- API backends that are utilized by other services (rather than human users).
For all of these services, the common approach is to start with availability (measuring the ratio of successful requests) and latency (measuring the ratio of requests that complete under a time threshold) SLIs. For more information on availability and latency SLIs, see Concepts in service monitoring.
You express a request-based availability SLI by using the
TimeSeriesRatio
structure to set up a ratio
of good requests to total requests. You decide how to filter the metric by
using its available labels to arrive at your preferred determination of
"good" or "valid".
You express a request-based latency SLI by using a
DistributionCut
structure.
Cloud Endpoints
Cloud Endpoints is a service for managing APIs. It allows you to take an existing API and expose it with authentication, quotas, and monitoring.
Endpoints is implemented as a proxy in front of the gRPC
application server. By measuring the metrics at the proxy, you can correctly
handle the case when all backends are unavailable and users are seeing errors.
Endpoints writes data to metric types beginning with the
prefix serviceruntime.googleapis.com
.
For more information, see the following:
- Documentation for Cloud Endpoints.
- List of
serviceruntime.googleapis.com
metric types.
Availability SLIs
Cloud Endpoints writes metric data to Cloud Monitoring using the
api
monitored-resource type and the service-runtime
api/request_count
metric type, which you can filter by
using the response_code
metric label to count "good" and "total" requests.
You express a request-based availability SLI by creating a
TimeSeriesRatio
structure for good requests
to total requests, as shown in the following example:
"serviceLevelIndicator": {
"requestBased": {
"goodTotalRatio": {
"totalServiceFilter":
"metric.type=\"serviceruntime.googleapis.com/api/request_count\"
resource.type=\"api\"
metric.label.\"response_code_class\"!=\"4xx\"",
"goodServiceFilter":
"metric.type=\"serviceruntime.googleapis.com/api/request_count\"
resource.type=\"api\"
(metric.label.\"response_code_class\"=\"1xx\"" OR
metric.label.\"response_code_class\"=\"2xx\""),
}
}
}
Latency SLIs
Cloud Endpoints uses the following primary metric types to capture latency:
-
api/request_latencies
: a distribution of latencies in seconds for non-streaming requests. Use when overall user experience is of primary importance. -
api/request_latencies_backend
: a distribution of backend latencies in seconds for non-streaming requests. Use to to measure backend latencies directly. -
api/request_latencies_overhead
: a distribution of request latencies in seconds for non-streaming requests, excluding the backend. Use to measure the overhead introduced by the Endpoints proxy.
Note that the total request latency is the sum of the backend and overhead latencies:
request_latencies = request_latencies_backend + request_latencies_overhead
Endpoints writes metric data to Cloud Monitoring using the
api
monitored-resource type and one of the request-latency metric types.
None of these metric types provides a response_code
orresponse_code_class
label; therefore, they report latencies for all requests.
You express a request-based latency SLI by using a
DistributionCut
structure, as shown in the
following examples.
The following example SLO expects that 99% of all requests in the project fall between 0 and 100 ms in total latency over a rolling one-hour period:
{
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter":
"metric.type=\"serviceruntime.googleapis.com/ap/request_latencies\"
resource.type=\"api\"",
"range": {
"min": 0,
"max": 100
}
}
}
},
"goal": 0.99,
"rollingPeriod": "3600s",
"displayName": "99% requests under 100 ms"
}
The following example SLO expects that 98% of requests fall between 0 and 100 ms in backend latency over a rolling one-hour period:
{
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter":
"metric.type=\"serviceruntime.googleapis.com/api/backend_latencies\"
resource.type=\"api\"",
"range": {
"min": 0,
"max": 100
}
}
}
},
"goal": 0.98,
"rollingPeriod": "3600s",
"displayName": "98% requests under 100 ms"
}
Cloud Run
Cloud Run is a fully managed compute platform for deploying and scaling containerized applications quickly and securely. It's intended to abstract away all infrastructure management by responding to changes in traffic by automatically scaling up and down from zero almost instantaneously and only charging you for the exact resources you use.
For additional information about Cloud Run observability. see the following:
- Documentation for Cloud Run.
- List of
run.googleapis.com
metric types.
Availability SLIs
Cloud Run writes metric data to Cloud Monitoring using the
cloud_run_revision
monitored-resource type and
request_count
metric type. You can filter the data by using the response_code
or the
response_code_class
metric label to count "good" and "total" requests.
You express a request-based availability SLI by creating a
TimeSeriesRatio
structure for good requests
to total requests, as shown in the following example:
"serviceLevelIndicator": {
"requestBased": {
"goodTotalRatio": {
"totalServiceFilter":
"metric.type=\"run.googleapis.com/request_count\"
resource.type=\"cloud_run_revision\"
metric.label.\"response_code_class\"!=\"4xx\"",
"goodServiceFilter":
"metric.type=\"run.googleapis.com/request_count\"
resource.type=\"cloud_run_revision\"
(metric.label.\"response_code_class\"=\"1xx\"" OR
metric.label.\"response_code_class\"=\"2xx\""),
}
}
}
Latency SLIs
To measure latency, Cloud Run writes metric data to Cloud Monitoring
using the
cloud_run_revision
monitored-resource type and
request_latencies
metric type. The data is a distribution of request
latency in milliseconds reaching the revision. You can filter the data by
using the response_code
or the response_code_class
metric label if you
need to explicitly measure the latency of all requests or only the successful
requests.
You express a request-based latency SLI by using a
DistributionCut
structure. The following
example SLO expects that 99% of requests fall between 0 and 100 ms in total
latency over a rolling one-hour period:
{
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter":
"metric.type=\"run.googleapis.com/request_latencies\"
resource.type=\"cloud_run_revision"",
"range": {
"min": 0,
"max": 100
}
}
}
},
"goal": 0.99,
"rollingPeriod": "3600s",
"displayName": "99% requests under 100 ms"
}
Cloud Run functions
Cloud Run functions is a scalable pay-as-you-go Functions-as-a-Service offering that runs your code without the need to manage any infrastructure. Functions are used in many architecture patterns to do things like event processing, automation, and serving HTTP/S requests.
For information on Cloud Run functions observability, see the following:
- Documentation for Cloud Run functions.
- List of
run.googleapis.com
metric types.
Availability SLIs
Cloud Run functions writes metric data to Cloud Monitoring using the
cloud_function
monitored-resource type and
function/execution_time
metric type.
You can filter the data by using the status
metric label to count "good"
and "total" executions.
You express a request-based availability SLI by creating a
TimeSeriesRatio
structure for good requests
to total requests, as shown in the following example:
"serviceLevelIndicator": {
"requestBased": {
"goodTotalRatio": {
"totalServiceFilter":
"metric.type=\"cloudfunctions.googleapis.com/function/execution_count\"
resource.type=\"cloud_function\"",
"goodServiceFilter":
"metric.type=\"cloudfunctions.googleapis.com/function/execution_count\"
resource.type=\"cloud_function\
metric.label.\"status\"=\"ok\"",
}
}
}
Latency SLIs
To measure latency, Cloud Run functions writes metric data to Cloud Monitoring
using the
cloud_function
monitored-resource type and
function/execution_times
metric type. The data
is a distribution of functions execution times in nanoseconds."
You can filter the data by using the status
if you need to explicitly
measure the latency of all executions or only the successful executions.
You express a request-based latency SLI by using a
DistributionCut
structure. The following
example SLO expects that 99% of all Cloud Run functions executions fall
between 0 and 100 ms in total latency over a rolling one-hour period:
{
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter":
"metric.type=\"cloudfunctions.googleapis.com/function/execution_times\"
resource.type=\"cloud_function\"",
"range": {
"min": 0,
"max": 100
}
}
}
},
"goal": 0.99,
"rollingPeriod": "3600s",
"displayName": "99% requests under 100 ms"
}
App Engine
App Engine provides a fully managed serverless platform to build and run applications. You have the choice of two environments, standard or flexible; for more information, see Choosing an App Engine environments.
For more information on App Engine, see the following:
- Documentation for App Engine.
- List of
appengine.googleapis.com
metric types.
Availability SLIs
App Engine writes metric data to
Cloud Monitoring using the
gae_app
monitored-resource type and
the
http/server/response_count
metric type.
You can filter the data by using the response_code
metric label
to count "good" and "total" responses.
You express a request-based availability SLI for App Engine
by creating a TimeSeriesRatio
structure for good requests to total requests, as shown in the following
example:
"serviceLevelIndicator": {
"requestBased": {
"goodTotalRatio": {
"totalServiceFilter":
"metric.type=\"appengine.googleapis.com/http/server/response_count\"
resource.type=\"gae_app\"
metric.label.\"response_code\">\"499\"
metric.label.\"response_code\"<\"399\"",
"goodServiceFilter":
"metric.type=\"appengine.googleapis.com/http/server/response_count\"
resource.type=\"gae_app\"
metric.label.\"response_code\"<\"299\"",
}
}
}
Latency SLIs
To measure latency, App Engine writes metric data to
Cloud Monitoring using the
gae_app
monitored-resource type and
the
http/server/response_latencies
metric type.
You can filter the data by using the response_code
metric label
to count "good" and "total" executions.
You express a request-based latency SLI for App Engine
by using a DistributionCut
structure. The following example SLO expects that 99% of all requests fall
between 0 and 100 ms in total latency over a rolling one-hour period:
{
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter":
"metric.type=\"appengine.googleapis.com/http/server/response_latencies\"
resource.type=\"gae_app\"",
"range": {
"min": 0,
"max": 100
}
}
}
},
"goal": 0.99,
"rollingPeriod": "3600s",
"displayName": "99% requests under 100 ms"
}
GKE and Istio
Google Kubernetes Engine (GKE) is Google's secured and managed Kubernetes service with four-way auto scaling and multi-cluster support. Istio is an open-source service mesh that allows you to connect, secure, control, and observe services. Istio can be installed on GKE as an add-on—Cloud Service Mesh—or by the user from the open source project. In both cases, Istio provides excellent telemetry, including information about traffic, errors, and latency for each service managed by the mesh.
For a full list of Istio metrics, see istio.io
metric types.
Availability SLIs
Istio writes metric data to Cloud Monitoring using the
service/server/request_count
metric type and one of the following
monitored-resource types:
You can filter the data by using the response_code
metric label to count
"good" and "total" requests. You can also use the destination_service_name
metric label to count requests for a specific service.
You express a request-based availability SLI for a service running on
GKE managed by the Istio service mesh
by creating a TimeSeriesRatio
structure for
good requests to total requests, as shown in the following example:
"serviceLevelIndicator": {
"requestBased": {
"goodTotalRatio": {
"totalServiceFilter":
"metric.type=\"istio.io/service/server/request_count\"
resource.type=\"k8s_container\"
metric.label.\"destination_service_name\"=\"frontend\"",
"goodServiceFilter":
"metric.type=\istio.io/server/response_count\"
resource.type=\"k8s_container\"
metric.label.\"destination_service_name\"=\"frontend\"
metric.label.\"response_code\"<\"299\"",
}
}
}
Latency SLIs
To measure latency, Istio writes metric data to Cloud Monitoring using the
service/server/response_latencies
metric type and one of the following
monitored-resource types:
You can filter the data by using the response_code
metric label to count
"good" and "total" requests. You can also use the
destination_service_name`
metric label to count requests for a specific service.
You express a request-based latency SLI for a service running on
GKE managed by the Istio service mesh
by using a DistributionCut
structure.
The following example SLO expects that 99% of all requests to the frontend
service fall between 0 and 100 ms in total latency over a rolling one-hour
period:
{
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter":
"metric.type=\"istio.io/server/response_latencies\"
resource.type=\"k8s_container\"
metric.label.\"destination_service_name\"=\"frontend\"",
"range": {
"min": 0,
"max": 100
}
}
}
},
"goal": 0.99,
"rollingPeriod": "3600s",
"displayName": "99% requests under 100 ms"
}