Request-response services

A request-response service is one where a customer explicitly asks the service to do some work and waits for that work to be completed successfully. The most common examples of such services are:

Web applications that human users interact with directly by using a browser.
Mobile applications that consist of a client application on a user's mobile phone and an API backend that the client interacts with.
API backends that are utilized by other services (rather than human users).

For all of these services, the common approach is to start with availability (measuring the ratio of successful requests) and latency (measuring the ratio of requests that complete under a time threshold) SLIs. For more information on availability and latency SLIs, see Concepts in service monitoring.

You express a request-based availability SLI by using the TimeSeriesRatio structure to set up a ratio of good requests to total requests. You decide how to filter the metric by using its available labels to arrive at your preferred determination of "good" or "valid".

You express a request-based latency SLI by using a DistributionCut structure.

Cloud Endpoints

Cloud Endpoints is a service for managing APIs. It allows you to take an existing API and expose it with authentication, quotas, and monitoring.

Endpoints is implemented as a proxy in front of the gRPC application server. By measuring the metrics at the proxy, you can correctly handle the case when all backends are unavailable and users are seeing errors. Endpoints writes data to metric types beginning with the prefix serviceruntime.googleapis.com.

For more information, see the following:

Documentation for Cloud Endpoints.
List of serviceruntime.googleapis.com metric types.

Availability SLIs

Cloud Endpoints writes metric data to Cloud Monitoring using the api monitored-resource type and the service-runtime api/request_count metric type, which you can filter by using the response_code metric label to count "good" and "total" requests.

You express a request-based availability SLI by creating a TimeSeriesRatio structure for good requests to total requests, as shown in the following example:

"serviceLevelIndicator": {
  "requestBased": {
    "goodTotalRatio": {
      "totalServiceFilter":
        "metric.type=\"serviceruntime.googleapis.com/api/request_count\"
         resource.type=\"api\"
         metric.label.\"response_code_class\"!=\"4xx\"",
      "goodServiceFilter":
        "metric.type=\"serviceruntime.googleapis.com/api/request_count\"
         resource.type=\"api\"
         (metric.label.\"response_code_class\"=\"1xx\"" OR
          metric.label.\"response_code_class\"=\"2xx\""),
    }
  }
}

Latency SLIs

Cloud Endpoints uses the following primary metric types to capture latency:

api/request_latencies: a distribution of latencies in seconds for non-streaming requests. Use when overall user experience is of primary importance.
api/request_latencies_backend: a distribution of backend latencies in seconds for non-streaming requests. Use to to measure backend latencies directly.
api/request_latencies_overhead: a distribution of request latencies in seconds for non-streaming requests, excluding the backend. Use to measure the overhead introduced by the Endpoints proxy.

Note that the total request latency is the sum of the backend and overhead latencies:

request_latencies = request_latencies_backend + request_latencies_overhead

Endpoints writes metric data to Cloud Monitoring using the api monitored-resource type and one of the request-latency metric types. None of these metric types provides a response_code orresponse_code_class label; therefore, they report latencies for all requests.

You express a request-based latency SLI by using a DistributionCut structure, as shown in the following examples.

The following example SLO expects that 99% of all requests in the project fall between 0 and 100 ms in total latency over a rolling one-hour period:

{
  "serviceLevelIndicator": {
    "requestBased": {
      "distributionCut": {
        "distributionFilter":
          "metric.type=\"serviceruntime.googleapis.com/ap/request_latencies\"
           resource.type=\"api\"",
        "range": {
          "min": 0,
          "max": 100
        }
      }
    }
  },
  "goal": 0.99,
  "rollingPeriod": "3600s",
  "displayName": "99% requests under 100 ms"
}

The following example SLO expects that 98% of requests fall between 0 and 100 ms in backend latency over a rolling one-hour period:

{
  "serviceLevelIndicator": {
    "requestBased": {
      "distributionCut": {
        "distributionFilter":
          "metric.type=\"serviceruntime.googleapis.com/api/backend_latencies\"
           resource.type=\"api\"",
        "range": {
          "min": 0,
          "max": 100
        }
      }
    }
  },
  "goal": 0.98,
  "rollingPeriod": "3600s",
  "displayName": "98% requests under 100 ms"
}

Cloud Run

Cloud Run is a fully managed compute platform for deploying and scaling containerized applications quickly and securely. It's intended to abstract away all infrastructure management by responding to changes in traffic by automatically scaling up and down from zero almost instantaneously and only charging you for the exact resources you use.

For additional information about Cloud Run observability. see the following:

Documentation for Cloud Run.
List of run.googleapis.com metric types.

Availability SLIs

Cloud Run writes metric data to Cloud Monitoring using the cloud_run_revision monitored-resource type and request_count metric type. You can filter the data by using the response_codeor the response_code_class metric label to count "good" and "total" requests.

You express a request-based availability SLI by creating a TimeSeriesRatio structure for good requests to total requests, as shown in the following example:

"serviceLevelIndicator": {
  "requestBased": {
    "goodTotalRatio": {
      "totalServiceFilter":
        "metric.type=\"run.googleapis.com/request_count\"
         resource.type=\"cloud_run_revision\"
         metric.label.\"response_code_class\"!=\"4xx\"",
      "goodServiceFilter":
        "metric.type=\"run.googleapis.com/request_count\"
         resource.type=\"cloud_run_revision\"
         (metric.label.\"response_code_class\"=\"1xx\"" OR
          metric.label.\"response_code_class\"=\"2xx\""),
     }
  }
}

Latency SLIs

To measure latency, Cloud Run writes metric data to Cloud Monitoring using the cloud_run_revision monitored-resource type and request_latencies metric type. The data is a distribution of request latency in milliseconds reaching the revision. You can filter the data by using the response_codeor the response_code_class metric label if you need to explicitly measure the latency of all requests or only the successful requests.

You express a request-based latency SLI by using a DistributionCut structure. The following example SLO expects that 99% of requests fall between 0 and 100 ms in total latency over a rolling one-hour period:

{
  "serviceLevelIndicator": {
    "requestBased": {
      "distributionCut": {
        "distributionFilter":
          "metric.type=\"run.googleapis.com/request_latencies\"
           resource.type=\"cloud_run_revision"",
        "range": {
           "min": 0,
           "max": 100
        }
      }
    }
  },
  "goal": 0.99,
  "rollingPeriod": "3600s",
  "displayName": "99% requests under 100 ms"
}

Cloud Run functions

Cloud Run functions is a scalable pay-as-you-go Functions-as-a-Service offering that runs your code without the need to manage any infrastructure. Functions are used in many architecture patterns to do things like event processing, automation, and serving HTTP/S requests.

For information on Cloud Run functions observability, see the following:

Documentation for Cloud Run functions.
List of run.googleapis.com metric types.

Availability SLIs

Cloud Run functions writes metric data to Cloud Monitoring using the cloud_function monitored-resource type and function/execution_time metric type. You can filter the data by using the status metric label to count "good" and "total" executions.

You express a request-based availability SLI by creating a TimeSeriesRatio structure for good requests to total requests, as shown in the following example:

"serviceLevelIndicator": {
  "requestBased": {
    "goodTotalRatio": {
      "totalServiceFilter":
        "metric.type=\"cloudfunctions.googleapis.com/function/execution_count\"
         resource.type=\"cloud_function\"",
      "goodServiceFilter":
        "metric.type=\"cloudfunctions.googleapis.com/function/execution_count\"
         resource.type=\"cloud_function\
         metric.label.\"status\"=\"ok\"",
     }
  }
}

Latency SLIs

To measure latency, Cloud Run functions writes metric data to Cloud Monitoring using the cloud_function monitored-resource type and function/execution_times metric type. The data is a distribution of functions execution times in nanoseconds." You can filter the data by using the status if you need to explicitly measure the latency of all executions or only the successful executions.

You express a request-based latency SLI by using a DistributionCut structure. The following example SLO expects that 99% of all Cloud Run functions executions fall between 0 and 100 ms in total latency over a rolling one-hour period:

{
  "serviceLevelIndicator": {
    "requestBased": {
      "distributionCut": {
        "distributionFilter":
          "metric.type=\"cloudfunctions.googleapis.com/function/execution_times\"
           resource.type=\"cloud_function\"",
        "range": {
          "min": 0,
          "max": 100
        }
      }
    }
  },
  "goal": 0.99,
  "rollingPeriod": "3600s",
  "displayName": "99% requests under 100 ms"
}

App Engine

App Engine provides a fully managed serverless platform to build and run applications. You have the choice of two environments, standard or flexible; for more information, see Choosing an App Engine environments.

For more information on App Engine, see the following:

Documentation for App Engine.
List of appengine.googleapis.com metric types.

Availability SLIs

App Engine writes metric data to Cloud Monitoring using the gae_app monitored-resource type and the http/server/response_count metric type. You can filter the data by using the response_code metric label to count "good" and "total" responses.

You express a request-based availability SLI for App Engine by creating a TimeSeriesRatio structure for good requests to total requests, as shown in the following example:

"serviceLevelIndicator": {
  "requestBased": {
    "goodTotalRatio": {
      "totalServiceFilter":
        "metric.type=\"appengine.googleapis.com/http/server/response_count\"
         resource.type=\"gae_app\"
         metric.label.\"response_code\">\"499\"
         metric.label.\"response_code\"<\"399\"",
      "goodServiceFilter":
        "metric.type=\"appengine.googleapis.com/http/server/response_count\"
         resource.type=\"gae_app\"
         metric.label.\"response_code\"<\"299\"",
     }
  }
}

Latency SLIs

To measure latency, App Engine writes metric data to Cloud Monitoring using the gae_app monitored-resource type and the http/server/response_latencies metric type. You can filter the data by using the response_code metric label to count "good" and "total" executions.

You express a request-based latency SLI for App Engine by using a DistributionCut structure. The following example SLO expects that 99% of all requests fall between 0 and 100 ms in total latency over a rolling one-hour period:

{
  "serviceLevelIndicator": {
    "requestBased": {
      "distributionCut": {
        "distributionFilter":
          "metric.type=\"appengine.googleapis.com/http/server/response_latencies\"
           resource.type=\"gae_app\"",
        "range": {
          "min": 0,
          "max": 100
        }
      }
    }
  },
  "goal": 0.99,
  "rollingPeriod": "3600s",
  "displayName": "99% requests under 100 ms"
}

GKE and Istio

Google Kubernetes Engine (GKE) is Google's secured and managed Kubernetes service with four-way auto scaling and multi-cluster support. Istio is an open-source service mesh that allows you to connect, secure, control, and observe services. Istio can be installed on GKE as an add-on—Cloud Service Mesh—or by the user from the open source project. In both cases, Istio provides excellent telemetry, including information about traffic, errors, and latency for each service managed by the mesh.

For a full list of Istio metrics, see istio.io metric types.

Availability SLIs

Istio writes metric data to Cloud Monitoring using the service/server/request_count metric type and one of the following monitored-resource types:

You can filter the data by using the response_code metric label to count "good" and "total" requests. You can also use the destination_service_name metric label to count requests for a specific service.

You express a request-based availability SLI for a service running on GKE managed by the Istio service mesh by creating a TimeSeriesRatio structure for good requests to total requests, as shown in the following example:

"serviceLevelIndicator": {
  "requestBased": {
    "goodTotalRatio": {
      "totalServiceFilter":
        "metric.type=\"istio.io/service/server/request_count\"
         resource.type=\"k8s_container\"
         metric.label.\"destination_service_name\"=\"frontend\"",
      "goodServiceFilter":
        "metric.type=\istio.io/server/response_count\"
         resource.type=\"k8s_container\"
         metric.label.\"destination_service_name\"=\"frontend\"
         metric.label.\"response_code\"<\"299\"",
    }
  }
}

Latency SLIs

To measure latency, Istio writes metric data to Cloud Monitoring using the service/server/response_latencies metric type and one of the following monitored-resource types:

You can filter the data by using the response_code metric label to count "good" and "total" requests. You can also use thedestination_service_name` metric label to count requests for a specific service.

You express a request-based latency SLI for a service running on GKE managed by the Istio service mesh by using a DistributionCut structure. The following example SLO expects that 99% of all requests to the frontend service fall between 0 and 100 ms in total latency over a rolling one-hour period:

{
  "serviceLevelIndicator": {
    "requestBased": {
      "distributionCut": {
        "distributionFilter":
          "metric.type=\"istio.io/server/response_latencies\"
           resource.type=\"k8s_container\"
           metric.label.\"destination_service_name\"=\"frontend\"",
        "range": {
          "min": 0,
          "max": 100
        }
      }
    }
  },
  "goal": 0.99,
  "rollingPeriod": "3600s",
  "displayName": "99% requests under 100 ms"
}