Stackdriver Transparent Service Level Indicators (SLIs)

Monitor Google Cloud services and their effects on your workloads.

stackdriver-sre-hero-banner

Modern IT runs on numbers

A comprehensive, metrics-driven approach is now a baseline objective for most IT ops teams. Many businesses now measure IT on service availability and performance. But for IT teams that depend on cloud services, it can be hard to get solid data on services that are provided by an outside cloud provider. If there is a problem, where is it? With your stack or with the service provider? Transparent SLIs help you monitor Google Cloud services and their effects on your workloads, so you can get the complete picture.

measure-all-the-things

Measure all the things

To help IT understand the performance of all your services components, Google provides detailed API level metrics for over 130 Google Cloud services. These metrics show you the error counts and latency for your applications' requests to each Google service. This lets you see correlations and side effects between your applications and the services they depend on, helping to speed root-cause analysis and time to resolution.

real-transparency

Real transparency

SLIs go far beyond traditional notions of “service health.” You can see the specific interactions between services and correlate those to environmental data. This allows you to cross tab service metrics by a variety of attributes such as location of the service, credential of the app calling the service, version and response code to help you explore relationships and determine causes and effects.

Using Transparent SLIs in practice

  • If all calls to a service are failing for one user but not any other, chances are there is something wrong with that account that you can easily fix yourself.
  • If you're troubleshooting a problem with your app and notice a correlation between your application's degraded performance and a sustained increase in latency for a critical GCP service, this is a sign to call us and get us to help.
  • If the latencies for a GCP service report look good and unchanged from before, but your in-app metrics report that the latency on calls to the service is abnormally high, that tells you that there could be some trouble in the network. Call your network provider (in some cases, Google) to get the debugging process started.




Our commitment to transparency

We here at Google Cloud are committed to sharing detailed performance information about our services. This is similar to the data that Google SREs use to keep our services up and running. With this shared data, you can easily monitor how we are doing, so that when we work together on a service ticket, everyone is on the same page. We think Transparent SLIs will improve your tech support experience and increase your confidence in cloud computing.

Google Cloud

Get started

To get started collecting and exploring Transparent service metrics, go to Stackdriver Metrics Explorer and select “Consumed API” as the resource type. You'll see a list of metrics you can chart based on the products and services you are using in your application. You can then pick the metrics that makes the most sense for your environment. Narrow down the data you display by specifying which service, method, location, credential, or error code you want to monitor.

After you have decided which metrics matter most to your app, you should create custom dashboards that chart your key indicators with ours so that you get the one-stop view needed to triage the general cause of a problem. Lastly, once you have a good long-term baseline of how Google services behave on your traffic and what your app’s tolerance is, you might consider setting alerts to let you know if there is deviance from the long-term behavior.

Get Started