Stackdriver Service Monitoring
Modern applications are composed of multiple services — dozens or even hundreds of different dependencies that support a given application or use case. When something fails, it often seems like many things fail at once. To help manage this complexity, Stackdriver is adding support for monitoring services that are provided through Cloud Services Platform and the Istio service mesh technology. Stackdriver Service Monitoring can monitor Google App Engine services as well.
Understanding intra-service dependencies is hard, and the relationship and connection patterns between services can be complex and varied. Stackdriver Service Monitoring provides a service graph that shows all the services in your application and their relationships, so you can see dependencies at a glance. The service graph also shows you the traffic, errors, and latencies between services so you can tell which services may be impacting performance or availability of other services. The service graph also lets you see what has changed over time, so it’s easier to isolate problems. Stackdriver Service Monitoring service graph gives you a real-time and historical view of services and their dependencies, so you can visualize your application and reduce time to root cause analysis and recovery.
Setting service level objectives
With our service monitoring functionality, we enable you to set, monitor, and alert your teams, based on Service Level Objectives so your teams can focus on what matters to your business. Because Istio (and App Engine) are instrumented in an opinionated way, we know exactly what the transaction counts, error counts, and latency distributions are between services. All you need to do is set your targets for availability and performance, and we automatically generate the graphs for service level indicators (SLIs), compliance to your targets over time, and remaining “error budget.” When those targets are violated, you are alerted to take action to fix the service.
Stackdriver Service Monitoring includes interactive dashboarding so you can dig deep into a service’s behavior across all signals without bouncing between metrics, logs, and traces. You get one dashboard scoped to the particular service, which can further scope to a specific time range. When diagnosing availability issues, you can drill down to metrics heat maps and traces, and explore logs and error reports, getting stack traces and opening the live production debugger, if you are instrumented for it. The service dashboard provides a single coherent way to narrow scope: you can tighten from an alert on a service, to a specific bounded time range, to a subset of traffic, and finally to a potential cause. This is the fastest possible way to get to the bottom of a problem for your service.
Using Istio Service Monitoring in practice
- Use Stackdriver Service Monitoring SLOs to monitor and detect when there is a problem with an app
- Use Stackdriver’s service graph to figure out service dependencies and which service is most likely to be the cause of the problem
- Use the service dashboard to work through the various signals from the service in question and track down a root cause