This document provides resources that describe how to integrate Cloud Logging, Cloud Monitoring, and Cloud Trace with third-party and open source solutions for observability and alerting. This is a reference guide for Site Reliability Engineers (SREs), system administrators, network operations, monitoring professionals, and other practitioners responsible for reliability, availability, and performance of systems. This document assumes that you have experience developing solutions in Google Cloud.
You might use many tools that help you monitor and diagnose your enterprise IT systems by generating alerts when the performance and availability of those systems degrade. As you and your organization expand your computing footprint to Google Cloud, you can integrate Google Cloud into your observability and alerting practices by doing the following:
- Extracting data from Google Cloud: send data from Cloud Monitoring and Logging to existing tools.
- Ingesting data to Google Cloud: send data from existing tools to Cloud Monitoring and Logging.
Extracting Monitoring and Logging data
You can send observability and event data for your Google Cloud resources to the third-party monitoring, alerting, and notification tools that you use. Google Cloud services automatically generate observability data like metrics, logs, and trace data that help to provide a complete observability overview. Extracting Cloud Monitoring and Cloud Logging data lets you integrate the data into your existing reliability, alerting, and incident management processes.
Cloud Monitoring observability data
Cloud Monitoring classifies metrics into general groups based on the type of service that collects the data. To learn about metric types, resources, labels, and aggregation in Monitoring, see Understanding metrics and building charts and Structure of time series.
Cloud Monitoring collects metrics, events, and metadata from Google Cloud, Amazon Web Services (AWS), hosted uptime probes, and application instrumentation. Metrics from the following sources are recorded in Cloud Monitoring:
- Google Cloud metrics
- Kubernetes metrics
- Istio metrics generated by the Istio on Google Kubernetes Engine add-on.
- Anthos metrics generated by GKE On Prem.
- Logs-based metrics
- Custom metrics that you can create and report for specific use cases that aren't included in metrics that are already provided by Cloud Monitoring.
Extracting metrics programmatically
The Monitoring API lets you programmatically read and write all metrics that are collected in Cloud Monitoring. The following reference guide provides architecture and code examples that you can use to read metrics from the Monitoring API:
Monitoring using third-party tools
Cloud Logging observability data
Google Cloud services, user resources, and code automatically generate logs that record status or an event. For information about how logs are ingested, stored, and exported, see Cloud Logging Basic concepts.
The Cloud Logging architecture includes the following components:
- Log producers: resources that generate logs in Cloud Logging.
- Log router: Cloud Logging and its exports.
- Log consumers: third-party and open source tools that ingest logs from Cloud Logging.
For information about how data flows from producers through the Logs Router to log consumers, see the Logs Router overview.
Setting up exports to third-party tools
You can export logs to Pub/Sub, Cloud Storage, and BigQuery, and create subscriptions for the tools that you want to use.
We recommend using Pub/Sub to export logs to third-party tools because Cloud Logging manages the export pipeline, and you are only responsible for processing the logs that arrive through the pipeline.
For more information about exporting logs to specific tools, see the following guides:
- Design patterns for exporting logging data
- Scenarios for exporting Cloud Logging: Compliance requirements
- Scenarios for exporting logging data: Security and access analytics
- Scenarios for exporting Cloud Logging data: Splunk
- Scenarios for exporting Cloud Logging: Elasticsearch
Extracting logs programmatically
Ingesting observability data into Monitoring
You can use Cloud Monitoring to ingest observability data from on-premises and third-party tools, and then generate insights using dashboards, charts, and alerts.
Monitoring AWS resources
To monitor resources running in AWS, you can use Cloud Monitoring AWS account integration to directly ingest information about those resources into Cloud Monitoring,including metrics from Amazon CloudWatch. For more information, see Quickstart for AWS and AWS metrics.
Monitoring Azure, AWS, and on-premises resources
To monitor resources running in Microsoft Azure, AWS resources that Monitoring doesn't automatically include, or on-premises resources, you can use Blue Medora's BindPlane product. BindPlane provides an integration solution to directly ingest metric data from many different sources. For more information about BindPlane integrations, see the following guides:
- Monitoring on-premises resources with Blue Medora
- Observability included: Cloud Monitoring integration
- Extending Monitoring to on-premises with the new BindPlane integration
- BindPlane documentation
Monitoring with Prometheus
Prometheus is a common open source, time-series monitoring framework that's used with Kubernetes clusters. You can use Prometheus integration to ingest infrastructure and custom application metrics into Cloud Monitoring. For more information, see the following guides:
- Using Prometheus with Monitoring
- White-box app monitoring for GKE with Prometheus
- Monitoring apps running on multiple GKE clusters using Prometheus and Cloud Monitoring
Monitoring with Istio
Istio Observability lets you export Istio metrics. You can use the Istio on GKE add-on to automatically configure the Monitoring adapter, or manually install Istio in your clusters, and then configure the Monitoring adapter. For more information about using Istio with Cloud Monitoring, see the following guides:
You can add custom telemetry to your applications and ingest metrics into Cloud Monitoring to use in charts, dashboards, and alerting policies.
We recommend using OpenCensus if your app is written in a language that the library supports. For more information, see Custom metrics with OpenCensus and Using OpenCensus with Cloud Bigtable and Cloud Trace.
If your app is running on GKE, you can use Prometheus as described earlier in this document.
You can add custom metric instrumentation to your code by using the Monitoring API. The Monitoring API provides you with the most flexibility and control, but it is more complex to use than OpenCensus or Prometheus. For more information, see Introduction to the Monitoring API.
Visualizing Monitoring data
Cloud Monitoring provides the following options for visualizing data:
- Monitoring UI: Cloud Monitoring provides built-in charts and dashboards.
- Integrated tools: Visualization tools like Grafana can integrate with the Cloud Monitoring data source. Grafana is an open source platform for metrics analysis and visualization. For more information, see the following guides:
- BigQuery data export: BigQuery is a fully managed data warehouse solution that provides data visualization using Data Studio and other integrated partner visualization tools. For information about how to configure metric extraction to BigQuery for visualization with integrated tools like Data Studio, see the following guides:
The Monitoring page in the Google Cloud Console lets you visualize data by using the following tools:
- Monitoring overview: predefined dashboards that display charts for Google Cloud resources, apps, or AWS resources. You can modify the chart configuration and the display period for these dashboards.
- Custom dashboards: customizable dashboards that you can create to display the health of services or groups of resources that you specify.
- Metrics Explorer: a web view that lets you create custom views of metrics that are collected in the workspace. You can share Metrics Explorer charts to support real-time troubleshooting and collaboration use cases.
For more information, see the Qwiklab monitoring multiple projects with Cloud Monitoring.
Ingesting observability data into Logging
You can use Cloud Logging to ingest log data from on-premises and from third-party tools, and then store, search, analyze, monitor, and alert on log data and events.
Logging AWS resources
To ingest logs from VMs that are running in Amazon EC2, you configure VM permissions, and then install the Cloud Logging Agent on the VMs. To learn more about using AWS logs with Cloud Logging, see the quickstart for AWS.
Logging Azure and on-premises resources
To ingest logs from resources running in Microsoft Azure or on-premises, you can use Blue Medora's BindPlane integration solution. To learn more, see the guide to logging on-premises resources with BlueMedora.
Ingesting traces into Cloud Trace
Cloud Trace helps you analyze app latency by providing application instrumentation, a storage backend, and a visualization and analysis layer for traces that you ingest. Cloud Trace is a distributed tracing system that you can use to analyze app latency, particularly for complex, microservice-based architectures.
Trace with Zipkin
If your app is already instrumented with Zipkin and you don't want to run your own trace backend, or if you want access to Cloud Trace's advanced analysis tools, you can ingest traces into Cloud Trace. The Zipkin project maintains this functionality, and Google doesn't officially support it. For more information, see Using Cloud Trace with Zipkin.
Trace with OpenTelemetry
If you don't have existing tracing instrumentation, and if you want to use Cloud Trace as your tracing analysis tool, we recommend using OpenTelemetry. Open Telemetry is an open source tracing and metrics library that supports many languages. For more information about using OpenTelemetry with Cloud Trace, see the OpenTelemetry documentation. For an example using Go, see the trace exporter package.
Trace with client libraries
You can use Cloud Trace client libraries for app instrumentation when you run VMs or containers in Google Cloud, on other cloud providers, or on-premises. Google is transitioning to use OpenTelemetry. If you want to start using distributed tracing, we recommend using OpenTelemetry as described earlier in this document.
Alerts and notifications
You can use Google Cloud and external tools to send alerts and notifications for data that's ingested from external systems.
Cloud Monitoring provides functionality for alerting and for managing incidents and events management. To monitor your systems, you can integrate alerts from Monitoring into tools that you have acquired or that you've built yourself.
To implement Monitoring integration, use one or more of the following options:
- Use Cloud Monitoring built-in notification integrations to send notifications to other systems.
- Use a partner solution for systems where no built-in integrations are available.
- Build custom integrations with webhooks to deliver Cloud Monitoring notifications to other systems.
Sending alerts from Monitoring
You can create Cloud Monitoring alerting policies based on logs and ingested metrics, and send notifications to different channels, like email or SMS. The following guides provide implementation examples for Slack and PagerDuty:
- PagerDuty Cloud Monitoring integration guide
- Sending Connection Notifications to Slack from Compute Engine
For more information about alerts, see Alerting policies in depth.
Sending alerts from a third-party solution
You can use a third-party solution to send Cloud Monitoring alerts to your system. For examples, see the following guides:
- xMatters: Monitoring integration options let you create incident resolution workflows to help coordinate and resolve incidents.
- PagerDuty: Cloud Monitoring Integration Guide lets you set up built-in, bi-directional alert policies.
Sending alerts from custom integrations
You can use a webhook to create a custom integration with many third-party monitoring systems. When you set up a custom integration, Cloud Monitoring delivers alert notifications by JSON payload for the URLs that you provide. The following guides provide examples of custom webhook integrations:
- How to connect Google Cloud's operations suite to external monitoring
- OpsGenie Monitoring integration
- VictorOps Monitoring integration
Sending error notifications from Monitoring
When you ingest data to Monitoring as described previously, you can configure Error Reporting functionality for external systems to write alerts as errors and to trigger notifications. For more information, see the following guides:
- Try other Google Cloud features using our technical guides.
- Learn more about Cloud Logging, Cloud Monitoring, and Cloud Trace.
- Get hands-on experience by completing the Qwiklabs in the Google Cloud's Operations Suite quest.