Integrating Cloud Monitoring, Logging, and Trace with observability and alerting solutions

This document provides resources that describe how to integrate Cloud Logging, Cloud Monitoring, and Cloud Trace with third-party and open source solutions for observability and alerting. This is a reference guide for Site Reliability Engineers (SREs), system administrators, network operations, monitoring professionals, and other practitioners responsible for reliability, availability, and performance of systems. This document assumes that you have experience developing solutions in Google Cloud.

You might use many tools that help you monitor and diagnose your enterprise IT systems by generating alerts when the performance and availability of those systems degrade. As you and your organization expand your computing footprint to Google Cloud, you can integrate Google Cloud into your observability and alerting practices by doing the following:

  • Extracting data from Google Cloud: send data from Cloud Monitoring and Logging to existing tools.
  • Ingesting data to Google Cloud: send data from existing tools to Cloud Monitoring and Logging.

Extracting Monitoring and Logging data

You can send observability and event data for your Google Cloud resources to the third-party monitoring, alerting, and notification tools that you use. Google Cloud services automatically generate observability data like metrics, logs, and trace data that help to provide a complete observability overview. Extracting Cloud Monitoring and Cloud Logging data lets you integrate the data into your existing reliability, alerting, and incident management processes.

Cloud Monitoring observability data

Cloud Monitoring classifies metrics into general groups based on the type of service that collects the data. To learn about metric types, resources, labels, and aggregation in Monitoring, see Understanding metrics and building charts and Structure of time series.

Cloud Monitoring collects metrics, events, and metadata from Google Cloud, Amazon Web Services (AWS), hosted uptime probes, and application instrumentation. Metrics from the following sources are recorded in Cloud Monitoring:

Extracting metrics programmatically

The Monitoring API lets you programmatically read and write all metrics that are collected in Cloud Monitoring. The following reference guide provides architecture and code examples that you can use to read metrics from the Monitoring API:

Monitoring using third-party tools

You can directly read the Monitoring API by using third-party solutions like Datadog and SignalFX that provide built-in monitoring capabilities.

Cloud Logging observability data

Google Cloud services, user resources, and code automatically generate logs that record status or an event. For information about how logs are ingested, stored, and exported, see Cloud Logging Basic concepts.

The Cloud Logging architecture includes the following components:

  • Log producers: resources that generate logs in Cloud Logging.
  • Log router: Cloud Logging and its exports.
  • Log consumers: third-party and open source tools that ingest logs from Cloud Logging.

For information about how data flows from producers through the Logs Router to log consumers, see the Logs Router overview.

For information about how to build an application to read logs programmatically, see Configure and manage sinks and Using the Logging API.

Setting up exports to third-party tools

You can export logs to Pub/Sub, Cloud Storage, and BigQuery, and create subscriptions for the tools that you want to use.

We recommend using Pub/Sub to export logs to third-party tools because Cloud Logging manages the export pipeline, and you are only responsible for processing the logs that arrive through the pipeline.

For more information about exporting logs to specific tools, see the following guides:

Extracting logs programmatically

You can use the Logging API to read logs and to list log entries to get a paginated list for a specific set of logs.

Ingesting observability data into Monitoring

You can use Cloud Monitoring to ingest observability data from on-premises and third-party tools, and then generate insights using dashboards, charts, and alerts.

Monitoring AWS resources

To monitor resources running in AWS, you can use Cloud Monitoring AWS account integration to directly ingest information about those resources into Cloud Monitoring,including metrics from Amazon CloudWatch. For more information, see Quickstart for AWS and AWS metrics.

Monitoring Azure, AWS, and on-premises resources

To monitor resources running in Microsoft Azure, AWS resources that Monitoring doesn't automatically include, or on-premises resources, you can use Blue Medora's BindPlane product. BindPlane provides an integration solution to directly ingest metric data from many different sources. For more information about BindPlane integrations, see the following guides:

Monitoring with Prometheus

Prometheus is a common open source, time-series monitoring framework that's used with Kubernetes clusters. You can use Prometheus integration to ingest infrastructure and custom application metrics into Cloud Monitoring. For more information, see the following guides:

Monitoring with Istio

Istio Observability lets you export Istio metrics. You can use the Istio on GKE add-on to automatically configure the Monitoring adapter, or manually install Istio in your clusters, and then configure the Monitoring adapter. For more information about using Istio with Cloud Monitoring, see the following guides:

Custom monitoring

You can add custom telemetry to your applications and ingest metrics into Cloud Monitoring to use in charts, dashboards, and alerting policies.

We recommend using OpenCensus if your app is written in a language that the library supports. For more information, see Custom metrics with OpenCensus.

If your app is running on GKE, you can use Prometheus as described earlier in this document.

You can add custom metric instrumentation to your code by using the Monitoring API. The Monitoring API provides you with the most flexibility and control, but it is more complex to use than OpenCensus or Prometheus. For more information, see Introduction to the Monitoring API.

Visualizing Monitoring data

Cloud Monitoring provides the following options for visualizing data:

The Monitoring page in the Google Cloud console lets you visualize data by using the following tools:

  • Google Cloud dashboards: display charts for Google Cloud resources, apps, or AWS resources. You can modify the chart configuration and the display period for these dashboards.
  • Custom dashboards: customizable dashboards that you can create to display the health of services or groups of resources that you specify.
  • Metrics Explorer: a web view that lets you create custom views of metrics that are collected in the workspace. You can share Metrics Explorer charts to support real-time troubleshooting and collaboration use cases.

For more information, see the Google Cloud Skills Boost monitoring multiple projects with Cloud Monitoring.

Ingesting observability data into Logging

You can use Cloud Logging to ingest log data from on-premises and from third-party tools, and then store, search, analyze, monitor, and alert on log data and events.

Logging AWS resources

To ingest logs from VMs that are running in Amazon EC2, you configure VM permissions, and then install the Cloud Logging Agent on the VMs. To learn more about using AWS logs with Cloud Logging, see the quickstart for AWS.

Logging Azure and on-premises resources

To ingest logs from resources running in Microsoft Azure or on-premises, you can use Blue Medora's BindPlane integration solution. To learn more, see the guide to logging on-premises resources with BlueMedora.

Custom logging

To send logs directly to Cloud Logging, you can use the Logging client libraries. For more information about using the Logging client libraries to ingest logs, see the Logging API samples.

Ingesting traces into Cloud Trace

Cloud Trace helps you analyze app latency by providing application instrumentation, a storage backend, and a visualization and analysis layer for traces that you ingest. Cloud Trace is a distributed tracing system that you can use to analyze app latency, particularly for complex, microservice-based architectures.

Trace with Zipkin

If your app is already instrumented with Zipkin and you don't want to run your own trace backend, or if you want access to Cloud Trace's advanced analysis tools, you can ingest traces into Cloud Trace. The Zipkin project maintains this functionality, and Google doesn't officially support it. For more information, see Using Cloud Trace with Zipkin.

Trace with OpenTelemetry

If you don't have existing tracing instrumentation, and if you want to use Cloud Trace as your tracing analysis tool, we recommend using OpenTelemetry. Open Telemetry is an open source tracing and metrics library that supports many languages. For more information about using OpenTelemetry with Cloud Trace, see the OpenTelemetry documentation. For an example using Go, see the trace exporter package.

Trace with client libraries

You can use Cloud Trace client libraries for app instrumentation when you run VMs or containers in Google Cloud, on other cloud providers, or on-premises. Google is transitioning to use OpenTelemetry. If you want to start using distributed tracing, we recommend using OpenTelemetry as described earlier in this document.

Alerts and notifications

You can use Google Cloud and external tools to send alerts and notifications for data that's ingested from external systems.

Cloud Monitoring provides functionality for alerting and for managing incidents and events management. To monitor your systems, you can integrate alerts from Monitoring into tools that you have acquired or that you've built yourself.

To implement Monitoring integration, use one or more of the following options:

  • Use Cloud Monitoring built-in notification integrations to send notifications to other systems.
  • Use a partner solution for systems where no built-in integrations are available.
  • Build custom integrations with webhooks to deliver Cloud Monitoring notifications to other systems.

Sending alerts from Monitoring

You can create Cloud Monitoring alerting policies based on logs and ingested metrics, and send notifications to different channels, like email or SMS. The following guides provide implementation examples for Slack and PagerDuty:

For more information about alerts, see Alerting policies in depth.

Sending alerts from a third-party solution

You can use a third-party solution to send Cloud Monitoring alerts to your system. For examples, see the following guides:

Sending alerts from custom integrations

You can use a webhook to create a custom integration with many third-party monitoring systems. When you set up a custom integration, Cloud Monitoring delivers alert notifications by JSON payload for the URLs that you provide. The following guides provide examples of custom webhook integrations:

Sending error notifications from Monitoring

When you ingest data to Monitoring as described previously, you can configure Error Reporting functionality for external systems to write alerts as errors and to trigger notifications. For more information, see the following guides:

What's next