Observability for GKE


This page describes how you can understand the health of your applications and maintain application availability and reliability.

Default observability features

By default, GKE clusters are configured to do the following:

Customize and enhance data collection

By default, GKE creates a Logging repository for storing logs for each cluster. You can control which logs and which metrics, if any, are sent from your GKE cluster to Cloud Logging and Cloud Monitoring.

You can also control whether to enable Google Cloud Managed Service for Prometheus.

For GKE Autopilot clusters, you cannot disable the Cloud Monitoring and Cloud Logging integration.

Additional observability metrics

You can collect additional observability metrics by enabling one or more observability metrics packages.

  • Control plane metrics: Monitor the health of Kubernetes components by collecting metrics for the Kubernetes API server, Scheduler, and Controller Manager. These metrics are useful signals of service health for defining service level objectives (SLOs).
  • Kube state metrics: Monitor the health of Kubernetes objects such as Deployments, Nodes, and Pods.
  • cAdvisor/Kubelet metrics: Monitor the health of containers and the kubelet.

If you have enabled GKE Enterprise in your project, these metrics are enabled by default when you register to a fleet during cluster creation.

Third-party and user-defined metrics

To monitor third-party applications running on your clusters such as Postgres, MongoDB, and Redis, use Prometheus exporters with Google Cloud Managed Service for Prometheus.

You can also write custom exporters to monitor other signals of health and performance.

Use collected data

Use the data you collect to analyze application health, debug, troubleshoot, and test as you develop, deploy, and maintain your applications.

GKE provides built-in observability features to get you started quickly:

  • View collected data for your clusters and workloads on in GKE observability dashboards. You can customize the provided dashboards for the following purposes:

    • View key cluster metrics, such as CPU utilization, memory utilization, and the number of open incidents.
    • View clusters by their infrastructure, workloads, or Services.
    • Inspect namespaces, Nodes, workloads, Services, Pods, and containers.
    • For Pods and containers, view metrics as a function of time and view log entries.

    You can also create your own dashboards or import Grafana dashboards to meet your needs.

  • From the Observability tab, you can create recommended alert policies so that you are notified about issues. To learn more about alerting, see the Alerting overview.

  • Create SLOs to monitor your service performance goals using collected GKE metrics.

  • Use GKE playbooks to troubleshoot common issues such as unschedulable Pods and containers that repeatedly crash after restart.

  • Explore and analyze your data with tools such as Logs Explorer, Metrics Explorer and Error Reporting.

  • Review GKE audit logs that record administrative activities and accesses as part of Cloud Audit Logs. Audit log policy determines which events are recorded and whether a log entry belongs to an Admin Activity log or a Data Access log.

Other features

GKE integrates with other Google Cloud services to help you monitor and manage your clusters and workloads.

Pricing

Pricing for integration with Cloud Logging (including Cloud Audit Logs), Cloud Monitoring, and Google Cloud Managed Service for Prometheus is based on the amount of logs and metrics collected. See the Pricing page for details.

Features provided by other Google Cloud services listed in Other features have separate pricing. See the Pricing section of those documentation pages for more information.

What's next