Observability for GKE

Autopilot Standard

This page describes how you can understand the health of your applications and maintain application availability and reliability.

Default observability features

By default, GKE clusters are configured to do the following:

Send system logs, audit logs, and application logs to Cloud Logging.
Send system metrics to Cloud Monitoring.
Use Google Cloud Managed Service for Prometheus to collect configured third-party and user-defined metrics and then send them to Cloud Monitoring. Google Cloud Managed Service for Prometheus lets you monitor and alert on your workloads using Prometheus, without manually managing and operating Prometheus at scale.

Customize and enhance data collection

By default, GKE creates a Logging repository for storing logs for each cluster. You can control which logs and which metrics, if any, are sent from your GKE cluster to Cloud Logging and Cloud Monitoring.

You can also control whether to enable Google Cloud Managed Service for Prometheus.

For GKE Autopilot clusters, you cannot disable the Cloud Monitoring and Cloud Logging integration.

Additional observability metrics

You can collect additional observability metrics by enabling one or more observability metrics packages.

Control plane metrics: Monitor the health of Kubernetes components by collecting metrics for the Kubernetes API server, Scheduler, and Controller Manager. These metrics are useful signals of service health for defining service level objectives (SLOs).
Kube state metrics: Monitor the health of Kubernetes objects such as Deployments, Nodes, and Pods.
cAdvisor/Kubelet metrics: Monitor the health of containers and the kubelet.

If you have enabled GKE Enterprise in your project, these metrics are enabled by default when you register to a fleet during cluster creation.

Third-party and user-defined metrics

To monitor third-party applications running on your clusters such as Postgres, MongoDB, and Redis, use Prometheus exporters with Google Cloud Managed Service for Prometheus.

You can also write custom exporters to monitor other signals of health and performance.

Use collected data

Use the data you collect to analyze application health, debug, troubleshoot, and test as you develop, deploy, and maintain your applications.

GKE provides built-in observability features to get you started quickly:

View collected data for your clusters and workloads on in GKE observability dashboards. You can customize the provided dashboards for the following purposes:
- View key cluster metrics, such as CPU utilization, memory utilization, and the number of open incidents.
- View clusters by their infrastructure, workloads, or Services.
- Inspect namespaces, Nodes, workloads, Services, Pods, and containers.
- For Pods and containers, view metrics as a function of time and view log entries.
You can also create your own dashboards or import Grafana dashboards to meet your needs.

Note: The provided GKE dashboards only display information for GKE clusters running on Google Cloud. They don't display information for GKE clusters running anywhere else, for example using on-premises or bare-metal servers.
From the Observability tab, you can create recommended alert policies so that you are notified about issues. To learn more about alerting, see the Alerting overview.
Create SLOs to monitor your service performance goals using collected GKE metrics.
Use GKE playbooks to troubleshoot common issues such as unschedulable Pods and containers that repeatedly crash after restart.
Explore and analyze your data with tools such as Logs Explorer, Metrics Explorer and Error Reporting.
Review GKE audit logs that record administrative activities and accesses as part of Cloud Audit Logs. Audit log policy determines which events are recorded and whether a log entry belongs to an Admin Activity log or a Data Access log.

Other features

GKE integrates with other Google Cloud services to help you monitor and manage your clusters and workloads.

Use the security posture dashboard to identify security concerns based on our standards and industry best practices.
View insights and recommendations to optimize your clusters.
Use network policy logging to help you troubleshoot issues with Kubernetes network policies. If you use GKE Dataplane V2, then network policy logging is built-in.

Pricing

Pricing for integration with Cloud Logging (including Cloud Audit Logs), Cloud Monitoring, and Google Cloud Managed Service for Prometheus is based on the amount of logs and metrics collected. See the Pricing page for details.

Features provided by other Google Cloud services listed in Other features have separate pricing. See the Pricing section of those documentation pages for more information.

What's next

Observe your clusters. Learn how to view dashboards, organize cluster information, and view alerting details.
Enable verbose, OS-level audit logging on GKE cluster nodes and how to export logs to Cloud Logging.
For more information about how to use observability features to troubleshoot GKE, see Introduction to GKE troubleshooting.