View observability metrics

Stay organized with collections Save and categorize content based on your preferences.

This page shows you how to view infrastructure health metrics for your Google Kubernetes Engine (GKE) clusters and workloads. These metrics can help troubleshoot issues with your GKE clusters and workloads.

Observability metrics for clusters

Requirements

  • You must have system metrics enabled on your clusters to use the overview metrics in the observability tab. System metrics are always enabled in Autopilot clusters.
  • You must have control plane metrics enabled on your clusters to use the control plane metrics in the Observability tab. Control plane metrics aren't available in Autopilot clusters.

    For more information, see Configuring Cloud Operations for GKE.

Observability metrics

In the Observability tab in the Google Cloud console, you can view performance metrics for clusters and workloads.

Metrics for clusters and workloads

The following metrics are available for both clusters and workloads:

  • Overview: Shows summary infrastructure health metrics such as CPU and memory request utilization, error logs, and warning events.
  • CPU: Shows CPU and core request utilization.
  • Memory: Shows memory request utilization.

The following metrics are available for clusters:

  • Kubernetes Events: Provides visibility into event counts over time and a detailed log of events.
  • Control plane (Standard clusters only): Provides visibility into the health of Kubernetes control plane components such as the kube-apiserver and scheduler. Also provides information such as the number of unschedulable pods. Pods in the unschedulable state were attempted for scheduling and have been determined to be unschedulable. Pods in this state are a sign that nothing in the cluster has changed that would make them schedulable.
  • Cloud Ops Ingestion: Provides visibility into the amount of logging and metrics ingestion which correlate to cost. For more information, see Google Cloud's operations suite pricing.

Interpret observability metrics

Metrics can help you troubleshoot issues with your GKE clusters, such as:

  • High CPU or memory request utilization trends might indicate that you should configure containers in a cluster or namespace to use fewer resources.
  • High container restart counts might indicate containers are crashing.
  • A high number of unschedulable Pods indicates insufficient resources or configuration errors.
  • High Cloud Logging or Managed Prometheus ingestion correlates with Google Cloud's operations suite cost. You might be able to save costs by reducing ingestion. For more information about Managed Prometheus, see Cost controls and attribution. For more information about logging, see Exclusion filters.

View cluster and workload observability metrics

To view observability metrics for your clusters or workloads, perform the following steps in the Google Cloud console:

  1. Go to the Kubernetes Clusters or Kubernetes Workloads page:

    Go to Kubernetes Clusters

    Go to Kubernetes Workloads

  2. Select the Observability tab.

  3. Choose the timeframe over which the metrics are aggregated. Drag inside a chart to focus on a specific time range. Click Reset Zoom to go back to the previously selected range.

Create a custom dashboard from a selected view

To add the visible charts to a customizable dashboard in Cloud Monitoring, perform the following steps in the Google Cloud console:

  1. Go to the Kubernetes Clusters or Kubernetes Workloads page:

    Go to Kubernetes Clusters

    Go to Kubernetes Workloads

  2. Select the Observability tab.

  3. Optionally, select filters for the data.

  4. Click Save as Custom Dashboard.

  5. Specify a name for the new dashboard.

  6. Click Submit to create a new dashboard.

  7. Click View in Monitoring to view the dashboard.

What's next