Monitoring

This page explains how to use Stackdriver Monitoring to monitor your Kubernetes Engine clusters.

Overview

You can use Stackdriver Monitoring to monitor signals and build operations in your Kubernetes Engine clusters.

Stackdriver monitors system metrics and custom metrics. System metrics are measurements of the cluster's infrastructure, such as CPU or memory usage. Custom metrics are application-specific metrics that you define yourself, such as the total number of active user sessions or the total number of rendered pages.

For system metrics, Stackdriver creates a Deployment that periodically connects to each node and collects metrics about its Pods and containers, then sends the metrics to Stackdriver. For a list of the system metrics collected from Kubernetes Engine, refer to Metrics List in the Stackdriver documentation.

To learn how to set up custom metrics, refer to Using Custom Metrics in the Stackdriver documentation.

Enabling Stackdriver Monitoring

You can create a new cluster with monitoring enabled, or add monitoring capability to an existing cluster.

Note that your cluster's node pools (including the default node pool) must have the necessary GCP scope to interact with Stackdriver Monitoring (the https://www.googleapis.com/auth/monitoring scope). When you create a new cluster with monitoring, Kubernetes Engine sets this scope automatically; however, existing clusters might not have the necessary permissions.

Creating a cluster with monitoring

You can create a cluster with Stackdriver Monitoring enabled, or enable Stackdriver Monitoring in an existing cluster.

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Click Create cluster.

  3. Configure the cluster as desired. Ensure that Turn on Stackdriver Monitoring is selected.
  4. Click Create.

gcloud

The following command creates a cluster named my-cluster in your project's default compute zone:

gcloud container clusters create my-cluster

When you run gcloud container clusters create, the --enable-cloud-monitoring flag is automatically set. This flag enables Stackdriver Monitoring in the cluster. To disable this default behavior, set the --no-enable-cloud-monitoring flag

Enabling monitoring for an existing cluster

To enable monitoring for an existing cluster, run the following command, where [CLUSTER-NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER-NAME] --monitoring-service=monitoring.googleapis.com

Note that if you initially created your cluster without monitoring, and want to enable it later, the cluster's node pools might not have the necessary GCP scope to interact with Stackdriver Monitoring. As a workaround, you can create a new node pool with the same number of nodes and the necessary scope as follows:

gcloud container node-pools create adjust-scope \
 --cluster [CLUSTER-NAME] --zone [ZONE] \
 --num-nodes [NUM-NODES] \
 --scopes https://www.googleapis.com/auth/monitoring

Once you've created the new node pool, you can move your existing Pods to the new, correctly-scoped node pool to use Stackdriver Monitoring. For more information, see "updating VM scopes with zero downtime".

Viewing metrics

You can view metrics in the in the GCP Console's Stackdriver Monitoring menu.

Cluster overview

Stackdriver Monitoring provides an overview menu for Kubernetes Engine. This menu collects displays useful information about your clusters in helpful dashboards.

To view the overview menu, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Resources, then select Kubernetes Engine.

  3. Select your cluster.

The overview menu provides the following default dashboards:

  • Incidents: Violations of alerting policies.
  • Events: Chronological list of anomalies, incidents, lifecycle, tags, IAM policies, deploys, notes, cloud provider updates, and user management updates that occur in your cloud accounts.
  • CPU Usage: Displays per-cluster CPU usage percentages.
  • Disk I/O: Displays per-cluster disk I/O rates in KB/s.
  • Network Traffic: Displays per-cluster network traffic in KB/s.
  • Pods: List of Pods and nodes (Compute Engine VM instances) in all namespaces. Selecting any Pod or node opens the overview for that resource.

To learn more about viewing metrics, refer to the Stackdriver Monitoring documentation and the Monitoring Filters page.

Dashboards

You can create custom dashboards for Kubernetes Engine nodes and containers.

To create a dashboard, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Dashboards, then select Create Dashboard.

  3. To create a new dashboard, click Add Chart.
  4. Fill the Title field with a name for the dashboard.
  5. From the Resource Type drop-down menu, select either Instance (GCE) for nodes or GKE Container for containers.
  6. From the Metric Type field, enter or select from the autofill menu the desired metric.
  7. Optionally, use the Filter to filter by a specific value, such as app, name, or version.
  8. Configure the dashboard further as desired. To create the dashboard, click Save.

For example, to create a chart to monitor memory usage across Pods with a common name, set resource type to GKE Container, the metric to Used Memory, filter for Name, and then add the name as the value.

Metrics Explorer

Metrics Explorer allows you to select a specific metric about your clusters and perform various aggregations.

To use Metrics Explorer, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Resources, then select Metrics Explorer.

  3. From the Find resource type and metric search menu, enter gke_container for Resource type.
  4. For the Metric, select the desired metric.
  5. Optionally, use the Filter menu to filter by resource.
  6. Use the Aggregation options to perform a desired aggregation.

Best practices

  • Alerting: You can set up alerting policies that inform you if something suspicious occurs in your cluster, such as when the number of running Pods drops too low or too many errors occur.

Disabling monitoring

To disable monitoring for an existing cluster, run the following command, where [CLUSTER-NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER-NAME] --monitoring-service=none

What's next

Send feedback about...

Kubernetes Engine