Monitoring

This page explains how to use Stackdriver Monitoring to monitor your Kubernetes Engine clusters.

Overview

You can use Stackdriver Monitoring to monitor signals and build operations in your Kubernetes Engine clusters.

Stackdriver monitors system metrics and custom metrics. System metrics are measurements of the cluster's infrastructure, such as CPU or memory usage. Custom metrics are application-specific metrics that you define yourself, such as the total number of active user sessions or the total number of rendered pages.

For system metrics, Stackdriver creates a Deployment that periodically connects to each node and collects metrics about its Pods and containers, then sends the metrics to Stackdriver.

Metrics for usage of system resources are collected from the following sources:

  • CPU: container/cpu/usage_time
  • Memory: container/memory/bytes_used, collected from memory.usage_in_bytes in cgroup
  • Evictable memory: container/memory/bytes_used, collected from the total_inactive_file field of memory.stat
  • Non-evictable memory: Measured by memory.usage_in_bytes - memory.total_inactive_file
  • Disk: container/disk/bytes_used

For a list of other system metrics collected from Kubernetes Engine, refer to Metrics List in the Stackdriver documentation.

To learn how to set up custom metrics, refer to Using Custom Metrics in the Stackdriver documentation or follow the Autoscaling Deployments with Custom Metrics tutorial.

Before you begin

To prepare for this task, perform the following steps:

  • Ensure that you have installed the Cloud SDK.
  • Set your default project ID:
    gcloud config set project [PROJECT_ID]
  • Set your default compute zone:
    gcloud config set compute/zone [COMPUTE_ZONE]
  • Update all gcloud commands to the latest version:
    gcloud components update

Enabling Stackdriver Monitoring

You can create a new cluster with monitoring enabled, or add monitoring capability to an existing cluster.

Note that your cluster's node pools (including the default node pool) must have the necessary GCP scope to interact with Stackdriver Monitoring (the https://www.googleapis.com/auth/monitoring scope). When you create a new cluster with monitoring, Kubernetes Engine sets this scope automatically; however, existing clusters might not have the necessary permissions.

Creating a cluster with monitoring

You can create a cluster with Stackdriver Monitoring enabled, or enable Stackdriver Monitoring in an existing cluster.

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Click Create cluster.

  3. Configure the cluster as desired. Ensure that Turn on Stackdriver Monitoring is selected.
  4. Click Create.

gcloud

When you create a cluster, the --enable-cloud-monitoring flag is automatically set, which enables Stackdriver Monitoring in the cluster.

To disable this default behavior, set the --no-enable-cloud-monitoring flag.

Enabling monitoring for an existing cluster

To enable monitoring for an existing cluster, run the following command, where [CLUSTER_NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service monitoring.googleapis.com

Note that if you initially created your cluster without monitoring, and want to enable it later, the cluster's node pools might not have the necessary GCP scope to interact with Stackdriver Monitoring. As a workaround, you can create a new node pool with the same number of nodes and the necessary scope as follows:

gcloud container node-pools create adjust-scope \
    --cluster [CLUSTER_NAME] \
    --num-nodes [NUM_NODES] \
    --scopes https://www.googleapis.com/auth/monitoring

Once you've created the new node pool, you can move your existing Pods to the new, correctly-scoped node pool to use Stackdriver Monitoring. For more information, refer to "updating VM scopes with zero downtime".

Viewing metrics

You can view metrics in the in the GCP Console's Stackdriver Monitoring menu.

Cluster overview

Stackdriver Monitoring provides an overview menu for Kubernetes Engine. This menu collects displays useful information about your clusters in helpful dashboards.

To view the overview menu, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Resources, then select Kubernetes Engine.

  3. Select your cluster.

The overview menu provides the following default dashboards:

  • Incidents: Violations of alerting policies.
  • Events: Chronological list of anomalies, incidents, lifecycle, tags, IAM policies, deploys, notes, cloud provider updates, and user management updates that occur in your cloud accounts.
  • CPU Usage: Displays per-cluster CPU usage percentages.
  • Disk I/O: Displays per-cluster disk I/O rates in KB/s.
  • Network Traffic: Displays per-cluster network traffic in KB/s.
  • Pods: List of Pods and nodes (Compute Engine VM instances) in all namespaces. Selecting any Pod or node opens the overview for that resource.

To learn more about viewing metrics, refer to the Stackdriver Monitoring documentation and the Monitoring Filters page.

Dashboards

You can create custom dashboards for Kubernetes Engine nodes and containers.

To create a dashboard, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Dashboards, then select Create Dashboard.

  3. To create a new dashboard, click Add Chart.
  4. Fill the Title field with a name for the dashboard.
  5. From the __Find resource type and metric_ field, search for "instance" and/or "container", then select the desired metrics.
  6. From the Metric Type field, enter or select from the autofill menu the desired metric.
  7. Optionally, use the Filter to filter by a specific value, such as app, name, or version.
  8. Configure the dashboard further as desired. To create the dashboard, click Save.

Metrics Explorer

Metrics Explorer allows you to select a specific metric about your clusters and perform various aggregations.

To use Metrics Explorer, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Resources, then select Metrics Explorer.

  3. From the Find resource type and metric search menu, enter gke_container for Resource type.
  4. For the Metric, select the desired metric.
  5. Optionally, use the Filter menu to filter by resource.
  6. Use the Aggregation options to perform a desired aggregation.

Best practices

  • Alerting: You can set up alerting policies that inform you if something suspicious occurs in your cluster, such as when the number of running Pods drops too low or too many errors occur.

Disabling monitoring

To disable monitoring for an existing cluster, run the following command, where [CLUSTER_NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service none

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine