Monitoring

This page explains how to use Stackdriver Monitoring to monitor your Google Kubernetes Engine clusters.

Overview

You can use Stackdriver Monitoring to monitor signals and build operations in your GKE clusters.

Stackdriver monitors system metrics and custom metrics. System metrics are measurements of the cluster's infrastructure, such as CPU or memory usage. Custom metrics are application-specific metrics that you define yourself, such as the total number of active user sessions or the total number of rendered pages.

For system metrics, Stackdriver creates a Deployment that periodically connects to each node and collects metrics about its Pods and containers, then sends the metrics to Stackdriver.

Metrics for usage of system resources are collected from the following sources:

  • CPU: container/cpu/usage_time
  • Memory: container/memory/bytes_used, collected from memory.usage_in_bytes in cgroup
  • Evictable memory: container/memory/bytes_used, collected from the total_inactive_file field of memory.stat
  • Non-evictable memory: Measured by memory.usage_in_bytes - memory.total_inactive_file
  • Disk: container/disk/bytes_used

For a list of other system metrics collected from GKE, refer to Metrics List in the Stackdriver documentation.

To learn how to set up custom metrics, refer to Using Custom Metrics in the Stackdriver documentation or follow the Autoscaling Deployments with Custom Metrics tutorial.

Before you begin

To prepare for this task, perform the following steps:

  • Ensure that you have enabled the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • Ensure that you have installed the Cloud SDK.
  • Set your default project ID:
    gcloud config set project [PROJECT_ID]
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone [COMPUTE_ZONE]
  • If you are working with regional clusters, set your default compute region:
    gcloud config set compute/region [COMPUTE_REGION]
  • Update gcloud to the latest version:
    gcloud components update

Enabling Stackdriver Monitoring

You can create a new cluster with monitoring enabled, or add monitoring capability to an existing cluster.

Your cluster's node pools (including the default node pool) must have the necessary GCP scope to interact with Stackdriver Monitoring (the https://www.googleapis.com/auth/monitoring scope). When you create a new cluster with monitoring, Kubernetes Engine sets this scope automatically; however, existing clusters might not have the necessary permissions.

Creating a cluster with monitoring

You can create a cluster with Stackdriver Monitoring enabled, or enable Stackdriver Monitoring in an existing cluster.

gcloud

When you create a cluster, the --enable-cloud-monitoring flag is automatically set, which enables Stackdriver Monitoring in the cluster.

To disable this default behavior, set the --no-enable-cloud-monitoring flag.

Console

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click Create cluster.

  3. Configure the cluster as desired.
  4. Click Advanced options. Ensure that Enable Stackdriver Monitoring service is selected.
  5. Click Create.

Enabling monitoring for an existing cluster

gcloud

To enable monitoring for an existing cluster, run the following command, where [CLUSTER_NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service monitoring.googleapis.com

If you initially created your cluster without monitoring, and want to enable it later, the cluster's node pools might not have the necessary GCP scope to interact with Stackdriver Monitoring. As a workaround, you can create a new node pool with the same number of nodes and the necessary scope as follows:

gcloud container node-pools create adjust-scope \
    --cluster [CLUSTER_NAME] \
    --num-nodes [NUM_NODES] \
    --scopes https://www.googleapis.com/auth/monitoring

Once you've created the new node pool, you can move your existing Pods to the new, correctly-scoped node pool to use Stackdriver Monitoring. For more information, refer to "updating VM scopes with zero downtime".

Console

If you initially created your cluster without monitoring, and want to enable it later, the cluster's node pools might not have the necessary GCP scope to interact with Stackdriver Monitoring. See the gcloud section prior to this one for a workaround.

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click the cluster's Edit button, which looks like a pencil.

  3. Set the value of the Stackdriver Logging drop-down to Enabled.
  4. Click Save.

Extending infrastructure metrics

In addition to Application metrics, Stackdriver custom metrics can also use measurements of your cluster's infrastructure not included in system metrics, such as container Disk I/O. You can deploy your own infrastructure monitoring agents to collect and push these metrics to Stackdriver.

cAdvisor

You can collect metrics using cAdvisor, the open-source monitoring agent used in Kubernetes, to collect metrics. You can use prometheus-to-sd to push these metrics to Stackdriver.

To run cAdvisor on your own cluster, perform these steps:

  1. Clone cAdvisor:

    git clone https://github.com/google/cadvisor.git cd cadvisor

  2. Follow the [cAdvisor DaemonSet instructions][cAdvisor ins]{:.external}:

    go get github.com/kubernetes-sigs/kustomize

  3. Create the example cAdvisor namespace and DaemonSet, which exports all container metrics:

    kustomize build deploy/kubernetes/overlays/examples | kubectl apply -f -

You should now see metrics about your containers in Stackdriver under the gke_container resource.

  1. Follow the cAdvisor kustomization instructions to make changes to the example provided to fit your needs. Apply your changes with:

kustomize build deploy/kubernetes/overlays/ | kubectl apply -f -

Viewing metrics

You can view metrics in the in the GCP Console's Stackdriver Monitoring menu.

Cluster overview

Stackdriver Monitoring provides an overview menu for GKE. This menu collects displays useful information about your clusters in helpful dashboards.

To view the overview menu, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Resources, then select GKE.

  3. Select your cluster.

The overview menu provides the following default dashboards:

  • Incidents: Violations of alerting policies.
  • Events: Chronological list of anomalies, incidents, lifecycle, tags, IAM policies, deploys, notes, cloud provider updates, and user management updates that occur in your cloud accounts.
  • CPU Usage: Displays per-cluster CPU usage percentages.
  • Disk I/O: Displays per-cluster disk I/O rates in KB/s.
  • Network Traffic: Displays per-cluster network traffic in KB/s.
  • Pods: List of Pods and nodes (Compute Engine VM instances) in all namespaces. Selecting any Pod or node opens the overview for that resource.

To learn more about viewing metrics, refer to the Stackdriver Monitoring documentation and the Monitoring Filters page.

Dashboards

You can create custom dashboards for GKE nodes and containers.

To create a dashboard, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Dashboards, then select Create Dashboard.

  3. To create a new dashboard, click Add Chart.
  4. Fill the Title field with a name for the dashboard.
  5. From the __Find resource type and metric_ field, search for "instance" and/or "container", then select the desired metrics.
  6. From the Metric Type field, enter or select from the autofill menu the desired metric.
  7. Optionally, use the Filter to filter by a specific value, such as app, name, or version.
  8. Configure the dashboard further as desired. To create the dashboard, click Save.

Metrics Explorer

Metrics Explorer allows you to select a specific metric about your clusters and perform various aggregations.

To use Metrics Explorer, perform these steps:

  1. Visit the Stackdriver Monitoring menu in GCP Console.

    Visit the Stackdriver Monitoring menu

  2. Hover your cursor over Resources, then select Metrics Explorer.

  3. From the Find resource type and metric search menu, enter gke_container for Resource type.
  4. For the Metric, select the desired metric.
  5. Optionally, use the Filter menu to filter by resource.
  6. Use the Aggregation options to perform a desired aggregation.

Best practices

  • Alerting: You can set up alerting policies that inform you if something suspicious occurs in your cluster.

Disabling monitoring

gcloud

Note: To use gcloud beta commands, you must configure gcloud to use the v1beta1 API.

To disable monitoring for an existing cluster, run the following command, where [CLUSTER_NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service none

Console

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click the cluster's Edit button, which looks like a pencil.

  3. Set the value of the Stackdriver Logging drop-down to Disabled.
  4. Click Save.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine