Stackdriver Monitoring

This page explains how to use Stackdriver Monitoring to monitor your Google Kubernetes Engine (GKE) clusters.

Overview

You can use Monitoring to monitor signals and build operations in your GKE clusters.

Stackdriver monitors system metrics and custom metrics. System metrics are measurements of the cluster's infrastructure, such as CPU or memory usage. Custom metrics are application-specific metrics that you define yourself, such as the total number of active user sessions or the total number of rendered pages.

For system metrics, Stackdriver creates a deployment that periodically connects to each node and collects metrics about its Pods and containers, then sends the metrics to Stackdriver.

Metrics for usage of system resources are collected from the following sources:

  • CPU: container/cpu/usage_time
  • Memory: container/memory/bytes_used, collected from memory.usage_in_bytes in cgroup
  • Evictable memory: container/memory/bytes_used, collected from the total_inactive_file field of memory.stat
  • Non-evictable memory: Measured by memory.usage_in_bytes - memory.total_inactive_file
  • Disk: container/disk/bytes_used

For a list of other system metrics collected from GKE, refer to Metrics list in the Stackdriver documentation.

To learn how to set up custom metrics, refer to Using custom metrics or follow the Autoscaling deployments with custom metrics tutorial.

Before you begin

To prepare for this task, perform the following steps:

  • Ensure that you have enabled the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • Ensure that you have installed the Cloud SDK.
  • Set your default project ID:
    gcloud config set project [PROJECT_ID]
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone [COMPUTE_ZONE]
  • If you are working with regional clusters, set your default compute region:
    gcloud config set compute/region [COMPUTE_REGION]
  • Update gcloud to the latest version:
    gcloud components update
  • Ensure that you have created a Stackdriver Workspace. See Managing Workspaces for more information.

Enabling Monitoring

You can create a cluster with Monitoring enabled, or enable Monitoring in an existing cluster.

Your cluster's node pools (including the default node pool) must have the necessary GCP scope to interact with Monitoring (the https://www.googleapis.com/auth/monitoring scope). When you create a new cluster with monitoring, GKE sets this scope automatically; however, existing clusters might not have the necessary permissions.

Creating a cluster with monitoring

gcloud

When you create a cluster, the --enable-cloud-monitoring flag is automatically set, which enables Monitoring in the cluster.

To disable this default behavior, set the --no-enable-cloud-monitoring flag.

Console

  1. In the GCP Console, go to the Kubernetes Engine > Kubernetes clusters page:

    Go to Kubernetes clusters

  2. Click Create cluster.

  3. Configure the cluster as needed.

  4. Click Advanced options. Ensure that Enable Stackdriver Monitoring service is selected.

  5. Click Create.

Enabling monitoring for an existing cluster

gcloud

To enable Monitoring for an existing cluster, run the following command, where [CLUSTER_NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service monitoring.googleapis.com

If you initially created your cluster without Monitoring, and want to enable it later, the cluster's node pools might not have the necessary GCP scope. As a workaround, you can create a new node pool with the same number of nodes and the necessary scope as follows:

gcloud container node-pools create adjust-scope \
    --cluster [CLUSTER_NAME] \
    --num-nodes [NUM_NODES] \
    --scopes https://www.googleapis.com/auth/monitoring

After you've created the new node pool, move your existing Pods to the new, correctly-scoped node pool to use Monitoring. For more information, refer to "Updating VM scopes with zero downtime".

Console

If you initially created your cluster without Monitoring, and want to enable it later, the cluster's node pools might not have the necessary GCP scope. See the gcloud section prior to this one for a workaround.

  1. In the GCP Console, go to the Kubernetes Engine > Kubernetes clusters page:

    Go to Kubernetes clusters

  2. Click Edit .

  3. Set the value of the Stackdriver Monitoring drop-down to Enabled.

  4. Click Save.

Extending infrastructure metrics

In addition to application metrics, Stackdriver custom metrics can also use measurements of your cluster's infrastructure not included in system metrics, such as container Disk I/O. You can deploy your own infrastructure monitoring agents to collect and push these metrics to Stackdriver.

cAdvisor

You can collect metrics using cAdvisor, the open source monitoring agent used in Kubernetes, to collect metrics. You can use prometheus-to-sd to push these metrics to Stackdriver.

To run cAdvisor on your own cluster, perform these steps:

  1. Clone cAdvisor:

    git clone https://github.com/google/cadvisor.git
    cd cadvisor
    
  2. Follow the cAdvisor DaemonSet instructions, to install kustomize. If you are using Cloud Shell, run:

    go get github.com/kubernetes-sigs/kustomize
    
  3. Create the example cAdvisor namespace and DaemonSet, which exports all container metrics:

    kustomize build deploy/kubernetes/overlays/examples | kubectl apply -f -
    

    You should now see Prometheus metrics in Stackdriver under the gke_container resource.

  4. Follow the cAdvisor kustomization instructions to make changes to the example provided to fit your needs. Apply your changes with:

    kustomize build deploy/kubernetes/overlays/<my_custom_patches> | kubectl apply -f -
    

Viewing metrics

You can view metrics in the Stackdriver Monitoring console.

Cluster overview

Monitoring provides an overview menu for GKE. This menu collects displays useful information about your clusters in helpful dashboards.

To view the overview menu, perform these steps:

  1. In the GCP Console, go to Stackdriver Monitoring:

    Go to Monitoring

  2. Hover the pointer over Resources, then select Kubernetes Engine.

  3. Select your cluster.

The overview menu provides the following default dashboards:

  • Incidents: Violations of alerting policies.
  • Events: Chronological list of anomalies, incidents, lifecycle, tags, IAM policies, deploys, notes, cloud provider updates, and user management updates that occur in your cloud accounts.
  • CPU Usage: Displays per-cluster CPU usage percentages.
  • Disk I/O: Displays per-cluster disk I/O rates in KB/s.
  • Network Traffic: Displays per-cluster network traffic in KB/s.
  • Pods: List of Pods and nodes (Compute Engine VM instances) in all namespaces. Selecting any Pod or node opens the overview for that resource.

To learn more about viewing metrics, refer to the Monitoring documentation and the Monitoring filters page.

Dashboards

You can create custom dashboards for GKE nodes and containers.

To create a dashboard, perform these steps:

  1. In the GCP Console, go to Stackdriver Monitoring:

    Go to Monitoring

  2. Hover your pointer over Dashboards, then select Create Dashboard.

  3. To create a new dashboard, click Add Chart.

  4. Fill the Title field with a name for the dashboard.

  5. From the Find resource type and metric field, search for instance and/or container, then select the metrics you want.

  6. From the Metric Type field, enter or select from the autofill menu the metrics you want.

  7. Optionally, use the Filter to filter by a specific value, such as app, name, or version.

  8. Configure the dashboard further as needed. To create the dashboard, click Save.

Metrics Explorer

Metrics Explorer allows you to select a specific metric about your clusters and perform various aggregations.

To use Metrics Explorer, perform these steps:

  1. In the GCP Console, go to Stackdriver Monitoring:

    Go to Monitoring

  2. Hover your pointer over Resources, then select Metrics Explorer.

  3. From the Find resource type and metric search menu, enter gke_container for Resource type.

  4. For the Metric, select the metrics you want.

  5. Optionally, use the Filter menu to filter by resource.

  6. Use the Aggregation options to perform an aggregation.

Best practices

  • Alerting: You can set up alerting policies that inform you if something suspicious occurs in your cluster.

Disabling monitoring

gcloud

To disable monitoring for an existing cluster, run the following command, where [CLUSTER_NAME] is the name of the cluster.

gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service none

If you are running Stackdriver Kubernetes Engine Monitoring in your cluster, you must disable both monitoring and logging by using gcloud beta to set the following flags in your cluster:

gcloud beta container clusters update [CLUSTER_NAME] --logging-service none --monitoring-service none

Console

  1. In the GCP Console, go to the Kubernetes Engine > Kubernetes clusters page:

    Go to Kubernetes clusters

  2. Click Edit .

  3. Set the value of the Stackdriver Monitoring drop-down to Disabled.

  4. Click Save.

What's next

Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Stackdriver Monitoring
Trenger du hjelp? Gå til brukerstøttesiden vår.