This page explains how to use Cloud Monitoring to monitor your Google Kubernetes Engine (GKE) clusters.
Overview
You can use Monitoring to monitor signals and build operations in your GKE clusters.
Cloud Monitoring monitors system metrics and custom metrics. System metrics are measurements of the cluster's infrastructure, such as CPU or memory usage. Custom metrics are application-specific metrics that you define yourself, such as the total number of active user sessions or the total number of rendered pages.
For system metrics, Cloud Monitoring creates a deployment that periodically connects to each node and collects metrics about its Pods and containers, then sends the metrics to Monitoring.
Metrics for usage of system resources are collected from the following sources:
- CPU:
container/cpu/usage_time
- Memory:
container/memory/bytes_used
, collected frommemory.usage_in_bytes
in cgroup - Evictable memory:
container/memory/bytes_used
, collected from thetotal_inactive_file
field ofmemory.stat
- Non-evictable memory: Measured by
memory.usage_in_bytes
-memory.total_inactive_file
- Disk:
container/disk/bytes_used
For a list of other system metrics collected from GKE, refer to Metrics list.
To learn how to set up custom metrics, refer to Using custom metrics or follow the Autoscaling deployments with custom metrics tutorial.
Before you begin
To prepare for this task, perform the following steps:
- Ensure that you have enabled the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- Ensure that you have installed the Cloud SDK.
- Set your default project ID:
gcloud config set project [PROJECT_ID]
- If you are working with zonal clusters, set your default compute zone:
gcloud config set compute/zone [COMPUTE_ZONE]
- If you are working with regional clusters, set your default compute region:
gcloud config set compute/region [COMPUTE_REGION]
- Update
gcloud
to the latest version:gcloud components update
-
Access Cloud Monitoring by doing the following:
- In the Google Cloud console, select your Google Cloud project.
Go to Google Cloud console - In the navigation pane, select Monitoring.
- In the Google Cloud console, select your Google Cloud project.
Enabling Monitoring
You can create a cluster with Monitoring enabled, or enable Monitoring in an existing cluster.
Your cluster's node pools (including the default node pool) must
have the necessary Google Cloud scope to interact with
Monitoring (the
https://www.googleapis.com/auth/monitoring
scope). When you create a new
cluster with monitoring, GKE sets this scope automatically;
however, existing clusters might not have the necessary permissions.
Creating a cluster with monitoring
gcloud
When you create a cluster, the --enable-cloud-monitoring
flag is
automatically set, which enables Monitoring in the cluster.
To disable this default behavior, set the --no-enable-cloud-monitoring
flag.
Console
In the Google Cloud console, go to the Kubernetes Engine > Kubernetes clusters page:
Click Create cluster.
Configure the cluster as needed.
Click Advanced options. Ensure that Enable Stackdriver Monitoring service is selected.
Click Create.
Enabling monitoring for an existing cluster
gcloud
To enable Monitoring for an existing cluster, run the
following command, where [CLUSTER_NAME]
is the name of the cluster.
gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service monitoring.googleapis.com
If you initially created your cluster without Monitoring, and want to enable it later, the cluster's node pools might not have the necessary Google Cloud scope. As a workaround, you can create a new node pool with the same number of nodes and the necessary scope as follows:
gcloud container node-pools create adjust-scope \ --cluster [CLUSTER_NAME] \ --num-nodes [NUM_NODES] \ --scopes https://www.googleapis.com/auth/monitoring
After you've created the new node pool, move your existing Pods to the new, correctly-scoped node pool to use Monitoring. For more information, refer to "Updating VM scopes with zero downtime".
Console
If you initially created your cluster without Monitoring, and want
to enable it later, the cluster's node pools might not have the necessary Google Cloud
scope. See the gcloud
section
prior to this one for a workaround.
In the Google Cloud console, go to the Kubernetes Engine > Kubernetes clusters page:
Click Edit edit.
Set the value of the Stackdriver Monitoring drop-down to Enabled.
Click Save.
Extending infrastructure metrics
In addition to application metrics, Cloud Monitoring custom metrics can also use measurements of your cluster's infrastructure not included in system metrics, such as container Disk I/O. You can deploy your own infrastructure monitoring agents to collect and push these metrics to Cloud Monitoring.
cAdvisor
You can collect metrics using cAdvisor, the open source monitoring agent used in Kubernetes, to collect metrics. You can use prometheus-to-sd to push these metrics to Cloud Monitoring.
To run cAdvisor on your own cluster, perform these steps:
Clone cAdvisor:
git clone https://github.com/google/cadvisor.git cd cadvisor
Follow the cAdvisor DaemonSet instructions, to install kustomize. If you are using Cloud Shell, run:
go get github.com/kubernetes-sigs/kustomize
Create the example cAdvisor namespace and DaemonSet, which exports all container metrics:
kustomize build deploy/kubernetes/overlays/examples | kubectl apply -f -
You should now see Prometheus metrics in Cloud Monitoring under the
gke_container
resource.Follow the cAdvisor kustomization instructions to make changes to the example provided to fit your needs. Apply your changes with:
kustomize build deploy/kubernetes/overlays/<my_custom_patches> | kubectl apply -f -
Viewing metrics
You can view metrics in the Google Cloud console.
Cluster overview
Monitoring provides an overview menu for GKE. This menu collects displays useful information about your clusters in helpful dashboards.
To view the overview menu, perform these steps:
In the Google Cloud console, go to Monitoring:
Hover the pointer over Resources, then select Kubernetes Engine.
Select your cluster.
The overview menu provides the following default dashboards:
- Incidents: Violations of alerting policies.
- Events: Chronological list of anomalies, incidents, lifecycle, tags, IAM policies, deploys, notes, cloud provider updates, and user management updates that occur in your cloud accounts.
- CPU Usage: Displays per-cluster CPU usage percentages.
- Disk I/O: Displays per-cluster disk I/O rates in KB/s.
- Network Traffic: Displays per-cluster network traffic in KB/s.
- Pods: List of Pods and nodes (Compute Engine VM instances) in all namespaces. Selecting any Pod or node opens the overview for that resource.
To learn more about viewing metrics, refer to the Monitoring documentation and the Monitoring filters page.
Dashboards
You can create custom dashboards for GKE nodes and containers.
To create a dashboard, perform these steps:
In the Google Cloud console, go to Monitoring:
Hover your pointer over Dashboards, then select Create Dashboard.
To create a new dashboard, click Add Chart.
Fill the Title field with a name for the dashboard.
From the Find resource type and metric field, search for
instance
and/orcontainer
, then select the metrics you want.From the Metric Type field, enter or select from the autofill menu the metrics you want.
Optionally, use the Filter to filter by a specific value, such as app, name, or version.
Configure the dashboard further as needed. To create the dashboard, click Save.
Metrics Explorer
Metrics Explorer allows you to select a specific metric about your clusters and perform various aggregations.
To use Metrics Explorer, perform these steps:
In the Google Cloud console, go to Monitoring:
Hover your pointer over Resources, then select Metrics Explorer.
From the Find resource type and metric search menu, enter
gke_container
for Resource type.For the Metric, select the metrics you want.
Optionally, use the Filter menu to filter by resource.
Use the Aggregation options to perform an aggregation.
Best practices
- Alerting: You can set up alerting policies that inform you if something suspicious occurs in your cluster.
Disabling monitoring
gcloud
To disable monitoring for an existing cluster, run the following command,
where [CLUSTER_NAME]
is the name of the cluster.
gcloud beta container clusters update [CLUSTER_NAME] --monitoring-service none
If you are running Cloud Operations for GKE in your cluster, you must disable both
monitoring and logging
by using gcloud beta
to set the following flags in your cluster:
gcloud beta container clusters update [CLUSTER_NAME] --logging-service none --monitoring-service none
Console
In the Google Cloud console, go to the Kubernetes Engine > Kubernetes clusters page:
Click Edit edit.
Set the value of the Stackdriver Monitoring drop-down to Disabled.
Click Save.
What's next
- To learn about Cloud Monitoring's costs, see Pricing.
- For information on alerting policies and how to configure them, see Alerting policies.
- To learn about creating and using uptime checks, see Managing uptime checks.
- Explore autoscaling, with the Autoscaling Deployments with Custom Metrics tutorial.