This page describes how to access the Stackdriver Kubernetes Engine Monitoring and Legacy Stackdriver monitoring dashboards, and how to use the Stackdriver Kubernetes Engine Monitoring monitoring dashboard.
Before you begin
If you haven't created a Workspace for your Google Cloud Platform (GCP) project, you must create one to use the Stackdriver Kubernetes Engine Monitoring or Legacy Stackdriver monitoring dashboard.
To create a Workspace, go to Getting a Workspace quickly.
Accessing the monitoring dashboard
From the GCP Console, go to Stackdriver > Monitoring:
Select the Workspace containing your GKE cluster.
Open the dashboard for your monitoring solution:
If your clusters use Stackdriver Kubernetes Engine Monitoring, go to Resources > Kubernetes Engine New:
This dashboard only displays clusters that use Stackdriver Kubernetes Engine Monitoring. If you don't see any clusters or if you don't see all the resources in your clusters, go to the Troubleshooting section on this page.
If your clusters use Legacy Stackdriver, go to Resources > Kubernetes Engine.
The remainder of this page is focused on using the Stackdriver Kubernetes Engine Monitoring dashboard. For more information about the Legacy Stackdriver dashboard, go to Stackdriver Monitoring. For more information about viewing your logs, go to Stackdriver Logging.
Stackdriver Kubernetes Engine Monitoring dashboard interface
The Stackdriver Kubernetes Engine Monitoring dashboard is divided into three parts:
The dashboard toolbar controls the time window for observations and provides dashboard settings and filters.
The timeline event selector lets you select a specific time and display summaries of alerts. For detailed information, go to the Timeline events section.
The details section lets you choose how your cluster information is presented to you. The next section provides more information on your choices.
The Stackdriver Kubernetes Engine Monitoring dashboard viewing tabs let you organize your cluster information by different hierarchies:
Infrastructure: Aggregates resources by Cluster > Node > Pod > Container.
Workloads: Aggregates resources by Cluster > Namespace > Workload > Pod > Container.
Services: Aggregates resources by Cluster > Namespace > Service > Pod > Container.
The table is sorted to show resources with open incidents first. To view subcomponents of a resource, click expand play_arrow for that resource. The following screenshot shows an expanded hierarchy of Kubernetes resources:
Each resource name is preceded by an indicator which is red or green. A red indicator means that the resource, or a subcomponent of the resource, has an open incident. A green indicator means that there are no open incidents. To see the alerting details, metrics, and logs for a resource, click its row. For more details, go to the section on Viewing alerts, metrics, logs and details.
The Stackdriver Kubernetes Engine Monitoring dashboard displays data in columns based on the selected time range:
- Name: The label you assigned to the Kubernetes resource.
- Resource Type: The possible values are Cluster, Container, Namespace, Node, Pod, and Workspace.
- Ready: The number of node instances available.
- Incidents: The number of alerting violations.
- CPU Utilization: The percent utilization compared to the requested CPU resources.
- Memory Utilization: The percent utilization of requested memory.
- Total Memory Usage: The amount of memory allocated.
Viewing alerts, metrics, logs, and details
The Stackdriver Kubernetes Engine Monitoring dashboard displays a summary line for each Kubernetes resource by default. Each resource with a subcomponent is listed with an expand play_arrow button and all resources are listed with a a red or green indicator. A red indicator means that the resource, or a subcomponent of the resource, has an open incident. A green indicator means that there are no open incidents:
- To view subcomponents of a resource, click expand play_arrow for that resource.
To open a pane that displays a summary of incidents, system metrics, logs, and details for a resource, click the resource's row. When you click a row, the information that is displayed is dependent on the resource type. For example when you click a row for a cluster, you won't see metrics or log information. However, this information is displayed when you click a row for a pod.
In the following example, there are no open incidents on the node:
You can also access the alerting details panel from the Stackdriver Kubernetes Engine Monitoring dashboard timeline event selector. A timeline of incidents gives you a view of alerting violations that happened within the selected time range. If you place your pointer over a red area in the timeline, event cards appear:
Each event card provides detailed information about one incident displayed in the timeline. To view alerting details for an event, click its event card.
If you don't have the Kubernetes Engine NEW option in the Resource menu, this indicates that you don't have any GKE clusters using Stackdriver Kubernetes Engine Monitoring. Similarly, if Kubernetes Engine isn't listed, then you don't have any GKE clusters using Legacy Stackdriver.
If you don't see any Kubernetes resources in your Stackdriver Kubernetes Engine Monitoring dashboard, then check the following:
Is the correct GCP project selected at the top of the page? If not, use the drop-down list in the menu bar to select a project. You must select the project whose data you want to see.
Does your project have any activity? If you just created your cluster, wait a few minutes for it to populate with data. See Installing Stackdriver support for details.
Is the time range too narrow? You can use the Time menu in the dashboard toolbar to select other time ranges or define a Custom range.
Do you have the proper permissions to view the dashboard? If you see either of the following permission-denied error messages when viewing a service's deployment details or a GCP project's metrics, you need to update your Cloud Identity and Access Management role to include roles/monitoring.viewer or roles/viewer:
You do not have sufficient permissions to view this page
You don't have permissions to perform the action on the selected resources
For more details, go to Predefined roles.
Does your cluster's service account have permission to write data into Stackdriver? If you see high error rates on your API dashboard, then your service account might be missing the following roles: