Stackdriver lets you explore monitoring and logging information in your Google Kubernetes Engine clusters and application containers using a single dashboard.
From the GCP Console, go to the Stackdriver Monitoring home page by selecting Stackdriver > Monitoring. You can click the following button to go there:
Select the Workspace containing your Google Kubernetes Engine cluster:
- In most cases, the Workspace is the Google Cloud Platform project containing your Google Kubernetes Engine cluster.
- You might be prompted to create a Workspace, or you may not see your GCP project in the list of accounts. In these cases, you should create a new Workspace using your GCP project. For more information, see Creating a Stackdriver Account.
- To monitor clusters from multiple projects in the same dashboard, you must create a Workspace that is different from your GCP project(s). For more information, see Monitoring multiple projects.
Navigate to the Kubernetes monitoring console:
If you're using Legacy Stackdriver, select Resources > Kubernetes Engine.
If you're using Stackdriver Kubernetes Engine Monitoring, select Resources > Kubernetes Engine NEW.
You'll only see these menu items if you have clusters using Stackdriver.
This console shows you only those clusters that use Stackdriver Kubernetes Monitoring. If you don't see any clusters or you don't see all the resources in your clusters, see the Troubleshooting section on this page.
Stackdriver Kubernetes Engine Monitoring dashboard interface
The Stackdriver Kubernetes Engine Monitoring dashboard is divided into several parts, as indicated by the red numbers in the screenshot below:
The dashboard toolbar provides dashboard settings, filtering, and control over the timeline shown underneath it.
The timeline event selector lets you hover over the timeline to reveal summaries of alerting violations. See the Timeline events section below.
The details section lets you choose from one of three viewing tabs: Infrastructure, Workloads, and Services. These viewing tabs are discussed the Viewing tabs section below.
The dashboard provides multiple viewing tabs, which organize your cluster information in different ways. The possible viewing tabs are:
Infrastructure. Aggregates Kubernetes resources by this hierarchy: Cluster > Node > Pod > Container.
Workloads. Aggregates Kubernetes resources by this hierarchy: Cluster > Namespace > Workload > Pod > Container.
Services. Aggregates Kubernetes resources by this hierarchy: Cluster > Namespace > Service > Pod > Container.
You can select your viewing mode from the tabs above the details section:
The table is sorted to show Kubernetes resources with open incidents first. You can click the expander arrow (▸) in front of each Kubernetes resource to look at any subcomponents of the resource. The following screenshot shows an expanded hierarchy of Kubernetes resources:
Each resource name is preceded by an indicator which, if it is red, indicates that incidents have occurred in that resource or in resources lower in the hierarchy. To see the alerting details, click Name. For more details, see the Alerting details section below.
Following are explanations of the columns that appear in the three tabs. The displayed values are based on the selected time range:
- Name: The label you assigned to the Kubernetes resource.
- Resource Type: The possible values are Cluster, Container, Namespace, Node, Pod, and Workspace.
- Ready: The number of node instances available.
- Incidents: The number of alerting violations.
- CPU Utilization: The percent utilization compared to the requested CPU resources.
- Memory Utilization: The percent utilization of requested memory.
- Total Memory Usage: The amount of memory allocated.
The Kubernetes Monitoring dashboard displays a summary line for each Kubernetes resource by default. To see the details for the resource, click the expander arrow (▸) in front of Kubernetes resource.
If you click the buttons, which are red or green, in front of the entry, a panel with alerting details appears:
This details view aggregates incidents, system metrics, and logs within one view.
You can also access the alerting details panel from the timeline event selector at the top of the dashboard. A timeline of incidents gives you a view of alerting violations that happened within the selected time range. If you hover over red areas in the timeline, event cards appear:
Event cards provide more information on each incident displayed in the timeline. If you click on an individual event card, you see the alerting details for the incident in a new panel.
The Kubernetes Monitoring dashboard provides a bubble visualization that allows you to explore trends and patterns that appear in your metrics. It also provides at-a-glance health information about the nodes in your cluster.
Keep in mind the following information when viewing the chart:
Each bubble represents a node, and its size, the plot size, represents the number of pods in the node.
A gray plot indicates a healthy node; a red plot indicates a node with an open incident.
For the beta release, you can select CPU Usage and Memory Usage for the axes of the chart. You can also select GPU Usage if your nodes are using GPUs.
If you don't see any Kubernetes resources in your dashboard, then check the following:
Is the correct GCP project selected at the top of the page? If not, use the drop-down menu at the top of the page to select a project. You must select the project whose data you want to see.
Does your project have any activity? If you just created your cluster, wait a few minutes for it to populate with data. See Installing Stackdriver Support for details.
Is the time range too narrow? You can use the Time menu in the dashboard toolbar at the top of the page to select other time ranges or define a Custom range.
Do you have the proper permissions to view the dashboard? If you see either of the following permission-denied error messages when viewing a service's deployment details or a GCP project's metrics, you need to update your Cloud Identity and Access Management role to include roles/monitoring.viewer or roles/viewer:
You do not have sufficient permissions to view this page
You don't have permissions to perform the action on the selected resources
For more details, go to Predefined roles.
Does your cluster's service account have permission to write data into Stackdriver? If you see high error rates on your API dashboard, then your service account might be missing the following roles:
- Stackdriver Resource Metadata Writer