This page describes how to access the Cloud Operations for GKE and Legacy Logging and Monitoring monitoring dashboards, and how to use the Cloud Operations for GKE monitoring dashboard.
Accessing the monitoring dashboard
From the Cloud Console, go to Monitoring:
If your Google Cloud project is already associated with a Workspace, then the Cloud Monitoring home page is displayed. Otherwise, a Workspace is created automatically. In general, this process requires no interaction from you, but it takes a few moments to complete. In some cases, the Add your project to a Workspace dialog is displayed. In this case, the simplest action is to create a Workspace.
Select Dashboards and then select the dashboard named GKE. This dashboard lists all clusters that use Cloud Operations for GKE.
If you don't see any clusters or if you don't see all the resources in your clusters, refer to Troubleshooting your GKE dashboard.
For clusters configured with Cloud Operations for GKE, the following monitored resource types are available:
- Kubernetes Cluster: k8s_cluster.
- Kubernetes Container: k8s_container.
- Kubernetes Master Component: k8s_master_component.
- Kubernetes Node: k8s_node.
- Kubernetes Pod: k8s_pod.
For example, to create a chart of the CPU usage of a cluster configured with Cloud Operations for GKE by using Metrics Explorer, do the following:
- In the Google Cloud Console navigation pane, select Monitoring:
Go to Google Cloud Console
If this is the first access of Cloud Monitoring for this Google Cloud project, then Cloud Monitoring creates a Workspace. Typically, this process is automatic and completes within a few minutes. If prompted to either select a Workspace or to create a Workspace, select create.
- In the Monitoring navigation pane, click Metrics Explorer.
- Ensure Metric is the selected tab.
- Click in the box labeled
Find resource type and metric, and then select from the menu or
enter the name for the resource and metric. Use the following information to complete the
fields for this text box:
- For Resource, select or enter Kubernetes Container.
- For Metric, select or enter CPU usage time.
- Use the Filter, Group By, and Aggregator menus to modify how the data is displayed. To display data by namespace, in Group By, select namespace_name. Note that this selection automatically updates the aggregation, which defines how multiple time series are combined. For more information, see Selecting metrics.
Cloud Operations for GKE dashboard interface
The Cloud Operations for GKE dashboard is divided into three parts:
The dashboard toolbar controls the time window for observations and provides dashboard settings and filters.
The timeline event selector lets you select a specific time and display summaries of alerts. For detailed information, go to the Timeline events section.
The details section lets you choose how your cluster information is presented to you. The next section provides more information on your choices.
The Cloud Operations for GKE dashboard viewing tabs let you organize your cluster information by different hierarchies:
Infrastructure: Aggregates resources by Cluster, then Node, then Pod, and then by Container.
Workloads: Aggregates resources by Cluster, then Namespace, then Workload, then Pod, and lastly by Container.
Services: Aggregates resources by Cluster, then Namespace, then Service, then Pod, and lastly by Container.
The table is sorted to show resources with open incidents first. To view subcomponents of a resource, click expand play_arrow for that resource. The following screenshot shows an expanded hierarchy of Kubernetes resources:
Each resource name is preceded by an indicator which is red or green. A red indicator means that the resource, or a subcomponent of the resource, has an open incident. A green indicator means that there are no open incidents. To see the alerting details, metrics, and logs for a resource, click its row. For more details, go to the section on Viewing alerts, metrics, logs and details.
The Cloud Operations for GKE dashboard displays data in columns based on the selected time range:
- Name: The label you assigned to the Kubernetes resource.
- Resource Type: The possible values are Cluster, Container, Namespace, Node, Pod, and Workspace.
- Ready: The number of running pods aggregated at the specified entity. A checkmark indicates that the entity has at least 1 pod ready and running. Note that this Ready indicator is not the same as Pod status in the GKE console. Ready only indicates that the pod is ready to serve traffic, while Pod status displays other statuses, like Pending, Running, Crashlooping, etc.
- Incidents: The number of alerting violations.
- CPU Utilization: The percent utilization compared to the requested CPU resources.
- Memory Utilization: The percent utilization of requested memory.
- Total Memory Usage: The amount of memory allocated.
Viewing alerts, metrics, logs, and details
The Cloud Operations for GKE dashboard displays a summary line for each Kubernetes resource by default. Each resource with a subcomponent is listed with an expand play_arrow button and all resources are listed with a a red or green indicator. A red indicator means that the resource, or a subcomponent of the resource, has an open incident. A green indicator means that there are no open incidents:
- To view subcomponents of a resource, click expand play_arrow for that resource.
To open a pane that displays a summary of incidents, system metrics, logs, and details for a resource, click the resource's row. When you click a row, the information that is displayed is dependent on the resource type. For example when you click a row for a cluster, you won't see metrics or log information. However, this information is displayed when you click a row for a pod.
In the following example, there are no open incidents on the node:
To go to the Kubernetes page in the Cloud Console, click Manage.
You can also access the alerting details panel from the Cloud Operations for GKE dashboard timeline event selector. A timeline of incidents gives you a view of alerting violations that happened within the selected time range. If you place your pointer over a red area in the timeline, event cards appear:
Each event card provides detailed information about one incident displayed in the timeline.
For troubleshooting information, refer to Troubleshooting your GKE dashboard.