Observing your GKE clusters

This page describes how to access the Cloud Operations for GKE and Legacy Logging and Monitoring monitoring dashboards, and how to use the Cloud Operations for GKE monitoring dashboard.

Accessing the monitoring dashboard

  1. From the Cloud Console, go to Monitoring:

    Go to Monitoring

    If you've already associated your Google Cloud project with a Workspace, then the Cloud Monitoring home page is displayed. Otherwise, Monitoring automatically creates a Workspace. In general, this process requires no interaction from you, but it might take a few moments to complete. In some cases, the Add your project to a Workspace dialog is displayed. In this case, the simplest action is to create a Workspace.

  2. Select Dashboards and then select one of the following dashboards:

    • For clusters with Cloud Operations for GKE enabled, select the dashboard named GKE.

    • For clusters with Legacy Logging and Monitoring enabled, select the dashboard named GKE Clusters.

    If you don't see any clusters or if you don't see all the resources in your clusters, refer to Troubleshooting your GKE dashboard.

Cloud Operations for GKE dashboard interface

The Cloud Operations for GKE dashboard is divided into several parts:

Display the Cloud Operations for GKE dashboard tabular view.

  1. The filter bar lets you select which GKE resources to filter on within the dashboard.

  2. The Alerts timeline lets you select a specific time span for your dashboard, and it displays a summary of alerts during that time span.

  3. The tables show your GKE fleet by type: cluster, namespace, node, workloads, services, pods, and containers. Each row shows a single resource with metrics. Clicking a row brings up a panel with more details about that resource.

Filter bar

The filter bar lets you filter the GKE resources shown in the dashboard to display the data you're interested in. It also displays information from other resources in your cluster related to your filter selections.

Using the filter bar

To filter the data in your dashboard, complete the following steps.

  1. Click the filter bar to display the filter options.

    Display the filter options.

  2. Select a Kubernetes resource you want to filter on and then select the resource name.

    If more than 1 resource has that name, then select the specific resource instance you want to filter on.

  3. Click Apply.

    The dashboard refreshes to display the updated information.

When using the filter bar, keep the following points in mind:

  • After applying a filter to the dashboard, you can click the filter bar again to filter on additional resources.

  • Some resources might have too many options to display in the filter menu. In this case, you must first filter on a parent resource to narrow down the options. For example, you might have too many Pods to display, so you could first filter by Cluster or any other Kubernetes resource to narrow the list of Pods.

  • To clarify the scope of each filter string, the filter interface might display additional filters by default, based on which resource you choose to filter on. For example, if you filter on a specific Namespace, then the filter adds the Cluster that the Namespace resides in.

    Display additional filter options selected by the system.

Alerts timeline

The Alerts timeline provides you with a view of the incidents in your clusters. The timeline of incidents gives you a view of alerting violations that happened within the selected time span. If you place your pointer over a red area in the timeline, event cards appear.

Using the timeline view of a Kubernetes alert.

Each event card provides detailed information about one incident displayed in the timeline.

The time span drop-down menu lets you set the time frame for your alert timeline and for the tables in your dashboard.

Using the event cards

When you hover over the alert timeline, the dashboard displays event cards for each alerting violation. An alert with indicates the incident is still open, while an alert with indicates the incident is closed.

If the time frame you selected has more than 2 alerts, you can click View all to see all of the alerts.

To view the alerting incident in Alerting, click View incident.

In the Associated resource section, the event card shows which resource the alert is associated with. If the dashboard can't determine which resource the alert is associated with, the event card provides an Update alert policy link, which takes you to the Edit alerting policy page. From here, you can update the alerting policy with additional information so that the dashboard can find the associated resource.

Dashboard tables

The dashboard displays a table of metrics for each GKE resource. The tables display the following columns:

  • Name: the display name of the resource.

  • Alerts: the number of open and acknowledged alerts for that resource and its children that occurred within the selected time span.

  • Container restarts: the number of times a container restarted within the selected time span.

  • CPU utilization: the CPU utilization of containers that can be attributed to a resource within the selected time span.

    • The metric used is kubernetes.io/container/cpu/request_utilization.
  • Memory utilization: the memory utilization of containers that can be attributed to a resource within the selected time span.

    • The metric used is kubernetes.io/container/memory/request_utilization.
  • Disk utilization: the disk utilization of pods that can be attributed to a resource within the selected time span. In contast to the previous to columns, this metric is created by pods and doesn't display on the Containers table.

    • The metric used is kubernetes.io/pod/volume/utilization.

For the utilization columns, keep in mind the following information:

  • These columns are not showing a ratio but are displaying 2 different data points, separated by a /. The first number shows the total capacity requested for that individual resource. The second number shows the percent utilization of the requested capacity.

  • The sparklines show the utilization data over the time span selected on the page.

Viewing resource details

The Cloud Operations for GKE dashboard displays a summary line for each Kubernetes resource by default. Clicking a row in a table shows the details for that resource.

Display of a resource detail.

The resource details panel displays information about the selected resource. It also provides an Incidents tab that displays information about open incidents and a Logs tab that displays logs generated by the resource.

The details panels for Pods and Containers also have a Metrics tab that displays metrics in charts. The details panels for Services and Namespaces only have an Incidents tab.

To view the alerting incident in Alerting, click View alerts.

Troubleshooting

For troubleshooting information, refer to Troubleshooting your GKE dashboard.