Observing your GKE clusters

This page describes how to access the Cloud Operations for GKE and Legacy Logging and Monitoring monitoring dashboards, and how to use the Cloud Operations for GKE monitoring dashboard.

Accessing the monitoring dashboard

  1. From the Cloud Console, go to Monitoring:

    Go to Monitoring

  2. Select Dashboards and then select one of the following dashboards:

    • For clusters with Cloud Operations for GKE enabled, select the dashboard named GKE.

    • For clusters with Legacy Logging and Monitoring enabled, select the dashboard named GKE Clusters.

    If you don't see any clusters or if you don't see all the resources in your clusters, refer to Troubleshooting your GKE dashboard.

Cloud Operations for GKE dashboard interface

The Cloud Operations for GKE dashboard is divided into several parts:

Display the Cloud Operations for GKE dashboard tabular view.

  1. The filter bar lets you select which GKE resources to filter on within the dashboard.

  2. The Alerts timeline lets you select a specific time span for your dashboard, and it displays a summary of alerts during that time span.

  3. The tables show your GKE fleet by type: cluster, namespace, node, workloads, services, pods, and containers. Each row shows a single resource with metrics. Clicking a row brings up a panel with more details about that resource.

Filter bar

The filter bar lets you filter the GKE resources shown in the dashboard to display the data you're interested in. It also displays information from other resources in your cluster related to your filter selections.

Using the filter bar

To filter the data in your dashboard, complete the following steps.

  1. Click the filter bar to display the filter options.

    Display the filter options.

  2. Select a Kubernetes resource you want to filter on and then select the resource name.

    If more than 1 resource has that name, then select the specific resource instance you want to filter on.

  3. Click Apply.

    The dashboard refreshes to display the updated information.

When using the filter bar, keep the following points in mind:

  • After applying a filter to the dashboard, you can click the filter bar again to filter on additional resources.

  • Some resources might have too many options to display in the filter menu. In this case, you must first filter on a parent resource to narrow down the options. For example, you might have too many Pods to display, so you could first filter by Cluster or any other Kubernetes resource to narrow the list of Pods.

  • To clarify the scope of each filter string, the filter interface might display additional filters by default, based on which resource you choose to filter on. For example, if you filter on a specific Namespace, then the filter adds the Cluster that the Namespace resides in.

    Display additional filter options selected by the system.

Alerts timeline

The Alerts timeline provides you with a view of the alerts in your clusters. The timeline of alerts gives you a view of alerting violations that happened within the selected time span. If you place your pointer over a red area in the timeline, event cards appear.

Using the timeline view of a Kubernetes alert.

Each event card provides detailed information about one alert displayed in the timeline.

The time span drop-down menu lets you set the time frame for your alert timeline and for the tables in your dashboard.

Using the event cards

When you hover over the alert timeline, the dashboard displays event cards for each alerting violation. An alert with indicates the incident is still open, while an alert with indicates the incident is closed.

If the time frame you selected has more than 2 alerts, you can use your mouse wheel to scroll through the event cards. You can also click View all alerts to display all of the event cards in a panel.

To view the alerting incident in Alerting, click View incident.

In the Associated resource section, the event card shows which resource the alert is associated with. If the dashboard can't determine which resource the alert is associated with, the event card provides an Update alert policy link, which takes you to the Edit alerting policy page. From here, you can update the alerting policy with additional information so that the dashboard can find the associated resource.

Dashboard tables

The dashboard displays a table of metrics for each GKE resource. The tables display the following columns:

  • Name: the display name of the resource.

  • Alerts: the number of open and acknowledged alerts for that resource and its children that occurred within the selected time span.

  • Service-level objectives (SLOs): a statement of desired performance for your services as measured through your selected service-level indicator (SLI).

  • Container restarts: the number of times a container restarted within the selected time span.

  • Error logs: the number of error logs associated with an entity based on the selected time range.

  • CPU utilization: the CPU utilization of containers that can be attributed to a resource within the selected time span.

  • Memory utilization: the memory utilization of containers that can be attributed to a resource within the selected time span.

  • Disk utilization: the disk utilization of pods that can be attributed to a resource within the selected time span. In contrast to the previous two columns, this metric is created by pods and doesn't display on the Containers table.

For the utilization columns, keep in mind the following information:

  • These columns are not showing a ratio but are displaying 2 different data points, separated by a /. The first number shows the total capacity requested for that individual resource. The second number shows the percent utilization of the requested capacity.

  • The sparklines show the utilization data over the time span selected on the page.

Configuring the dashboard tables

You can configure the tables in the dashboard to display the information you're most interested in seeing. Using the Configure resource tables drop-down menu next to the filter bar, you can select which columns to display. You can also choose whether or not to show sparklines in the tables.

Note that these configurations apply to all of the tables in the dashboard.

Display of the Configure resource tables drop-down menu.

To configure your tables, complete the following steps.

  1. Select the Configure resource tables drop-down menu.

  2. Select the columns to display in the tables.

    The Name and Active alerts columns are required.

  3. Select whether or not to display sparklines.

  4. Click Apply to make the changes.

Viewing resource details

The Cloud Operations for GKE dashboard displays a summary line for each Kubernetes resource by default. Clicking a row in a table shows the details for that resource.

Display of a resource detail.

The resource details panel displays information about the selected resource. It also provides an Alerts tab that displays information about open alerts, an Events tab that displays the Kubernetes events associated with the selected resource, a Metrics tab that displays metrics in charts, and a Logs tab that displays logs generated by the resource.

To view the alerting incident in Alerting, click View alerts.

Viewing Kubernetes events

The Events tab on the resource details panel displays the Kubernetes events associated with the resource. Kubernetes events are available for all resources except containers.

Display of an Event tab for a resource detail.

The Events tab has a series of cards that display information about each event. A card also provides a link to the associated resource if the event occurred in a lower resource. You can click View log to open Logs Explorer to view the log associated with the event. You can also click Copy message to copy the log message to your clipboard.

To view all of the events in Logs Explorer, click View in Logging. Logs Explorer opens and displays all of the logs associated with the Kubernetes events.

Managing SLOs

You can track the health and performance of your applications using service-level objectives (SLOs). After configuring the dashboard to display the Service-level objectives (SLOs) column, you can see if your applications are meeting their SLOs. Your resource's SLO might have one of the following statuses:

  • Healthy: indicates that the resource meets the specified SLO. This status has a green indicator.

  • Out of error budget: indicates that the resource has depleted its error budget, meaning that additional bad events might cause your resource to violate its SLO. This status has a yellow indicator.

  • Unhealthy: indicates that the resource is out of SLO and has an alert that's firing. This status has a red indicator.

  • No status: indicates that no data exists for that SLO. This status has a gray indicator.

For more information about these concepts, refer to Concepts in service monitoring.

SLO details

You can only define SLOs for the following Kubernetes resources: namespaces, workloads, and Kubernetes services. To see detailed information about your resource's SLO compliance, click the resource to open the details panel. On the details panel, click the SLOS tab.

Display of the SLOs tab in the details panel.

Create SLO

You can create an SLO for your Kubernetes resource from the Cloud Operations for GKE monitoring dashboard.

On the GKE dashboard page, click in the row of the Kubernetes resource to open the Create a Service Level Objective (SLO) panel.

On the resource's details panel, click Create SLO to open the Create a Service Level Objective (SLO) panel.

Display of the SLOs creation panel.

For information on completing the form to create an SLO, refer to the Creating an SLO guide.

Viewing logs in Logs Explorer

You can search for and view your cluster's log data through the Logs Explorer. Using the Logs Explorer, you can view your logs, parse and analyze your log data, and refine your query parameters.

You can find more details in the Logging documentation about using the Logs Explorer.

Troubleshooting

For troubleshooting information, refer to Troubleshooting your GKE dashboard.