Observing your GKE clusters

This page describes how to access the Kubernetes Engine Monitoring and Legacy Logging and Monitoring monitoring dashboards, and how to use the Kubernetes Engine Monitoring monitoring dashboard.

Accessing the monitoring dashboard

  1. From the Cloud Console, go to Monitoring:

    Go to Monitoring

    If your Google Cloud project is already associated with a Workspace, then the Cloud Monitoring home page is displayed. Otherwise, a Workspace is created automatically. In general, this process requires no interaction from you, but it takes a few moments to complete. In some cases, the Add your project to a Workspace dialog is displayed. In this case, the simplest action is to create a new Workspace.

  2. Select Dashboards:

    • If your clusters use Kubernetes Engine Monitoring, select the dashboard named Kubernetes Engine New.

      This dashboard only displays clusters that use Kubernetes Engine Monitoring. If you don't see any clusters or if you don't see all the resources in your clusters, see Troubleshooting.

    • If your clusters use Legacy Logging and Monitoring, select the dashboard named Kubernetes Engine.

Kubernetes Engine Monitoring dashboard interface

The Kubernetes Engine Monitoring dashboard is divided into three parts:

Display the Kubernetes Engine Monitoring dashboard tabular view.

  1. The dashboard toolbar controls the time window for observations and provides dashboard settings and filters.

  2. The timeline event selector lets you select a specific time and display summaries of alerts. For detailed information, go to the Timeline events section.

  3. The details section lets you choose how your cluster information is presented to you. The next section provides more information on your choices.

Viewing tabs

The Kubernetes Engine Monitoring dashboard viewing tabs let you organize your cluster information by different hierarchies:

  • Infrastructure: Aggregates resources by Cluster, then Node, then Pod, and then by Container.

  • Workloads: Aggregates resources by Cluster, then Namespace, then Workload, then Pod, and lastly by Container.

  • Services: Aggregates resources by Cluster, then Namespace, then Service, then Pod, and lastly by Container.

Select your Kubernetes Engine Monitoring viewing mode.

The table is sorted to show resources with open incidents first. To view subcomponents of a resource, click expand for that resource. The following screenshot shows an expanded hierarchy of Kubernetes resources:

Display of the expanded hierarchy of Kubernetes resources.

Each resource name is preceded by an indicator which is red or green. A red indicator means that the resource, or a subcomponent of the resource, has an open incident. A green indicator means that there are no open incidents. To see the alerting details, metrics, and logs for a resource, click its row. For more details, go to the section on Viewing alerts, metrics, logs and details.

Column definitions

The Kubernetes Engine Monitoring dashboard displays data in columns based on the selected time range:

  • Name: The label you assigned to the Kubernetes resource.
  • Resource Type: The possible values are Cluster, Container, Namespace, Node, Pod, and Workspace.
  • Ready: The number of node instances available.
  • Incidents: The number of alerting violations.
  • CPU Utilization: The percent utilization compared to the requested CPU resources.
  • Memory Utilization: The percent utilization of requested memory.
  • Total Memory Usage: The amount of memory allocated.

Viewing alerts, metrics, logs, and details

The Kubernetes Engine Monitoring dashboard displays a summary line for each Kubernetes resource by default. Each resource with a subcomponent is listed with an expand button and all resources are listed with a a red or green indicator. A red indicator means that the resource, or a subcomponent of the resource, has an open incident. A green indicator means that there are no open incidents:

  • To view subcomponents of a resource, click expand for that resource.
  • To open a pane that displays a summary of incidents, system metrics, logs, and details for a resource, click the resource's row. When you click a row, the information that is displayed is dependent on the resource type. For example when you click a row for a cluster, you won't see metrics or log information. However, this information is displayed when you click a row for a pod.

    In the following example, there are no open incidents on the node:

    Display of a Kubernetes alerts details.

    To go to the Kubernetes page in the Cloud Console, click Manage.

Timeline events

You can also access the alerting details panel from the Kubernetes Engine Monitoring dashboard timeline event selector. A timeline of incidents gives you a view of alerting violations that happened within the selected time range. If you place your pointer over a red area in the timeline, event cards appear:

Using the timeline view of a Kubernetes alert.

Each event card provides detailed information about one incident displayed in the timeline. To view alerting details for an event, click its event card.

Troubleshooting

If you don't have the Kubernetes Engine NEW option in the Resource menu, this indicates that you don't have any GKE clusters using Kubernetes Engine Monitoring. Similarly, if Kubernetes Engine isn't listed, then you don't have any GKE clusters using Legacy Logging and Monitoring.

If you don't see any Kubernetes resources in your Kubernetes Engine Monitoring dashboard, then check the following:

  • Is the correct Google Cloud project selected at the top of the page? If not, use the drop-down list in the menu bar to select a project. You must select the project whose data you want to see.

  • Does your project have any activity? If you just created your cluster, wait a few minutes for it to populate with data. See Installing monitoring and logging support for details.

  • Is the time range too narrow? You can use the Time menu in the dashboard toolbar to select other time ranges or define a Custom range.

  • Do you have the proper permissions to view the dashboard? If you see either of the following permission-denied error messages when viewing a service's deployment details or a Google Cloud project's metrics, you need to update your Cloud Identity and Access Management role to include roles/monitoring.viewer or roles/viewer:

    • You do not have sufficient permissions to view this page
    • You don't have permissions to perform the action on the selected resources

    For more details, go to Predefined roles.

  • Does your cluster's service account have permission to write data into Monitoring and Logging? If you see high error rates on your API dashboard, then your service account might be missing the following roles:

    • roles/logging.logWriter: In the Google Cloud Console, this role is named Logs Writer. For more information on Logging roles, see the Logging access control guide.

    • roles/monitoring.metricWriter: In the Google Cloud Console, this role is named Monitoring Metric Writer. This role permits a service account to write metric data to a Workspace. For more information on Monitoring roles, see the Monitoring access control guide.

    • roles/stackdriver.resourceMetadata.writer: In the Google Cloud Console, this role is named Stackdriver Resource Metadata Writer. This role permits write-only access to resource metadata, and it provides exactly the permissions needed by agents to send metadata. For more information on Monitoring roles, see the Monitoring access control guide.