Managing GKE logs

This page provides an overview of the logging options available in Google Kubernetes Engine (GKE).

Overview

When you enable the Cloud Operations for GKE integration with Cloud Logging and Cloud Monitoring for your cluster, your logs are stored in a dedicated, persistent datastore. While GKE itself stores logs, these logs are not stored permanently. For example, GKE container logs are removed when their host Pod is removed, when the disk on which they are stored runs out of space, or when they are replaced by newer logs. System logs are periodically removed to free up space for new logs. Cluster events are removed after one hour.

For container and system logs, GKE deploys, by default, a per-node logging agent that reads container logs, adds helpful metadata, and then stores them. The logging agent checks for container logs in the following sources:

  • Standard output and standard error logs from containerized processes

  • kubelet and container runtime logs

  • Logs for system components, such as VM startup scripts

For events, GKE uses a deployment in the kube-system namespace which automatically collects events and sends them to Logging.

What logs are collected

By default, GKE collects logs for both your system and application workloads deployed to the cluster.

  • System logs – These logs include the audit logs for the cluster including the Admin Activity log, Data Access log, and the Events log. For detailed information about the Audit Logs for GKE, refer to the Audit Logs for GKE documentation. Some system logs run as containers, such as those for the kube-system, and they're described in Controlling the collection of your application logs.

  • Application logs – Kubernetes containers collect logs for your workloads, written to STDOUT and STDERR.

Collecting your logs

When you create a new GKE cluster, Cloud Operations for GKE integration with Cloud Logging and Cloud Monitoring is enabled by default.

For Legacy Cloud Logging, you can follow the documentation on how to enable or disable the Logging integration.

Logging defaults

When enabled, a dedicated agent is automatically deployed and managed. It runs on the GKE node to collect logs, adds helpful metadata about the container, pod, and cluster, and then sends the logs to Cloud Logging. Both system logs and your workload's application logs are then delivered to the Logs Router in Cloud Logging.

From there, logs are either ingested into Cloud Logging or excluded. The Logs Router also provides an optional step to export your logs to BigQuery, Pub/Sub, or Cloud Storage.

Customizing log collection for system logs only

Beginning with GKE version 1.15.7, you can configure Cloud Operations for GKE to only capture system logs and not collect application logs.

Log collection with custom fluentd

GKE's default logging agent provides a managed solution to deploy and manage the agents that send the logs for your clusters to Cloud Logging. If you want to alter the default behavior of the fluentd agents, then you can run a customized fluentd agent.

Common use cases include:

  • removing sensitive data from your logs

  • collecting additional logs not written to STDOUT or STDERR

Collecting Linux auditd logs for GKE nodes

You can enable verbose operating system audit logs on Google Kubernetes Engine nodes running Container-Optimized OS. Operating system logs on your nodes provide valuable information about the state of your cluster and workloads, such as error messages, login attempts, and binary executions.You can use this information to debug problems or investigate security incidents.

To learn more, go to Enabling Linux auditd logs on GKE nodes.

GKE Audit Logs

For detailed information about log entries that apply to the Kubernetes Cluster and GKE Cluster Operations resource types, go to Audit logging.

Logging Access Control

There are two aspects of logging access control: application access and user access. Cloud Logging provides IAM roles that you can use to grant appropriate access.

Application Access

Applications need permission to write logs which is granted though assigning the IAM role roles/logging.logWriter to the service account for an application. When you create a GKE cluster, the roles/logging.logWriter role is enabled by default.

User View Access

You need to have the roles/logging.viewer role to view your logs in your project. If you need to have access to the Data Access logs, you need to have the logging.privateLogViewer IAM permission.

For more information about permissions and roles, go to the Access control guide. You can also review the Best practices for Cloud Audit Logs document, which also applies to Cloud Logging in general.

User Admin Access

IAM roles roles/logging.configWriter and roles/logging.admin provide the administrative capabilities. The roles/logging.configWriter IAM role is required to create a logging sink which is commonly used to direct your logs to a specific or centralized project. For example, you might want to use a logging sink along with a logging filter to direct all of your logs for a namespace to a centralized logging bucket.

To learn more, go to the Access Control guide for Cloud Logging.

Best practices

  • Structured logging: Single-line JSON strings written to standard output or standard error will be read into Google Cloud's operations suite as structured log entries. See Structured logging for more details. You can use Advanced logs filters to filter logs based on their fields.
  • Severities: By default, logs written to the standard output are on the INFO level and logs written to the standard error are on the ERROR level. Structured logs can include a severity field, which defines the log's severity.
  • Exporting to BigQuery: You can export logs to external services, such as BigQuery or Pub/Sub, for additional analysis. Logs exported to BigQuery retain their format and structure. See Overview of logs exports for further information.
  • Alerting: You can use logs-based metrics to set up alerting policies when Logging logs unexpected behavior. For an example, see Creating a simple alerting policy on a counter metric. For detailed information on logs-based metrics, see Overview of logs-based metrics.
  • Error reporting: You can use Error Reporting to collect errors produced in your clusters.