About GKE logs


This page provides an overview of the logging options available in Google Kubernetes Engine (GKE).

Overview

GKE logs sent to Cloud Logging are stored in a dedicated, persistent datastore. While GKE itself stores logs, these logs are not stored permanently. For example, GKE container logs are removed when their host Pod is removed, when the disk on which they are stored runs out of space, or when they are replaced by newer logs. System logs are periodically removed to free up space for new logs. Cluster events are removed after one hour.

GKE logging agent

For container and system logs, GKE deploys, by default, a per-node logging agent that reads container logs, adds helpful metadata, and then stores them in Cloud Logging. The GKE logging agent checks for container logs in the following sources:

  • Standard output and standard error logs from containerized processes

  • kubelet and container runtime logs

  • Logs for system components, such as VM startup scripts

For events, GKE uses a deployment in the kube-system namespace which automatically collects events and sends them to Logging.

What logs are collected

By default, GKE collects several types of logs from your cluster and stores those in Cloud Logging:

  • Audit logs include the Admin Activity log, Data Access log, and the Events log. For detailed information about the Audit Logs for GKE, refer to the Audit Logs for GKE documentation. Audit logs for GKE cannot be disabled.

  • System logs include logs from the following sources:

    • All Pods running in namespaces kube-system, istio-system, knative-serving, gke-system, and config-management-system.

    • Key services that are not containerized including docker/containerd runtime, kubelet, kubelet-monitor, node-problem-detector, and kube-container-runtime-monitor.

    • The node's serial ports output, if the VM instance metadata serial-port-logging-enable is set to true. As of GKE 1.16-13-gke.400, serial port output for nodes is collected by the Logging agent. To disable serial port output logging, set --metadata serial-port-logging-enable=false during cluster creation. Serial port output is useful for troubleshooting crashes, failed boots, startup issues, or shutdown issues with GKE nodes. Disabling these logs might make troubleshooting issues more difficult.

  • Application logs include all logs generated by non-system containers running on user nodes.

Optionally, GKE can collect additional types of logs from certain Kubernetes control plane components and store them in Cloud Logging:

  • API server logs include all logs generated by the Kubernetes API server (kube-apiserver).

  • Scheduler logs include all logs generated by the Kubernetes Scheduler (kube-scheduler).

  • Controller Manager logs include all logs generated by the Kubernetes Controller Manager (kube-controller-manager).

To learn more about each of these control plane components, see GKE cluster architecture.

Collecting your logs

When you create a new GKE cluster, integration with Cloud Logging is enabled by default.

System and application logs are delivered to the Log Router in Cloud Logging.

From there, logs can be ingested into Cloud Logging, excluded, or exported to BigQuery, Pub/Sub, or Cloud Storage.

Beginning with GKE version 1.15.7, you can configure a Standard cluster to only capture system logs and not collect application logs. For both Autopilot and Standard clusters, exclusion filters let you reduce the volume of logs sent to Cloud Logging.

Logging throughput

When system logging is enabled, a dedicated Cloud Logging agent is automatically deployed and managed. It runs on all GKE nodes in a cluster to collect logs, adds helpful metadata about the container, pod, and cluster, and then sends the logs to Cloud Logging using a fluentbit-based agent.

If any GKE nodes require more than the default log throughput and your GKE Standard cluster is using control plane version 1.23.13-gke.1000 or later, you can configure GKE to deploy an alternative configuration of the Logging agent designed to maximize logging throughput.

For more information, see Adjust log throughput.

Log collection with custom fluentd or fluentbit

GKE's default logging agent provides a managed solution to deploy and manage the agents that send the logs for your clusters to Cloud Logging. Depending on your GKE control plane version, either fluentd or fluentbit are used to collect logs. Starting from GKE 1.17, logs are collected using a fluentbit-based agent. GKE clusters using versions prior to GKE 1.17 use a fluentd-based agent. If you want to alter the default behavior of the fluentdagents, then you can run a customized fluentd agent.

Common use cases include:

  • removing sensitive data from your logs

  • collecting additional logs not written to STDOUT or STDERR

  • using specific performance-related settings

  • customized log formatting

Collecting Linux auditd logs for GKE nodes

You can enable verbose operating system audit logs on GKE nodes running Container-Optimized OS. Operating system logs on your nodes provide valuable information about the state of your cluster and workloads, such as error messages, login attempts, and binary executions. You can use this information to debug problems or investigate security incidents.

To learn more, see Enabling Linux auditd logs on GKE nodes.

GKE Audit Logs

For detailed information about log entries that apply to the Kubernetes Cluster and GKE Cluster Operations resource types, go to Audit logging.

Logging Access Control

There are two aspects of logging access control: application access and user access. Cloud Logging provides Identity and Access Management (IAM) roles that you can use to grant appropriate access.

Application Access

Applications need permissions to write logs to Cloud Logging, which is granted by assigning the IAM role roles/logging.logWriter to the service account attached to the underlying node pool.

User View Access

You need to have the roles/logging.viewer role to view your logs in your project. If you need to have access to the Data Access logs, you need to have the logging.privateLogViewer IAM permission.

For more information about permissions and roles, go to the Access control guide. You can also review Best practices for Cloud Audit Logs, which also apply to Cloud Logging in general.

User Admin Access

The IAM roles roles/logging.configWriter and roles/logging.admin provide the administrative capabilities. The roles/logging.configWriter role is required to create a logging sink which is commonly used to direct your logs to a specific or centralized project. For example, you might want to use a logging sink along with a logging filter to direct all of your logs for a namespace to a centralized logging bucket.

To learn more, go to the Access Control guide for Cloud Logging.

Best practices

  • Structured logging: The logging agent integrated with GKE will read JSON documents serialized to single-line strings and written to standard output or standard error and will send them to Google Cloud Observability as structured log entries.
    • See Structured logging for more details on working with an integrated logging agent.
    • You can use Advanced logs filters to filter logs based on the JSON document's fields.
    • Logs generated with glog will have the common fields parsed, for example, severity, pid, source_file, source_line. However, the message payload itself is unparsed and shows up verbatim in the resulting log message in Google Cloud Observability.
  • Severities: By default, logs written to the standard output are on the INFO level and logs written to the standard error are on the ERROR level. Structured logs can include a severity field, which defines the log's severity.
  • Exporting to BigQuery: For additional analysis, you can export logs to external services, such as BigQuery or Pub/Sub. Logs exported to BigQuery retain their format and structure. See Routing and storage overview for further information.
  • Alerting: When Logging logs unexpected behavior, you can use logs-based metrics to set up alerting policies. For an example, see Create an alerting policy on a counter metric. For detailed information on logs-based metrics, see Overview of logs-based metrics.
  • Error reporting: To collect errors from applications running on your clusters, you can use Error Reporting.

Control plane logs

You can configure a GKE cluster to send logs emitted by the Kubernetes API server, Scheduler, and Controller Manager to Cloud Logging.

Requirements

Sending logs emitted by Kubernetes control plane components to Cloud Logging requires GKE control plane version 1.22.0 or later and requires that the collection of system logs be enabled.

Configuring collection of control plane logs

See the instructions to configure logging support for a new cluster or for an existing cluster.

Pricing

GKE control plane logs are exported to Cloud Logging. Cloud Logging pricing applies.

Quota

Control plane logs consume the "Write requests per minute" quota of the Cloud Logging API. Before enabling control plane logs, check your recent peak usage of that quota. If you have many clusters in the same project or are already approaching the quota limit, then you can request a quota-limit increase before enabling control plane logs.

Access controls

If you want to limit access within your organization to Kubernetes control plane logs, you can create a separate log bucket with more limited access controls.

By storing them in a separate log bucket with limited access, control plane logs in the log bucket won't automatically be accessible to anyone with roles/logging.viewer access to the project. Additionally, if you decide to delete certain control plane logs due to privacy or security concerns, storing them in a separate log bucket with limited access makes it possible to delete the logs without impacting logs from other components or services.