GKE cluster architecture

Autopilot Standard

This page describes the architecture of the Google Kubernetes Engine (GKE) clusters that run your containerized workloads. Use this page to learn about the control plane, nodes, and how the various GKE cluster components interact with each other.

This page is for Admins, Architects, and Operators who define IT solutions and system architecture. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Before reading this page, ensure that you're familiar with the Kubernetes cluster architecture.

A GKE cluster consists of a control plane and worker machines called nodes. The control plane and nodes make up the Kubernetes cluster orchestration system. GKE Autopilot manages the entire underlying infrastructure of clusters, including the control plane, nodes, and all system components.

If you use GKE Standard mode, GKE manages the control plane and system components, and you manage the nodes.

The following diagram shows the architecture of a GKE cluster:

This diagram shows the following components:

Control plane: managed by GKE. Runs the Kubernetes API server, workload controllers, Kubernetes scheduler, and cluster state storage.
Nodes: managed by GKE in Autopilot mode, and managed by customers in Standard mode. All of your Pods run in nodes.
Other Google Cloud services: available to integrate with GKE.

About the control plane

The control plane runs processes such as the Kubernetes API server, scheduler, and core resource controllers. GKE manages the control plane lifecycle from cluster creation to deletion. This includes upgrades to the Kubernetes version running on the control plane, which GKE performs automatically, or manually at your request if you prefer to upgrade earlier than the automatic schedule.

Control plane and the Kubernetes API

The control plane is the unified endpoint for your cluster. You interact with the control plane through Kubernetes API calls. The control plane runs the Kubernetes API server process (kube-apiserver) to handle API requests. You can make Kubernetes API calls in the following ways:

Direct calls: HTTP/gRPC
Indirect calls: Kubernetes command-line clients such as kubectl, or the Google Cloud console.

The API server process is the hub for all communication for the cluster. All internal cluster components such as nodes, system processes, and application controllers act as clients of the API server.

Your API requests tell Kubernetes what your desired state is for the objects in your cluster. Kubernetes attempts to constantly maintain that state. Kubernetes lets you configure objects in the API either imperatively or declaratively.

To learn more about object management in Kubernetes, refer to the following pages:

Control plane and the cluster state database

The open source Kubernetes project uses etcd as the storage database for all cluster data by default. The cluster state is kept in a key-value store that contains information about the state of every Kubernetes API object in your cluster. For example, the cluster state database stores every Secret, ConfigMap, and Deployment.

GKE clusters store the cluster state in one of the following key-value stores:

etcd: GKE stores the cluster state in etcd instances that run on every control plane virtual machine (VM).
Spanner: GKE stores the cluster state in Spanner. The Spanner database doesn't run in the cluster control plane.

Regardless of the type of database, every GKE cluster serves the etcd API in the control plane. The Kubernetes API server uses the etcd API to communicate with the backend cluster state database.

Control plane and node interaction

The control plane manages what runs on all of the cluster's nodes. The control plane schedules workloads and manages the workloads' lifecycle, scaling, and upgrades. The control plane also manages network and storage resources for those workloads. The control plane and nodes communicate with each other using Kubernetes APIs.

Control plane interactions with Artifact Registry

When you create or update a cluster, GKE pulls container images for the Kubernetes system software running on the control plane and nodes from Artifact Registry repositories in the pkg.dev or the gcr.io domain. An outage affecting these registries might cause the following actions to fail:

New cluster creation
Cluster version upgrades

Disruptions to workloads might occur even without your intervention, depending on the specific nature and duration of the outage.

If the Artifact Registry repository outage is regional, we might redirect requests to a zone or region that isn't affected by the outage.

To check the status of Google Cloud services, go to the Google Cloud status dashboard.

Best practice:

Deploy across multiple regions to permit availability of applications during region outages.

About the nodes

Nodes are the worker machines that run your containerized applications and other workloads. The individual machines are Compute Engine virtual machines (VMs) that GKE creates. The control plane manages and receives updates on each node's self-reported status.

A node runs the services necessary to support the containers that make up your cluster's workloads. These include the runtime and the Kubernetes node agent (kubelet), which communicates with the control plane and is responsible for starting and running containers scheduled on the node.

GKE also runs a number of system containers that run as per-node agents, called DaemonSets, that provide functionality such as log collection and intra-cluster network connectivity.

Best practice:

Use stdout for containerized applications because stdout lets your platform handle the application logs.

Node management varies based on the cluster mode of operation, as follows:

Node component	Autopilot mode	Standard mode
Lifecycle	Fully managed by GKE, including: Automatic upgrades Automatic repairs Health checks Automatic node scaling Node creation and deletion Resizing Labelling for workload separation.	GKE manages the following: Automatic upgrades Automatic repairs Health checks You can manage the following: Node automatic scaling configuration Manual version upgrades Configuration changes (such as labels and resizing) Node pool creation and deletion
Visibility	View nodes using `kubectl`. Underlying Compute Engine virtual machines not visible or accessible in the gcloud CLI or the Google Cloud console.	View nodes using `kubectl`, the gcloud CLI, and the Google Cloud console. View and access underlying Compute Engine VMs.
Connectivity	No direct connection to the underlying VMs.	Connect to underlying VMs using SSH.
Node operating system (OS)	Managed by GKE. All nodes use Container-Optimized OS with containerd (`cos_containerd`).	Choose an operating system for your nodes.
Machine hardware selection	Request compute classes in Pods based on use case. GKE manages machine configuration, scheduling, quantity, and lifecycle.	Choose and configure Compute Engine machine types when creating node pools. Configure settings for sizing, scaling, quantity, scheduling, and location based on need.