Autopilot overview

Stay organized with collections Save and categorize content based on your preferences.

Autopilot is a new mode of operation in Google Kubernetes Engine (GKE) that is designed to reduce the operational cost of managing clusters, optimize your clusters for production, and yield higher workload availability. The mode of operation refers to the level of flexibility, responsibility, and control that you have over your cluster. In addition to the benefits of a fully managed control plane and node automations, GKE offers two modes of operation:

  • Autopilot: GKE provisions and manages the cluster's underlying infrastructure, including nodes and node pools, giving you an optimized cluster with a hands-off experience.
  • Standard: You manage the cluster's underlying infrastructure, giving you node configuration flexibility.

With Autopilot, you no longer have to monitor the health of your nodes or calculate the amount of compute capacity that your workloads require. Autopilot supports most Kubernetes APIs, tools, and its rich ecosystem. You stay within GKE without having to interact with the Compute Engine APIs, CLIs, or UI, as the nodes are not accessible through Compute Engine, like they are in Standard mode. You pay only for the CPU, memory, and storage that your Pods request while they are running.

Autopilot clusters are pre-configured with an optimized cluster configuration that is ready for production workloads. This streamlined configuration follows GKE best practices and recommendations for cluster and workload setup and security. Some of these built-in settings (detailed in the Autopilot and Standard comparison table) are immutable and other optional settings can be turned on or off.

Autopilot comes with a SLA that covers both the control plane and your Pods. With Autopilot, as the underlying infrastructure is abstracted away, you can focus on the Kubernetes API and your deployments. Autopilot uses the resource requirements that you define in your PodSpec and provisions the resources for the deployment such as CPU, memory, and persistent disks.

Here are some reasons why you might want to use the Standard mode of operation instead of Autopilot:

  • You require a higher level of control over your cluster configuration.
  • Your clusters must run workloads that do not meet Autopilot constraints.

Scaling

Autopilot automatically scales the cluster's resources based on your Pod specifications, so that you can focus on your Pods. To automatically increase or decrease the number of Pods, you can implement Horizontal pod autoscaling using the standard Kubernetes CPU or memory metrics, or using custom metrics through Cloud Monitoring.

Allowable resource ranges

Autopilot lets you request CPU, memory, and ephemeral storage resources for your workloads. The allowed ranges depend on whether you want to run your Pods on the default general-purpose compute platform, or on a compute class. For information about the default container resource requests and the allowed resource ranges, refer to Resource requests in Autopilot.

Workload limitations and restrictions in Autopilot

Autopilot supports most workloads that run your applications. In order for GKE to offer management of the nodes and provide you with a more streamlined operational experience, there are a few restrictions and limitations when compared to GKE Standard. Some of these limitations are security best practices, while others allow Autopilot clusters to be safely managed. Workload limitations apply to all Pods, including those launched by Deployments, DaemonSets, ReplicaSets, ReplicationControllers, StatefulSets, Jobs, and CronJobs.

Host options restrictions

HostPort and hostNetwork are not permitted because node management is handled by GKE. Using hostPath volumes in write mode is prohibited, while using hostPath volumes in read mode is allowed only for /var/log/ path prefixes. Using host namespaces in workloads is prohibited.

Linux workload limitations

Autopilot supports only the following Linux capabilities for workloads:

"SETPCAP", "MKNOD", "AUDIT_WRITE", "CHOWN", "DAC_OVERRIDE", "FOWNER",
"FSETID", "KILL", "SETGID", "SETUID", "NET_BIND_SERVICE", "SYS_CHROOT", "SETFCAP"

In GKE version 1.21 and later, the "SYS_PTRACE" capability is also supported for workloads.

Node selectors and node affinity

Zonal affinity topologies are supported. Node affinity and node selectors are limited for use only with the following keys: topology.kubernetes.io/region, topology.kubernetes.io/zone, failure-domain.beta.kubernetes.io/region, failure-domain.beta.kubernetes.io/zone, cloud.google.com/gke-os-distribution, kubernetes.io/os, and kubernetes.io/arch. Not all values of OS and arch are supported in Autopilot.

You can also use node selectors and node affinity for the following purposes:

No privileged Pods

Privileged mode for containers in workloads is mainly used to make changes to nodes, like changing kubelet or networking settings. With Autopilot clusters, node changes aren't allowed, so these types of Pods are also not allowed. This restriction might impact some admin workloads.

Pod affinity and anti-affinity

Although GKE manages your nodes for you in Autopilot, you retain the ability to schedule your Pods. Autopilot supports Pod affinity , so that you can co-locate Pods together on a single node for network efficiency. For example, you can use Pod affinity to deploy frontend Pods on nodes with backend Pods. Pod affinity is limited for use only with the following keys: topology.kubernetes.io/region, topology.kubernetes.io/zone, failure-domain.beta.kubernetes.io/region, and failure-domain.beta.kubernetes.io/zone.

Autopilot also supports anti-affinity, so that you can spread Pods across nodes to avoid single points of failure. For example, you can use Pod anti-affinity to prevent frontend Pods from co-locating with backend Pods.

Tolerations supported only for workload separation

Tolerations are supported only for workload separation. Taints are automatically added by node auto-provisioning as needed.

Security limitations in Autopilot

Container isolation

Autopilot enforces a hardened configuration for your Pods that provides enhanced security isolation and helps limit the impact of container escape vulnerabilities on your cluster:

  • The container runtime default seccomp profile is applied, by default, to all Pods in your cluster.
  • The CAP_NET_RAW container permission is dropped for all containers. The CAP_NET_RAW permission is not typically used and was the subject of multiple container escape vulnerabilities. The lack of CAP_NET_RAW might cause the use of ping to fail inside your container.
  • Workload Identity is enforced and prevents Pod access to the underlying Compute Engine service account and other sensitive node metadata.
  • Services with spec.ExternalIPs set are blocked to protect against CVE-2020-8554. These services are rarely used.
  • The following StorageTypes are allowed. Other StorageTypes are blocked because they require privileges over the node:

    "configMap", "csi", "downwardAPI", "emptyDir", "gcePersistentDisk", "hostPath",
    "nfs", "persistentVolumeClaim", "projected", "secret"
    

Pod security policies

Autopilot enforces settings that provide enhanced isolation for your containers. Kubernetes PodSecurityPolicy is not supported on Autopilot clusters. In GKE versions older than 1.21, OPA Gatekeeper and Policy Controller are also not supported.

Security boundaries in Autopilot

At the Kubernetes layer, the GKE Autopilot mode provides the Kubernetes API but removes permissions to use some highly privileged Kubernetes primitives, like privileged Pods, with the goal to limit the ability to access, modify, or directly control the node virtual machine (VM).

These restrictions are put in place for GKE Autopilot mode to limit workloads from having low-level access to the node VM, in order to allow Google Cloud to offer full management of nodes, and a Pod-level SLA.

Our intent is to prevent unintended access to the node virtual machine. We accept submissions to that effect through the Google Vulnerability Reward Program (VRP) and will reward reports at the discretion of the Google VRP reward panel.

By design, privileged users, like cluster administrators, have full control of any GKE cluster. As a security best practice, we recommend that you avoid granting powerful GKE/Kubernetes privileges widely and instead use namespace admin delegation wherever possible as described in our multi-tenancy guidance.

Workloads on Autopilot continue to enjoy the same security as GKE Standard mode, where single-tenant VMs are provisioned in the user's project for their exclusive use. And, like Standard, on each individual VM, Autopilot workloads within a cluster might run together on a VM with a kernel that is security-hardened, but shared.

Since the shared kernel represents a single security boundary, GKE recommends that if you require strong isolation, such as high-risk or untrusted workloads, run your workloads on GKE Standard clusters using GKE Sandbox to provide multi-layer security protection.

Other limitations in Autopilot

Certificate signing requests

You cannot create certificate signing requests within Autopilot.

External monitoring tools

Most external monitoring tools require access that is restricted. Solutions from several Google Cloud partners are available for use on Autopilot, however not all are supported, and custom monitoring tools cannot be installed on Autopilot clusters.

External services

External IP Services are not permitted on Autopilot clusters. To give a Service an external IP, you can use a LoadBalancer type of Service or use an Ingress to add the Service to an external IP shared among several services.

Managed namespaces

The kube-system namespace is managed, meaning that all resources in this namespace cannot be altered and new resources cannot be created.

No changes to nodes

You can't make changes to Autopilot nodes, such as changes to the underlying machine type if your workloads have specific compute requirements.

No conversion

Converting Standard clusters to Autopilot mode and converting Autopilot clusters to Standard mode is not supported.

No direct external inbound connections for private clusters

Autopilot clusters with private nodes do not have external IPs and cannot accept inbound connections directly. If you deploy services on a NodePort, you cannot access those services from outside the VPC, such as from the internet. To expose applications externally in Autopilot clusters, use Services. For more information, see Exposing applications using services.

No Pod bursting

For Standard clusters, Pods can be configured to burst into unused capacity on the node. For Autopilot clusters, since all Pods have limits set on requests, resource bursting is not possible. It is important to ensure that your Pod specification defines adequate resources for the resource requests, and does not rely on bursting.

No SSH to Nodes

Since you're no longer provisioning or managing the nodes in Autopilot, there's no SSH access to nodes. GKE handles all operational aspects of the nodes, including node health and all Kubernetes components running on the nodes.

You can still connect remotely to your running containers using the Kubernetes exec functionality to execute commands in your containers for debugging, including connecting to an interactive shell, for example with kubectl exec -it deploy/YOUR_DEPLOYMENT -- sh.

Resource limits

In an Autopilot cluster, each Pod is treated as a Guaranteed QoS Class Pod, with limits that are equal to requests. Autopilot automatically sets resource limits equal to requests if you do not have resource limits specified. If you do specify resource limits, your limits will be overridden and set to be equal to the requests.

Serial port logging

Autopilot clusters require serial port logging to be enabled to debug and troubleshoot your nodes. If your Google Cloud organization has an organization policy that enforces the compute.disableSerialPortLogging constraint, new nodes might not provision.

Ask your organization policy administrator to remove this constraint in projects with Autopilot clusters.

Webhooks limitations

In GKE version 1.21 and later, you can also create mutating dynamic admission webhooks. However, Autopilot modifies mutating webhooks objects to add a namespace selector which excludes the resources in managed namespaces (e.g. kube-system) from being intercepted. Additionally, webhooks which specify one or more of following resources (and any of their sub-resources) in the rules, will be rejected:

- group: ""
  resource: nodes
- group: ""
  resource: persistentvolumes
- group: certificates.k8s.io
  resource: certificatesigningrequests
- group: authentication.k8s.io
  resource: tokenreviews

You cannot use the * token, which represents all values, in the resources or groups field to allow the preceding resources.

User impersonation limitation

GKE version 1.22.4-gke.1501 and later support user impersonation for all user-defined users and groups. System users and groups such as the kube-apiserver user and the system:masters group cannot be impersonated.

No Google Cloud Marketplace applications

You can't install apps from Cloud Marketplace in Autopilot clusters.

Troubleshooting

For troubleshooting steps, refer to Troubleshooting Autopilot clusters.

What's next