Containers & Kubernetes

Move-in ready Kubernetes security with GKE Autopilot

March 7, 2024

Greg Castle

Security Engineer, GKE Security

Try Gemini 1.5 models

Google's most advanced multimodal models in Vertex AI

Creating and managing the security of Kubernetes clusters is a lot like building or renovating a house. Both require making concessions across many areas when trying to find a balance between security, usability and maintainability.

For homeowners, these choices include utility and aesthetic options, such as installing floors, fixtures, benchtops, and tiles. They also include security decisions: what types of doors, locks, lights, cameras, and sensors should you install? How should they be connected, monitored, and maintained? Who do you call when there’s a problem?

Kubernetes clusters are similar: Each cluster is like a house you’re constructing. The initial security decisions you make determine how well you can detect and respond to attacks. Wouldn't it be nicer if all of those decisions were made by experts who stuck around and upgraded your house when the technology improved?

This is where GKE Autopilot comes in. We use Google Cloud’s deep Kubernetes security expertise to configure your clusters to be move-in ready for your production workloads. Autopilot is a great example of Google Cloud’s shared fate operating model, where we work to be proactive partners in helping you to achieve your desired security outcomes on our platform. With Google Cloud’s security tools we give you the means to run an entire city of Kubernetes clusters more securely.

The work we do to configure cluster-level security depends on which mode you choose for the cluster. In Standard mode, Google Cloud handles all of the Kubernetes security configuration, but leaves many node configuration decisions and in-cluster policy configuration up to you. In Autopilot mode, we fully configure nodes, node pools, and in-cluster policy for you according to security best practices, allowing you to focus on workload-specific security.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_yJl1kbz.max-1000x1000.png

Say goodbye to node security responsibilities

By fully managing the nodes in Autopilot mode, Google Cloud handles the complexity of the node security surface, while still letting you use the flexible Kubernetes API to fine-tune the configuration.

This approach can help solve security challenges, including:

Keeping nodes patched, balancing speed of patching with availability
Preventing overprivileged containers that create risk of node breakouts
Preventing unauthorized access and changes to nodes via SSH, privileged volume types, webhooks, and the certificates API
Protecting privileged namespaces such as kube-system from unauthorized access
Allowlisting privileges required for common security tools from our Autopilot partner workloads list

The benefits of this shift in responsibility extend beyond stopping insecure practices — it also makes configuring some important security features much easier. On Autopilot, enabling GKE Sandbox is simple because we manage the nodes. When you set the runtimeClass option in your podspec, Autopilot figures out the correct node configuration to run the sandboxed container for you. Additionally, GKE Security Posture vulnerability scanning is on by default.

By taking on the responsibility for node security, we continue to make usability improvements while also tightening host security over time. We add new defenses and increasingly sophisticated detection as technology improves, without you needing to migrate node settings or configuration.

Built-in cluster policy

In-cluster policy is responsible for making sure Kubernetes API objects like Pods and Services are well-configured, don’t create security risks, and are compliant with your organization's security best practices. In particular, you need to avoid pods that have the ability to “break out” to the node and access privileged agents or interfere with other workloads.

Typically this work is done by installing tooling like Policy Controller or a third-party product like Gatekeeper. However, even managed policy introduces some overhead:

Deciding which tooling and vendor you prefer
Identifying which controls are appropriate for your organization's containers
Creating exemptions for privileged security and monitoring tooling
Creating self-serve exemptions for developers needing to debug containers

Autopilot removes that work by installing policies that meet security best practices but allow the majority of workloads to run without modification. We work with a vetted set of third-party security partners to ensure their tools and monitoring work out of the box without customer effort. We have built in some self-service security exemptions that keep workloads protected by default but still allow developers to debug problems.

Autopilot's built-in policy implements 93% of the Kubernetes Baseline security standard and 60% of the Restricted standard. While most users will find this to be a good balance of security and usability, if you have additional security or compliance requirements, you can address those using an additional policy tool as mentioned above.

The benefits of this policy aren't just more compliance checkboxes. From Jan 2022 to December 2023, the default Autopilot configuration protected clusters against 62% of the container breakout vulnerabilities reported through Google’s kCTF VRP and kernelCTF.

GKE security configuration done right

Autopilot helps prevent insecure configuration at the GKE API layer and simplifies policy concerns by standardizing the configuration. This is the building code that reduces the risk of electrical fires.

Our goal with GKE has always been to build in security by default, and over the years we’ve added always-on security in multiple parts of the product. Some highlights include enabling Shielded Nodes and Auto-upgrades by default to protect against node tampering and to keep clusters patched.

As we introduced those changes, we retained the option to disable some of these security features on GKE Standard, in order to maintain backwards compatibility. This adds overhead for cluster admins: to keep clusters secure at scale, you'd need to configure Organization Policy Service or Terraform validation for each feature.

With Autopilot, we started fresh and removed all of those less secure options. Strong security features including Workload Identity, auto-upgrades, and Shielded Nodes are always on and can't be turned off. As a result of ensuring that all new clusters are Autopilot clusters, your security policy management is simplified. Additionally, moving security feature configuration into the workload and out of the node pool API provides a consolidated surface on which to enforce policy.

Try it out today

Autopilot is the default choice for all new GKE clusters, and gives you move-in ready security. It sets up a baseline of in-cluster security policy for you, and allows common, trusted security tools. It simplifies enforcing cluster security practices by moving responsibility for node security to Google, protecting the node and system components, and removing legacy security options. Try it out by creating a cluster today.

Posted in