With the speed of development in Kubernetes, there are often new security features for you to use. This page guides you through implementing our current guidance for hardening your Google Kubernetes Engine (GKE) cluster.
This guide prioritizes high-value security mitigations that require customer action at cluster creation time. Less critical features, secure-by-default settings, and those that can be enabled post-creation time are mentioned later in the document. For a general overview of security topics, read the Security Overview.
Many of these recommendations, as well as other common misconfigurations, can be automatically checked using Security Health Analytics.
Where the recommendations below relate to a CIS GKE Benchmark Recommendation, this is specified.
Upgrade your GKE infrastructure in a timely fashion
CIS GKE Benchmark Recommendation: 6.5.3. Ensure Node Auto-Upgrade is enabled for GKE nodes
Keeping the version of Kubernetes up to date is one of the simplest things you can do to improve your security. Kubernetes frequently introduces new security features and provides security patches.
See the GKE security bulletins for information on security patches.
In Google Kubernetes Engine, the control planes are patched and upgraded for you automatically. Node auto-upgrade also automatically upgrades nodes in your cluster.
If you choose to disable node auto-upgrade, we recommend upgrading monthly on your own schedule. Older clusters should opt-in to node auto-upgrade and closely follow the GKE security bulletins for critical patches.
To learn more, see Auto-upgrading nodes.
Restrict network access to the control plane and nodes
CIS GKE Benchmark Recommendations: 6.6.2. Prefer VPC-native clusters, 6.6.3. Ensure Master Authorized Networks is Enabled, 6.6.4. Ensure clusters are created with Private Endpoint Enabled and Public Access Disabled, and 6.6.5. Ensure clusters are created with Private Nodes
You should limit exposure of your cluster control plane and nodes to the internet. These settings can only be set at cluster creation time.
By default the GKE cluster control plane and nodes have internet routable addresses that can be accessed from any IP address.
For the GKE cluster control plane, see Creating a private cluster. There are three different flavors of private clusters that can deliver network level protection:
- Public endpoint access disabled: This is the most secure option as it prevents all internet access to both control planes and nodes. This is a good choice if you have configured your on-premises network to connect to Google Cloud using Cloud Interconnect and Cloud VPN. Those technologies effectively connect your company network to your cloud VPC.
- Public endpoint access enabled, authorized networks enabled (recommended): This option provides restricted access to the control plane from source IP addresses that you define. This is a good choice if you don't have existing VPN infrastructure or have remote users or branch offices that connect over the public internet instead of the corporate VPN and Cloud Interconnect or Cloud VPN.
- Public endpoint access enabled, authorized networks disabled: This is the default and allows anyone on the internet to make network connections to the control plane.
To disable direct internet access to nodes, specify the gcloud CLIoption --enable-private-nodes at cluster creation.
This tells GKE to provision nodes with internal IP addresses, which means the nodes aren't directly reachable over the public internet.
We recommend clusters at least use authorized networks and private nodes. This ensures the control plane is reachable by:
- The allowed CIDRs in authorized networks.
- Nodes within your cluster's VPC.
- Google's internal production jobs that manage your control plane.
That corresponds to the following gcloud
flags at cluster creation time:
--enable-ip-alias
--enable-private-nodes
--enable-master-authorized-networks
Group authentication
CIS GKE Benchmark Recommendation: 6.8.3. Consider managing Kubernetes RBAC users with Google Groups for RBAC
You should use groups to manage your users. Using groups allows identities to be controlled using your Identity management system and Identity administrators. Adjusting the group membership negates the need to update your RBAC configuration whenever anyone is added or removed from the group.
To manage user permissions using Google Groups, you must enable Google Groups for RBAC on your cluster. This allows you to manage users with the same permissions easily, while allowing your identity administrators to manage users centrally and consistently.
See Google Groups for RBAC for instructions on enabling Google Groups for RBAC.
Container node choices
The following sections describe secure node configuration choices.
Enable Shielded GKE Nodes
CIS GKE Benchmark Recommendation: 6.5.5. Ensure Shielded GKE Nodes are enabled
Shielded GKE Nodes provide strong, verifiable node identity and integrity to increase the security of GKE nodes and should be enabled on all GKE clusters.
You can enable Shielded GKE Nodes at cluster creation or update. Shielded GKE Nodes should be enabled with secure boot. Secure boot should not be used if you need third-party unsigned kernel modules. For instructions on how to enable Shielded GKE Nodes, and how to enable secure boot with Shielded GKE Nodes, see Using Shielded GKE Nodes.
Choose a hardened node image with the containerd runtime
The Container-Optimized OS with containerd
(cos_containerd
) image is a
variant of the Container-Optimized OS image with containerd as the main
container runtime directly integrated with Kubernetes.
containerd is the core runtime component of Docker and has been designed to deliver core container functionality for the Kubernetes Container Runtime Interface (CRI). It is significantly less complex than the full Docker daemon, and therefore has a smaller attack surface.
To use the cos_containerd
image in your cluster, see Containerd images.
The cos_containerd
image is the preferred image for GKE
because it has been custom built, optimized, and hardened specifically for running containers.
Enable Workload Identity
CIS GKE Benchmark Recommendation: 6.2.2. Prefer using dedicated Google Cloud Service Accounts and Workload Identity
Workload Identity is the recommended way to authenticate to Google APIs. It replaces the previous practices of using the node service account or exporting service account keys into secrets as described in Authenticating to Google Cloud with Service Accounts.
Workload Identity also replaces the need to use Metadata Concealment and as such, the two approaches are incompatible. The sensitive metadata protected by Metadata Concealment is also protected by Workload Identity.
Harden workload isolation with GKE Sandbox
CIS GKE Benchmark Recommendation: 6.10.4. Consider GKE Sandbox for hardening workload isolation, especially for untrusted workloads
GKE Sandbox provides an extra layer of security to prevent malicious code from affecting the host kernel on your cluster nodes.
You can run containers in a sandboxed environment to mitigate against most container escape attacks, also called local privilege escalation attacks. For past container escape vulnerabilities, refer to the security bulletins. This type of attack lets an attacker gain access to the host VM of the container, and therefore gain access to other containers on the same VM. A sandbox such as GKE Sandbox can help limit the impact of these attacks.
You should consider sandboxing a workload in situations such as:
- The workload runs untrusted code
- You want to limit the impact if an attacker compromises a container in the workload.
Learn how to use GKE Sandbox in Harden workload isolation with GKE Sandbox.
Enable security bulletin notifications
When security bulletins are available that are relevant to your cluster, GKE publishes notifications about those events as messages to Pub/Sub topics that you configure. You can receive these notifications on a Pub/Sub subscription, integrate with third-party services, and filter for the notification types you want to receive.
For more information about receiving security bulletins using GKE cluster notifications, see Cluster notifications.
Permissions
Use least privilege Google service accounts
CIS GKE Benchmark Recommendation: 6.2.1. Prefer not running GKE clusters using the Compute Engine default service account
Each GKE node has an Identity and Access Management (IAM) Service Account associated with it. By default, nodes are given the Compute Engine default service account, which you can find by navigating to the IAM section of the Google Cloud console. This account has broad access by default, making it useful to wide variety of applications, but it has more permissions than are required to run your Kubernetes Engine cluster. You should create and use a minimally privileged service account for your nodes to use instead of the Compute Engine default service account.
With the launch of Workload Identity, we suggest a more limited use case for the node service account. We expect the node service account to be used by system daemons responsible for logging, monitoring and similar tasks. Workloads in Pods should instead be provisioned Google identities with Workload Identity.
GKE requires the service account to have, at minimum, the
permissions granted by the
roles/container.nodeServiceAccount
predefined IAM role.
The following commands create an IAM service account with the minimum permissions required to operate GKE. You can also use the service account for resources in other projects. For instructions, refer to Enabling service account impersonation across projects.
gcloud
Create a service account:
gcloud iam service-accounts create SA_NAME \ --display-name="DISPLAY_NAME"
Add the
roles/container.nodeServiceAccount
role to the service account:gcloud projects add-iam-policy-binding PROJECT_ID \ --member "serviceAccount:SA_NAME@PROJECT_ID.iam.gserviceaccount.com" \ --role roles/container.nodeServiceAccount
Replace the following:
SA_NAME
: the name of the new service account.DISPLAY_NAME
: the display name for the new service account, which makes the account easier to identify.PROJECT_ID
: the project ID of the project in which you want to create the new service account.
Config Connector
Note: This step requires Config Connector. Follow the installation instructions to install Config Connector on your cluster.
To create the service account, download the following resource as
service-account.yaml
.Replace the following:
SA_NAME
: the name of the new service account.DISPLAY_NAME
: the display name for the new service account, which makes the account easier to identify.
Then, run:
kubectl apply -f service-account.yaml
Apply the
roles/container.nodeServiceAccount
role to the service account. Download the following resource aspolicy-least-privilege.yaml
. Replace[SA_NAME]
and[PROJECT_ID]
with your own information.kubectl apply -f policy-least-privilege.yaml
If you use private images in Container Registry, you also need to grant access to those:
gsutil
gsutil iam ch \
serviceAccount:SA_NAME@PROJECT_ID.iam.gserviceaccount.com:objectViewer \
gs://BUCKET_NAME
The bucket that stores your images has the name BUCKET_NAME
of the form:
artifacts.PROJECT_ID.appspot.com
for images pushed to a registry in the hostgcr.io
, orSTORAGE_REGION.artifacts.PROJECT_ID.appspot.com
Replace the following:
PROJECT_ID
: your Google Cloud console project ID.STORAGE_REGION
: the location of the storage bucket:us
for registries in the hostus.gcr.io
eu
for registries in the hosteu.gcr.io
asia
for registries in the hostasia.gcr.io
Refer to the gsutil iam
documentation
for more information about the command.
Config Connector
Note: This step requires Config Connector. Follow the installation instructions to install Config Connector on your cluster.
Apply the storage.objectViewer
role to your service account. Download the following resource as policy-object-viewer.yaml
. Replace [SA_NAME]
and [PROJECT_ID]
with your own information.
kubectl apply -f policy-object-viewer.yaml
If you want another human user to be able to create new clusters or node pools with this service account, you must grant them the Service Account User role on this service account:
gcloud
gcloud iam service-accounts add-iam-policy-binding \ SA_NAME@PROJECT_ID.iam.gserviceaccount.com \ --member=user:USER \ --role=roles/iam.serviceAccountUser
Config Connector
Note: This step requires Config Connector. Follow the installation instructions to install Config Connector on your cluster.
Apply the iam,serviceAccountUser
role to your service account. Download the
following resource as policy-service-account-user.yaml
. Replace [SA_NAME]
and [PROJECT_ID]
with your own information.
kubectl apply -f policy-service-account-user.yaml
If your cluster already exists, you can now create a new node pool with this new service account:
gcloud container node-pools create NODE_POOL_NAME \ --service-account=SA_NAME@PROJECT_ID.iam.gserviceaccount.com \ --cluster=CLUSTER_NAME
If you need your GKE cluster to have access to other Google Cloud services, you should create an additional service account and grant your workloads access to the service account using Workload Identity.
Restrict access to cluster API discovery
By default, Kubernetes bootstraps clusters with a permissive set of discovery ClusterRoleBindings which give broad access to information about a cluster's APIs, including those of CustomResourceDefinitions.
Users should be aware that the system:authenticated
Group included in the
subjects of the system:discovery
and system:basic-user
ClusterRoleBindings
can include any authenticated user (including any user with a Google account),
and does not represent a meaningful level of security for clusters on
GKE.
Those wishing to harden to their cluster's discovery APIs should consider one or more of the following:
- Configure Authorized networks to restrict access to set IP ranges.
- Set up a private cluster to restrict access to a VPC.
If neither of these options are suitable for your GKE use case, you should treat all API discovery information (namely the schema of CustomResources, APIService definitions, and discovery information hosted by extension API servers) as publicly disclosed.
Both of these options allow access to the API server IP address from Cloud Run and Cloud Functions. This access is being removed, so do not rely on these services to communicate with your API server. For more information, refer to the Google Cloud blog post.
Use namespaces and RBAC to restrict access to cluster resources
CIS GKE Benchmark Recommendation: 5.6.1. Create administrative boundaries between resources using namespaces
Give teams least-privilege access to Kubernetes by creating separate namespaces or clusters for each team and environment. Assign cost centers and appropriate labels to each namespace for accountability and chargeback. Only give developers the level of access to their namespace that they need to deploy and manage their application, especially in production. Map out the tasks that your users need to undertake against the cluster and define the permissions that they require to do each task.
For more information about creating namespaces, see the Kubernetes documentation.
IAM and Role-based access control (RBAC) work together, and an entity must have sufficient permissions at either level to work with resources in your cluster.
Assign the appropriate IAM roles for GKE to groups and users to provide permissions at the project level and use RBAC to grant permissions on a cluster and namespace level. To learn more, see Access control.
You can use IAM and RBAC permissions together with namespaces to restrict user interactions with cluster resources on Google Cloud console. For more information, see Enable access and view cluster resources by namespace.Restrict traffic among Pods with a network policy
CIS GKE Benchmark Recommendation: 6.6.7. Ensure Network Policy is Enabled and set as appropriate
By default, all Pods in a cluster can communicate with each other. You should control Pod to Pod communication as needed for your workloads.
Restricting network access to services makes it much more difficult for attackers to move laterally within your cluster, and also offers services some protection against accidental or deliberate denial of service. Two recommended ways to control traffic are:
- Use Istio. See Installing Istio on Google Kubernetes Engine if you're interested in load balancing, service authorization, throttling, quota, metrics and more.
- Use Kubernetes network policies. See Creating a cluster network policy. Choose this if you're looking for the basic access control functionality exposed by Kubernetes. To implement common approaches for restricting traffic using network policies, follow the implementation guide from the Anthos Security Blueprints. Also, the Kubernetes documentation has an excellent walkthrough for a simple nginx deployment. Consider using network policy logging to verify that your network policies are working as expected.
Istio and network policy may be used together if there is a need to do so.
Secret management
CIS GKE Benchmark Recommendation: 6.3.1. Consider encrypting Kubernetes Secrets using keys managed in Cloud KMS
You should provide an additional layer of protection for sensitive data, such as secrets, stored in etcd. To do this you need to configure a secrets manager that is integrated with GKE clusters. Some solutions will work both in GKE and in Anthos clusters on VMware, and so may be more desirable if you are running workloads across multiple environments. If you choose to use an external secrets manager such as HashiCorp Vault, you'll want to have that set up before you create your cluster.
You have several options for secret management.
- You can use Kubernetes secrets natively in GKE. Optionally, you can encrypt these at the application-layer with a key you manage, using Application-layer secrets encryption.
- You can use a secrets manager such as HashiCorp Vault. When run in a hardened HA mode, this will provide a consistent, production-ready way to manage secrets. You can authenticate to HashiCorp Vault using either a Kubernetes service account or a Google Cloud service account. To learn more about using GKE with Vault, see Running and connecting to HashiCorp Vault on Kubernetes.
GKE VMs are encrypted at the storage layer by default, which includes etcd.
Use admission controllers to enforce policy
CIS GKE Benchmark Recommendation: 6.10.3. Ensure Pod Security Policy is Enabled and set as appropriate
Admission controllers are plugins that govern and enforce how the cluster is used. They must be enabled to use some of the more advanced security features of Kubernetes and are an important part of the defence in depth approach to hardening your cluster
By default, Pods in Kubernetes can operate with capabilities beyond what they require. You should constrain the Pod's capabilities to only those required for that workload.
Kubernetes supports numerous controls for restricting your Pods to execute with only explicitly granted capabilities. The two most popular controls being Gatekeeper and Pod Security Policies.
Gatekeeper provides a powerful means to enforce and validate security on GKE clusters using declarative policies. To learn how to use Gatekeeper to perform declarative controls on your GKE cluster, see Applying Pod security policies using Gatekeeper.
Pod Security Policy allows you to set smart defaults for your Pods, and enforce controls you want to enable across your fleet. The policies you define should be specific to the needs of your application. The restricted-psp.yaml example policy is a good starting point.
To learn more about Pod Security Policy, see Using PodSecurityPolicies.
If you are using a NetworkPolicy, and you have a Pod that is subject to a PodSecurityPolicy, create an RBAC Role or ClusterRole that has permission to use the PodSecurityPolicy. Then bind the Role or ClusterRole to the Pod's service account. Granting permissions to user accounts is not sufficient in this case. For more information, see Authorizing policies.
Restrict the ability for workloads to self-modify
Certain Kubernetes workloads, especially system workloads, have permission to self-modify. For example, some workloads vertically autoscale themselves. While convenient, this can allow an attacker who has already compromised a node to escalate further in the cluster. For example, an attacker could have a workload on the node change itself to run as a more privileged service account that exists in the same namespace.
Ideally, workloads should not be granted the permission to modify themselves in the first place. When self-modification is necessary, you can limit permissions by applying Gatekeeper or Policy Controller constraints, such as NoUpdateServiceAccount from the open source Gatekeeper library, which provides several useful security policies.
When you deploy policies, it is usually necessary to allow the controllers that
manage the cluster lifecycle to bypass the policies. This is necessary so that
the controllers can make changes to the cluster, such as applying cluster
upgrades. For example, if you deploy the NoUpdateServiceAccount
policy on
GKE, you must set the following parameters in the Constraint
:
parameters:
allowedGroups:
- system:masters
allowedUsers:
- system:addon-manager
Monitor your cluster configuration
You should audit your cluster configurations for deviations from your defined settings.
Many of the recommendations covered in this hardening guide, as well as other common misconfigurations, can be automatically checked using Security Health Analytics.
Secure defaults
The following sections describe options that are securely configured by default in new clusters. You should verify that preexisting clusters are configured securely.
Protect node metadata
CIS GKE Benchmark Recommendations: 6.4.1. Ensure legacy Compute Engine instance metadata APIs are Disabled and 6.4.2. Ensure the GKE Metadata Server is Enabled
The v0.1
and v1beta1
Compute Engine metadata server endpoints were deprecated
and shutdown on September 30, 2020. These endpoints did not enforce metadata query headers.
For the shutdown schedule, refer to v0.1
and v1beta1
metadata server endpoints deprecation.
Some practical attacks against Kubernetes rely on access to the VM's metadata server to extract credentials. These attacks are blocked if you are using Workload identity or Metadata Concealment.
Leave legacy client authentication methods disabled
CIS GKE Benchmark Recommendations: 6.8.1. Ensure Basic Authentication using static passwords is Disabled and 6.8.2. Ensure authentication using Client Certificates is Disabled
There are several methods of authenticating
to the Kubernetes API server. In GKE, the supported methods
are service account bearer tokens, OAuth tokens, and x509 client certificates.
GKE manages authentication with gcloud
for you using the
OAuth token method, setting up the Kubernetes configuration, getting an access
token, and keeping it up to date.
Prior to GKE's integration with OAuth, a one-time generated x509 certificate or static password were the only available authentication methods, but are now not recommended and should be disabled. These methods present a wider surface of attack for cluster compromise and have been disabled by default since GKE version 1.12. If you are using legacy authentication methods, we recommend that you turn them off. Authentication with a static password is deprecated and has been removed since GKE version 1.19.
Existing clusters should move to OAuth. If a long-lived credential is needed by a system external to the cluster we recommend you create a Google service account or a Kubernetes service account with the necessary privileges and export the key.
To update an existing cluster and remove the static password, see Disabling authentication with a static password.
Currently, there is no way to remove the pre-issued client certificate from an existing cluster, but it has no permissions if RBAC is enabled and ABAC is disabled.
Leave Cloud Logging enabled
CIS GKE Benchmark Recommendation: 6.7.1. Ensure Stackdriver Kubernetes Logging and Monitoring is Enabled
To reduce operational overhead and to maintain a consolidated view of your logs, implement a logging strategy that is consistent wherever your clusters are deployed. Anthos clusters are integrated with Cloud Logging by default and that should remain configured.
All GKE clusters have Kubernetes audit logging enabled by default, which keeps a chronological record of calls that have been made to the Kubernetes API server. Kubernetes audit log entries are useful for investigating suspicious API requests, for collecting statistics, or for creating monitoring alerts for unwanted API calls.
GKE clusters integrate Kubernetes Audit Logging with Cloud Audit Logs and Cloud Logging. Logs can be routed from Cloud Logging to your own logging systems.
Leave the Kubernetes web UI (Dashboard) disabled
CIS GKE Benchmark Recommendation: 6.10.1. Ensure Kubernetes web UI is Disabled
You should not enable the Kubernetes web UI (Dashboard) when running on GKE.
The Kubernetes web UI (Dashboard) is backed by a highly privileged Kubernetes Service Account. The Google Cloud console provides much of the same functionality, so you don't need these permissions.
To disable the Kubernetes web UI:
gcloud container clusters update CLUSTER_NAME \ --update-addons=KubernetesDashboard=DISABLED
Leave ABAC disabled
CIS GKE Benchmark Recommendation: 6.8.4. Ensure Legacy Authorization (ABAC) is Disabled
You should disable Attribute-Based Access Control (ABAC), and instead use Role-Based Access Control (RBAC) in GKE.
By default, ABAC is disabled for clusters created using GKE version 1.8 and later. In Kubernetes, RBAC is used to grant permissions to resources at the cluster and namespace level. RBAC allows you to define roles with rules containing a set of permissions. RBAC has significant security advantages over ABAC.
If you're still relying on ABAC, first review the Prerequisites for using RBAC. If you upgraded your cluster from an older version and are using ABAC, you should update your access controls configuration:
gcloud container clusters update CLUSTER_NAME \ --no-enable-legacy-authorization
To create a new cluster with the above recommendation:
gcloud container clusters create CLUSTER_NAME \ --no-enable-legacy-authorization
Leave the DenyServiceExternalIPs
admission controller enabled
Do not disable the DenyServiceExternalIPs
admission controller.
The
DenyServiceExternalIPs
admission controller blocks Services from using ExternalIPs and mitigates a
known security vulnerability.
The DenyServiceExternalIPs
admission controller is enabled by default on new
clusters created on GKE versions 1.21 and later. For clusters
upgrading to GKE versions 1.21 and later, you can enable the
admission controller using the following command:
gcloud beta container clusters update CLUSTER_NAME \
--no-enable-service-externalips
What's next
- Learn more about GKE security in the Security Overview.
- Make sure you understand the GKE shared responsibility model.
- Understand how to apply the CIS GKE Benchmark to your cluster.
- Learn more about access control in GKE.
- Read the GKE network overview.
- Read the GKE multi-tenancy overview.