With the speed of development in Kubernetes, there are often new security features for you to use. This page guides you through implementing our current guidance for hardening your Google Kubernetes Engine (GKE) cluster.
This guide prioritizes high-value security mitigations that require customer action at cluster creation time. Less critical features, secure-by-default settings, and those that can be enabled post-creation time are mentioned later in the document. For a general overview of security topics, read the Security Overview.
Many of these recommendations, as well as other common misconfigurations, can be automatically checked using Security Health Analytics.
Upgrade your GKE infrastructure in a timely fashion (default 2019-11-11)
Keeping the version of Kubernetes up to date is one of the simplest things you can do to improve your security. Kubernetes frequently introduces new security features and provides security patches.
See the GKE security bulletins for information on security patches.
In Google Kubernetes Engine, the masters are patched and upgraded for you automatically. Node auto-upgrade also automatically upgrades nodes in your cluster.
If you choose to disable node auto-upgrade, we recommend upgrading monthly on your own schedule. Older clusters should opt-in to node auto-upgrade and closely follow the GKE security bulletins for critical patches.
To learn more, see Auto-upgrading nodes.
Restrict network access to the control plane and nodes
You should limit exposure of your cluster control plane and nodes to the internet. These settings can only be set at cluster creation time.
By default the GKE cluster control plane and nodes have internet routable addresses that can be accessed from any IP address.
For the GKE cluster control plane, see Creating a private cluster. There are three different flavors of private clusters that can deliver network level protection:
- Public endpoint access disabled: This is the most secure option as it prevents all internet access to both masters and nodes. This is a good choice if you have configured your on-premises network to connect to Google Cloud using Cloud Interconnect and Cloud VPN. Those technologies effectively connect your company network to your cloud VPC.
- Public endpoint access enabled, master authorized networks enabled (recommended): This option gives the control plane a public IP address, but installs a customer configurable firewall in front that allows you to restrict which IP addresses can talk to the control plane. This is a good choice if: you don't have existing VPN infrastructure; or have road warriors or branch offices that connect over the public internet instead of the corporate VPN and Cloud Interconnect/Cloud VPN.
- Public endpoint access enabled, master authorized networks disabled: This is the default and allows anyone on the internet to make network connections to the control plane.
To disable direct internet access to nodes, specify the
gcloud tooloption --enable-private-nodes at cluster creation.
This tells GKE to provision nodes with RFC 1918 private IP addresses, which means the nodes aren't directly reachable over the public internet.
We recommend clusters at least use master authorized networks and private nodes. This ensures the control plane is reachable by:
- The whitelisted CIDRs in master authorized networks.
- Nodes within your cluster's VPC.
- Google's internal production jobs that manage your master.
That corresponds to the following
gcloud flags at cluster creation time:
Group authentication (Beta)
This setting can only be enabled at cluster creation time.
You should use groups to manage your users. Using groups allows identities to be controlled using your Identity management system and Identity administrators. Adjusting the group membership negates the need to update your RBAC configuration whenever anyone is added or removed from the group.
To manage user permissions using Google Groups, you must enable Google Groups for GKE when creating your cluster. This allows you to manage users with the same permissions easily, while allowing your identity administrators to manage users centrally and consistently.
To enable Google Groups for GKE, create a Google Group,
to manage user access, and specify the
cluster creation time.
Container node choices
The following sections describe secure node configuration choices.
Enable shielded GKE nodes (Beta)
Shielded GKE Nodes provide strong, verifiable node identity and integrity to increase the security of GKE nodes and should be enabled on all GKE clusters.
To enable Shielded GKE Nodes, specify the
--enable-shielded-nodes at cluster creation or update. Shielded GKE Nodes
should be enabled with secure boot. Secure boot should not be used if you need
third-party unsigned kernel modules. To enable secure boot, specify the
--shielded-secure-boot at cluster creation.
Choose a hardened node image with the containerd runtime (Beta)
The Container-Optimized OS with Containerd (cos_containerd) image is a variant of the Container-Optimized OS image with containerd as the main container runtime directly integrated with Kubernetes.
containerd is the core runtime component of Docker and has been designed to deliver core container functionality for the Kubernetes Container Runtime Interface (CRI). It is significantly less complex than the full Docker daemon, and therefore has a smaller attack surface.
To use cos_containerd image in your cluster, specify the
--image-type=cos_containerd at cluster creation or upgrade time.
cos_containerd is the preferred image for GKE as it has been custom built, optimized, and hardened specifically for running containers.
Enable Workload Identity (Beta)
Workload Identity is the recommended way to authenticate to Google APIs. It replaces the previous practices of using the node service account or exporting service account keys into secrets as described in Authenticating to Google Cloud Platform with Service Accounts.
Workload Identity also replaces the need to use Metadata Concealment and as such, the two approaches are incompatible. The sensitive metadata protected by Metadata Concealment is also protected by Workload Identity.
Use least privilege Google service accounts
Each GKE node has an IAM Service Account associated with it. By default, nodes are given the Compute Engine default service account, which you can find by navigating to the IAM section of the Cloud Console. This account has broad access by default, making it useful to wide variety of applications, but it has more permissions than are required to run your Kubernetes Engine cluster. You should create and use a minimally privileged service account to run your GKE cluster instead of using the Compute Engine default service account.
With the launch of Workload Identity, we suggest a more limited use case for the node service account. We expect the node service account to be used by system daemons responsible for logging, monitoring and similar tasks. Workloads in Pods should instead be provisioned Google identities with Workload Identity.
The following commands create an IAM service account with the minimum permissions required to operate GKE:
gcloud iam service-accounts create [SA_NAME] \ --display-name=[SA_NAME] gcloud projects add-iam-policy-binding [PROJECT_ID] \ --member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \ --role roles/logging.logWriter gcloud projects add-iam-policy-binding [PROJECT_ID] \ --member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \ --role roles/monitoring.metricWriter gcloud projects add-iam-policy-binding [PROJECT_ID] \ --member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \ --role roles/monitoring.viewer
To create the service account, download the following resource as
[SA_NAME]with the name you want to use for the service account.
apiVersion: iam.cnrm.cloud.google.com/v1alpha1 kind: IAMServiceAccount metadata: name: [SA_NAME] spec: displayName: [SA_NAME]
kubectl apply -f service-account.yaml
logging.logWriterrole to the service account. Download the following resource as
[PROJECT_ID]with your own information.
apiVersion: iam.cnrm.cloud.google.com/v1alpha1 kind: IAMPolicyMember metadata: name: policy-logging spec: member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com role: roles/logging.logWriter resourceRef: kind: Project name: [PROJECT_ID]
kubectl apply -f policy-logging.yaml
monitoring.metricWriterrole. Download the following resource as
[PROJECT_ID]with your own information.
apiVersion: iam.cnrm.cloud.google.com/v1alpha1 kind: IAMPolicyMember metadata: name: policy-metrics-writer spec: member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com role: roles/monitoring.metricWriter resourceRef: kind: Project name: [PROJECT_ID]
kubectl apply -f policy-logging.yaml
monitoring.viewerrole. Download the following resource as
[PROJECT_ID]with your own information.
apiVersion: iam.cnrm.cloud.google.com/v1alpha1 kind: IAMPolicyMember metadata: name: policy-monitoring spec: member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com role: roles/monitoring.viewer resourceRef: kind: Project name: [PROJECT_ID]
kubectl apply -f policy-monitoring.yaml
If you use private images in Google Container Registry, you also need to grant access to those:
gcloud projects add-iam-policy-binding [PROJECT_ID] \ --member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \ --role roles/storage.objectViewer
monitoring.viewer role to your service account. Download the following resource as
[PROJECT_ID] with your own information.
apiVersion: iam.cnrm.cloud.google.com/v1alpha1 kind: IAMPolicyMember metadata: name: policy-object-viewer spec: member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com role: roles/storage.objectViewer resourceRef: kind: Project name: [PROJECT_ID]
kubectl apply -f policy-object-viewer.yaml
If you want another human user to be able to create new clusters or node pools with this service account, you must grant them the Service Account User role on this service account:
gcloud iam service-accounts add-iam-policy-binding \ [SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com \ --member=user:[USER] \ --role=roles/iam.serviceAccountUser
iam,serviceAccountUser role to your service account. Download the following resource as
[PROJECT_ID] with your own information.
apiVersion: iam.cnrm.cloud.google.com/v1alpha1 kind: IAMPolicyMember metadata: name: policy-service-account-user spec: member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com role: roles/iam.serviceAccountUser resourceRef: kind: Project name: [PROJECT_ID]
kubectl apply -f policy-service-account-user.yaml
If your cluster already exists, you can now create a new node pool with this new service account:
gcloud container node-pools create [NODE_POOL] \ --service-account=[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com \ --cluster=[CLUSTER_NAME]
If you need your GKE cluster to have access to other Google Cloud services, you should create an additional service account and grant your workloads access to the service account using Workload Identity.
Restrict cluster discovery RBAC permissions
By default, Kubernetes bootstraps clusters with a permissive set of discovery ClusterRoleBindings which give broad access to information about a cluster's APIs, including those of CustomResourceDefinitions.
Users should be aware that the
system:authenticated Group included in the
subjects of the
can include any authenticated user (including any user with a Google account),
and does not represent a meaningful level of security for clusters on
Those wishing to harden to their cluster's discovery APIs should consider one or more of the following:
- Configure Authorized networks to restrict access to set IP ranges.
- Set up a private cluster to restrict access to a VPC.
- Curate the subjects of the default
system:basic-userClusterRoleBindings, for example, rather than the Kubernetes default of allowing access to
system:(un)authenticated, consider only allowing access to the
system:serviceaccountsGroup plus other known Users and Groups.
Use Namespaces and RBAC to restrict access to cluster resources
Give teams least-privilege access to Kubernetes by creating separate namespaces or clusters for each team and environment. Assign cost centers and appropriate labels to each namespace for accountability and chargeback. Only give developers the level of access to their namespace that they need to deploy and manage their application, especially in production. Map out the tasks that your users need to undertake against the cluster and define the permissions that they require to do each task.
Assign the appropriate Cloud IAM roles for GKE to groups and users to provide permissions at the project level and use RBAC to grant permissions on a cluster and namespace level. To learn more, see Access control.
For more information, refer to Preparing a Kubernetes Engine Environment for Production.
Restrict traffic among Pods with a Network Policy
By default, all Pods in a cluster can communicate with each other. You should control Pod to Pod communication as needed for your workloads.
Restricting network access to services makes it much more difficult for attackers to move laterally within your cluster, and also offers services some protection against accidental or deliberate denial of service. Two recommended ways to control traffic are:
- Use Istio. Choose this if you're interested in load balancing, service authorization, throttling, quota, metrics and more.
- Use Kubernetes Network Policies. See Setting a Cluster Network Policy. Choose this if you're looking for the basic access control functionality exposed by Kubernetes. The Kubernetes documentation has an excellent walkthrough for a simple nginx deployment.
Istio and Network Policy may be used together if there is a need to do so.
You should provide an additional layer of protection for sensitive data, such as secrets, stored in etcd. To do this you need to configure a secrets manager that is integrated with GKE clusters. Some solutions will work both in GKE and in GKE On-Prem, and so may be more desirable if you are running workloads across multiple environments. If you choose to use an external secrets manager such as HashiCorp Vault, you'll want to have that set up before you create your cluster.
You have several options for secret management.
- You can use Kubernetes secrets natively in GKE. Optionally, you can encrypt these at the application-layer with a key you manage, using Application-layer secrets encryption.
- You can use a secrets manager such as HashiCorp Vault. When run in a hardened HA mode, this will provide a consistent, production-ready way to manage secrets. You can authenticate to HashiCorp Vault using either a Kubernetes service account or a Google Cloud service account. To learn more about using GKE with Vault, see Running and connecting to HashiCorp Vault on Kubernetes.
GKE VMs are encrypted at the storage layer by default, which includes etcd.
Use admission controllers to enforce policy
Admission controllers are plugins that govern and enforce how the cluster is used. They must be enabled to use some of the more advanced security features of Kubernetes and are an important part of the defence in depth approach to hardening your cluster
By default, Pods in Kubernetes can operate with capabilities beyond what they require. You should constrain the Pod's capabilities to only those required for that workload.
Kubernetes offers controls for restricting your Pods to execute with only explicitly granted capabilities. Pod Security Policy allows you to set smart defaults for your Pods, and enforce controls you want to enable across your fleet. The policies you define should be specific to the needs of your application. The restricted-psp.yaml example policy is a good starting point.
To learn more about Pod Security Policy, see Using PodSecurityPolicies.
If you are using a NetworkPolicy, and you have a Pod that is subject to a PodSecurityPolicy, create an RBAC Role or ClusterRole that has permission to use the PodSecurityPolicy. Then bind the Role or ClusterRole to the Pod's service account. Granting permissions to user accounts is not sufficient in this case. For more information, see Authorizing policies.
Monitor your cluster configuration
You should audit your cluster configurations for deviations from your defined settings.
Many of the recommendations covered in this hardening guide, as well as other common misconfigurations, can be automatically checked using Security Health Analytics.
The following sections describe options that are securely configured by default in new clusters. You should verify that preexisting clusters are configured securely.
Protect node metadata (default for 1.12+)
Compute Engine's instance
metadata server exposes legacy
/v1beta1/ endpoints, which do not enforce metadata query
headers. These APIs have
been disabled by default for new 1.12+ clusters. If you have upgraded clusters
from older versions you should disable these legacy
To learn more, see Protecting Cluster Metadata.
Leave legacy client authentication methods disabled (default 1.12+)
There are several methods of authenticating to the Kubernetes API server.
In GKE, the supported methods are service account bearer
tokens, OAuth tokens, x509 client certificates, and static passwords.
GKE manages authentication with
gcloud for you using the
OAuth token method, setting up the Kubernetes configuration, getting an access
token, and keeping it up to date.
Prior to GKE's integration with Google OAuth, the pre-provisioned x509 certificate or static password were the only available authentication methods, but are now not recommended and disabled by default on new clusters since 1.12+.
Existing clusters should move to OAuth. If a long-lived credential is needed by a system external to the cluster we recommend you create a Google service account or a Kubernetes service account with the necessary privileges and export the key.
To update an existing cluster and remove the static password:
gcloud container clusters update [CLUSTER_NAME] \ --no-enable-basic-auth
Currently, there is no way to remove the pre-issued client certificate from an existing cluster, but it has no permissions if RBAC is enabled and ABAC is disabled.
Leave Stackdriver logging enabled (default)
To reduce operational overhead and to maintain a consolidated view of your logs, implement a logging strategy that is consistent wherever your clusters are deployed. Anthos clusters are integrated with Stackdriver by default and that should remain configured.
All GKE clusters have Kubernetes audit logging enabled by default, which keeps a chronological record of calls that have been made to the Kubernetes API server. Kubernetes audit log entries are useful for investigating suspicious API requests, for collecting statistics, or for creating monitoring alerts for unwanted API calls.
Leave the Kubernetes web UI (Dashboard) disabled (default for 1.10+)
You should not enable the Kubernetes web UI (Dashboard) when running on GKE.
The Kubernetes Web UI (Dashboard) is backed by a highly privileged Kubernetes Service Account. The Cloud Console provides much of the same functionality, so you don't need these permissions.
To disable the Kubernetes Web UI:
gcloud container clusters update [CLUSTER_NAME] \ --update-addons=KubernetesDashboard=DISABLED
Leave ABAC disabled (default for 1.10+)
You should disable Attribute-Based Access Control (ABAC), and instead use Role-Based Access Control (RBAC) in GKE.
In Kubernetes, RBAC is used to grant permissions to resources at the cluster and namespace level. RBAC allows you to define roles with rules containing a set of permissions. RBAC has significant security advantages and is now stable in Kubernetes, so it’s time to disable ABAC.
If you're still relying on ABAC, first review the Prerequisites for using RBAC. If you upgraded your cluster from an older version and are using ABAC, you should update your access controls configuration:
gcloud container clusters update [CLUSTER_NAME] \ --no-enable-legacy-authorization
To create a new cluster with the above recommendation:
gcloud container clusters create [CLUSTER_NAME] \ --no-enable-legacy-authorization