Hardening your cluster's security

With the speed of development in Kubernetes, there are often new security features for you to use. This page guides you through implementing our current guidance for hardening your Google Kubernetes Engine (GKE) cluster.

This guide prioritizes high-value security mitigations that require customer action at cluster creation time. Less critical features, secure-by-default settings, and those that can be enabled post-creation time are mentioned later in the document. For a general overview of security topics, read the Security Overview.

Many of these recommendations, as well as other common misconfigurations, can be automatically checked using Security Health Analytics.

Upgrade your GKE infrastructure in a timely fashion (default 2019-11-11)

Keeping the version of Kubernetes up to date is one of the simplest things you can do to improve your security. Kubernetes frequently introduces new security features and provides security patches.

See the GKE security bulletins for information on security patches.

In Google Kubernetes Engine, the masters are patched and upgraded for you automatically. Node auto-upgrade also automatically upgrades nodes in your cluster.

If you choose to disable node auto-upgrade, we recommend upgrading monthly on your own schedule. Older clusters should opt-in to node auto-upgrade and closely follow the GKE security bulletins for critical patches.

To learn more, see Auto-upgrading nodes.

Restrict network access to the control plane and nodes

You should limit exposure of your cluster control plane and nodes to the internet. These settings can only be set at cluster creation time.

By default the GKE cluster control plane and nodes have internet routable addresses that can be accessed from any IP address.

For the GKE cluster control plane, see Creating a private cluster. There are three different flavors of private clusters that can deliver network level protection:

  • Public endpoint access disabled: This is the most secure option as it prevents all internet access to both masters and nodes. This is a good choice if you have configured your on-premises network to connect to Google Cloud using Cloud Interconnect and Cloud VPN. Those technologies effectively connect your company network to your cloud VPC.
  • Public endpoint access enabled, master authorized networks enabled (recommended): This option gives the control plane a public IP address, but installs a customer configurable firewall in front that allows you to restrict which IP addresses can talk to the control plane. This is a good choice if: you don't have existing VPN infrastructure; or have road warriors or branch offices that connect over the public internet instead of the corporate VPN and Cloud Interconnect/Cloud VPN.
  • Public endpoint access enabled, master authorized networks disabled: This is the default and allows anyone on the internet to make network connections to the control plane.

To disable direct internet access to nodes, specify the gcloud tooloption --enable-private-nodes at cluster creation.

This tells GKE to provision nodes with RFC 1918 private IP addresses, which means the nodes aren't directly reachable over the public internet.

We recommend clusters at least use master authorized networks and private nodes. This ensures the control plane is reachable by:

  • The whitelisted CIDRs in master authorized networks.
  • Nodes within your cluster's VPC.
  • Google's internal production jobs that manage your master.

That corresponds to the following gcloud flags at cluster creation time:

  • --enable-ip-alias
  • --enable-private-nodes
  • --enable-master-authorized-networks

Group authentication (Beta)

This setting can only be enabled at cluster creation time.

You should use groups to manage your users. Using groups allows identities to be controlled using your Identity management system and Identity administrators. Adjusting the group membership negates the need to update your RBAC configuration whenever anyone is added or removed from the group.

To manage user permissions using Google Groups, you must enable Google Groups for GKE when creating your cluster. This allows you to manage users with the same permissions easily, while allowing your identity administrators to manage users centrally and consistently.

To enable Google Groups for GKE, create a Google Group, gke-security-groups, to manage user access, and specify the gcloud flag --security-group at cluster creation time.

Container node choices

The following sections describe secure node configuration choices.

Enable shielded GKE nodes (Beta)

Shielded GKE Nodes provide strong, verifiable node identity and integrity to increase the security of GKE nodes and should be enabled on all GKE clusters.

To enable Shielded GKE Nodes, specify the gcloud option --enable-shielded-nodes at cluster creation or update. Shielded GKE Nodes should be enabled with secure boot. Secure boot should not be used if you need third-party unsigned kernel modules. To enable secure boot, specify the gcloud flag --shielded-secure-boot at cluster creation.

Choose a hardened node image with the containerd runtime (Beta)

The Container-Optimized OS with Containerd (cos_containerd) image is a variant of the Container-Optimized OS image with containerd as the main container runtime directly integrated with Kubernetes.

containerd is the core runtime component of Docker and has been designed to deliver core container functionality for the Kubernetes Container Runtime Interface (CRI). It is significantly less complex than the full Docker daemon, and therefore has a smaller attack surface.

To use cos_containerd image in your cluster, specify the gcloud flag --image-type=cos_containerd at cluster creation or upgrade time.

cos_containerd is the preferred image for GKE as it has been custom built, optimized, and hardened specifically for running containers.

Enable Workload Identity (Beta)

Workload Identity is the recommended way to authenticate to Google APIs. It replaces the previous practices of using the node service account or exporting service account keys into secrets as described in Authenticating to Google Cloud Platform with Service Accounts.

Workload Identity also replaces the need to use Metadata Concealment and as such, the two approaches are incompatible. The sensitive metadata protected by Metadata Concealment is also protected by Workload Identity.

Permissions

Use least privilege Google service accounts

Each GKE node has a Cloud Identity and Access Management (Cloud IAM) Service Account associated with it. By default, nodes are given the Compute Engine default service account, which you can find by navigating to the Cloud IAM section of the Cloud Console. This account has broad access by default, making it useful to wide variety of applications, but it has more permissions than are required to run your Kubernetes Engine cluster. You should create and use a minimally privileged service account to run your GKE cluster instead of using the Compute Engine default service account.

With the launch of Workload Identity, we suggest a more limited use case for the node service account. We expect the node service account to be used by system daemons responsible for logging, monitoring and similar tasks. Workloads in Pods should instead be provisioned Google identities with Workload Identity.

GKE requires, at a minimum, the service account to have the monitoring.viewer, monitoring.metricWriter, and logging.logWriter roles. Read more about monitoring roles and logging roles.

The following commands create a Cloud IAM service account with the minimum permissions required to operate GKE:

gcloud

gcloud iam service-accounts create [SA_NAME] \
  --display-name=[SA_NAME]

gcloud projects add-iam-policy-binding [PROJECT_ID] \
  --member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
  --role roles/logging.logWriter

gcloud projects add-iam-policy-binding [PROJECT_ID] \
  --member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
  --role roles/monitoring.metricWriter

gcloud projects add-iam-policy-binding [PROJECT_ID] \
  --member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
  --role roles/monitoring.viewer

Config Connector

Note: This step requires Config Connector. Follow the installation instructions to install Config Connector on your cluster.

  1. To create the service account, download the following resource as service-account.yaml. Replace [SA_NAME] with the name you want to use for the service account.

    apiVersion: iam.cnrm.cloud.google.com/v1alpha1
    kind: IAMServiceAccount
    metadata:
      name: [SA_NAME]
    spec:
      displayName: [SA_NAME]
    Then, run:
    kubectl apply -f service-account.yaml

  2. Apply the logging.logWriter role to the service account. Download the following resource as policy-logging.yaml. Replace [SA_NAME] and [PROJECT_ID] with your own information.

    apiVersion: iam.cnrm.cloud.google.com/v1alpha1
    kind: IAMPolicyMember
    metadata:
      name: policy-logging
    spec:
      member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com
      role: roles/logging.logWriter
      resourceRef:
        kind: Project
        name: [PROJECT_ID]
    kubectl apply -f policy-logging.yaml

  3. Apply the monitoring.metricWriter role. Download the following resource as policy-metrics-writer.yaml. Replace [SA_NAME] and [PROJECT_ID] with your own information.

    apiVersion: iam.cnrm.cloud.google.com/v1alpha1
    kind: IAMPolicyMember
    metadata:
      name: policy-metrics-writer
    spec:
      member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com
      role: roles/monitoring.metricWriter
      resourceRef:
        kind: Project
        name: [PROJECT_ID]
    kubectl apply -f policy-logging.yaml

  4. Apply the monitoring.viewer role. Download the following resource as policy-monitoring.yaml. Replace [SA_NAME] and [PROJECT_ID] with your own information.

    apiVersion: iam.cnrm.cloud.google.com/v1alpha1
    kind: IAMPolicyMember
    metadata:
      name: policy-monitoring
    spec:
      member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com
      role: roles/monitoring.viewer
      resourceRef:
        kind: Project
        name: [PROJECT_ID]
    kubectl apply -f policy-monitoring.yaml

If you use private images in Google Container Registry, you also need to grant access to those:

gcloud

gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member "serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
--role roles/storage.objectViewer

Config Connector

Note: This step requires Config Connector. Follow the installation instructions to install Config Connector on your cluster.

Apply the monitoring.viewer role to your service account. Download the following resource as policy-object-viewer.yaml. Replace [SA_NAME] and [PROJECT_ID] with your own information.

apiVersion: iam.cnrm.cloud.google.com/v1alpha1
kind: IAMPolicyMember
metadata:
  name: policy-object-viewer
spec:
  member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com
  role: roles/storage.objectViewer
  resourceRef:
    kind: Project
    name: [PROJECT_ID]
kubectl apply -f policy-object-viewer.yaml

If you want another human user to be able to create new clusters or node pools with this service account, you must grant them the Service Account User role on this service account:

gcloud

gcloud iam service-accounts add-iam-policy-binding \
[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com \
--member=user:[USER] \
--role=roles/iam.serviceAccountUser

Config Connector

Note: This step requires Config Connector. Follow the installation instructions to install Config Connector on your cluster.

Apply the iam,serviceAccountUser role to your service account. Download the following resource as policy-service-account-user.yaml. Replace [SA_NAME] and [PROJECT_ID] with your own information.

apiVersion: iam.cnrm.cloud.google.com/v1alpha1
kind: IAMPolicyMember
metadata:
  name: policy-service-account-user
spec:
  member: serviceAccount:[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com
  role: roles/iam.serviceAccountUser
  resourceRef:
    kind: Project
    name: [PROJECT_ID]
kubectl apply -f policy-service-account-user.yaml

If your cluster already exists, you can now create a new node pool with this new service account:

gcloud container node-pools create [NODE_POOL] \
  --service-account=[SA_NAME]@[PROJECT_ID].iam.gserviceaccount.com \
  --cluster=[CLUSTER_NAME]

If you need your GKE cluster to have access to other Google Cloud services, you should create an additional service account and grant your workloads access to the service account using Workload Identity.

Restrict cluster discovery RBAC permissions

By default, Kubernetes bootstraps clusters with a permissive set of discovery ClusterRoleBindings which give broad access to information about a cluster's APIs, including those of CustomResourceDefinitions.

Users should be aware that the system:authenticated Group included in the subjects of the system:discovery and system:basic-user ClusterRoleBindings can include any authenticated user (including any user with a Google account), and does not represent a meaningful level of security for clusters on GKE.

Those wishing to harden to their cluster's discovery APIs should consider one or more of the following:

  • Configure Authorized networks to restrict access to set IP ranges.
  • Set up a private cluster to restrict access to a VPC.
  • Curate the subjects of the default system:discovery and system:basic-user ClusterRoleBindings, for example, rather than the Kubernetes default of allowing access to system:(un)authenticated, consider only allowing access to the system:serviceaccounts Group plus other known Users and Groups.

Use Namespaces and RBAC to restrict access to cluster resources

Give teams least-privilege access to Kubernetes by creating separate namespaces or clusters for each team and environment. Assign cost centers and appropriate labels to each namespace for accountability and chargeback. Only give developers the level of access to their namespace that they need to deploy and manage their application, especially in production. Map out the tasks that your users need to undertake against the cluster and define the permissions that they require to do each task.

Cloud IAM and Role-based access control (RBAC) work together, and an entity must have sufficient permissions at either level to work with resources in your cluster.

Assign the appropriate Cloud IAM roles for GKE to groups and users to provide permissions at the project level and use RBAC to grant permissions on a cluster and namespace level. To learn more, see Access control.

For more information, refer to Preparing a Kubernetes Engine Environment for Production.

Restrict traffic among Pods with a Network Policy

By default, all Pods in a cluster can communicate with each other. You should control Pod to Pod communication as needed for your workloads.

Restricting network access to services makes it much more difficult for attackers to move laterally within your cluster, and also offers services some protection against accidental or deliberate denial of service. Two recommended ways to control traffic are:

  1. Use Istio. See Installing Istio on GKE Choose this if you're interested in load balancing, service authorization, throttling, quota, metrics and more.
  2. Use Kubernetes Network Policies. See Setting a Cluster Network Policy. Choose this if you're looking for the basic access control functionality exposed by Kubernetes. The Kubernetes documentation has an excellent walkthrough for a simple nginx deployment.

Istio and Network Policy may be used together if there is a need to do so.

Secret management

You should provide an additional layer of protection for sensitive data, such as secrets, stored in etcd. To do this you need to configure a secrets manager that is integrated with GKE clusters. Some solutions will work both in GKE and in GKE On-Prem, and so may be more desirable if you are running workloads across multiple environments. If you choose to use an external secrets manager such as HashiCorp Vault, you'll want to have that set up before you create your cluster.

You have several options for secret management.

  • You can use Kubernetes secrets natively in GKE. Optionally, you can encrypt these at the application-layer with a key you manage, using Application-layer secrets encryption.
  • You can use a secrets manager such as HashiCorp Vault. When run in a hardened HA mode, this will provide a consistent, production-ready way to manage secrets. You can authenticate to HashiCorp Vault using either a Kubernetes service account or a Google Cloud service account. To learn more about using GKE with Vault, see Running and connecting to HashiCorp Vault on Kubernetes.

GKE VMs are encrypted at the storage layer by default, which includes etcd.

Use admission controllers to enforce policy

Admission controllers are plugins that govern and enforce how the cluster is used. They must be enabled to use some of the more advanced security features of Kubernetes and are an important part of the defence in depth approach to hardening your cluster

By default, Pods in Kubernetes can operate with capabilities beyond what they require. You should constrain the Pod's capabilities to only those required for that workload.

Kubernetes offers controls for restricting your Pods to execute with only explicitly granted capabilities. Pod Security Policy allows you to set smart defaults for your Pods, and enforce controls you want to enable across your fleet. The policies you define should be specific to the needs of your application. The restricted-psp.yaml example policy is a good starting point.

To learn more about Pod Security Policy, see Using PodSecurityPolicies.

If you are using a NetworkPolicy, and you have a Pod that is subject to a PodSecurityPolicy, create an RBAC Role or ClusterRole that has permission to use the PodSecurityPolicy. Then bind the Role or ClusterRole to the Pod's service account. Granting permissions to user accounts is not sufficient in this case. For more information, see Authorizing policies.

Monitor your cluster configuration

You should audit your cluster configurations for deviations from your defined settings.

Many of the recommendations covered in this hardening guide, as well as other common misconfigurations, can be automatically checked using Security Health Analytics.

Secure defaults

The following sections describe options that are securely configured by default in new clusters. You should verify that preexisting clusters are configured securely.

Protect node metadata (default for 1.12+)

Compute Engine's instance metadata server exposes legacy /0.1/ and /v1beta1/ endpoints, which do not enforce metadata query headers. These APIs have been disabled by default for new 1.12+ clusters. If you have upgraded clusters from older versions you should disable these legacy APIs manually.

Some practical attacks against Kubernetes rely on access to the VM's metadata server to extract credentials. These attacks are blocked if you are using Workload identity or Metadata Concealment.

Compute Engine v1beta1 and v0.1 metadata server endpoints are deprecated and scheduled for shutdown. Ensure that you update all requests to use the v1 endpoint.

To learn more, see Protecting Cluster Metadata.

Leave legacy client authentication methods disabled (default 1.12+)

There are several methods of authenticating to the Kubernetes API server.

In GKE, the supported methods are service account bearer tokens, OAuth tokens, x509 client certificates, and static passwords. GKE manages authentication with gcloud for you using the OAuth token method, setting up the Kubernetes configuration, getting an access token, and keeping it up to date.

Prior to GKE's integration with Google OAuth, the pre-provisioned x509 certificate or static password were the only available authentication methods, but are now not recommended and disabled by default on new clusters since 1.12+.

Existing clusters should move to OAuth. If a long-lived credential is needed by a system external to the cluster we recommend you create a Google service account or a Kubernetes service account with the necessary privileges and export the key.

To update an existing cluster and remove the static password:

gcloud container clusters update [CLUSTER_NAME] \
  --no-enable-basic-auth

Currently, there is no way to remove the pre-issued client certificate from an existing cluster, but it has no permissions if RBAC is enabled and ABAC is disabled.

Leave Stackdriver logging enabled (default)

To reduce operational overhead and to maintain a consolidated view of your logs, implement a logging strategy that is consistent wherever your clusters are deployed. Anthos clusters are integrated with Stackdriver by default and that should remain configured.

All GKE clusters have Kubernetes audit logging enabled by default, which keeps a chronological record of calls that have been made to the Kubernetes API server. Kubernetes audit log entries are useful for investigating suspicious API requests, for collecting statistics, or for creating monitoring alerts for unwanted API calls.

GKE clusters integrate Kubernetes Audit Logging with Cloud Audit Logs and Stackdriver Logging. Logs can be exported from Stackdriver to your own logging systems if desired.

Leave the Kubernetes web UI (Dashboard) disabled (default for 1.10+)

You should not enable the Kubernetes web UI (Dashboard) when running on GKE.

The Kubernetes Web UI (Dashboard) is backed by a highly privileged Kubernetes Service Account. The Cloud Console provides much of the same functionality, so you don't need these permissions.

To disable the Kubernetes Web UI:

gcloud container clusters update [CLUSTER_NAME] \
    --update-addons=KubernetesDashboard=DISABLED

Leave ABAC disabled (default for 1.10+)

You should disable Attribute-Based Access Control (ABAC), and instead use Role-Based Access Control (RBAC) in GKE.

In Kubernetes, RBAC is used to grant permissions to resources at the cluster and namespace level. RBAC allows you to define roles with rules containing a set of permissions. RBAC has significant security advantages and is now stable in Kubernetes, so it’s time to disable ABAC.

If you're still relying on ABAC, first review the Prerequisites for using RBAC. If you upgraded your cluster from an older version and are using ABAC, you should update your access controls configuration:

gcloud container clusters update [CLUSTER_NAME] \
    --no-enable-legacy-authorization

To create a new cluster with the above recommendation:

gcloud container clusters create [CLUSTER_NAME] \
    --no-enable-legacy-authorization

What's next

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Kubernetes Engine Documentation