Isolate your workloads in dedicated node pools

Standard

This page shows you how to reduce the risk of privilege escalation attacks in your cluster by telling Google Kubernetes Engine (GKE) to schedule your workloads on a separate, dedicated node pool away from privileged GKE-managed workloads. You should use this approach only if you can't use GKE Sandbox. GKE Sandbox is the recommended approach for node isolation. GKE Sandbox also provides other hardening benefits for your workloads.

This page is for Security specialists who require a layer of isolation on workloads but can't use GKE Sandbox. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

This page applies to Standard clusters without node auto-provisioning. To separate workloads in Autopilot clusters and in Standard clusters with node auto-provisioning enabled, refer to Configure workload separation in GKE.

Overview

GKE clusters use privileged GKE-managed workloads to enable specific cluster functionality and features, such as metrics gathering. These workloads are given special permissions to run correctly in the cluster.

Workloads that you deploy to your nodes might have the potential to be compromised by a malicious entity. Running these workloads alongside privileged GKE-managed workloads means that an attacker who breaks out of a compromised container can use the credentials of the privileged workload on the node to escalate privileges in your cluster.

Prevent container breakouts

Your primary defense should be your applications. GKE has multiple features that you can use to harden your clusters and Pods. In most cases, we strongly recommend using GKE Sandbox to isolate your workloads. GKE Sandbox is based on the gVisor open source project, and implements the Linux kernel API in the userspace. Each Pod runs on a dedicated kernel that sandboxes applications to prevent access to privileged system calls in the host kernel. Workloads running in GKE Sandbox are automatically scheduled on separate nodes, isolated from other workloads.

You should also follow the recommendations in Harden your cluster's security.

Avoid privilege escalation attacks

If you can't use GKE Sandbox, and you want an extra layer of isolation in addition to other hardening measures, you can use node taints and node affinity to schedule your workloads on a dedicated node pool. A node taint tells GKE to avoid scheduling workloads without a corresponding toleration (such as GKE-managed workloads) on those nodes. The node affinity on your own workloads tells GKE to schedule your Pods on the dedicated nodes.

Limitations of node isolation

Attackers can still initiate Denial-of-Service (DoS) attacks from the compromised node.
Compromised nodes can still read many resources, including all Pods and namespaces in the cluster.
Compromised nodes can access Secrets and credentials used by every Pod running on that node.
Using a separate node pool to isolate your workloads can impact your cost efficiency, autoscaling, and resource utilization.
Compromised nodes can still bypass egress network policies.
Some GKE-managed workloads must run on every node in your cluster, and are configured to tolerate all taints.
If you deploy DaemonSets that have elevated permissions and can tolerate any taint, those Pods may be a pathway for privilege escalation from a compromised node.

How node isolation works

To implement node isolation for your workloads, you must do the following:

Taint and label a node pool for your workloads.
Update your workloads with the corresponding toleration and node affinity rule.

This guide assumes that you start with one node pool in your cluster. Using node affinity in addition to node taints isn't mandatory, but we recommend it because you benefit from greater control over scheduling.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Choose a specific name for the node taint and the node label that you want to use for the dedicated node pools.

Taint and label a node pool for your workloads

Best practice: to prevent the kubelet from modifying node labels that you use for workload isolation, prefix your label keys with node-restriction.kubernetes.io/.

Create a new node pool for your workloads and apply a node taint and a node label. When you apply a taint or a label at the node pool level, any new nodes, such as those created by autoscaling, will automatically get the specified taints and labels.

You can also add node taints and node labels to existing node pools. If you use the NoExecute effect, GKE evicts any Pods running on those nodes that don't have a toleration for the new taint.

For workload isolation, always use the node-restriction.kubernetes.io/ prefix for your node labels and for the corresponding selectors in your Pod manifests. This prefix prevents an attacker from using the node's credential to set or modify the labels that use this prefix. For more information, see Node isolation/restriction in the Kubernetes documentation.

To add a taint and a label to a new node pool, run the following command:

gcloud container node-pools create POOL_NAME \
    --cluster=CLUSTER_NAME \
    --node-taints=TAINT_KEY=TAINT_VALUE:TAINT_EFFECT \
    --node-labels=node-restriction.kubernetes.io/LABEL_KEY=LABEL_VALUE

Replace the following:

POOL_NAME: the name of the new node pool for your workloads.
CLUSTER_NAME: the name of your GKE cluster.
TAINT_KEY=TAINT_VALUE: a key-value pair associated with a scheduling TAINT_EFFECT. For example, workloadType=untrusted.
TAINT_EFFECT: one of the following effect values: NoSchedule, PreferNoSchedule, or NoExecute. NoExecute provides a better eviction guarantee than NoSchedule.
node-restriction.kubernetes.io/LABEL_KEY=LABEL_VALUE: key-value pairs for the node labels, which correspond to the selectors that you specify in your workload manifests. The node-restriction.kubernetes.io/ prefix prevents the node credentials from being used to set these key-value pairs on nodes.

Add a toleration and a node affinity rule to your workloads

After you taint the dedicated node pool, no workloads can schedule on it unless they have a toleration corresponding to the taint you added. Add the toleration to the specification for your workloads to let those Pods schedule on your tainted node pool.

If you labelled the dedicated node pool, you can also add a node affinity rule to tell GKE to only schedule your workloads on that node pool.

The following example adds a toleration for the workloadType=untrusted:NoExecute taint and a node affinity rule for the workloadType=untrusted node label.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: my-app
  namespace: default
  labels:
    app: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      tolerations:
      - key: TAINT_KEY
        operator: Equal
        value: TAINT_VALUE
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-restriction.kubernetes.io/LABEL_KEY
                operator: In
                values:
                - "LABEL_VALUE"
      containers:
      - name: sleep
        image: ubuntu
        command: ["/bin/sleep", "inf"]