Isolate workloads in dedicated node pools
This page shows you how to reduce the risk of privilege escalation attacks in your cluster by configuring GKE on Azure to schedule your workloads on a separate, dedicated node pool away from privileged managed workloads.
Overview
GKE on Azure clusters use privileged workloads that we manage to enable specific cluster functionality and features, such as metrics gathering. These workloads are given special permissions to run correctly in the cluster.
Workloads that you deploy to your nodes might have the potential to be compromised by a malicious entity. Running these workloads alongside privileged system workloads means that an attacker who breaks out of a compromised container can use the credentials of the privileged workload on the node to escalate privileges in your cluster.
Preventing container breakouts
Your primary defense should be your applications. GKE on Azure has multiple features that you can use to harden your clusters and Pods. In most cases, we strongly recommend using Policy Controller and kernel security features to harden your workloads. For more security recommendations, see the Security overview.
Avoiding privilege escalation attacks
If you want an extra layer of isolation in addition to other hardening measures, you can use node taints and node affinity to schedule your workloads on a dedicated node pool.
A node taint tells GKE on Azure to avoid scheduling workloads without a corresponding toleration (such as GKE on Azure-managed workloads) on those nodes. The node affinity on your own workloads tells GKE on Azure to schedule your Pods on the dedicated nodes.
Limitations of node isolation
- Attackers can still initiate Denial-of-Service (DoS) attacks from the compromised node.
- Compromised nodes can still read many resources, including all Pods and namespaces in the cluster.
- Compromised nodes can access Secrets and credentials used by every Pod running on that node.
- Using a separate node pool to isolate your workloads can affect your cost efficiency, autoscaling, and resource utilization.
- Compromised nodes can still bypass egress network policies.
- Some GKE on Azure-managed workloads must run on every node in your cluster, and are configured to tolerate all taints.
- If you deploy DaemonSets that have elevated permissions and can tolerate any taint, those Pods might be a pathway for privilege escalation from a compromised node.
How node isolation works
To implement node isolation for your workloads, you must do the following:
- Taint and label a node pool for your workloads.
- Update your workloads with the corresponding toleration and node affinity rule.
This guide assumes that you start with one node pool in your cluster. Using node affinity in addition to node taints isn't mandatory, but we recommend it because you benefit from greater control over scheduling.
Before you begin
To perform the steps on this page, first complete the following:
- Create a cluster.
- Create a node pool.
Choose a name for the node taint and the node label that you want to use for the dedicated node pools. For this example, we use
workloadType=untrusted
.
Taint and label a node pool for your workloads
Create a new node pool for your workloads and apply a node taint and a node label. When you apply a taint or a label at the node pool level, any new nodes, such as those created by autoscaling, will automatically get the specified taints and labels.
You can also add node taints and node labels to existing node pools. If you use
the NoExecute
effect, GKE on Azure evicts any Pods running on those
nodes that don't have a toleration for the new taint.
To add a taint and a label to a new node pool, run the following command:
gcloud container azure node-pools create POOL_NAME \
--cluster CLUSTER_NAME \
--node-taints TAINT_KEY=TAINT_VALUE:TAINT_EFFECT \
--node-labels LABEL_KEY=LABEL_VALUE
Replace the following:
POOL_NAME
: the name of the new node pool for your workloads.CLUSTER_NAME
: the name of your GKE on Azure cluster.TAINT_KEY=TAINT_VALUE
: a key-value pair associated with a schedulingTAINT_EFFECT
. For example,workloadType=untrusted
.TAINT_EFFECT
: one of the following effect values:NoSchedule
,PreferNoSchedule
, orNoExecute
.NoExecute
provides a better eviction guarantee thanNoSchedule
.LABEL_KEY
=LABEL_VALUE
: key-value pairs for the node labels, which correspond to the selectors that you specify in your workload manifests.
Add a toleration and a node affinity rule to your workloads
After you taint the dedicated node pool, no workloads can schedule on it unless they have a toleration corresponding to the taint you added. Add the toleration to the specification for your workloads to let those Pods schedule on your tainted node pool.
If you labelled the dedicated node pool, you can also add a node affinity rule to tell GKE on Azure to only schedule your workloads on that node pool.
The following example adds a toleration for the
workloadType=untrusted:NoExecute
taint and a node affinity rule for the
workloadType=untrusted
node label.
kind: Deployment
apiVersion: apps/v1
metadata:
name: my-app
namespace: default
labels:
app: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
tolerations:
- key: TAINT_KEY
operator: Equal
value: TAINT_VALUE
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: LABEL_KEY
operator: In
values:
- "LABEL_VALUE"
containers:
- name: sleep
image: ubuntu
command: ["/bin/sleep", "inf"]
Replace the following:
TAINT_KEY
: the taint key that you applied to your dedicated node pool.TAINT_VALUE
: the taint value that you applied to your dedicated node pool.LABEL_KEY
: the node label key that you applied to your dedicated node pool.LABEL_VALUE
: the node label value that you applied to your dedicated node pool.
When you update your Deployment with kubectl apply
, GKE on Azure
recreates the affected Pods. The node affinity rule forces the Pods onto the
dedicated node pool that you created. The toleration allows only those Pods to be
placed on the nodes.
Verify that the separation works
To verify that the scheduling works correctly, run the following command and check whether your workloads are on the dedicated node pool:
kubectl get pods -o=wide
Recommendations and best practices
After setting up node isolation, we recommend that you do the following:
- Restrict specific node pools to GKE on Azure-managed workloads only by
adding the
components.gke.io/gke-managed-components
taint. Adding this taint prevents your own Pods from scheduling on those nodes, improving the isolation. - When creating new node pools, prevent most GKE on Azure-managed workloads from running on those nodes by adding your own taint to those node pools.
- Whenever you deploy new workloads to your cluster, such as when installing third-party tooling, audit the permissions that the Pods require. When possible, avoid deploying workloads that use elevated permissions to shared nodes.