Harden workload isolation with GKE Sandbox

Autopilot Standard

This page describes how to use GKE Sandbox to protect the host kernel on your nodes when containers in the Pod execute unknown or untrusted code, or need extra isolation from the node. This page explains how you can enable GKE Sandbox and monitor your clusters when GKE Sandbox is running.

This page is for Security specialists who must isolate their workloads for additional protection from unknown or untrusted code. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Before reading this page, ensure that you're familiar with the general overview of GKE Sandbox.

Enable GKE Sandbox

GKE Sandbox is ready to use in Autopilot clusters running GKE version 1.27.4-gke.800 and later. To start deploying Autopilot workloads in a sandbox, skip to Working with GKE Sandbox.

To use GKE Sandbox in new or existing GKE Standard clusters, you must manually enable GKE Sandbox on the cluster.

Hardware accelerated workloads in GKE Sandbox are generally available in the following versions:

1.29.15-gke.1134000
1.30.11-gke.1093000
1.31.7-gke.1149000
1.32.2-gke.1182003 and later

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Enable GKE Sandbox on a new Standard cluster

The default node pool, which is created when you create a new cluster, can't use GKE Sandbox if it's the only node pool in the cluster, because GKE-managed system workloads must run separately from untrusted sandboxed workloads. To enable GKE Sandbox during cluster creation, you must add at least one extra node pool to the cluster.

Console

To view your clusters, visit the Google Kubernetes Engine menu in the Google Cloud console.

In the Google Cloud console, go to the Create a Kubernetes cluster page.
Go to Create a Kubernetes cluster
Optional but recommended: From the navigation menu, under Cluster, click Features and select the following check boxes so that gVisor messages are logged:
- Cloud Logging
- Cloud Monitoring
- Managed Service for Prometheus
Click Add Node Pool.
From the navigation menu, under Node Pools, expand the new node pool and click Nodes.
Configure the following settings for the node pool:
1. From the Image type drop-down list, select Container-Optimized OS with Containerd (cos_containerd). This is the only supported image type for GKE Sandbox.
2. Under Machine Configuration, select a Series and Machine type.
3. Optionally, if you're running a supported GKE version, select a GPU or TPU type. This must be one of the following GPU types:
  - nvidia-tesla-t4
  - nvidia-tesla-a100
  - nvidia-a100-80gb
  - nvidia-l4
  - nvidia-h100-80gb
  or the following TPU types:
  - v4
  - v5e
  - v5p
  - v6e
From the navigation menu, under the name of the node pool you are configuring, click Security and select the Enable sandbox with gVisor checkbox.
Continue to configure the cluster and node pools as needed.
Click Create.

gcloud

GKE Sandbox can't be enabled for the default node pool, and it isn't possible to create additional node pools at the same time as you create a new cluster using the gcloud command. Instead, create your cluster as you normally would. Although optional, it's recommended that you enable Logging and Monitoring so that gVisor messages are logged.

Next, use the gcloud container node-pools create command, and set the -- sandbox flag to type=gvisor. The node image type must be cos_containerd for GKE Sandbox.

gcloud container node-pools create NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --node-version=NODE_VERSION \
  --machine-type=MACHINE_TYPE \
  --image-type=cos_containerd \
  --sandbox type=gvisor

Replace the following variables:

NODE_POOL_NAME: the name of your new node pool.
CLUSTER_NAME: the name of your cluster.
NODE_VERSION: the version to use for the node pool.
MACHINE_TYPE: the type of machine to use for the nodes.

To create a GPU node pool with GKE Sandbox, run the following command:

gcloud container node-pools create NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --node-version=NODE_VERSION \
  --machine-type=MACHINE_TYPE \
  --accelerator=type=GPU_TYPE,gpu-driver-version=DRIVER_VERSION \
  --image-type=cos_containerd \
  --sandbox type=gvisor

Replace the following:

GPU_TYPE: a supported GPU type. For details, see GKE Sandbox.
MACHINE_TYPE: a machine that matches the requested GPU type. For details, see Google Kubernetes Engine GPU Requirements.
DRIVER_VERSION: the NVIDIA driver version to install. Can be one of the following:
- default: Install the default driver version for your GKE version.
- latest: Install the latest available driver version for your GKE version. Available only for nodes that use Container-Optimized OS.

To create a TPU node pool with GKE Sandbox, run the following command:

gcloud container node-pools create NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --node-version=NODE_VERSION \
  --num-nodes=NUM_NODES \
  --tpu-topology=TPU_TOPOLOGY \
  --machine-type=MACHINE_TYPE \
  --image-type=cos_containerd \
  --sandbox type=gvisor

MACHINE_TYPE: a supported TPU type. For details, see GKE Sandbox.

Enable GKE Sandbox on an existing Standard cluster

You can enable GKE Sandbox on an existing Standard cluster by adding a new node pool and enabling the feature for that node pool.

Console

To create a new node pool with GKE Sandbox enabled:

Go to the Google Kubernetes Engine page in the Google Cloud console.

Go to Google Kubernetes Engine
Click the name of the cluster you want to modify.
Click Add Node Pool.
Configure the Node pool details page as selected.
From the navigation menu, click Nodes and configure the following settings:
1. From the Image type drop-down list, select Container-Optimized OS with Containerd (cos_containerd). This is the only supported image type for GKE Sandbox.
2. Under Machine Configuration, select a Series and Machine type.
3. Optionally, if you're running a supported GKE version, select a GPU or TPU type. This must be one of the following GPU types:
  - nvidia-tesla-t4
  - nvidia-tesla-a100
  - nvidia-a100-80gb
  - nvidia-l4
  - nvidia-h100-80gb
  or the following TPU types:
  - v4
  - v5e
  - v5p
  - v6e
From the navigation menu, click Security and select the Enable sandbox with gVisor checkbox.
Click Create.

gcloud

To create a new node pool with GKE Sandbox enabled, use a command like the following:

gcloud container node-pools create NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --machine-type=MACHINE_TYPE \
  --image-type=cos_containerd \
  --sandbox type=gvisor

The node image type must be cos_containerd for GKE Sandbox.

To create a GPU node pool with GKE Sandbox, run the following command:

gcloud container node-pools create NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --node-version=NODE_VERSION \
  --machine-type=MACHINE_TYPE \
  --accelerator=type=GPU_TYPE,gpu-driver-version=DRIVER_VERSION \
  --image-type=cos_containerd \
  --sandbox type=gvisor

Replace the following:

GPU_TYPE: a supported GPU type. For details, see GKE Sandbox.
MACHINE_TYPE: a machine that matches the requested GPU type. For details, see Google Kubernetes Engine GPU Requirements.
DRIVER_VERSION: the NVIDIA driver version to install. Can be one of the following:
- default: Install the default driver version for your GKE version.
- latest: Install the latest available driver version for your GKE version. Available only for nodes that use Container-Optimized OS.

To create a TPU node pool with GKE Sandbox, run the following command:

gcloud container node-pools create NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --node-version=NODE_VERSION \
  --num-nodes=NUM_NODES \
  --tpu-topology=TPU_TOPOLOGY \
  --machine-type=MACHINE_TYPE \
  --image-type=cos_containerd \
  --sandbox type=gvisor

MACHINE_TYPE: a supported TPU type. For details, see GKE Sandbox.

Optional: Enable monitoring and logging

It is optional but recommended that you enable Cloud Logging and Cloud Monitoring on the cluster, so that gVisor messages are logged. These services are enabled by default for new clusters.

You can use the Google Cloud console to enable these features on an existing cluster.

Go to the Google Kubernetes Engine page in the Google Cloud console.

Go to Google Kubernetes Engine
Click the name of the cluster you want to modify.
Under Features, in the Cloud Logging field, click Edit Cloud Logging.
Select the Enable Cloud Logging checkbox.
Click Save Changes.
Repeat the same steps for the Cloud Monitoring and Managed Service for Prometheus fields to enable those features.

Use GKE Sandbox in Autopilot and Standard

In Autopilot clusters and in Standard clusters with GKE Sandbox enabled, you request a sandboxed environment for a Pod by specifying the gvisor RuntimeClass in the Pod specification.

For Autopilot clusters, ensure that you're running GKE version 1.27.4-gke.800 or later.

Run an application in a sandbox

To make a Deployment run on a node with GKE Sandbox enabled, set its spec.template.spec.runtimeClassName to gvisor, as shown in the following example:

# httpd.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd
  labels:
    app: httpd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      runtimeClassName: gvisor
      containers:
      - name: httpd
        image: httpd

Create the Deployment:

kubectl apply -f httpd.yaml

The Pod is deployed to a node with GKE Sandbox enabled. To verify the deployment, find the node where the Pod is deployed:

kubectl get pods

The output is similar to the following:

NAME                    READY   STATUS    RESTARTS   AGE
httpd-db5899bc9-dk7lk   1/1     Running   0          24s

From the output, find the name of the Pod in the output, and then check the value for RuntimeClass:

kubectl get pods POD_NAME -o jsonpath='{.spec.runtimeClassName}'

The output is gvisor.

Alternatively, you can list the RuntimeClass of each Pod, and look for the Pods where it is set to gvisor:

kubectl get pods -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.runtimeClassName}\n{end}'

The output is the following:

POD_NAME: gvisor

This method of verifying that the Pod is running in a sandbox is trustworthy because it does not rely on any data within the sandbox itself. Anything reported from within the sandbox is untrustworthy, because it could be defective or malicious.

Run a Pod with accelerators on GKE Sandbox

To run a GPU or TPU workload on GKE Sandbox, add the runtimeClassName: gvisor field to your manifest like the following examples:

Example manifest for Standard mode GPU Pods:

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  runtimeClassName: gvisor
  containers:
  - name: my-gpu-container
    image: nvidia/samples:vectoradd-cuda10.2
    resources:
      limits:
        nvidia.com/gpu: 1

Example manifest for Autopilot mode GPU Pods:

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  runtimeClassName: gvisor
  nodeSelector:
    cloud.google.com/gke-gpu-driver-version: "latest"
    cloud.google.com/gke-accelerator: nvidia-tesla-t4
  containers:
  - name: my-gpu-container
    image: nvidia/samples:vectoradd-cuda10.2
    resources:
      limits:
        nvidia.com/gpu: 1

Example manifest for Standard or Autopilot mode TPU Pods:

apiVersion: v1
kind: Pod
metadata:
  name: my-tpu-pod
spec:
  runtimeClassName: gvisor
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
    cloud.google.com/gke-tpu-topology: 1x1
  containers:
  - name: my-tpu-container
    image: python:3.10
    command:
      - bash
      - -c
      - |
        pip install 'jax[tpu]' -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
        python -c 'import jax; print("TPU cores:", jax.device_count())'
    resources:
      limits:
        google.com/tpu: 1
      requests:
        google.com/tpu: 1

You can run any Autopilot or Standard mode accelerator Pods that meet the version and accelerator type requirements on GKE Sandbox by adding the runtimeClassName: gvisor field to the manifest. To run GPU Pods in GKE, see the following:

To run TPU Pods in GKE, see the following:

Run a regular Pod along with sandboxed Pods

The steps in this section apply to Standard mode workloads. You don't need to run regular Pods alongside sandbox Pods in Autopilot mode, because the Autopilot pricing model eliminates the need to manually optimize the number of Pods scheduled on nodes.

After enabling GKE Sandbox on a node pool, you can run trusted applications on those nodes without using a sandbox by using node taints and tolerations. These Pods are referred to as "regular Pods" to distinguish them from sandboxed Pods.

Regular Pods, just like sandboxed Pods, are prevented from accessing other Google Cloud services or cluster metadata. This prevention is part of the node's configuration. If your regular Pods or sandboxed Pods require access to Google Cloud services, use Workload Identity Federation for GKE.

GKE Sandbox adds the following label and taint to nodes that can run sandboxed Pods:

labels:
  sandbox.gke.io/runtime: gvisor

taints:
- effect: NoSchedule
  key: sandbox.gke.io/runtime
  value: gvisor

In addition to any node affinity and toleration settings in your Pod manifest, GKE Sandbox applies the following node affinity and toleration to all Pods with RuntimeClass set to gvisor:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: sandbox.gke.io/runtime
          operator: In
          values:
          - gvisor

tolerations:
  - effect: NoSchedule
    key: sandbox.gke.io/runtime
    operator: Equal
    value: gvisor

To schedule a regular Pod on a node with GKE Sandbox enabled, manually apply the node affinity and toleration described earlier in your Pod manifest.

If your pod can run on nodes with GKE Sandbox enabled, add the toleration.
If your pod must run on nodes with GKE Sandbox enabled, add both the node affinity and toleration.

For example, the following manifest modifies the manifest used in Running an application in a sandbox so that it runs as a regular Pod on a node with sandboxed Pods, by removing the runtimeClass and adding both the taint and toleration described earlier.

# httpd-no-sandbox.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd-no-sandbox
  labels:
    app: httpd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: httpd
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: sandbox.gke.io/runtime
                operator: In
                values:
                - gvisor
      tolerations:
        - effect: NoSchedule
          key: sandbox.gke.io/runtime
          operator: Equal
          value: gvisor

First, verify that the Deployment is not running in a sandbox:

kubectl get pods -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.runtimeClassName}\n{end}'

The output is similar to:

httpd-db5899bc9-dk7lk: gvisor
httpd-no-sandbox-5bf87996c6-cfmmd:

The httpd Deployment created earlier is running in a sandbox, because its runtimeClass is gvisor. The httpd-no-sandbox Deployment has no value for runtimeClass, so it is not running in a sandbox.

Next, verify that the non-sandboxed Deployment is running on a node with GKE Sandbox by running the following command:

kubectl get pod -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.nodeName}\n{end}'

The name of the node pool is embedded in the value of nodeName. Verify that the Pod is running on a node in a node pool with GKE Sandbox enabled.

Verify metadata protection

To validate the assertion that metadata is protected from nodes that can run sandboxed Pods, you can run a test:

Create a sandboxed Deployment from the following manifest, using kubectl apply -f. It uses the fedora image, which includes the curl command. The Pod runs the /bin/sleep command to ensure that the Deployment runs for 10000 seconds.

# sandbox-metadata-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora
  labels:
    app: fedora
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora
  template:
    metadata:
      labels:
        app: fedora
    spec:
      runtimeClassName: gvisor
      containers:
      - name: fedora
        image: fedora
        command: ["/bin/sleep","10000"]

Get the name of the Pod using kubectl get pods, then use kubectl exec to connect to the Pod interactively.
```
kubectl exec -it POD_NAME /bin/sh
```
You are connected to a container running in the Pod, in a /bin/sh session.
Within the interactive session, attempt to access a URL that returns cluster metadata:
```
curl -s "http://169.254.169.254/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"
```
The command hangs and eventually times out, because the packets are silently dropped.
Press Ctrl+C to terminate the curl command, and type exit to disconnect from the Pod.
Remove the RuntimeClass line from the YAML manifest and redeploy the Pod using kubectl apply -f FILENAME. The sandboxed Pod is terminated and recreated on a node without GKE Sandbox.
Get the new Pod name, connect to it using kubectl exec, and run the curl command again. This time, results are returned. This example output is truncated.
```
ALLOCATE_NODE_CIDRS: "true"
API_SERVER_TEST_LOG_LEVEL: --v=3
AUTOSCALER_ENV_VARS: kube_reserved=cpu=60m,memory=960Mi,ephemeral-storage=41Gi;...
...
```
Type exit to disconnect from the Pod.
Remove the deployment:
```
kubectl delete deployment fedora
```

Disable GKE Sandbox

You can't disable GKE Sandbox in GKE Autopilot clusters or in GKE Standard node pools. If you want to stop using GKE Sandbox, delete the node pool.

What's next

Learn more about managing node pools.
Read the security overview.

Harden workload isolation with GKE Sandbox Stay organized with collections Save and categorize content based on your preferences.

Enable GKE Sandbox

Before you begin

Enable GKE Sandbox on a new Standard cluster

Console

gcloud

Enable GKE Sandbox on an existing Standard cluster

Console

gcloud

Optional: Enable monitoring and logging

Use GKE Sandbox in Autopilot and Standard

Run an application in a sandbox

Run a Pod with accelerators on GKE Sandbox

Run a regular Pod along with sandboxed Pods

Verify metadata protection

Disable GKE Sandbox

What's next

Harden workload isolation with GKE Sandbox