Harden workload isolation with GKE Sandbox


This page describes how to use GKE Sandbox to protect the host kernel on your nodes when containers in the Pod execute unknown or untrusted code, or need extra isolation from the node.

Enabling GKE Sandbox

You can enable GKE Sandbox on a new cluster or an existing cluster.

Before you begin

Before you start, make sure you have performed the following tasks:

Set up default gcloud settings using one of the following methods:

  • Using gcloud init, if you want to be walked through setting defaults.
  • Using gcloud config, to individually set your project ID, zone, and region.

Using gcloud init

If you receive the error One of [--zone, --region] must be supplied: Please specify location, complete this section.

  1. Run gcloud init and follow the directions:

    gcloud init

    If you are using SSH on a remote server, use the --console-only flag to prevent the command from launching a browser:

    gcloud init --console-only
  2. Follow the instructions to authorize gcloud to use your Google Cloud account.
  3. Create a new configuration or select an existing one.
  4. Choose a Google Cloud project.
  5. Choose a default Compute Engine zone for zonal clusters or a region for regional or Autopilot clusters.

Using gcloud config

  • Set your default project ID:
    gcloud config set project PROJECT_ID
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone COMPUTE_ZONE
  • If you are working with Autopilot or regional clusters, set your default compute region:
    gcloud config set compute/region COMPUTE_REGION
  • Update gcloud to the latest version:
    gcloud components update
  • GKE Sandbox requires GKE version 1.13.5-gke.15 or later, for the cluster control plane and nodes.
  • Ensure that the gcloud command is version 243.0.0 or later.

On a new cluster

To enable GKE Sandbox, you configure a node pool. The default node pool (the first node pool in your cluster, created when the cluster is created) cannot use GKE Sandbox. To enable GKE Sandbox during cluster creation, you must add a second node pool when you create the cluster.

Console

To view your clusters, visit the Google Kubernetes Engine menu in Cloud Console.

  1. Go to the Google Kubernetes Engine page in Cloud Console.

    Go to Google Kubernetes Engine

  2. Click Create.

  3. Optional but recommended: From the navigation pane, under Cluster, click Features and enable Cloud Operations for GKE, so that gVisor messages are logged.

  4. Click Add Node Pool.

  5. From the navigation pane, under Node Pools, expand the new node pool and click Nodes.

  6. Configure the following settings for the node pool:

    1. From the Image type drop-down list, select Container-Optimized OS with Containerd (cos_containerd).
    2. Under Machine Configuration, select a Series and Machine type.

  7. From the navigation pane, under the name of the node pool you are configuring, click Security and select the Enable sandbox with gVisor checkbox.

  8. Continue to configure the cluster and node pools as desired.

  9. Click Create.

gcloud

GKE Sandbox can't be enabled for the default node pool, and it isn't possible to create additional node pools at the same time as you create a new cluster using the gcloud command. Instead, create your cluster as you normally would. It is optional but recommended that you enable Stackdriver Logging and Stackdriver Monitoring, by adding the flag --enable-stackdriver-kubernetes. gVisor messages are logged.

Next, use the gcloud container node-pools create command, and set the --sandbox flag to type=gvisor. Replace values in square brackets with your own.

gcloud container node-pools create node-pool-name \
  --cluster=cluster-name \
  --node-version=node-version \
  --machine-type=machine-type \
  --image-type=cos_containerd \
  --sandbox type=gvisor \

Prior to 1.18.4-gke.1300, the gvisor RuntimeClass is instantiated during node creation. Before any workloads are scheduled onto the node, check for the existence of the gvisor RuntimeClass using the following command:

kubectl get runtimeclasses
NAME     AGE
gvisor   19s

If you are running a version earlier than 1.17.9-gke.1500, or a 1.18 version earlier than 1.18.6-gke.600, you must also wait for gvisor.config.common-webhooks.networking.gke.io to be instantiated. To check, use the following command:

kubectl get mutatingwebhookconfiguration gvisor.config.common-webhooks.networking.gke.io
NAME                                              CREATED AT
gvisor.config.common-webhooks.networking.gke.io   2020-04-06T17:07:17Z

On an existing cluster

You can enable GKE Sandbox on an existing cluster by adding a new node pool and enabling the feature for that node pool.

Console

To create a new node pool with GKE Sandbox enabled:

  1. Go to the Google Kubernetes Engine page in Cloud Console.

    Go to Google Kubernetes Engine

  2. Click the name of the cluster you want to modify.

  3. Click Add Node Pool.

  4. Configure the Node pool details page as desired.

  5. From the navigation pane, click Nodes and configure the following settings:

    1. From the Image type drop-down list, select Container-Optimized OS with Containerd (cos_containerd).
    2. Under Machine Configuration, select a Series and Machine type.

  6. From the navigation pane, click Security and select the Enable sandbox with gVisor checkbox.

  7. Click Create.

gcloud

To create a new node pool with GKE Sandbox enabled, use a command like the following:

gcloud container node-pools create node-pool-name \
  --cluster=cluster-name \
  --machine-type=machine-type \
  --image-type=cos_containerd \
  --sandbox type=gvisor

Prior to 1.18.4-gke.1300, the gvisor RuntimeClass is instantiated during node creation. Before any workloads are scheduled onto the node, check for the existence of the gvisor RuntimeClass using the following command:

kubectl get runtimeclasses
NAME     AGE
gvisor   19s

If you are running a version earlier than 1.17.9-gke.1500, or a 1.18 version earlier than 1.18.6-gke.600, you must also wait for gvisor.config.common-webhooks.networking.gke.io to be instantiated. To check, use the following command:

kubectl get mutatingwebhookconfiguration gvisor.config.common-webhooks.networking.gke.io
NAME                                              CREATED AT
gvisor.config.common-webhooks.networking.gke.io   2020-04-06T17:07:17Z

Optional: Enable Cloud Operations for GKE

It is optional but recommended that you enable Cloud Operations for GKE on the cluster, so that gVisor messages are logged. Cloud Operations for GKE is enabled by default for new clusters.

You can use Google Cloud Console to enable these features on an existing cluster.

  1. Go to the Google Kubernetes Engine page in Cloud Console.

    Go to Google Kubernetes Engine

  2. Click the name of the cluster you want to modify.

  3. Under Features, in the Cloud Operations for GKE field, click Edit Cloud Operations for GKE.

  4. Select the Enable Cloud Operations for GKE checkbox.

  5. From the drop-down list, select System and workload logging and monitoring.

  6. Click Save Changes.

Working with GKE Sandbox

Running an application in a sandbox

To force a Deployment to run on a node with GKE Sandbox enabled, set its spec.template.spec.runtimeClassName to gvisor, as shown by this manifest for a Deployment:

# httpd.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd
  labels:
    app: httpd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      runtimeClassName: gvisor
      containers:
      - name: httpd
        image: httpd

To create the Deployment, use the kubectl create command:

kubectl create -f httpd.yaml

The Pod is deployed to a node in a node pool with GKE Sandbox enabled. To verify the deployment, use the following command to find the node where the Pod is deployed:

kubectl get pods

The output is similar to:

NAME                    READY   STATUS    RESTARTS   AGE
httpd-db5899bc9-dk7lk   1/1     Running   0          24s

From the output, find the name of the Pod in the output, and then run the following command to check its value for RuntimeClass:

kubectl get pods pod-name -o jsonpath='{.spec.runtimeClassName}'

The output is:

gvisor

Alternatively, you can list the RuntimeClass of each Pod, and look for the ones where it is set to gvisor:

kubectl get pods -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.runtimeClassName}\n{end}'

Output is:

pod-name: gvisor

This method of verifying that the Pod is running in a sandbox is trustworthy because it does not rely on any data within the sandbox itself. Anything reported from within the sandbox is untrustworthy, because it could be defective or malicious.

Running a regular Pod along with sandboxed Pods

After enabling GKE Sandbox on a node pool, you can run trusted applications on those nodes without using a sandbox by using node taints and tolerations. These Pods are referred to as "regular Pods" to distinguish them from sandboxed Pods.

Regular Pods, just like sandboxed Pods, are prevented from accessing other Google Cloud services or cluster metadata. This prevention is part of the node's configuration. If your regular Pods or sandboxed Pods require access to Google Cloud services, use Workload Identity.

GKE Sandbox adds the following label and taint to nodes that can run sandboxed Pods:

labels:
  sandbox.gke.io/runtime: gvisor
taints:
- effect: NoSchedule
  key: sandbox.gke.io/runtime
  value: gvisor

In addition to any node affinity and toleration settings in your Pod manifest, GKE Sandbox applies the following node affinity and toleration to all Pods with RuntimeClass set to gvisor:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: sandbox.gke.io/runtime
          operator: In
          values:
          - gvisor
tolerations:
  - effect: NoSchedule
    key: sandbox.gke.io/runtime
    operator: Equal
    value: gvisor

To schedule a regular Pod on a node with GKE Sandbox enabled, manually apply the node affinity and toleration above in your Pod manifest.

  • If your pod can run on nodes with GKE Sandbox enabled, add the toleration.
  • If your pod must run on nodes with GKE Sandbox enabled, add both the node affinity and toleration.

For example, the following manifest modifies the manifest used in Running an application in a sandbox so that it runs as a regular Pod on a node with sandboxed Pods, by removing the runtimeClass and adding both the taint and toleration above.

# httpd-no-sandbox.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd-no-sandbox
  labels:
    app: httpd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: httpd
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: sandbox.gke.io/runtime
                operator: In
                values:
                - gvisor
      tolerations:
        - effect: NoSchedule
          key: sandbox.gke.io/runtime
          operator: Equal
          value: gvisor

First, verify that the Deployment is not running in a sandbox:

kubectl get pods -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.runtimeClassName}\n{end}'

The output is similar to:

httpd-db5899bc9-dk7lk: gvisor
httpd-no-sandbox-5bf87996c6-cfmmd:

The httpd Deployment created earlier is running in a sandbox, because its runtimeClass is gvisor. The httpd-no-sandbox Deployment has no value for runtimeClass, so it is not running in a sandbox.

Next, verify that the non-sandboxed Deployment is running on a node with GKE Sandbox by running the following command:

kubectl get pod -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.nodeName}\n{end}'

The name of the node pool is embedded in the value of nodeName. Verify that the Pod is running on a node in a node pool with GKE Sandbox enabled.

Verifying metadata protection

To validate the assertion that metadata is protected from nodes that can run sandboxed Pods, you can run a test:

  1. Create a sandboxed Deployment from the following manifest, using kubectl apply -f. It uses the fedora image, which includes the curl command. The Pod runs the /bin/sleep command to ensure that the Deployment runs for 10000 seconds.

    # sandbox-metadata-test.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: fedora
      labels:
        app: fedora
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: fedora
      template:
        metadata:
          labels:
            app: fedora
        spec:
          runtimeClassName: gvisor
          containers:
          - name: fedora
            image: fedora
            command: ["/bin/sleep","10000"]
    
  2. Get the name of the Pod using kubectl get pods, then use kubectl exec to connect to the Pod interactively.

    kubectl exec -it pod-name /bin/sh
    

    You are connected to a container running in the Pod, in a /bin/sh session.

  3. Within the interactive session, attempt to access a URL that returns cluster metadata:

    curl -s "http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"
    

    The command hangs and eventually times out, because the packets are silently dropped.

  4. Press Ctrl+C to terminate the curl command, and type exit to disconnect from the Pod.

  5. Remove the RuntimeClass line from the YAML manifest and redeploy the Pod using kubectl apply -f filename. The sandboxed Pod is terminated and recreated on a node without GKE Sandbox.

  6. Get the new Pod name, connect to it using kubectl exec, and run the curl command again. This time, results are returned. This example output is truncated.

    ALLOCATE_NODE_CIDRS: "true"
    API_SERVER_TEST_LOG_LEVEL: --v=3
    AUTOSCALER_ENV_VARS: kube_reserved=cpu=60m,memory=960Mi,ephemeral-storage=41Gi;...
    ...
    

    Type exit to disconnect from the Pod.

  7. Remove the deployment:

    kubectl delete deployment fedora
    

Disabling GKE Sandbox

It isn't currently possible to update a node pool to disable GKE Sandbox. To disable GKE Sandbox on an existing node pool, you can do one of the following:

What's next