Running untrusted workloads with GKE Sandbox

This page describes how to use GKE Sandbox to protect the host kernel on your nodes when containers in the Pod execute unknown or untrusted code. For example, multi-tenant clusters such as software-as-a-service (SaaS) providers often execute unknown code submitted by their users.

GKE Sandbox uses gVisor, an open source project. This topic broadly discusses gVisor, but you can learn more details by reading the official gVisor documentation.

Overview

GKE Sandbox provides an extra layer of security to prevent untrusted code from affecting the host kernel on your cluster nodes. Before discussing how GKE Sandbox works, it's useful to understand the nature of the potential risks it helps mitigate.

A container runtime such as docker or containerd provides some degree of isolation between the container's processes and the kernel running on the node. However, the container runtime often runs as a privileged user on the node and has access to most system calls into the host kernel.

Potential threats

Multi-tenant clusters and clusters whose containers run untrusted workloads are more exposed to security vulnerabilities than other clusters. Examples include SaaS providers, web-hosting providers, or other organizations that allow their users to upload and run code. A flaw in the container runtime or in the host kernel could allow a process running within a container to "escape" the container and affect the node's kernel, potentially bringing down the node.

The potential also exists for a malicious tenant to gain access to and exfiltrate another tenant's data in memory or on disk, by exploiting such a defect.

Finally, an untrusted workload could potentially access other Google Cloud Platform services or cluster metadata.

How GKE Sandbox mitigates these threats

gVisor is a userspace re-implementation of the Linux kernel API that does not need elevated privileges. In conjunction with a container runtime such as containerd , the userspace kernel re-implements the majority of system calls and services them on behalf of the host kernel. Direct access to the host kernel is limited. See the gVisor architecture guide for detailed information about how this works. From the container's point of view, gVisor is nearly transparent, and does not require any changes to the containerized application.

When you enable GKE Sandbox on a node pool, a sandbox is created for each Pod running on a node in that node pool. In addition, nodes running sandboxed Pods are prevented from accessing other GCP services or cluster metadata.

Each sandbox uses its own userspace kernel. With this in mind, you can make decisions about how to group your containers into Pods, based on the level of isolation you require and the characteristics of your applications.

GKE Sandbox is an especially good fit for the following types of applications. See Limitations for more information to help you decide which applications to sandbox.

  • Untrusted or third-party applications using runtimes such as Rust, Java, Python, PHP, Node.js, or Golang
  • Web server front-ends, caches, or proxies
  • Applications processing external media or data using CPUs
  • Machine-learning workloads using CPUs
  • CPU-intensive or memory-intensive applications

Additional security recommendations

When using GKE Sandbox, we recommend that you also follow these recommendations:

  • Unless your nodes use only a single vCPU, we recommend that you disable Hyper-Threading to mitigate Microarchitectural Data Sampling (MDS) vulnerabilities announced by Intel. For more information, see the security bulletin.

  • It is highly recommended that you specify resource limits on all containers running in a sandbox. This protects against the risk of a defective or malicious application starving the node of resources and negatively impacting other applications or system processes running on the node.

Limitations

GKE Sandbox works well with many applications, but not all. This section provides more information about the current limitations of GKE Sandbox.

Node pool configuration

  • You cannot enable GKE Sandbox on the default node pool.
  • When using GKE Sandbox, your cluster must have at least two node pools. You must always have at least one node pool where GKE Sandbox is disabled. This node pool must contain at least one node, even if all your workloads are sandboxed.

Access to cluster metadata

  • Nodes running sandboxed Pods are prevented from accessing cluster metadata at the level of the operating system on the node.
  • You can run regular Pods on a node with GKE Sandbox enabled. However, by default those regular Pods cannot access GCP services or cluster metadata. Use Workload Identity to grant Pods access to GCP services.

Incompatible features

It is not currently possible to use GKE Sandbox along with the following Kubernetes features:

  • Accelerators such as GPUs or TPUs
  • Istio
  • Monitoring statistics at the level of the Pod or container
  • Hostpath storage
  • Per-container PID namespace
  • CPU and memory limits are only applied for Guaranteed Pods and Burstable Pods, and only when CPU and memory limits are specified for all containers running in the Pod.
  • Pods using PodSecurityPolicies that specify host namespaces, such as hostNetwork, hostPID, hostIPC
  • Pods using PodSecurityPolicy settings such as privileged mode,
  • VolumeDevices
  • Portforward
  • Linux kernel security modules such as Seccomp, Apparmor, or Selinux Sysctl, NoNewPrivileges, bidirectional MountPropagation, FSGroup, or ProcMount.

Workload characteristics

Imposing an additional layer of indirection for accessing the node's kernel comes with performance trade-offs. GKE Sandbox provides the most tangible benefit on large multi-tenant clusters where isolation is important. Keep the following guidelines in mind when testing your workloads with GKE Sandbox.

System calls

Workloads that generate a large volume of low-overhead system calls, such as a large number of small IO operations, may require more system resources when running in a sandbox, so you may need to use more powerful nodes or add additional nodes to your cluster.

Direct access to hardware or virtualization

If your workload needs any of the following, GKE Sandbox might not be a good fit because it prevents direct access to the host kernel on the node:

  • Direct access to the node's hardware
  • Kernel-level virtualization features
  • Privileged containers

Enabling GKE Sandbox

You can enable GKE Sandbox on a new cluster or an existing cluster.

Before you begin

To prepare for this task, perform the following steps:

  • Ensure that you have enabled the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • Ensure that you have installed the Cloud SDK.
  • Set your default project ID:
    gcloud config set project [PROJECT_ID]
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone [COMPUTE_ZONE]
  • If you are working with regional clusters, set your default compute region:
    gcloud config set compute/region [COMPUTE_REGION]
  • Update gcloud to the latest version:
    gcloud components update
  • GKE Sandbox requires GKE v1.12.7-gke.17 or higher, or v1.13.5-gke.15 or higher, for the cluster master and nodes.
  • Ensure that the gcloud command is version 243.0.0 or higher.

On a new cluster

To enable GKE Sandbox, you configure a node pool. The default node pool (the first node pool in your cluster, created when the cluster is created) cannot use GKE Sandbox. To enable GKE Sandbox during cluster creation, you must add a second node pool when you create the cluster.

Console

To view your clusters, visit the Google Kubernetes Engine menu in GCP Console.

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click Create cluster.

  3. Choose the Standard cluster template or choose an appropriate template for your workload.

  4. Optional but recommended: Enable Stackdriver Logging and Stackdriver Monitoring, so that gVisor messages are logged.

  5. Click Add node pool.

  6. Configure the node pool according to your requirements. Click More node pool options for the node pool. Configure these settings:

    • For the node version, select v1.12.6-gke.8 or higher.
    • For the node image, select Container-Optimized OS with Containerd (cos_containerd) (beta).
    • Enable Enable sandbox with gVisor (beta).
    • If the nodes in the node pool use more than a single vCPU, click Add Label. Set the key to cloud.google.com/gke-smt-disabled and the value to true. Next, follow the instructions for disabling Hyper-Threading in the security bulletin.

    Configure other node pool settings as required.

  7. Save the node pool settings and continue configuring your cluster.

gcloud

GKE Sandbox can't be enabled for the default node pool, and it isn't possible to create additional node pools at the same time as you create a new cluster using the gcloud command. Instead, create your cluster as you normally would. It is optional but recommended that you enable Stackdriver Logging and Stackdriver Monitoring, by adding the flag --enable-stackdriver-kubernetes. gVisor messages are logged.

Next, use the gcloud beta container node-pools create command, and set the --sandbox flag to type=gvisor. Replace values in square brackets with your own, and remember to specify a node version of v1.12.6-gke.8 or higher.

gcloud beta container node-pools create [NODE_POOL_NAME] \
  --cluster=[CLUSTER_NAME] \
  --node-version=[NODE_VERSION] \
  --image-type=cos_containerd \
  --sandbox type=gvisor \
  --enable-autoupgrade

The gvisor RuntimeClass is instantiated during node creation, before any workloads are scheduled onto the node. You can check for the existence of the gvisor RuntimeClass using the following command:

kubectl get runtimeclasses
NAME     AGE
gvisor   19s

On an existing cluster

You can enable GKE Sandbox on an existing cluster by adding a new node pool and enabling the feature for that node pool, or by modifying an existing non-default node pool.

Console

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click the cluster's Edit button, which looks like a pencil.

  3. If necessary, add an additional node pool by clicking Add node pool. To edit an existing node pool, click the node pool's Edit button. Do not enable Sandbox with gVisor (beta) on the default node pool.

  4. Enable Sandbox with gVisor (beta), then click Done.

  5. If necessary, make additional configuration changes to the cluster, then click Save.

gcloud

To create a new node pool with GKE Sandbox enabled, use a command like the following:

gcloud beta container node-pools create [NODE_POOL_NAME] \
  --cluster=[CLUSTER_NAME] \
  --image-type=cos_containerd \
  --sandbox type=gvisor \
  --enable-autoupgrade

To enable GKE Sandbox on an existing node pool, use a command like the following. Do not enable --sandbox type=gvisor on the default node pool.

 gcloud beta container node-pools update [NODE_POOL_NAME] \
  --sandbox type=gvisor

The gvisor RuntimeClass is instantiated during node creation, before any workloads are scheduled onto the node. You can check for the existence of the gvisor RuntimeClass using the following command:

kubectl get runtimeclasses
NAME     AGE
gvisor   19s

Optional: Enable Stackdriver Logging and Stackdriver Monitoring

It is optional but recommended that you enable Stackdriver Logging and Stackdriver Monitoring on the cluster, so that gVisor messages are logged. You must use Google Cloud Platform Console to enable these features on an existing cluster.

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click the cluster's Edit button, which looks like a pencil.

  3. Enable Stackdriver Logging and Stackdriver Monitoring.

  4. If necessary, make additional configuration changes to the cluster, then click Save.

Working with GKE Sandbox

Running an application in a sandbox

To force a Deployment to run on a node with GKE Sandbox enabled, set its spec.template.spec.runtimeClassName to gvisor, as shown by this manifest for a Deployment:

# httpd.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd
  labels:
    app: httpd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      runtimeClassName: gvisor
      containers:
      - name: httpd
        image: httpd

To create the Deployment, use the kubectl create command:

kubectl create -f httpd.yaml

The Pod is deployed to a node in a node pool with GKE Sandbox enabled. To verify this, use the kubectl get pods command to find the node where the Pod is deployed:

kubectl get pods

NAME                    READY   STATUS    RESTARTS   AGE
httpd-db5899bc9-dk7lk   1/1     Running   0          24s

Find the name of the Pod in the output, then run the following command to check its value for RuntimeClass:

kubectl get pods [NAME-OF-POD] -o jsonpath='{.spec.runtimeClassName}'

gvisor

Alternatively, you can list the RuntimeClass of each Pod, and look for the ones where it is set to gvisor:

kubectl get pods -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.runtimeClassName}\n{end}'

[NAME-OF-POD]: gvisor

This method of verifying that the Pod is running in a sandbox is trustworthy because it does not rely on any data within the sandbox itself. Anything reported from within the sandbox is untrustworthy, because it could be defective or malicious.

Running a regular Pod along with sandboxed Pods

After enabling GKE Sandbox on a node pool, you can run trusted applications on those nodes without using a sandbox by using node taints and tolerations. These Pods are referred to as "regular Pods" to distinguish them from sandboxed Pods.

Regular Pods, just like sandboxed Pods, are prevented from accessing other GCP services or cluster metadata. This prevention is part of the node's configuration. If your regular Pods or sandboxed Pods require access to GCP services, use Workload Identity.

GKE Sandbox adds the following label and taint to nodes that can run sandboxed Pods:

labels:
  sandbox.gke.io: gvisor
taints:
- effect: NoSchedule
  key: sandbox.gke.io
  value: gvisor

In addition to any node affinity and toleration settings in your Pod manifest, GKE Sandbox applies the following node affinity and toleration to all Pods with RuntimeClass set to gvisor:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: sandbox.gke.io/runtime
          operator: In
          values:
          - gvisor
tolerations:
  - effect: NoSchedule
    key: sandbox.gke.io/runtime
    operator: Equal
    value: gvisor

To schedule a regular Pod on a node with GKE Sandbox enabled, manually apply the node affinity and toleration above in your Pod manifest.

  • If your pod can run on nodes with GKE Sandbox enabled, add the toleration.
  • If your pod must run on nodes with GKE Sandbox enabled, add both the node affinity and toleration.

For example, the following manifest modifies the manifest used in Running an application in a sandbox so that it runs as a regular Pod on a node with sandboxed Pods, by removing the runtimeClass and adding both the taint and toleration above.

# httpd-no-sandbox.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd-no-sandbox
  labels:
    app: httpd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: httpd
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: sandbox.gke.io/runtime
                operator: In
                values:
                - gvisor
      tolerations:
        - effect: NoSchedule
          key: sandbox.gke.io/runtime
          operator: Equal
          value: gvisor

First, verify that the Deployment is not running in a sandbox:

kubectl get pods -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.runtimeClassName}\n{end}'

httpd-db5899bc9-dk7lk: gvisor
httpd-no-sandbox-5bf87996c6-cfmmd:

The httpd Deployment created earlier is running in a sandbox, because its runtimeClass is gvisor. The httpd-no-sandbox Deployment has no value for runtimeClass, so it is not running in a sandbox.

Next, verify that the non-sandboxed Deployment is running on a node with GKE Sandbox by running the following command:

kubectl get pod -o jsonpath=$'{range .items[*]}{.metadata.name}: {.spec.nodeName}\n{end}'

The name of the node pool is embedded in the value of nodeName. Verify that the Pod is running on a node in a node pool with GKE Sandbox enabled.

Verifying metadata protection

To validate the assertion that metadata is protected from nodes that can run sandboxed Pods, you can run a test:

  1. Create a sandboxed Deployment from the following manifest, using kubectl apply -f. It uses the fedora image, which includes the curl command. The Pod runs the /bin/sleep command to ensure that the Deployment runs for 10000 seconds.

    # sandbox-metadata-test.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: fedora
      labels:
        app: fedora
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: fedora
      template:
        metadata:
          labels:
            app: fedora
        spec:
          runtimeClassName: gvisor
          containers:
          - name: fedora
            image: fedora
            command: ["/bin/sleep","10000"]
    
  2. Get the name of the Pod using kubectl get pods, then use kubectl exec to connect to the Pod interactively.

    kubectl exec -it [POD-NAME] /bin/sh
    

    You are connected to a container running in the Pod, in a /bin/sh session.

  3. Within the interactive session, attempt to access a URL that returns cluster metadata:

    curl -s "http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"
    

    The command hangs and eventually times out, because the packets are silently dropped.

  4. Press Ctrl+C to terminate the curl command, and type exit to disconnect from the Pod.

  5. Remove the RuntimeClass line from the YAML manifest and redeploy the Pod using kubectl apply -f [FILENAME]. The sandboxed Pod is terminated and recreated on a node without GKE Sandbox.

  6. Get the new Pod name, connect to it using kubectl exec, and run the curl command again. This time, results are returned. This example output is truncated.

    ALLOCATE_NODE_CIDRS: "true"
    API_SERVER_TEST_LOG_LEVEL: --v=3
    AUTOSCALER_ENV_VARS: kube_reserved=cpu=60m,memory=960Mi,ephemeral-storage=41Gi;...
    ...
    

    Type exit to disconnect from the Pod.

  7. Remove the deployment:

    kubectl delete deployment fedora
    

Disabling GKE Sandbox

It isn't currently possible to update a node pool to disable GKE Sandbox. To disable GKE Sandbox on an existing node pool, you can do one of the following:

  • Delete the previously-sandboxed Pods. Otherwise, after you disable GKE Sandbox, those Pods run as regular Pods if no available nodes have GKE Sandbox enabled. Then delete the node pool where GKE Sandbox was enabled, or
  • Resize the node pool to zero nodes, or
  • Recreate the Pods without specifying a value for the RuntimeClassName.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine Documentation