GKE Sandbox

This page describes how GKE Sandbox protects the host kernel on your nodes when containers in the Pod execute unknown or untrusted code. For example, multi-tenant clusters such as software-as-a-service (SaaS) providers often execute unknown code submitted by their users.

GKE Sandbox uses gVisor, an open source project. This topic broadly discusses gVisor, but you can learn more details by reading the official gVisor documentation.

Overview

GKE Sandbox provides an extra layer of security to prevent untrusted code from affecting the host kernel on your cluster nodes. Before discussing how GKE Sandbox works, it's useful to understand the nature of the potential risks it helps mitigate.

A container runtime such as docker or containerd provides some degree of isolation between the container's processes and the kernel running on the node. However, the container runtime often runs as a privileged user on the node and has access to most system calls into the host kernel.

Potential threats

Multi-tenant clusters and clusters whose containers run untrusted workloads are more exposed to security vulnerabilities than other clusters. Examples include SaaS providers, web-hosting providers, or other organizations that allow their users to upload and run code. A flaw in the container runtime or in the host kernel could allow a process running within a container to "escape" the container and affect the node's kernel, potentially bringing down the node.

The potential also exists for a malicious tenant to gain access to and exfiltrate another tenant's data in memory or on disk, by exploiting such a defect.

Finally, an untrusted workload could potentially access other Google Cloud services or cluster metadata.

How GKE Sandbox mitigates these threats

gVisor is a userspace re-implementation of the Linux kernel API that does not need elevated privileges. In conjunction with a container runtime such as containerd , the userspace kernel re-implements the majority of system calls and services them on behalf of the host kernel. Direct access to the host kernel is limited. See the gVisor architecture guide for detailed information about how this works. From the container's point of view, gVisor is nearly transparent, and does not require any changes to the containerized application.

When you enable GKE Sandbox on a node pool, a sandbox is created for each Pod running on a node in that node pool. In addition, nodes running sandboxed Pods are prevented from accessing other Google Cloud services or cluster metadata.

Each sandbox uses its own userspace kernel. With this in mind, you can make decisions about how to group your containers into Pods, based on the level of isolation you require and the characteristics of your applications.

GKE Sandbox is an especially good fit for the following types of applications. See Limitations for more information to help you decide which applications to sandbox.

  • Untrusted or third-party applications using runtimes such as Rust, Java, Python, PHP, Node.js, or Golang
  • Web server front-ends, caches, or proxies
  • Applications processing external media or data using CPUs
  • Machine-learning workloads using CPUs
  • CPU-intensive or memory-intensive applications

Additional security recommendations

When using GKE Sandbox, we recommend that you also follow these recommendations:

  • It is highly recommended that you specify resource limits on all containers running in a sandbox. This protects against the risk of a defective or malicious application starving the node of resources and negatively impacting other applications or system processes running on the node.

Limitations

GKE Sandbox works well with many applications, but not all. This section provides more information about the current limitations of GKE Sandbox.

Node pool configuration

  • You cannot enable GKE Sandbox on the default node pool.
  • When using GKE Sandbox, your cluster must have at least two node pools. You must always have at least one node pool where GKE Sandbox is disabled. This node pool must contain at least one node, even if all your workloads are sandboxed.

Access to cluster metadata

  • Nodes running sandboxed Pods are prevented from accessing cluster metadata at the level of the operating system on the node.
  • You can run regular Pods on a node with GKE Sandbox enabled. However, by default those regular Pods cannot access Google Cloud services or cluster metadata. Use Workload Identity to grant Pods access to Google Cloud services.

Hyper-Threading is disabled

gVisor nodes have Hyper-Threading disabled by default to mitigate Microarchitectural Data Sampling (MDS) vulnerabilities announced by Intel. To enable Hyper-Threading for a node pool:

  1. Create a new node pool in your cluster with the node label cloud.google.com/gke-smt-disabled=false:

    gcloud container node-pools create smt-enabled --cluster=[CLUSTER_NAME] \
      --node-labels=cloud.google.com/gke-smt-disabled=false \
      --image-type=cos_containerd \
      --sandbox type=gvisor
    
  2. Deploy the DaemonSet to the node pool. The DaemonSet will only run on nodes with the cloud.google.com/gke-smt-disabled=false label. It will enable Hyper-Threading and then reboot the node.

    kubectl create -f \
      https://raw.githubusercontent.com/GoogleCloudPlatform/\
      k8s-node-tools/master/disable-smt/gke/enable-smt.yaml
    
  3. After the node reboots, ensure that the DaemonSet pods are in running state.

    kubectl get pods --selector=name=enable-smt -n kube-system
    
  4. You should get a response similar to:

    NAME               READY     STATUS    RESTARTS   AGE
    enable-smt-2xnnc   1/1       Running   0          6m
    
  5. Check that SMT has been enabled appears in the logs of the pods.

    kubectl logs enable-smt-2xnnc enable-smt -n kube-system
    

Capabilities

By default, the container is prevented from opening raw sockets, to reduce the potential for malicious attacks. Certain network-related tools such as ping and tcpdump, create raw sockets as part of their core functionality. To enable raw sockets, you must explicitly add the NET_RAW capability to the container's security context:

spec:
  containers:
  - name: my-container
    securityContext:
      capabilities:
        add: ["NET_RAW"]

Incompatible features

It is not currently possible to use GKE Sandbox along with the following Kubernetes features:

Workload characteristics

Imposing an additional layer of indirection for accessing the node's kernel comes with performance trade-offs. GKE Sandbox provides the most tangible benefit on large multi-tenant clusters where isolation is important. Keep the following guidelines in mind when testing your workloads with GKE Sandbox.

System calls

Workloads that generate a large volume of low-overhead system calls, such as a large number of small IO operations, may require more system resources when running in a sandbox, so you may need to use more powerful nodes or add additional nodes to your cluster.

Direct access to hardware or virtualization

If your workload needs any of the following, GKE Sandbox might not be a good fit because it prevents direct access to the host kernel on the node:

  • Direct access to the node's hardware
  • Kernel-level virtualization features
  • Privileged containers

What's next

Segítségére volt ez az oldal? Tudassa velünk a véleményét:

Visszajelzés küldése a következővel kapcsolatban:

Kubernetes Engine Documentation