You can let Google Kubernetes Engine (GKE) workloads manage resources, like CPU and memory, for child processes by using the Linux cgroups API. This document shows you how to provide containers with read-write access to the cgroups API without running those containers in privileged mode.
When to use writable cgroups
By default, Kubernetes provides all Linux containers with read-only access to
the cgroups API by mounting the /sys/fs/cgroup file system in each container.
You can optionally let GKE mount this file system in read-write
mode in specific Pods to let root processes manage and constrain resources for
child processes.
These writable cgroups help to improve reliability in applications like
Ray that run
system processes and user code in the same container. By writing to the
/sys/fs/cgroup file system, Ray can reserve portions of a container's resources
for critical processes. You can use writable cgroups to improve reliability in
these applications without the security risk of using privileged mode for the
containers.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
- Ensure that you have an Autopilot or Standard cluster running version 1.34 or later. To create a new cluster, see Create an Autopilot cluster.
- Ensure that your cluster uses cgroup v2. For more information, see Migrate nodes to Linux cgroup v2.
Enable writable cgroups for your nodes
Enable writable cgroups on your node pools by customizing the containerd configuration. You can apply this configuration to your entire cluster or to specific node pools in Standard clusters.
In your containerd configuration file, add a writableCgroups section and set
the enabled field to true. For more information, see
Customize containerd configuration in GKE nodes.
writableCgroups:
enabled: true
Specify the updated configuration file when you create or update a cluster or a node pool.
Use writable cgroups in workloads
After you enable writable cgroups for your cluster or node pools, configure your workloads to meet all of the following requirements:
- Select a node that has writable cgroups enabled.
- Enable writable cgroups for one or more containers in the Pod.
Use the Guaranteed Quality of Service (QoS) class by meeting one of the following conditions:
- For workloads that
specify resources at the Pod level,
set equal values for
resources.requestsandresources.limitsin the Pod specification. - For workloads that specify resources for each container, set equal values
for
resources.requestsandresources.limitsin the specification of every container in the Pod, including init containers.
- For workloads that
specify resources at the Pod level,
set equal values for
To configure these requirements, follow these steps:
To select nodes that have writable cgroups enabled, add the
node.gke.io/enable-writable-cgroups: "true"label to thespec.nodeSelectorfield in your Pod specification:node.gke.io/enable-writable-cgroups: "true"To enable writable cgroups for your workload, add one of the following labels to the
metadata.annotationsfield in your Pod specification:Enable for the entire Pod:
node.gke.io/enable-writable-cgroups: "true"Enable for a specific container in the Pod:
node.gke.io/enable-writable-cgroups.CONTAINER_NAME: "true"Replace
CONTAINER_NAMEwith the name of the container.
To configure the Guaranteed QoS class for your Pod, specify equal CPU and memory requests and limits for every container in the Pod or for the entire Pod, like in the following example:
resources: requests: cpu: "100m" memory: "100Mi" limits: cpu: "100m" memory: "100Mi"You must specify equal requests and limits for every container, even if you enable writable cgroups only for one of the containers in the Pod.
Your final Pod specification should be similar to the following examples.
This example enables writable cgroups for all containers in the Pod:
apiVersion: v1 kind: Pod metadata: name: writable-cgroups-pod annotations: node.gke.io/enable-writable-cgroups: "true" spec: nodeSelector: node.gke.io/enable-writable-cgroups: "true" containers: - name: container image: busybox:stable command: ["/bin/sh", "-c"] args: - | trap 'echo "Caught SIGTERM, exiting..."; exit 0' TERM echo "Waiting for termination signal..." while true; do sleep 1; done resources: requests: cpu: "100m" memory: "100Mi" limits: cpu: "100m" memory: "100Mi"This example enables writable cgroups for a specific container in a multi-container Pod:
apiVersion: v1 kind: Pod metadata: name: writable-cgroups-per-container annotations: node.gke.io/enable-writable-cgroups.busybox-container: "true" spec: nodeSelector: node.gke.io/enable-writable-cgroups: "true" containers: - name: busybox-container image: busybox:stable command: ["/bin/sh", "-c"] args: - | trap 'echo "Caught SIGTERM, exiting..."; exit 0' TERM echo "Waiting for termination signal..." while true; do sleep 1; done resources: requests: cpu: "100m" memory: "100Mi" limits: cpu: "100m" memory: "100Mi" - name: container-disabled image: busybox:stable command: ["/bin/sh", "-c"] args: - | trap 'echo "Caught SIGTERM, exiting..."; exit 0' TERM echo "Waiting for termination signal..." while true; do sleep 1; done resources: requests: cpu: "100m" memory: "100Mi" limits: cpu: "100m" memory: "100Mi"
Verify that the cgroup file system is writable
To verify the permissions on the /sys/fs/cgroup file system for a Pod or a
container, follow these steps:
- Identify a Pod that you want to check. You can use one of the sample Pods from the Use writable cgroups in workloads section.
Create a shell session in the Pod:
kubectl exec -it POD_NAME -- /bin/shReplace
POD_NAMEwith the name of the Pod.Describe the mounted cgroup file system:
mount | grep cgroupThe output is similar to the following:
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)In this output,
rwindicates that the file system is writable. If you seeroin the output, the file system is read-only.