Customizing node system configuration

Standard

This document shows you how to customize your Google Kubernetes Engine (GKE) node configuration using a configuration file called a node system configuration.

Overview

You can customize your node configuration by using various methods. For example, you can specify parameters such as the machine type and minimum CPU platform when you create a node pool.

A node system configuration is a configuration file that provides a way to adjust a limited set of system settings. You can use a node system configuration to specify custom settings for the Kubernetes node agent ( kubelet) and low-level Linux kernel configurations (sysctl) in your node pools.

You can also customize your containerd container runtime on your GKE nodes by using a different file called a runtime configuration file. For instructions, see Customize containerd configuration in GKE nodes.

You can also use DaemonSets to customize nodes, such as in Automatically bootstrapping GKE nodes with DaemonSets.

Using a node system configuration

You can customize your node system configuration by using any of the following methods:

Configuration file: available in Standard mode. You use a YAML file that contains the kubelet and Linux kernel configuration parameters. The steps on this page show you how to create and use a configuration file.
ComputeClass: available in Autopilot mode and Standard mode. You specify the node system configuration in your GKE ComputeClass specification. Compute classes let you define sets of node attributes for GKE to use when it scales your cluster up. Available in GKE version 1.32.1-gke.1729000 and later. For details, see About compute classes in GKE.

To use a node system configuration file, do the following:

Create a configuration file. This file contains your kubelet and sysctl configurations.
Add the configuration when you create a cluster, or when you create or update a node pool.

Creating a configuration file

Write your node system configuration file in YAML. The following example shows you how to add configurations for the kubelet and sysctl options:

kubeletConfig:
  cpuManagerPolicy: static
  allowedUnsafeSysctls:
  - 'kernel.shm*'
  - 'kernel.msg*'
  - 'kernel.sem'
  - 'fs.mqueue*'
  - 'net.*'
linuxConfig:
 sysctl:
   net.core.somaxconn: '2048'
   net.ipv4.tcp_rmem: '4096 87380 6291456'

In this example:

cpuManagerPolicy: static configures the kubelet to use the static CPU management policy.
net.core.somaxconn: '2048' limits the socket listen() backlog to 2,048 bytes.
net.ipv4.tcp_rmem: '4096 87380 6291456' sets the minimum, default, and maximum value of the TCP socket receive buffer to 4,096 bytes, 87,380 bytes, and 6,291,456 bytes respectively.

If you want to add configurations solely for the kubelet or sysctl, only include that section in your configuration file. For example, to add a kubelet configuration, create the following file:

kubeletConfig:
  cpuManagerPolicy: static

For a complete list of the fields that you can add to your configuration file, see the Kubelet configuration options and Sysctl configuration options sections.

Adding the configuration to a node pool

After you have created the configuration file, add the --system-config-from-file flag by using the Google Cloud CLI. You can add this flag when you create a cluster, or when you create or update a node pool. You cannot add a node system configuration with the Google Cloud console.

Create a cluster with the node system configuration

You can add a node system configuration during cluster creation with the gcloud CLI or Terraform. The following instructions apply the node system configuration to the default node pool:

gcloud CLI

gcloud container clusters create CLUSTER_NAME \
    --location=LOCATION \
    --system-config-from-file=SYSTEM_CONFIG_PATH

Replace the following:

CLUSTER_NAME: the name for your cluster
LOCATION: the compute zone or region of the cluster
SYSTEM_CONFIG_PATH: the path to the file that contains your kubelet and sysctl configurations

After you have applied a node system configuration, the default node pool of the cluster uses the settings that you defined.

Terraform

To create a regional cluster with a customized node system configuration by using Terraform, refer to the following example:

resource "google_container_cluster" "default" {
  name     = "gke-standard-regional-cluster"
  location = "us-central1"

  initial_node_count = 1

  node_config {
    # Kubelet configuration
    kubelet_config {
      cpu_manager_policy = "static"
    }

    linux_node_config {
      # Sysctl configuration
      sysctls = {
        "net.core.netdev_max_backlog" = "10000"
      }

      # Linux cgroup mode configuration
      cgroup_mode = "CGROUP_MODE_V2"

      # Linux huge page configuration
      hugepages_config {
        hugepage_size_2m = "1024"
      }
    }
  }
}

For more information about using Terraform, see Terraform support for GKE.

Create a new node pool with the node system configuration

You can add a node system configuration when you use the gcloud CLI or Terraform to create a new node pool. You can also update the node system configuration of an existing node pool.

The following instructions apply the node system configuration to a new node pool:

gcloud CLI

gcloud container node-pools create POOL_NAME \
     --cluster CLUSTER_NAME \
     --location=LOCATION \
     --system-config-from-file=SYSTEM_CONFIG_PATH

``` Replace the following:

POOL_NAME: the name for your node pool
CLUSTER_NAME: the name of the cluster that you want to add a node pool to
LOCATION: the compute zone or region of the cluster
SYSTEM_CONFIG_PATH: the path to the file that contains your kubelet and sysctl configurations

Terraform

To create a node pool with a customized node system configuration by using Terraform, refer to the following example:

resource "google_container_node_pool" "default" {
  name    = "gke-standard-regional-node-pool"
  cluster = google_container_cluster.default.name

  node_config {
    # Kubelet configuration
    kubelet_config {
      cpu_manager_policy = "static"
    }

    linux_node_config {
      # Sysctl configuration
      sysctls = {
        "net.core.netdev_max_backlog" = "10000"
      }

      # Linux cgroup mode configuration
      cgroup_mode = "CGROUP_MODE_V2"

      # Linux huge page configuration
      hugepages_config {
        hugepage_size_2m = "1024"
      }
    }
  }
}

For more information about using Terraform, see Terraform support for GKE.

Update the node system configuration of an existing node pool

Run the following command:

  gcloud container node-pools update POOL_NAME \
      --cluster=CLUSTER_NAME \
      --location=LOCATION \
      --system-config-from-file=SYSTEM_CONFIG_PATH

Replace the following:

POOL_NAME: the name of the node pool that you want to update
CLUSTER_NAME: the name of the cluster that you want to update
LOCATION: the compute zone or region of the cluster
SYSTEM_CONFIG_PATH: the path to the file that contains your kubelet and sysctl configurations

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. For more information about node updates, see Planning for node update disruptions.

Editing a node system configuration

To edit a node system configuration, you can create a new node pool with the configuration that you want, or update the node system configuration of an existing node pool.

Editing by creating a node pool

To edit a node system configuration by creating a node pool:

Create a configuration file with the configuration that you want.
Add the configuration to a new node pool.
Migrate your workloads to the new node pool.
Delete the old node pool.

Editing by updating an existing node pool

To edit the node system configuration of an existing node pool, follow the instructions in the Update node pool tab for adding the configuration to a node pool. Updating a node system configuration overrides the node pool's system configuration with the new configuration, which requires recreating the nodes. If you omit any parameters during an update, they are set to their respective defaults.

If you want to reset the node system configuration back to the defaults, update your configuration file with empty values for the kubelet and sysctl. For example:

kubeletConfig: {}
linuxConfig:
  sysctl: {}

Deleting a node system configuration

To remove a node system configuration:

Create a node pool.
Migrate your workloads to the new node pool.
Delete the node pool that has the old node system configuration.

Kubelet configuration options

The following table shows you the kubelet options that you can modify.

Kubelet config settings	Restrictions	Default setting	Description
allowedUnsafeSysctls	List of `sysctl` names or groups. Allowed `sysctl` groups: `kernel.shm`, `kernel.msg`, `kernel.sem`, `fs.mqueue.`, and `net.`. Example: `[kernel.msg*, net.ipv4.route.min_pmtu]`.	`none`	This setting defines a comma-separated allowlist of unsafe `sysctl` names or `sysctl` groups, which can be set on the Pods. Available on GKE versions 1.32.0-gke.1448000 or later.
containerLogMaxSize	Value must be a positive number and a unit suffix between `10Mi` and `500Mi`, inclusive. Valid units are `Ki, Mi, Gi`.	`10Mi`	This setting controls the containerLogMaxSize setting of container log rotation policy, which lets you configure the maximum size for each log file. The default value is `10Mi`.
containerLogMaxFiles	Value must be must be integer between `2` and `10`, inclusive.	`5`	This setting controls the containerLogMaxFiles setting of the container log files rotation policy, which lets you configure the maximum number of files allowed for each container respectively. The default value is `5`. The total log size `(container_log_max_size*container_log_max_files)` per container cannot exceed 1 percent of the total storage of the node.
cpuCFSQuota	Value must be `true` or `false`	`true`	This setting enforces the Pod's CPU limit. Setting this value to `false` means that the CPU limits for Pods are ignored. Ignoring CPU limits might be desirable in certain scenarios where Pods are sensitive to CPU limits. The risk of disabling `cpuCFSQuota` is that a rogue Pod can consume more CPU resources than intended.
cpuCFSQuotaPeriod	Value must be a duration of time	`"100ms"`	This setting sets the CPU CFS quota period value, `cpu.cfs_period_us`, which specifies the period of how often a cgroup's access to CPU resources should be reallocated. This option lets you tune the CPU throttling behavior.
imageGcLowThresholdPercent	Value must be integer between 10 and 85, inclusive, and lower than `imageGcHighThresholdPercent`	`80`	`imageGcLowThresholdPercent` is the percent of disk usage before which image garbage collection is never run. Lowest disk usage to garbage collect to. The percent is calculated by dividing this field value by 100. When specified, the value must be less than `imageGcThresholdPercent`.
imageGcHighThresholdPercent	Value must be integer between 10 and 85, inclusive, and higher than `imageGcLowThresholdPercent`	`85`	`imageGcHighThresholdPercent` is the percent of disk usage above which image garbage collection is run. Highest disk usage to garbage collect to. The percent is calculated by dividing this field value by 100. When specified, the value must be greater than `imageGcLowThresholdPercent`.
imageMinimumGcAge	Value must be a duration of time not greater than '2m'. Valid time units are `"ns", "us" (or "µs"), "ms", "s", "m", "h"`	`2m`	`imageMinimumGcAge` is the minimum age for an unused image before it is garbage collected.
imageMaximumGcAge	Value must be a duration of time	`0s`	`imageMaximumGcAge` is the maximum age an image can be unused before it is garbage collected. The default of this field is "0s", which disables this field. Meaning, images won't be garbage collected based on being unused for too long. When specified, the value must be greater than `imageMinimumGcAge`. imageMaximumGcAge is available on GKE versions 1.30.7-gke.1076000, 1.31.3-gke.1023000 or later
`insecureKubeletReadonlyPortEnabled`	Value must be a boolean value (`true` or `false`)	`true`	This setting disables the insecure kubelet read-only port `10255` on every new node pool in your cluster. If you configure this setting in this file, you can't use a GKE API client to change the setting at the cluster level.
podPidsLimit	Value must be must be between 1024 and 4194304	`none`	This setting sets the maximum number of process IDs (PIDs) that each Pod can use.

Resource Managers

Kubernetes offers a suite of Resource Managers. You can configure these Resource Managers to coordinate and optimize the alignment of node resources for Pods configured with specific requirements for CPUs, devices, and memory (hugepages) resources. For more information, see Node Resource Managers.

With GKE, you can configure the following settings for these Resource Managers. You can configure these settings independently of each other, however, we recommend using these settings together to align resource management. You can use the Topology Manager settings together with the CPU Manager and Memory Manager settings to align CPU and memory with other requested resources in the Pod spec.

Kubelet config settings Restrictions Default setting Description

Kubelet config settings	Restrictions	Default setting	Description
cpuManagerPolicy:	Value must be `none` or `static`	`none`	This setting controls the kubelet's CPU Manager policy. The default value is `none`, which is the default CPU affinity scheme, providing no affinity beyond what the OS scheduler does automatically. Setting this value to `static` allows Pods that are both in the `Guaranteed` QoS class and have integer CPU requests to be assigned exclusive CPUs.
memoryManager: policy:	Value must be `None` or `Static`	`None`	This setting controls the kubelet's Memory Manager policy. With the default value of `None`, Kubernetes acts the same as if the Memory Manager is not present. For details, see None policy. If you set this value to `Static`, the Memory Manager policy sends topology hints that depend on the type of Pod. For details, see Static policy. This setting is supported for clusters with the control plane running GKE version 1.32.3-gke.1785000 or later.
topologyManager: policy: scope:	Value must be one of the supported settings for each of the respective fields	topologyManager.policy default is `none` topoloyManager.scope default is `container`	These settings control the kubelet's Topology Manager policy, which coordinates the set of components responsible for performance optimizations related to CPU isolation, memory, and device locality. You can set the policy and scope settings independently of each other. For more information about these settings, see Topology manager scopes and policies. The following GKE resources support this setting: Clusters with the control plane running GKE version 1.32.3-gke.1785000 or later. For clusters with the control plane and nodes running 1.33.0-gke.1712000 or later, the Topology Manager also receives information about GPU topology. Nodes with the following machine types: A2, A3, G2, A4, C4A.

      cpuManagerPolicy:

Value must be none or static

none

This setting controls the kubelet's CPU Manager policy. The default value is none, which is the default CPU affinity scheme, providing no affinity beyond what the OS scheduler does automatically.

Setting this value to static allows Pods that are both in the Guaranteed QoS class and have integer CPU requests to be assigned exclusive CPUs.

      memoryManager:
        policy:

Value must be None or Static

None

This setting controls the kubelet's Memory Manager policy. With the default value of None, Kubernetes acts the same as if the Memory Manager is not present. For details, see None policy.

If you set this value to Static, the Memory Manager policy sends topology hints that depend on the type of Pod. For details, see Static policy.

This setting is supported for clusters with the control plane running GKE version 1.32.3-gke.1785000 or later.

      topologyManager:
        policy:
        scope:

Value must be one of the supported settings for each of the respective fields

topologyManager.policy default is none
topoloyManager.scope default is container

These settings control the kubelet's Topology Manager policy, which coordinates the set of components responsible for performance optimizations related to CPU isolation, memory, and device locality.

You can set the policy and scope settings independently of each other. For more information about these settings, see Topology manager scopes and policies.

The following GKE resources support this setting:

Clusters with the control plane running GKE version 1.32.3-gke.1785000 or later. For clusters with the control plane and nodes running 1.33.0-gke.1712000 or later, the Topology Manager also receives information about GPU topology.
Nodes with the following machine types: A2, A3, G2, A4, C4A.

The following example shows a node system configuration where all three Resource Manager policies are configured:

cpuManagerPolicy: static
memoryManager:
  policy: Static
topologyManager:
  policy: best-effort
  scope: pod

Sysctl configuration options

To tune the performance of your system, you can modify the following Kernel attributes:

kernel.shmmni
kernel.shmmax
kernel.shmall
net.core.busy_poll
net.core.busy_read
net.core.netdev_max_backlog
net.core.rmem_max
net.core.rmem_default
net.core.wmem_default
net.core.wmem_max
net.core.optmem_max
net.core.somaxconn
net.ipv4.tcp_rmem
net.ipv4.tcp_wmem
net.ipv4.tcp_tw_reuse
net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.default.disable_ipv6
net.netfilter.nf_conntrack_acct - Available on GKE versions 1.32.0-gke.1448000 or later.
net.netfilter.nf_conntrack_max - Available on GKE versions 1.32.0-gke.1448000 or later.
net.netfilter.nf_conntrack_buckets - Available on GKE versions 1.32.0-gke.1448000 or later.
net.netfilter.nf_conntrack_tcp_timeout_close_wait - Available on GKE versions 1.32.0-gke.1448000 or later.
net.netfilter.nf_conntrack_tcp_timeout_established - Available on GKE versions 1.32.0-gke.1448000 or later.
net.netfilter.nf_conntrack_tcp_timeout_time_wait - Available on GKE versions 1.32.0-gke.1448000 or later.
vm.max_map_count

Different Linux namespaces might have unique values for a given sysctl, while others are global for the entire node. Updating sysctl options by using a node system configuration ensures that the sysctl is applied globally on the node and in each namespace, resulting in each Pod having identical sysctl values in each Linux namespace.

Linux cgroup mode configuration options

The kubelet and the container runtime use Linux kernel cgroups for resource management, such as limiting how much CPU or memory each container in a Pod can access. There are two versions of the cgroup subsystem in the kernel: cgroupv1 and cgroupv2. Kubernetes support for cgroupv2 was introduced as alpha in Kubernetes version 1.18, beta in 1.22, and GA in 1.25. For more details, refer to the Kubernetes cgroups v2 documentation.

Node system configuration lets you customize the cgroup configuration of your node pools. You can use cgroupv1 or cgroupv2. GKE uses cgroupv2 for new Standard node pools running version 1.26 and later, and cgroupv1 for versions earlier than 1.26. For node pools created with node auto-provisioning, the cgroup configuration depends on the initial cluster version, not the node pool version. cgroupv1 is not supported on Arm machines.

You can use node system configuration to change the setting for a node pool to use cgroupv1 or cgroupv2 explicitly. Just upgrading an existing node pool to 1.26 doesn't change the setting to cgroupv2, as existing node pools created running a version earlier than 1.26—without a customized cgroup configuration—continue to use cgroupv1 unless you explicitly specify otherwise.

For example, to configure your node pool to use cgroupv2, use a node system configuration file such as:

linuxConfig:
  cgroupMode: 'CGROUP_MODE_V2'

The supported cgroupMode options are:

CGROUP_MODE_V1: Use cgroupv1 on the node pool.
CGROUP_MODE_V2: Use cgroupv2 on the node pool.
CGROUP_MODE_UNSPECIFIED: Use the default GKE cgroup configuration.

To use cgroupv2, the following requirements and limitations apply:

For a node pool running a version earlier than 1.26, you must use gcloud CLI version 408.0.0 or newer. Alternatively, use gcloud beta with version 395.0.0 or newer.
Your cluster and node pools must run GKE version 1.24.2-gke.300 or later.
You must use the Container-Optimized OS with containerd node image.
If any of your workloads depend on reading the cgroup filesystem (/sys/fs/cgroup/...), ensure they are compatible with the cgroupv2 API.
- Ensure any monitoring or third-party tools are compatible with cgroupv2.
If you use JDK (Java workload), we recommend that you use versions which fully support cgroupv2, including JDK 8u372, JDK 11.0.16 or later, or JDK 15 or later.

Verify cgroup configuration

When you add a node system configuration, GKE must recreate the nodes to implement the changes. After you've added the configuration to a node pool and the nodes have been recreated, you can verify the new configuration.

You can verify the cgroup configuration for nodes in a node pool with gcloud CLI or the kubectl command-line tool:

gcloud CLI

Check the cgroup configuration for a node pool:

gcloud container node-pools describe POOL_NAME \
    --format='value(Config.effectiveCgroupMode)'

Replace POOL_NAME with the name of your node pool.

The potential output is one of the following:

EFFECTIVE_CGROUP_MODE_V1: the nodes use cgroupv1
EFFECTIVE_CGROUP_MODE_V2: the nodes use cgroupv2

The output only shows the new cgroup configuration after the nodes in the node pool have been recreated. The output is empty for Windows server node pools, which don't support cgroup.

kubectl

To verify the cgroup configuration for nodes in this node pool with kubectl, pick a node and connect to it using the following instructions:

Create an interactive shell with any node in the node pool. Replace mynode in the command with the name of any node in the node pool.
Identify the cgroup version on Linux nodes.

Linux huge page configuration options

You can use the node system configuration file to use the Linux kernel feature huge pages.

Kubernetes supports huge pages on nodes as a type of resource, similar to CPU or memory. Use the following parameters to instruct your Kubernetes nodes to pre-allocate huge pages for consumption by Pods. To manage your Pods' consumption of huge pages, see Manage HugePages.

To pre-allocate huge pages for your nodes, specify the amounts and sizes. For example, to configure your nodes to allocate three 1-gigabyte-sized huge pages, and 1024 2-megabyte-sized huge pages, use a node system configuration such as the following:

linuxConfig:
  hugepageConfig:
    hugepage_size2m: 1024
    hugepage_size1g: 3

To use huge pages, the following limitations and requirements apply:

To ensure that the node is not fully occupied by huge pages, the allocated huge pages overall size can't exceed 60% of the total memory on machines with less than 30 GB memory, and 80% on machines with more than 30 GB memory. For example, on e2-standard-2 machine with 8 GB of memory, you can't allocate more than 4.8 GB for huge pages. And on c4a-standard-8 with 32 GB of memory, huge pages cannot exceed 25.6 GB.
1 GB huge pages are only available on A3, C2D, C3, C3D, C4, C4A, C4D, CT5E, CT5LP, CT6E, H3, M2, M3, or Z3 machine types.

Customizing node system configuration Stay organized with collections Save and categorize content based on your preferences.

Overview

Using a node system configuration

Creating a configuration file

Adding the configuration to a node pool

Create a cluster with the node system configuration

gcloud CLI

Terraform

Create a new node pool with the node system configuration

gcloud CLI

Terraform

Update the node system configuration of an existing node pool

Editing a node system configuration

Editing by creating a node pool

Editing by updating an existing node pool

Deleting a node system configuration

Kubelet configuration options

Resource Managers

Sysctl configuration options

Linux cgroup mode configuration options

Verify cgroup configuration

gcloud CLI

kubectl

Linux huge page configuration options

What's next

Customizing node system configuration