Customizing node system configuration


This document shows you how to customize your Google Kubernetes Engine (GKE) node configuration using a configuration file called a node system configuration.

Overview

You can customize your node configuration by using various methods. For example, you can specify parameters such as the machine type and minimum CPU platform when you create a node pool.

A node system configuration is a configuration file that provides a way to adjust a limited set of system settings. You can use a node system configuration to specify custom settings for the Kubernetes node agent ( kubelet) and low-level Linux kernel configurations (sysctl) in your node pools.

You can also customize your containerd container runtime on your GKE nodes by using a different file called a runtime configuration file. For instructions, see Customize containerd configuration in GKE nodes.

You can also use DaemonSets to customize nodes, such as in Automatically bootstrapping GKE nodes with DaemonSets.

Using a node system configuration

To use a node system configuration:

  1. Create a configuration file. This file contains your kubelet and sysctl configurations.
  2. Add the configuration when you create a cluster, or when you create or update a node pool.

Creating a configuration file

Write your node system configuration file in YAML. The following example shows you how to add configurations for the kubelet and sysctl options:

kubeletConfig:
  cpuManagerPolicy: static
linuxConfig:
 sysctl:
   net.core.somaxconn: '2048'
   net.ipv4.tcp_rmem: '4096 87380 6291456'

In this example:

  • cpuManagerPolicy: static configures the kubelet to use the static CPU management policy.
  • net.core.somaxconn: '2048' limits the socket listen() backlog to 2,048 bytes.
  • net.ipv4.tcp_rmem: '4096 87380 6291456' sets the minimum, default, and maximum value of the TCP socket receive buffer to 4,096 bytes, 87,380 bytes, and 6,291,456 bytes respectively.

If you want to add configurations solely for the kubelet or sysctl, only include that section in your configuration file. For example, to add a kubelet configuration, create the following file:

kubeletConfig:
  cpuManagerPolicy: static

For a complete list of the fields that you can add to your configuration file, see the Kubelet configuration options and Sysctl configuration options sections.

Adding the configuration to a node pool

After you have created the configuration file, add the --system-config-from-file flag by using the Google Cloud CLI. You can add this flag when you create a cluster, or when you create or update a node pool. You cannot add a node system configuration with the Google Cloud console.

To add a node system configuration, run the following command:

Create cluster

gcloud container clusters create CLUSTER_NAME \
    --location=LOCATION \
    --system-config-from-file=SYSTEM_CONFIG_PATH

Replace the following:

  • CLUSTER_NAME: the name for your cluster
  • LOCATION: the compute zone or region of the cluster
  • SYSTEM_CONFIG_PATH: the path to the file that contains your kubelet and sysctl configurations

After you have applied a node system configuration, the default node pool of the cluster uses the settings that you defined.

Create node pool

gcloud container node-pools create POOL_NAME \
     --cluster CLUSTER_NAME \
     --location=LOCATION \
     --system-config-from-file=SYSTEM_CONFIG_PATH

``` Replace the following:
  • POOL_NAME: the name for your node pool
  • CLUSTER_NAME: the name of the cluster that you want to add a node pool to
  • LOCATION: the compute zone or region of the cluster
  • SYSTEM_CONFIG_PATH: the path to the file that contains your kubelet and sysctl configurations

Update node pool

gcloud container node-pools update POOL_NAME \
    --cluster=CLUSTER_NAME \
    --location=LOCATION \
    --system-config-from-file=SYSTEM_CONFIG_PATH

Replace the following:

  • POOL_NAME: the name of the node pool that you want to update
  • CLUSTER_NAME: the name of the cluster that you want to update
  • LOCATION: the compute zone or region of the cluster
  • SYSTEM_CONFIG_PATH: the path to the file that contains your kubelet and sysctl configurations

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.

Editing a node system configuration

To edit a node system configuration, you can create a new node pool with the configuration that you want, or update the node system configuration of an existing node pool.

Editing by creating a node pool

To edit a node system configuration by creating a node pool:

  1. Create a configuration file with the configuration that you want.
  2. Add the configuration to a new node pool.
  3. Migrate your workloads to the new node pool.
  4. Delete the old node pool.

Editing by updating an existing node pool

To edit the node system configuration of an existing node pool, follow the instructions in the Update node pool tab for adding the configuration to a node pool. Updating a node system configuration overrides the node pool's system configuration with the new configuration, which requires recreating the nodes. If you omit any parameters during an update, they are set to their respective defaults.

If you want to reset the node system configuration back to the defaults, update your configuration file with empty values for the kubelet and sysctl. For example:

kubeletConfig: {}
linuxConfig:
  sysctl: {}

Deleting a node system configuration

To remove a node system configuration:

  1. Create a node pool.
  2. Migrate your workloads to the new node pool.
  3. Delete the node pool that has the old node system configuration.

Kubelet configuration options

The following table shows you the kubelet options that you can modify.

Kubelet config settings Restrictions Default setting Description
cpuManagerPolicy Value must be none or static none This setting controls the kubelet's CPU Manager Policy. The default value is none which is the default CPU affinity scheme, providing no affinity beyond what the OS scheduler does automatically.

Setting this value to static allows Pods in the Guaranteed QoS class with integer CPU requests to be assigned exclusive use of CPUs.
cpuCFSQuota Value must be true or false true This setting enforces the Pod's CPU limit. Setting this value to false means that the CPU limits for Pods are ignored.

Ignoring CPU limits might be desirable in certain scenarios where Pods are sensitive to CPU limits. The risk of disabling cpuCFSQuota is that a rogue Pod can consume more CPU resources than intended.
cpuCFSQuotaPeriod Value must be a duration of time "100ms" This setting sets the CPU CFS quota period value, cpu.cfs_period_us, which specifies the period of how often a cgroup's access to CPU resources should be reallocated. This option lets you tune the CPU throttling behavior.
insecureKubeletReadonlyPortEnabled Value must be a boolean value (true or false) true This setting disables the insecure kubelet read-only port 10255 on every new node pool in your cluster. If you configure this setting in this file, you can't use a GKE API client to change the setting at the cluster level.
podPidsLimit Value must be must be between 1024 and 4194304 none This setting sets the maximum number of process IDs (PIDs) that each Pod can use.

Sysctl configuration options

To tune the performance of your system, you can modify the following Kernel attributes:

Different Linux namespaces might have unique values for a given sysctl, while others are global for the entire node. Updating sysctl options by using a node system configuration ensures that the sysctl is applied globally on the node and in each namespace, resulting in each Pod having identical sysctl values in each Linux namespace.

Linux cgroup mode configuration options

The kubelet and the container runtime use Linux kernel cgroups for resource management, such as limiting how much CPU or memory each container in a Pod can access. There are two versions of the cgroup subsystem in the kernel: cgroupv1 and cgroupv2. Kubernetes support for cgroupv2 was introduced as alpha in Kubernetes version 1.18, beta in 1.22, and GA in 1.25. For more details, refer to the Kubernetes cgroups v2 documentation.

Node system configuration lets you customize the cgroup configuration of your node pools. You can use cgroupv1 or cgroupv2. GKE uses cgroupv2 for new Standard node pools running version 1.26 and later, and cgroupv1 for versions earlier than 1.26. For node pools created with node auto-provisioning, the cgroup configuration depends on the initial cluster version, not the node pool version.

You can use node system configuration to change the setting for a node pool to use cgroupv1 or cgroupv2 explicitly. Just upgrading an existing node pool to 1.26 doesn't change the setting to cgroupv2, as existing node pools created running a version earlier than 1.26—without a customized cgroup configuration—continue to use cgroupv1 unless you explicitly specify otherwise.

For example, to configure your node pool to use cgroupv2, use a node system configuration file such as:

linuxConfig:
  cgroupMode: 'CGROUP_MODE_V2'

The supported cgroupMode options are:

  • CGROUP_MODE_V1: Use cgroupv1 on the node pool.
  • CGROUP_MODE_V2: Use cgroupv2 on the node pool.
  • CGROUP_MODE_UNSPECIFIED: Use the default GKE cgroup configuration.

To use cgroupv2, the following requirements and limitations apply:

  • For a node pool running a version earlier than 1.26, you must use gcloud CLI version 408.0.0 or newer. Alternatively, use gcloud beta with version 395.0.0 or newer.
  • Your cluster and node pools must run GKE version 1.24.2-gke.300 or later.
  • You must use the Container-Optimized OS with containerd node image.
  • If any of your workloads depend on reading the cgroup filesystem (/sys/fs/cgroup/...), ensure they are compatible with the cgroupv2 API.
    • Ensure any monitoring or third-party tools are compatible with cgroupv2.
  • If you use JDK (Java workload), we recommend that you use versions which fully support cgroupv2, including JDK 8u372, JDK 11.0.16 or later, or JDK 15 or later.

Verify cgroup configuration

When you add a node system configuration, GKE must recreate the nodes to implement the changes. After you've added the configuration to a node pool and the nodes have been recreated, you can verify the new configuration.

You can verify the cgroup configuration for nodes in a node pool with gcloud CLI or the kubectl command-line tool:

gcloud CLI

Check the cgroup configuration for a node pool:

gcloud container node-pools describe POOL_NAME \
    --format='value(Config.effectiveCgroupMode)'

Replace POOL_NAME with the name of your node pool.

The potential output is one of the following:

  • EFFECTIVE_CGROUP_MODE_V1: the nodes use cgroupv1
  • EFFECTIVE_CGROUP_MODE_V2: the nodes use cgroupv2

The output only shows the new cgroup configuration after the nodes in the node pool have been recreated. The output is empty for Windows server node pools, which don't support cgroup.

kubectl

To verify the cgroup configuration for nodes in this node pool with kubectl, pick a node and connect to it using the following instructions:

  1. Create an interactive shell with any node in the node pool. Replace mynode in the command with the name of any node in the node pool.
  2. Identify the cgroup version on Linux nodes.

Linux huge page configuration options

You can use the node system configuration file to use the Linux kernel feature huge pages.

Kubernetes supports huge pages on nodes as a type of resource, similar to CPU or memory. Use the following parameters to instruct your Kubernetes nodes to pre-allocate huge pages for consumption by Pods. To manage your Pods' consumption of huge pages, see Manage HugePages.

To pre-allocate huge pages for your nodes, specify the amounts and sizes. For example, to configure your nodes to allocate three 1-gigabyte-sized huge pages, and 1024 2-megabyte-sized huge pages, use a node system configuration such as the following:

linuxConfig:
  hugepageConfig:
    hugepage_size2m: 1024
    hugepage_size1g: 3

To use huge pages, the following limitations and requirements apply:

  • To ensure that the node is not fully occupied by huge pages, the allocated huge pages overall size can't exceed 60% of the total memory. For example, with an e2-standard-2 machine, which has 8 GB of memory, you can't allocate more than 4.8GB for huge pages.
  • 1-gigabtye-sized huge pages are only available on the A3, C2D, C3, C3A, C3D, C4, CT5E, CT5L, CT5LP, CT6E, H3, M2, M3, or Z3 machine types.

What's next