This document shows you how to customize your Google Kubernetes Engine (GKE) node configuration using a configuration file called a node system configuration.
Overview
You can customize your node configuration by using various methods. For example, you can specify parameters such as the machine type and minimum CPU platform when you create a node pool.
A node system configuration is a configuration file that provides a
way to adjust a limited set of system settings. You can use a node system
configuration to specify custom settings for the Kubernetes node agent (
kubelet
)
and low-level Linux kernel configurations
(sysctl
) in your node pools.
You can also customize your containerd container runtime on your GKE nodes by using a different file called a runtime configuration file. For instructions, see Customize containerd configuration in GKE nodes.
You can also use DaemonSets to customize nodes, such as in Automatically bootstrapping GKE nodes with DaemonSets.
Using a node system configuration
To use a node system configuration:
- Create a configuration file. This file contains your
kubelet
andsysctl
configurations. - Add the configuration when you create a cluster, or when you create or update a node pool.
Creating a configuration file
Write your node system configuration file in YAML. The following example shows you
how to add configurations for the kubelet
and sysctl
options:
kubeletConfig:
cpuManagerPolicy: static
linuxConfig:
sysctl:
net.core.somaxconn: '2048'
net.ipv4.tcp_rmem: '4096 87380 6291456'
In this example:
cpuManagerPolicy: static
configures thekubelet
to use the static CPU management policy.net.core.somaxconn: '2048'
limits thesocket listen()
backlog to 2,048 bytes.net.ipv4.tcp_rmem: '4096 87380 6291456'
sets the minimum, default, and maximum value of the TCP socket receive buffer to 4,096 bytes, 87,380 bytes, and 6,291,456 bytes respectively.
If you want to add configurations solely for the kubelet
or sysctl
, only
include that section in your configuration file. For example, to add a kubelet
configuration, create the following file:
kubeletConfig:
cpuManagerPolicy: static
For a complete list of the fields that you can add to your configuration file, see the Kubelet configuration options and Sysctl configuration options sections.
Adding the configuration to a node pool
After you have created the configuration file, add the
--system-config-from-file
flag by using the Google Cloud CLI. You can add this flag when you create a
cluster, or when you create or update a node pool. You cannot add a node system
configuration with the Google Cloud console.
To add a node system configuration, run the following command:
Create cluster
gcloud container clusters create CLUSTER_NAME \
--location=LOCATION \
--system-config-from-file=SYSTEM_CONFIG_PATH
Replace the following:
CLUSTER_NAME
: the name for your clusterLOCATION
: the compute zone or region of the clusterSYSTEM_CONFIG_PATH
: the path to the file that contains yourkubelet
andsysctl
configurations
After you have applied a node system configuration, the default node pool of the cluster uses the settings that you defined.
Create node pool
gcloud container node-pools create POOL_NAME \
--cluster CLUSTER_NAME \
--location=LOCATION \
--system-config-from-file=SYSTEM_CONFIG_PATH
``` Replace the following:
POOL_NAME
: the name for your node poolCLUSTER_NAME
: the name of the cluster that you want to add a node pool toLOCATION
: the compute zone or region of the clusterSYSTEM_CONFIG_PATH
: the path to the file that contains yourkubelet
andsysctl
configurations
Update node pool
gcloud container node-pools update POOL_NAME \
--cluster=CLUSTER_NAME \
--location=LOCATION \
--system-config-from-file=SYSTEM_CONFIG_PATH
Replace the following:
POOL_NAME
: the name of the node pool that you want to updateCLUSTER_NAME
: the name of the cluster that you want to updateLOCATION
: the compute zone or region of the clusterSYSTEM_CONFIG_PATH
: the path to the file that contains yourkubelet
andsysctl
configurations
This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.
Editing a node system configuration
To edit a node system configuration, you can create a new node pool with the configuration that you want, or update the node system configuration of an existing node pool.
Editing by creating a node pool
To edit a node system configuration by creating a node pool:
- Create a configuration file with the configuration that you want.
- Add the configuration to a new node pool.
- Migrate your workloads to the new node pool.
- Delete the old node pool.
Editing by updating an existing node pool
To edit the node system configuration of an existing node pool, follow the instructions in the Update node pool tab for adding the configuration to a node pool. Updating a node system configuration overrides the node pool's system configuration with the new configuration, which requires recreating the nodes. If you omit any parameters during an update, they are set to their respective defaults.
If you want to reset the node system configuration back to the defaults, update
your configuration file with empty values for the kubelet
and sysctl
. For
example:
kubeletConfig: {}
linuxConfig:
sysctl: {}
Deleting a node system configuration
To remove a node system configuration:
- Create a node pool.
- Migrate your workloads to the new node pool.
- Delete the node pool that has the old node system configuration.
Kubelet configuration options
The following table shows you the kubelet
options that you can modify.
Kubelet config settings | Restrictions | Default setting | Description |
---|---|---|---|
cpuManagerPolicy |
Value must be none or static
|
none
|
This setting controls the kubelet's
CPU
Manager Policy. The default value is none which is the
default CPU affinity scheme, providing no affinity beyond what the OS
scheduler does automatically.Setting this value to static allows Pods in the Guaranteed QoS class with
integer CPU requests to be assigned exclusive use of CPUs. |
cpuCFSQuota |
Value must be true or false
|
true
|
This setting enforces the
Pod's CPU limit. Setting this value to false means that
the CPU limits for Pods are ignored.Ignoring CPU limits might be desirable in certain scenarios where Pods are sensitive to CPU limits. The risk of disabling cpuCFSQuota is
that a rogue Pod can consume more CPU resources than intended.
|
cpuCFSQuotaPeriod | Value must be a duration of time |
"100ms"
|
This setting sets the CPU CFS quota period value, cpu.cfs_period_us ,
which specifies the period of how often a cgroup's access to CPU resources
should be reallocated. This option lets you tune the CPU throttling behavior. |
insecureKubeletReadonlyPortEnabled |
Value must be a boolean value (true or false ) |
true |
This setting disables the insecure kubelet read-only port 10255
on every new node pool in your cluster. If you configure this setting in this
file, you can't use a GKE API client to change the setting at
the cluster level. |
podPidsLimit | Value must be must be between 1024 and 4194304 |
none
|
This setting sets the maximum number of process IDs (PIDs) that each Pod can use. |
Sysctl configuration options
To tune the performance of your system, you can modify the following Kernel attributes:
kernel.shmmni
kernel.shmmax
kernel.shmall
net.core.busy_poll
net.core.busy_read
net.core.netdev_max_backlog
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max
net.core.optmem_max
net.core.somaxconn
net.ipv4.tcp_rmem
net.ipv4.tcp_wmem
net.ipv4.tcp_tw_reuse
net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.default.disable_ipv6
vm.max_map_count
Different Linux namespaces
might have unique values for a given sysctl
, while others are global for the
entire node. Updating sysctl
options by using a node system configuration
ensures that the sysctl
is applied globally on the node and in each namespace,
resulting in each Pod having identical sysctl
values in each Linux namespace.
Linux cgroup mode configuration options
The kubelet and the container runtime use Linux kernel
cgroups for resource management, such
as limiting how much CPU or memory each container in a Pod can access. There are
two versions of the cgroup subsystem in the kernel: cgroupv1
and cgroupv2
.
Kubernetes support for cgroupv2
was introduced as alpha in Kubernetes version 1.18,
beta in 1.22, and GA in 1.25. For more details, refer to the Kubernetes
cgroups v2
documentation.
Node system configuration lets you customize the cgroup configuration of your
node pools. You can use cgroupv1
or cgroupv2
. GKE uses
cgroupv2
for new Standard node pools running version 1.26 and later,
and cgroupv1
for versions earlier than 1.26. For node pools created with node
auto-provisioning, the cgroup configuration depends on the initial cluster
version, not the node pool version.
You can use node system configuration to change the setting for a node pool to
use cgroupv1
or cgroupv2
explicitly. Just upgrading an existing node pool to
1.26 doesn't change the setting to cgroupv2
, as existing node pools created
running a version earlier than 1.26—without a customized cgroup
configuration—continue to use cgroupv1
unless you explicitly specify
otherwise.
For example, to configure your node pool to use cgroupv2
, use a node system
configuration file such as:
linuxConfig:
cgroupMode: 'CGROUP_MODE_V2'
The supported cgroupMode
options are:
CGROUP_MODE_V1
: Usecgroupv1
on the node pool.CGROUP_MODE_V2
: Usecgroupv2
on the node pool.CGROUP_MODE_UNSPECIFIED
: Use the default GKE cgroup configuration.
To use cgroupv2
, the following requirements and limitations apply:
- For a node pool running a version earlier than 1.26, you must use gcloud CLI version 408.0.0 or newer. Alternatively, use gcloud beta with version 395.0.0 or newer.
- Your cluster and node pools must run GKE version 1.24.2-gke.300 or later.
- You must use the Container-Optimized OS with containerd node image.
- If any of your workloads depend on reading the cgroup filesystem
(
/sys/fs/cgroup/...
), ensure they are compatible with thecgroupv2
API.- Ensure any monitoring or third-party tools are compatible with
cgroupv2
.
- Ensure any monitoring or third-party tools are compatible with
- If you use JDK (Java workload), we recommend that you use versions which
fully support cgroupv2,
including JDK
8u372
, JDK 11.0.16 or later, or JDK 15 or later.
Verify cgroup configuration
When you add a node system configuration, GKE must recreate the nodes to implement the changes. After you've added the configuration to a node pool and the nodes have been recreated, you can verify the new configuration.
You can verify the cgroup configuration for nodes in a node pool with
gcloud CLI or the kubectl
command-line tool:
gcloud CLI
Check the cgroup configuration for a node pool:
gcloud container node-pools describe POOL_NAME \
--format='value(Config.effectiveCgroupMode)'
Replace POOL_NAME
with the name of your node pool.
The potential output is one of the following:
EFFECTIVE_CGROUP_MODE_V1
: the nodes usecgroupv1
EFFECTIVE_CGROUP_MODE_V2
: the nodes usecgroupv2
The output only shows the new cgroup configuration after the nodes in the node pool have been recreated. The output is empty for Windows server node pools, which don't support cgroup.
kubectl
To verify the cgroup configuration for nodes in this node pool with kubectl
,
pick a node and connect to it using the following instructions:
- Create an interactive
shell
with any node in the node pool. Replace
mynode
in the command with the name of any node in the node pool. - Identify the cgroup version on Linux nodes.
Linux huge page configuration options
You can use the node system configuration file to use the Linux kernel feature huge pages.
Kubernetes supports huge pages on nodes as a type of resource, similar to CPU or memory. Use the following parameters to instruct your Kubernetes nodes to pre-allocate huge pages for consumption by Pods. To manage your Pods' consumption of huge pages, see Manage HugePages.
To pre-allocate huge pages for your nodes, specify the amounts and sizes. For example, to configure your nodes to allocate three 1-gigabyte-sized huge pages, and 1024 2-megabyte-sized huge pages, use a node system configuration such as the following:
linuxConfig:
hugepageConfig:
hugepage_size2m: 1024
hugepage_size1g: 3
To use huge pages, the following limitations and requirements apply:
- To ensure that the node is not fully occupied by huge pages, the allocated huge pages overall size can't exceed 60% of the total memory. For example, with an e2-standard-2 machine, which has 8 GB of memory, you can't allocate more than 4.8GB for huge pages.
- 1-gigabtye-sized huge pages are only available on the A3, C2D, C3, C3A, C3D, C4, CT5E, CT5L, CT5LP, CT6E, H3, M2, M3, or Z3 machine types.