Understand the ClusterCIDRConfig custom resource

Overview

The ClusterCIDRConfig is a custom CIDR allocator resource that enables you to allocate more IP addresses ranges for Pods dynamically.

IP Address Management (IPAM) enables efficient use of IP subnets and avoids having overlaps in address ranges, which prevents network conflicts and outages. Kubernetes assigns Pod CIDRs per node, which are used as IP addresses for the Pods running on that node.

Current Kubernetes NodeIPAM has following limitations:

  • All Pod CIDRs are allocated from one cluster CIDR. You have to specify the entire IP address range accounting for the largest cluster at the time of cluster creation. This limitation can waste IP addresses.

  • If you increase the cluster size, it is difficult to add more IP addresses.

  • The cluster CIDR is one large range. It may be difficult to find a contiguous block of IP addresses that satisfy the needs of the cluster.

  • Each node gets a fixed-size IP range within a cluster. If nodes are of different sizes and capacity, you can’t allocate a bigger Pod range to a given node with larger capacity and a smaller range to nodes with lesser capacity. This wastes a lot of IP addresses. For a large cluster with many nodes, this waste gets compounded across all the nodes in the cluster.

With ClusterCIDRConfig functionality, you can avoid assigning a large CIDR block to a cluster, map your cluster size to the scale of your Pods, and therefore preserve IP addresses. You can save IP addresses by using ClusterCIDRConfigs with different combinations of the CIDR and perNodeMaskSize. ClusterCIDRConfig resource supports the following:

  • Multiple discontiguous IP CIDR blocks for cluster CIDR at a more granular level

  • Node affinity of CIDR blocks

  • Different block sizes allocated to nodes

GKE on Bare Metal uses the ClusterCIDRConfig functionality in the following features:

Cluster.spec.clusterNetwork.pods.cidrBlocks is an optional field and isn't defined by default. You must define it if any of the features from the preceding list doesn't have it defined. For example, it is required when the clusters are created in IPv4 island mode and must be specified as it is used as a native routing CIDR.

The following table lists the use of ClusterCIDRConfig's Cluster.spec.clusterNetwork.pods.cidrBlocks field behavior for different network modes.

Network Mode ClusterCIDRConfig Value
IPv4 Island (default) (Required field) Specify Cluster.spec.clusterNetwork.pods.cidrBlocks.
IPv4 Flat (default) Cluster.spec.clusterNetwork.pods.cidrBlocks are completely ignored, and can be skipped. Users have to explicitly define ClusterCIDRConfigs (per-node, per-nodepool and/or per-cluster).
Dual-stack (IPv4 Island, IPv4 Flat )

Specify the IPv4 CIDR.

Do not specify IPv6 CIDR in Cluster.spec.clusterNetwork.pods.cidrBlocks.

Specify ClusterCIDRConfigs with both IPv4 and IPv6 CIDRs. IPv4 CIDR configured in all the ClusterCIDRConfigs must be the same as the IPv4 CIDR from Cluster.spec.clusterNetwork.pods.cidrBlocks including the PerNodeMask value for IPv4. For more information on ClusterCIDRConfig and examples on using it, see Examples: Dualstack (IPv4 island, IPv6 Flat)

Dual-stack (Flat IPv4, Flat IPv6) You can skip Cluster.spec.clusterNetwork.pods.cidrBlocks as these are completely ignored. You must explicitly define ClusterCIDRConfigs (per-node, per-nodepool, and/or per-cluster) with both IPv4 and IPv6 CIDRs.

Configuring the ClusterCIDRConfig custom CIDR allocator resource

ClusterCIDRConfig

When you configure the ClusterCIDRConfig custom CIDR allocator resource, consider the following points:

  • Pod CIDR assignment from a particular ClusterCIDRConfig to a node is based on label selectors. This is similar to the nodeSelector mechanism used for scheduling Pods on a node.

  • You must configure the ClusterCIDRConfig during the cluster creation process in the cluster configuration YAML file. Once you specify ClusterCIDRConfigs, you cannot modify the values later.

  • You can specify multiple ClusterCIDRConfigs with overlapping CIDRs.

  • If no matching ClusterCIDRConfig is found for a node, the node remains in a NotReady state, until a ClusterCIDRConfig with matching labels is created.

  • If the best match ClusterCIDRConfig does not have more CIDRs available for allocation, the next best CIDR is chosen and the Pod CIDRs are allocated from the available CIDRs.

  • In case of dual-stack model, if you want to assign dual-stack Pod CIDRs to the nodes, do the following:

    • Configure both IPv4 and IPv6 CIDRs in the ClusterCIDRConfig.

    • Ensure that all ClusterCIDRConfig have DualStack CIDRs, if multiple ClusterCIDRConfig are configured.

    • Ensure that both IPv4 and IPv6 CIDRs configured have an equal number of allocatable IP addresses per node.

    For example, 32 - spec.IPv4.PerNodeMaskSize == 128 - spec.IPv6.PerNodeMaskSize

    spec.IPv4.PerNodeMaskSize = 24

    spec.IPv6.PerNodeMaskSize = 120

    Thus, 32 - 24 == 128 - 120, as the difference is 8.

  • Multiple ClusterCIDRConfigs can match the labels from the nodeSelector to node labels.

ClusterCIDRConfig assignment rules

To determine which ClusterCIDRConfig is used to assign Pod CIDRs to the current node, use the following tie-breaking rules. Implement these rules in the given order. Implement the next rule only if the tie is not broken by the preceding rule.

  1. Pick the ClusterCIDRConfig whose NodeSelector matches the most labels on the Node. For example, {'node.kubernetes.io/instance-type':'medium', 'rack': 'rack1'} (Match Count: 2) is picked before {'node.kubernetes.io/instance-type': 'medium'}. (Match Count: 1).

  2. Pick the ClusterCIDRConfig with the fewest allocatable Pod CIDRs. For example, {CIDR: "10.0.0.0/16", PerNodeMaskSize: "16"} (1 possible Pod CIDR) is picked before {CIDR: "192.168.0.0/20", PerNodeMaskSize: "22"} (4 possible Pod CIDRs).

  3. Pick the ClusterCIDRConfig whose PerNodeMaskSize has the fewest IP addresses. For example, 27 (2^(32-27)= 32 IP addresses) picked before 25 (2^(32-25)=128 IP addresses).

  4. Pick the ClusterCIDRConfig whose matching NodeSelector label has a lower alphanumeric value. For example, {'kubernetes.io/hostname': 'node-1'} is chosen over {'node.kubernetes.io/instance-type':'medium'}.

  5. Pick the ClusterCIDRConfig whose CIDR IP has a lower value. Irrespective of whether the config is an IPv4 config or a DualStack config, only the IPv4 CIDRs are compared. For example, {CIDR: "10.0.0.0/16"} is picked over {CIDR: "192.168.0.0/16"}.

Configuration examples

This section lists configuration examples for Cluster and ClusterCIDRConfig for all the networking modes.