Using GKE Dataplane V2

This page explains how to enable GKE Dataplane V2 for Google Kubernetes Engine (GKE) clusters.

New Autopilot clusters have GKE Dataplane V2 enabled in versions 1.22.7-gke.1500 and later and versions 1.23.4-gke.1500 and later.

Creating a GKE cluster with GKE Dataplane V2

You can enable GKE Dataplane V2 when you create new clusters with GKE version 1.20.6-gke.700 and later by using the gcloud CLI or the Kubernetes Engine API. You can also enable GKE Dataplane V2 in Preview when you create new clusters with GKE version 1.17.9 and later

Console

To create a new cluster with GKE Dataplane V2, perform the following tasks:

  1. Go to the Google Kubernetes Engine page in the Cloud console.

    Go to Google Kubernetes Engine

  2. Click Create.

  3. Click Configure to configure a Standard cluster.

  4. In the Networking section, select the Enable Dataplane V2 checkbox. The Enable Kubernetes Network Policy option is disabled when you select Enable Dataplane V2 because network policy enforcement is built into GKE Dataplane V2.

  5. Click Create.

gcloud

To create a new cluster with GKE Dataplane V2, use the following command:

gcloud container clusters create CLUSTER_NAME \
    --enable-dataplane-v2 \
    --enable-ip-alias \
    --release-channel CHANNEL_NAME \
    --region COMPUTE_REGION

Replace the following:

  • CLUSTER_NAME: the name of your new cluster.
  • CHANNEL_NAME: a release channel that includes GKE version 1.20.6-gke.700 or later. If you prefer not to use a release channel, you can also use the --version flag instead of --release-channel, specifying version 1.20.6-gke.700 or later.
  • COMPUTE_REGION: the Compute Engine region for the new cluster. For zonal clusters, use --zone=COMPUTE_ZONE.

API

To create a new cluster with GKE Dataplane V2, specify the datapathProvider field in the networkConfig object in your cluster create request.

The following JSON snippet shows the configuration needed to enable GKE Dataplane V2:

"cluster":{
   "initialClusterVersion":"VERSION",
   "ipAllocationPolicy":{
      "useIpAliases":true
   },
   "networkConfig":{
      "datapathProvider":"ADVANCED_DATAPATH"
   },
   "releaseChannel":{
      "channel":"CHANNEL_NAME"
   }
}

Replace the following:

  • VERSION: your cluster version, which must be GKE 1.20.6-gke.700 or later.
  • CHANNEL_NAME: a release channel that includes GKE version 1.20.6-gke.700 or later.

Troubleshooting issues with GKE Dataplane V2

  1. Check the state of the system Pods:

    kubectl -n kube-system get pods -l k8s-app=cilium -o wide
    

    If GKE Dataplane V2 is running, the output includes Pods with the prefix anetd-. anetd is the networking controller for GKE Dataplane V2.

  2. If the issue is with services or network policy enforcement, check the anetd Pod logs:

    kubectl -n kube-system get events --field-selector involvedObject.name=anetd
    kubectl -n kube-system logs -l k8s-app=cilium
    
  3. If Pod creation is failing, check the kubelet logs for clues. You can do this in GKE using ssh:

    gcloud compute ssh NODE -- sudo journalctl -u kubelet
    

    Replace NODE with the name of the VM instance.

Known issues

Network Policy port ranges do not take effect

If you specify an endPort field in a Network Policy on a cluster that has GKE Dataplane V2 enabled, it will not take effect.

Starting in GKE 1.22, the Kubernetes Network Policy API lets you specify a range of ports where the Network Policy is enforced. This API is supported in clusters with Calico Network Policy but is not supported in clusters with GKE Dataplane V2.

You can verify the behavior of your NetworkPolicy objects by reading them back after writing them to the API server. If the object still contains the endPort field, the feature is enforced. If the endPort field is missing, the feature is not enforced. In all cases, the object stored in the API server is the source of truth for the Network Policy.

For more information see KEP-2079: Network Policy to support Port Ranges.

Pods display failed to allocate for range 0: no IP addresses available in range set error message

Affected GKE versions: 1.18 and later

GKE clusters running node pools that use containerd and have GKE Dataplane V2 enabled might experience IP address leak issues and exhaust all the Pod IP addresses on a node. A Pod scheduled on an affected node displays an error message similar to the following:

failed to allocate for range 0: no IP addresses available in range set: 10.48.131.1-10.48.131.62

For more information about the issue, see containerd issue #5768.

To fix this issue, upgrade your cluster to one of the following GKE versions:

  • 1.23.4-gke.1600 or later.
  • 1.22.8-gke.200 or later.
  • 1.21.11-gke.1100 or later.
  • 1.20.15-gke.5200 or later.

Workarounds

You can mitigate this issue by deleting the leaked Pod IP addresses for the node.

To delete the leaked Pod IP addresses, get authentication credentials for the cluster and perform the following steps:

  1. Save the following manifest as a shell script named cleanup.sh:

    for hash in $(sudo find /var/lib/cni/networks/gke-pod-network -iregex '/var/lib/cni/networks/gke-pod-network/[0-9].*' -exec head -n1 {} \;); do if [ -z $(sudo ctr -n k8s.io c ls | grep $hash | awk '{print $1}') ]; then sudo grep -ilr $hash /var/lib/cni/networks/gke-pod-network; fi; done | sudo xargs rm
    
    sudo systemctl restart kubelet containerd;
    
  2. Run the script on all cluster nodes that could be impacted:

    for node in `kubectl get nodes -o wide | grep Ready | awk '{print $1}' | sort -u`; do gcloud compute ssh --zone "ZONE" --project "PROJECT" $node --command "$(cat cleanup.sh)"; done
    

Network Policy drops a connection due to incorrect connection tracking lookup

When a client Pod connects to itself via a Service or the virtual IP address of a internal TCP/UDP load balancer, the reply packet is not identified as a part of an existing connection due to incorrect conntrack lookup in the dataplane. This means that a Network Policy that restricts ingress traffic for the pod is incorrectly enforced on the packet.

The impact of this issue depends on the number of configured Pods for the Service. For example, if the Service has 1 backend Pod, the connection always fails. If the Service has 2 backend Pods, the connection fails 50% of the time.

Workarounds

You can mitigate this issue by configuring the port and containerPort in the Service manifest to be the same value.

What's next