GKE Dataplane V2


This page gives an overview of what GKE Dataplane V2 does and how it works.

Before you read this page, you should understand networking inside GKE clusters.

Overview of GKE Dataplane V2

GKE Dataplane V2 is a dataplane for GKE and Anthos clusters that is optimized for Kubernetes networking. GKE Dataplane V2 provides:

  • A consistent user experience for networking in GKE and all Anthos clusters environments. See Availability of GKE Dataplane V2 for information about the environments that support GKE Dataplane V2.
  • Real-time visibility of network activity.
  • Simpler architecture that makes it easier to manage and troubleshoot clusters.

GKE Dataplane V2 is enabled for all new Autopilot clusters in versions 1.22.7-gke.1500 and later, 1.23.4-gke.1500 and later, and all versions 1.24 and later.

How GKE Dataplane V2 works

GKE Dataplane V2 is implemented using eBPF. As packets arrive at a GKE node, eBPF programs installed in the kernel decide how to route and process the packets. Unlike packet processing with iptables, eBPF programs can use Kubernetes-specific metadata in the packet. This lets GKE Dataplane V2 process network packets in the kernel more efficiently and report annotated actions back to user space for logging.

The following diagram shows the path of a packet through a node using GKE Dataplane V2:

GKE deploys the GKE Dataplane V2 controller as a DaemonSet named anetd to each node in the cluster. anetd interprets Kubernetes objects and programs network topologies in eBPF. The anetd Pods run in the kube-system namespace.

GKE Dataplane V2 and NetworkPolicy

GKE Dataplane V2 is implemented using Cilium. The legacy dataplane for GKE is implemented using Calico.

Both of these technologies manage Kubernetes NetworkPolicy. Cilium uses eBPF and the Calico Container Network Interface (CNI) uses iptables functionality in the Linux kernel.

Advantages of GKE Dataplane V2

Scalability

GKE Dataplane V2 has different scalability characteristics than legacy data plane.

For GKE versions where the GKE Dataplane V2 does not use kube-proxy and does not rely on iptables for service routing, GKE removes some iptables related bottlenecks, such as the number of Services.

GKE Dataplane V2 relies on eBPF maps that are currently limited to 64,000 endpoints across all services.

Security

Kubernetes NetworkPolicy is always on in clusters with GKE Dataplane V2. You don't have to install and manage third-party software add-ons such as Calico to enforce network policy.

Operations

When you create a cluster with GKE Dataplane V2, network policy logging is built in. Configure the logging CRD on your cluster to see when connections are allowed and denied by your Pods.

Consistency

GKE Dataplane V2 is available and provides the same features on GKE and on other Anthos clusters environments. For more information, see Availability of GKE Dataplane V2.

GKE Dataplane V2 technical specifications

GKE Dataplane V2 supports clusters with the following specifications:

Specification GKE Anthos clusters on VMware Anthos on bare metal
Number of nodes per cluster 5000* 500 500
Number of Pods per cluster 50,000 15,000 27,500
Number of LoadBalancer services per cluster 750 500 1,000

GKE Dataplane V2 maintains a service map to keep track of which services refer to which Pods as their backends. The number of Pod backends for each service summed across all services must all fit into the service map, which can contain up to 64,000 entries. If this limit is exceeded your cluster may not work as intended.

* Starting in Kubernetes versions 1.23, the 500 node per Dataplane v2 cluster limit has been raised to 5000, with the following additional conditions imposed on clusters:

  • Private clusters or public clusters that use Private Service Connect. To check if your cluster uses Private Service Connect, see Public clusters with Private Service Connect.
  • Regional clusters only
  • Only clusters that were created with GKE version 1.23 or later have a raised 5000 node limit. Clusters that were created with earlier GKE versions might require lifting a cluster size quota. Contact support for assistance.
  • Clusters that use Cilium CRDs (CiliumNetworkPolicy and CiliumClusterwideNetworkPolicy) cannot scale to 5000 nodes.

The number of LoadBalancer services supported in Anthos clusters on VMware depends on the load balancer mode being used. 500 LoadBalancer services are supported on Anthos clusters on VMware when using bundled load balancing mode (Seesaw) and 250 are supported when using integrated load balancing mode with F5. See Scalability for more information.

Limitations

The following limitations apply in GKE, Anthos clusters on VMware, and all other environments:

  • GKE Dataplane V2 can only be enabled when creating a new cluster. Existing clusters cannot be upgraded to use GKE Dataplane V2.
  • If you enable GKE Dataplane V2 with NodeLocal DNSCache, you cannot configure Pods with dnsPolicy: ClusterFirstWithHostNet, or your Pods will experience DNS resolution errors. This limitation was lifted starting with 1.20.12-gke.500 (Stable).
  • Starting in GKE version 1.21.5-gke.1300, GKE Dataplane V2 does not support CiliumNetworkPolicy or CiliumClusterwideNetworkPolicy CRD APIs.
  • Manually created internal TCP/UDP load balancers associated with a Service of type NodePort are not supported.
  • There is a known issue with multi-cluster Services with multiple (TCP/UDP) ports on GKE Dataplane V2. For more information, see MCS Services with multiple ports.
  • GKE Dataplane V2 uses cilium instead of kube-proxy to implement Kubernetes Services. kube-proxy is maintained and developed by the Kubernetes community, so new features for Services are more likely to be implemented in kube-proxy before they are implemented in cilium for GKE Dataplane V2. One example of a Services feature that was first implemented in kube-proxy is KEP-1669: Proxy Terminating Endpoints.
  • For a NodePort Service created on a cluster with GKE Dataplane V2 running version 1.25 or earlier, Default SNAT and PUPI ranges, it is needed to add the PUPI range of the pods in the nonMasqueradeCIDRs (ip-masq-agent ConfigMap) to avoid a connectivity issue.

GKE Dataplane V2 and kube-proxy

GKE Dataplane V2 does not use kube-proxy except on Windows Server node pools on GKE versions 1.25 and earlier.

Network policy enforcement without GKE Dataplane V2

See Using network policy enforcement for instructions to enable network policy enforcement in clusters that don't use GKE Dataplane V2.

What's next