LoadBalancer Service concepts


This page provides a general overview of how Google Kubernetes Engine (GKE) creates and manages Google Cloud load balancers when you apply a Kubernetes LoadBalancer Services manifest. It describes the different types of load balancers and how settings like the externalTrafficPolicy and GKE subsetting for L4 internal load balancers determine how the load balancers are configured.

Before reading this page, you should be familiar with GKE networking concepts.

Overview

When you create a LoadBalancer Service, GKE configures a Google Cloud pass-through load balancer whose characteristics depend on parameters of your Service manifest.

Choose a LoadBalancer Service

When choosing which LoadBalancer Service configuration to use, consider the following aspects:

  • The type of IP address of the LoadBalancer. Your load balancer can have an internal or an external IP address.
  • The number and type of nodes the LoadBalancer supports.

After you determine your network architecture requirements, use the following decision tree to determine which LoadBalancer Service to choose for your network configuration.

A LoadBalancer Service in GKE can have an internal or external address.
Figure: LoadBalancer Service decision tree

External versus internal load balancing

When you create a LoadBalancer Service in GKE, you specify whether the load balancer has an internal or external address:

  • If your clients are located in the same VPC network or in a network connected to the cluster's (VPC network), then use an Internal LoadBalancer Service. Internal LoadBalancer Services are implemented by using internal passthrough Network Load Balancers. Clients located in the same VPC network or in a network connected to the cluster's VPC network can access the Service by using the load balancer's IP address.

    To create an internal LoadBalancer Service, include one of the following annotations in the metadata.annotations[] of the Service manifest:

    • networking.gke.io/load-balancer-type: "Internal" (GKE 1.17 and later)
    • cloud.google.com/load-balancer-type: "Internal" (versions earlier than 1.17)
  • If your clients are located outside your VPC network, then use an external LoadBalancer Service. You can use one of the following types of external passthrough Network Load Balancers accessible on the internet (including Google Cloud VMs with internet access):

Effect of externalTrafficPolicy

You can set externalTrafficPolicy to Local or Cluster to define how packets are routed to nodes with ready and serving Pods. Consider the following scenarios when defining the externalTrafficPolicy:

  • Use externalTrafficPolicy: Local to preserve the original client IP addresses or if you want to minimize disruptions when the number of nodes without serving Pods in the cluster changes.

  • Use externalTrafficPolicy: Cluster if the overall number of nodes without serving Pods in your cluster remains consistent, but the number of nodes with serving Pods changes. This option does not preserve the original client IP addresses.

For more information about how externalTrafficPolicy affects packet routing within the nodes, see packet processing.

GKE subsetting

The GKE subsetting for L4 internal load balancers cluster-wide configuration option, or GKE subsetting, improves the scalability of internal passthrough Network Load Balancers by more efficiently grouping node endpoints for the load balancer backends.

The following diagram shows two Services in a zonal cluster with three nodes and GKE subsetting enabled. Each Service has two Pods. GKE creates one GCE_VM_IP network endpoint group (NEG) for each Service. Endpoints in each NEG are the nodes with the serving Pods for the respective Service.

You can enable GKE subsetting when you create a cluster or by editing an existing cluster. Once enabled, you cannot disable GKE subsetting. For more information, see GKE subsetting.

GKE subsetting requires:

  • GKE version 1.18.19-gke.1400 or later, and
  • The HttpLoadBalancing add-on enabled for the cluster. This add-on is enabled by default. It allows the cluster to manage load balancers which use backend services.

Node count consideration when enabling GKE subsetting

As a best practice, if you need to create internal LoadBalancer Services, you should enable GKE subsetting. GKE subsetting allows you to support more nodes in your cluster:

  • If your cluster has GKE subsetting disabled, you should not create more than 250 total nodes (among all node pools). If you create more than 250 total nodes in the cluster, internal LoadBalancer Services might experience uneven traffic distribution or complete loss of connectivity.

  • If your cluster has GKE subsetting enabled, you can use either externalTrafficPolicy: Local or externalTrafficPolicy: Cluster, as long as the number of unique nodes with at least one serving Pod is not higher than 250 nodes. Nodes without any serving Pod are not relevant. If you need more than 250 nodes with at least one serving Pod, you must use externalTrafficPolicy: Cluster.

Internal passthrough Network Load Balancers created by GKE can only distribute packets to 250 or fewer backend node VMs. This limitation exists because GKE does not use load balancer backend subsetting, and an internal passthrough Network Load Balancer is limited to distributing packets to 250 or fewer backends when load balancer backend subsetting is disabled.

Node grouping

The Service manifest annotations and, for Internal LoadBalancer Service, the status of GKE subsetting determine the resulting Google Cloud load balancer and the type of backends. Backends for Google Cloud pass-through load balancers identify the network interface (NIC) of the GKE node, not a particular node or Pod IP address. The type of load balancer and backends determine how nodes are grouped into GCE_VM_IP NEGs, instance groups, or target pools.

GKE LoadBalancer Service Resulting Google Cloud load balancer Node grouping method
Internal LoadBalancer Service created in a cluster with GKE subsetting enabled1 An internal passthrough Network Load Balancer whose backend service uses GCE_VM_IP network endpoint group (NEG) backends

Node VMs are grouped zonally into GCE_VM_IP NEGs on a per-service basis according to the externalTrafficPolicy of the Service and the number of nodes in the cluster.

The externalTrafficPolicy of the Service also controls which nodes pass the load balancer health check and packet processing.

Internal LoadBalancer Service created in a cluster with GKE subsetting disabled An internal passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends

All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the internal passthrough Network Load Balancer's backend service.

The externalTrafficPolicy of the Service controls which nodes pass the load balancer health check and the packet processing.

The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation.

External LoadBalancer Service with the cloud.google.com/l4-rbs: "enabled" annotation2 A backend service-based external passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends

All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the external passthrough Network Load Balancer's backend service.

The externalTrafficPolicy of the Service controls which nodes pass the load balancer health check and the packet processing.

The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation.

External LoadBalancer Service without the cloud.google.com/l4-rbs: "enabled" annotation3 A target pool-based external passthrough Network Load Balancer whose target pool contains all nodes of the cluster

The target pool is a legacy API which does not rely on instance groups. All nodes have direct membership in the target pool.

The externalTrafficPolicy of the Service controls which nodes pass the load balancer health check and the packet processing.

1 Only the internal passthrough Network Load Balancers created after enabling GKE subsetting use GCE_VM_IP NEGs. Any internal LoadBalancer Services created before enabling GKE subsetting continue to use unmanaged instance group backends. For examples and configuration guidance, see Creating internal LoadBalancer Services.

2GKE does not automatically migrate existing external LoadBalancer Services from target pool-based external passthrough Network Load Balancers to backend service-based external passthrough Network Load Balancers. To create an external LoadBalancer Service powered by a backend service-based external passthrough Network Load Balancer, you must include the cloud.google.com/l4-rbs: "enabled" annotation in the Service manifest at the time of creation.

3Removing the cloud.google.com/l4-rbs: "enabled" annotation from an existing external LoadBalancer Service powered by a backend service-based external passthrough Network Load Balancer does not cause GKE to create a target pool-based external passthrough Network Load Balancer. To create an external LoadBalancer Service powered by a target pool-based external passthrough Network Load Balancer, you must omit the cloud.google.com/l4-rbs: "enabled" annotation from the Service manifest at the time of creation.

Node membership in GCE_VM_IP NEG backends

When GKE subsetting is enabled for a cluster, GKE creates a unique GCE_VM_IP NEG in each zone for each internal LoadBalancer Service. Unlike instance groups, nodes can be members of more than one load-balanced GCE_VM_IP NEG. The externalTrafficPolicy of the Service and the number of nodes in the cluster determine which nodes are added as endpoints to the Service's GCE_VM_IP NEG(s).

The cluster's control plane adds nodes as endpoints to the GCE_VM_IP NEGs according to the value of the Service's externalTrafficPolicy and the number of nodes in the cluster, as summarized in the following table.

externalTrafficPolicy Number of nodes in the cluster Endpoint membership
Cluster 1 to 25 nodes GKE uses all nodes in the cluster as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service.
Cluster more than 25 nodes GKE uses a random subset of up to 25 nodes as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service.
Local any number of nodes1 GKE only uses nodes which have at least one of the Service's serving Pods as endpoints for the Service's NEG(s).

1Limited to 250 nodes with serving Pods for internal LoadBalancer Services. More than 250 nodes can be present in the cluster, but internal passthrough Network Load Balancers only distribute to 250 backend VMs when internal passthrough Network Load Balancer backend subsetting is disabled. Even with GKE subsetting enabled, GKE never configures internal passthrough Network Load Balancers with internal passthrough Network Load Balancer backend subsetting. For details about this limit, see Maximum number of VM instances per internal backend service.

Single load-balanced instance group limitation

The Compute Engine API prohibits VMs from being members of more than one load-balanced instance group. GKE nodes are subject to this constraint.

When using unmanaged instance group backends, GKE creates or updates unmanaged instance groups containing all nodes from all node pools in each zone the cluster uses. These unmanaged instance groups are used for:

  • Internal passthrough Network Load Balancers created for internal LoadBalancer Services when GKE subsetting is disabled.
  • Backend service-based external passthrough Network Load Balancers created for external LoadBalancer Services with the cloud.google.com/l4-rbs: "enabled" annotation.
  • External Application Load Balancers created for external GKE Ingress resources, using the GKE Ingress controller, but not using container-native load balancing.

Because node VMs can't be members of more than one load-balanced instance group, GKE can't create and manage internal passthrough Network Load Balancers, backend service-based external passthrough Network Load Balancers, and external Application Load Balancers created for GKE Ingress resources if either of the following is true:

  • Outside of GKE, you created at least one backend service based load balancer, and you used the cluster's managed instance groups as backends for the load balancer's backend service.
  • Outside of GKE, you create a custom unmanaged instance group that contains some or all of the cluster's nodes, then attach that custom unmanaged instance group to a backend service for a load balancer.

To work around this limitation, you can instruct GKE to use NEG backends where possible:

  • Enable GKE subsetting. As a result, new internal LoadBalancer Services use GCE_VM_IP NEGs instead.
  • Configure external GKE Ingress resources to use container native load balancing. For more information, see GKE container-native load balancing.

Load balancer health checks

All GKE LoadBalancer Services require a load balancer health check. The load balancer's health check is implemented outside of the cluster and is different from a readiness or liveness probe.

The externalTrafficPolicy of the Service defines how the load balancer's health check operates. In all cases, the load balancer's health check probers send packets to the kube-proxy software running on each node. The load balancer's health check is a proxy for information that the kube-proxy gathers, such as whether a Pod exists, is running, and has passed its readiness probe. Health checks for LoadBalancer Services cannot be routed to serving Pods. The load balancer's health check is designed to direct new TCP connections to nodes.

The following table describes the health check behavior:

externalTrafficPolicy Which nodes pass the health check What port is used
Cluster All nodes of the cluster pass the health check even if the node has no serving Pods. If one or more serving Pods exist on a node, that node passes the load balancer's health check even if the serving Pods are terminating or are failing readiness probes. The load balancer health check port must be TCP port 10256. It cannot be customized.
Local

Only the nodes with at least one ready, non-terminating serving Pod pass the load balancer's health check. Nodes without a serving Pod, nodes whose serving Pods all fail readiness probes, and nodes whose serving Pods are all terminating fail the load balancer's health check.

During state transitions, a node still passes the load balancer's health check until the load balancer's unhealthy threshold is reached. The transition state occurs when all serving Pods on a node begin to fail readiness probes or when all serving Pods on a node are terminating. How the packet is processed in this situation depends on the GKE version. For additional details, see the next section, Packet processing.

The health check port is TCP port 10256 unless you specify a custom health check port.

Packet processing

The following sections detail how the load balancer and cluster nodes work together to route packets received for LoadBalancer Services.

Pass-through load balancing

The Google Cloud pass-through load balancer routes packets to the nic0 interface of the GKE cluster's nodes. Each load-balanced packet received by a node has the following characteristics:

  • The packet's destination IP address matches the load balancer's forwarding rule IP address.
  • The protocol and destination port of the packet match both of these:
    • a protocol and port specified in spec.ports[] of the Service manifest
    • a protocol and port configured on the load balancer's forwarding rule

Destination Network Address Translation on nodes

After the node receives the packet, the node performs additional packet processing. In GKE clusters without GKE Dataplane V2 enabled, nodes use iptables to process load-balanced packets. In GKE clusters with GKE Dataplane V2 enabled, nodes use eBPF instead. The node-level packet processing always includes the following actions:

  • The node performs Destination Network Address Translation (DNAT) on the packet, setting its destination IP address to a serving Pod IP address.
  • The node changes the packet's destination port to the targetPort of the corresponding Service's spec.ports[].

Source Network Address Translation on nodes

The externalTrafficPolicy determines whether the node-level packet processing also performs source network address translation (SNAT) as well as the path the packet takes from node to Pod:

externalTrafficPolicy Node SNAT behavior Routing behavior
Cluster The node changes the source IP address of load-balanced packets to match the IP address of the node which received it from the load balancer.

The node routes packets to any serving Pod. The serving Pod might or might not be on the same node.

If the node that receives the packets from the load balancer lacks a ready and serving Pod, the node routes the packets to a different node which does contain a ready and serving Pod. Response packets from the Pod are routed from its node back to the node which received the request packets from the load balancer. That first node then sends the response packets to the original client using Direct Server Return.

Local The node does not change the source IP address of load-balanced packets.

In most situations, the node routes the packet to a serving Pod running on the node which received the packet from the load balancer. That node sends response packets to the original client using Direct Server Return. This is the primary intent of this type of traffic policy.

In some situations, a node receives packets from the load balancer even though the node lacks a ready, non-terminating serving Pod for the Service. This situation is encountered when the load balancer's health check has not yet reached its failure threshold, but a previously ready and serving Pod is no longer ready or is terminating (for example, when doing a rolling update). How the packets are processed in this situation depends on the GKE version, whether the cluster uses GKE Dataplane V2, and the value of externalTrafficPolicy:

  • Without GKE Dataplane V2, in GKE 1.26 and later and with GKE Dataplane V2 in GKE versions 1.26.4-gke.500 and later, Proxy Terminating Endpoints is enabled. Packets are routed to a terminating Pod as a last resort, if all the following conditions are met:
    • If all serving Pods are terminating and the externalTrafficPolicy is Cluster.
    • If all serving Pods on the node are terminating and the externalTrafficPolicy is Local.
  • For all other GKE versions, the packet is answered by the node's kernel with a TCP reset.

Pricing and quotas

Network pricing applies to packets processed by a load balancer. For more information, see Cloud Load Balancing and forwarding rules pricing. You can also estimate billing charges using the Google Cloud pricing calculator.

The number of forwarding rules you can create is controlled by load balancer quotas:

What's next