About LoadBalancer Services


This page provides a general overview of how Google Kubernetes Engine (GKE) creates and manages Google Cloud load balancers when you apply a Kubernetes LoadBalancer Services manifest. It describes LoadBalancer types, configuration parameters, and provides best practice recommendations.

Before reading this page, ensure that you're familiar with GKE networking concepts.

Overview

When you create a LoadBalancer Service, GKE configures a Google Cloud pass-through load balancer whose characteristics depend on parameters of your Service manifest.

Customize your LoadBalancer Service for a Network

When choosing which LoadBalancer Service configuration to use, consider the following aspects:

LoadBalancer Service decision tree.
Figure: LoadBalancer Service decision tree

Type of load balancer – Internal or External

When you create a LoadBalancer Service in GKE, you specify whether the load balancer has an internal or external address:

  • External LoadBalancer Services are implemented by using external passthrough Network Load Balancers. Clients located outside your VPC network and Google Cloud VMs with internet access can access an external LoadBalancer Service.

    When you create a LoadBalancer service and don't specify any custom settings, it defaults to this configuration.

    As a best practice, when creating an external LoadBalancer Service, include the cloud.google.com/l4-rbs: "enabled" annotation in the Service manifest. Including this annotation in the Service manifest creates a backend service-based external passthrough Network Load Balancer.

    LoadBalancer Service manifests that omit the cloud.google.com/l4-rbs: "enabled" annotation create a target pool-based external passthrough Network Load Balancer. Using target pool-based external passthrough Network Load Balancers is no longer recommended.

  • Internal LoadBalancer Services are implemented by using internal passthrough Network Load Balancers. Clients located in the same VPC network or in a network connected to the cluster's VPC network can access an internal LoadBalancer Service.

    To create an internal LoadBalancer Service:

    • As a best practice, ensure that GKE subsetting is enabled so that GKE can efficiently group nodes using GCE_VM_IP network endpoint groups (NEGs). GKE subsetting isn't required, but is strongly recommended.

    • Include the networking.gke.io/load-balancer-type: "Internal" annotation in the Service manifest.

Effect of externalTrafficPolicy

The externalTrafficPolicy parameter controls the following:

  • Which nodes receive packets from the load balancer
  • Whether packets might be routed between nodes in the cluster, after the load balancer delivers the packets to a node
  • Whether the original client IP address is preserved or lost

The externalTrafficPolicy can be either Local or Cluster:

  • Use externalTrafficPolicy: Local to ensure that packets are only delivered to a node with at least one serving, ready, non-terminating Pod, preserving the original client source IP address. This option is best for workloads with a relatively constant number of nodes with serving Pods, even if the overall number of nodes in the cluster varies. This option is required to support weighted load balancing.

  • Use externalTrafficPolicy: Cluster in situations where the overall number of nodes in your cluster is relatively constant, but the number of nodes with serving Pods varies. This option doesn't preserve original client source IP addresses, and can add latency because packets might be routed to a serving Pod on another node after being delivered to a node from the load balancer. This option is incompatible with weighted load balancing.

For more information about how externalTrafficPolicy affects packet routing within the nodes, see packet processing.

Weighted load balancing

External LoadBalancer Services support weighted load balancing, which allows nodes with more serving Pods to receive a larger proportion of new connections compared to nodes with fewer serving Pods.

To use weighted load balancing, you must meet all of the following requirements:

  • Your GKE cluster must use version 1.31.0-gke.1506000 or later.

  • The HttpLoadBalancing add-on must be enabled for your cluster. This add-on is enabled by default. It allows the cluster to manage load balancers which use backend services.

  • You must include the cloud.google.com/l4-rbs: "enabled" annotation in the LoadBalancer Service manifest so that GKE creates a backend service-based external passthrough Network Load Balancer. Target pool-based external passthrough Network Load Balancers don't support weighted load balancing.

  • You must include the networking.gke.io/weighted-load-balancing: pods-per-node annotation in the LoadBalancer Service manifest to enable the weighted load balancing feature.

  • The LoadBalancer Service manifest must use externalTrafficPolicy: Local. GKE doesn't prevent you from using externalTrafficPolicy: Cluster, but externalTrafficPolicy: Cluster effectively disables weighted load balancing because the packet might be routed, after the load balancer, to a different node.

For more information about weighted load balancing from the perspective of the load balancer, see Weighted load balancing in the backend service-based external passthrough Network Load Balancer.

Special considerations for internal LoadBalancer Services

This section describes the GKE subsetting option, which is unique to internal LoadBalancer Services, and how GKE subsetting interacts with the externalTrafficPolicy to influence the maximum number of load-balanced nodes.

GKE subsetting

Best practice:

Enable GKE subsetting to improve the scalability of internal LoadBalancer Services.

GKE subsetting, also called GKE subsetting for Layer 4 internal load balancers, is a cluster-wide configuration option that improves the scalability of internal passthrough Network Load Balancers by more efficiently grouping node endpoints into GCE_VM_IP network endpoint groups (NEGs). The NEGs are used as the backends of the load balancer.

The following diagram shows two Services in a zonal cluster with three nodes. The cluster has GKE subsetting enabled. Each Service has two Pods. GKE creates one GCE_VM_IP NEG for each Service. Endpoints in each NEG are the nodes with the serving Pods for the respective Service.

GKE subsetting for two Services on a zonal cluster.

You can enable GKE subsetting when you create a cluster or by updating an existing cluster. Once enabled, you cannot disable GKE subsetting. GKE subsetting requires:

  • GKE version 1.18.19-gke.1400 or later, and
  • The HttpLoadBalancing add-on enabled for the cluster. This add-on is enabled by default. It allows the cluster to manage load balancers which use backend services.

Node count

A cluster with GKE subsetting disabled can experience problems with internal LoadBalancer Services if the cluster has more than 250 total nodes (among all node pools). This happens because Internal passthrough Network Load Balancers created by GKE can only distribute packets to 250 or fewer backend node VMs. This limitation exists because of the following two reasons:

  • GKE doesn't use load balancer backend subsetting.
  • An internal passthrough Network Load Balancer is limited to distributing packets to 250 or fewer backends when load balancer backend subsetting is disabled.

A cluster with GKE subsetting supports internal LoadBalancer Services in clusters with more than 250 total nodes.

  • An internal LoadBalancer Service using externalTrafficPolicy: Local in a cluster that has GKE subsetting enabled supports up to 250 nodes with serving Pods backing this Service.

  • An internal LoadBalancer Service using externalTrafficPolicy: Cluster in a cluster that has GKE subsetting enabled doesn't impose any limitation on the number of nodes with serving Pods, because GKE configures no more than 25 node endpoints in GCE_VM_IP NEGs. For more information, see Node membership in GCE_VM_IP NEG backends.

Node grouping

The Service manifest annotations and, for Internal LoadBalancer Service, the status of GKE subsetting determine the resulting Google Cloud load balancer and the type of backends. Backends for Google Cloud pass-through load balancers identify the network interface (NIC) of the GKE node, not a particular node or Pod IP address. The type of load balancer and backends determine how nodes are grouped into GCE_VM_IP NEGs, instance groups, or target pools.

GKE LoadBalancer Service Resulting Google Cloud load balancer Node grouping method
Internal LoadBalancer Service created in a cluster with GKE subsetting enabled1 An internal passthrough Network Load Balancer whose backend service uses GCE_VM_IP network endpoint group (NEG) backends

Node VMs are grouped zonally into GCE_VM_IP NEGs on a per-service basis according to the externalTrafficPolicy of the Service and the number of nodes in the cluster.

The externalTrafficPolicy of the Service also controls which nodes pass the load balancer health check and packet processing.

Internal LoadBalancer Service created in a cluster with GKE subsetting disabled An internal passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends

All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the internal passthrough Network Load Balancer's backend service.

The externalTrafficPolicy of the Service controls which nodes pass the load balancer health check and the packet processing.

The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation.

External LoadBalancer Service with the cloud.google.com/l4-rbs: "enabled" annotation2 A backend service-based external passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends

All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the external passthrough Network Load Balancer's backend service.

The externalTrafficPolicy of the Service controls which nodes pass the load balancer health check and the packet processing.

The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation.

External LoadBalancer Service without the cloud.google.com/l4-rbs: "enabled" annotation3 A target pool-based external passthrough Network Load Balancer whose target pool contains all nodes of the cluster

The target pool is a legacy API which does not rely on instance groups. All nodes have direct membership in the target pool.

The externalTrafficPolicy of the Service controls which nodes pass the load balancer health check and the packet processing.

1 Only the internal passthrough Network Load Balancers created after enabling GKE subsetting use GCE_VM_IP NEGs. Any internal LoadBalancer Services created before enabling GKE subsetting continue to use unmanaged instance group backends. For examples and configuration guidance, see Creating internal LoadBalancer Services.

2GKE does not automatically migrate existing external LoadBalancer Services from target pool-based external passthrough Network Load Balancers to backend service-based external passthrough Network Load Balancers. To create an external LoadBalancer Service powered by a backend service-based external passthrough Network Load Balancer, you must include the cloud.google.com/l4-rbs: "enabled" annotation in the Service manifest at the time of creation.

3Removing the cloud.google.com/l4-rbs: "enabled" annotation from an existing external LoadBalancer Service powered by a backend service-based external passthrough Network Load Balancer does not cause GKE to create a target pool-based external passthrough Network Load Balancer. To create an external LoadBalancer Service powered by a target pool-based external passthrough Network Load Balancer, you must omit the cloud.google.com/l4-rbs: "enabled" annotation from the Service manifest at the time of creation.

Node membership in GCE_VM_IP NEG backends

When GKE subsetting is enabled for a cluster, GKE creates a unique GCE_VM_IP NEG in each zone for each internal LoadBalancer Service. Unlike instance groups, nodes can be members of more than one load-balanced GCE_VM_IP NEG. The externalTrafficPolicy of the Service and the number of nodes in the cluster determine which nodes are added as endpoints to the Service's GCE_VM_IP NEG(s).

The cluster's control plane adds nodes as endpoints to the GCE_VM_IP NEGs according to the value of the Service's externalTrafficPolicy and the number of nodes in the cluster, as summarized in the following table.

externalTrafficPolicy Number of nodes in the cluster Endpoint membership
Cluster 1 to 25 nodes GKE uses all nodes in the cluster as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service.
Cluster more than 25 nodes GKE uses a random subset of up to 25 nodes as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service.
Local any number of nodes1 GKE only uses nodes which have at least one of the Service's serving Pods as endpoints for the Service's NEG(s).

1Limited to 250 nodes with serving Pods for internal LoadBalancer Services. More than 250 nodes can be present in the cluster, but internal passthrough Network Load Balancers only distribute to 250 backend VMs when internal passthrough Network Load Balancer backend subsetting is disabled. Even with GKE subsetting enabled, GKE never configures internal passthrough Network Load Balancers with internal passthrough Network Load Balancer backend subsetting. For details about this limit, see Maximum number of VM instances per internal backend service.

Single load-balanced instance group limitation

The Compute Engine API prohibits VMs from being members of more than one load-balanced instance group. GKE nodes are subject to this constraint.

When using unmanaged instance group backends, GKE creates or updates unmanaged instance groups containing all nodes from all node pools in each zone the cluster uses. These unmanaged instance groups are used for:

  • Internal passthrough Network Load Balancers created for internal LoadBalancer Services when GKE subsetting is disabled.
  • Backend service-based external passthrough Network Load Balancers created for external LoadBalancer Services with the cloud.google.com/l4-rbs: "enabled" annotation.
  • External Application Load Balancers created for external GKE Ingress resources, using the GKE Ingress controller, but not using container-native load balancing.

Because node VMs can't be members of more than one load-balanced instance group, GKE can't create and manage internal passthrough Network Load Balancers, backend service-based external passthrough Network Load Balancers, and external Application Load Balancers created for GKE Ingress resources if either of the following is true:

  • Outside of GKE, you created at least one backend service based load balancer, and you used the cluster's managed instance groups as backends for the load balancer's backend service.
  • Outside of GKE, you create a custom unmanaged instance group that contains some or all of the cluster's nodes, then attach that custom unmanaged instance group to a backend service for a load balancer.

To work around this limitation, you can instruct GKE to use NEG backends where possible:

  • Enable GKE subsetting. As a result, new internal LoadBalancer Services use GCE_VM_IP NEGs instead.
  • Configure external GKE Ingress resources to use container native load balancing. For more information, see GKE container-native load balancing.

Load balancer health checks

All GKE LoadBalancer Services implement a load balancer health check. The load balancer health check system operates outside of the cluster and is different from a Pod readiness, liveness, or startup probe.

Load balancer health check packets are answered by either the kube-proxy (legacy dataplane) or cilium-agent (GKE Dataplane V2) software running on each node. Load balancer health checks for LoadBalancer Services cannot be answered by Pods.

The externalTrafficPolicy of the Service determines which nodes pass the load balancer health check:

externalTrafficPolicy Which nodes pass the health check What port is used
Cluster All nodes of the cluster pass the health check, including nodes without serving Pods. If at least one serving Pod exists on a node, that node passes the load balancer health check regardless of the state of its Pod. The load balancer health check port must be TCP port 10256. It cannot be customized.
Local

The load balancer health check considers a node healthy if at least one ready, non-terminating serving Pod exists on the node, regardless of the state of any other Pods. Nodes without a serving Pod, nodes whose serving Pods all fail readiness probes, and nodes whose serving Pods are all terminating fail the load balancer health check.

During state transitions, a node still passes the load balancer health check until the load balancer health check unhealthy threshold has been reached. The transition state occurs when all serving Pods on a node begin to fail readiness probes or when all serving Pods on a node are terminating. How the packet is processed in this situation depends on the GKE version. For additional details, see the next section, Packet processing.

The health check port is TCP port 10256 unless you specify a custom health check port.

When weighted load balancing is enabled, the kube-proxy or cilium-agent software includes a response header in its answer to the load balancer health check. This response header defines a weight that is proportional to the number of serving, ready, and non-terminating Pods on the node. The load balancer routes new connections to serving Pods based on this weight.

Packet processing

The following sections detail how the load balancer and cluster nodes work together to route packets received for LoadBalancer Services.

Pass-through load balancing

Passthrough Network Load Balancers route packets to the nic0 interface of the GKE cluster's nodes. Each load-balanced packet received on a node has the following characteristics:

  • The packet's destination IP address matches the load balancer's forwarding rule IP address.
  • The protocol and destination port of the packet match both of these:
    • a protocol and port specified in spec.ports[] of the Service manifest
    • a protocol and port configured on the load balancer's forwarding rule

Destination Network Address Translation on nodes

After the node receives the packet, the node performs additional packet processing. In GKE clusters that use the legacy dataplane, nodes use iptables to process load-balanced packets. In GKE clusters with GKE Dataplane V2 enabled, nodes use eBPF instead. The node-level packet processing always includes the following actions:

  • The node performs Destination Network Address Translation (DNAT) on the packet, setting its destination IP address to a serving Pod IP address.
  • The node changes the packet's destination port to the targetPort of the corresponding Service's spec.ports[].

Source Network Address Translation on nodes

The externalTrafficPolicy determines whether the node-level packet processing also performs source network address translation (SNAT) as well as the path the packet takes from node to Pod:

externalTrafficPolicy Node SNAT behavior Routing behavior
Cluster The node changes the source IP address of load-balanced packets to match the IP address of the node which received it from the load balancer.

The node routes packets to any serving Pod. The serving Pod might or might not be on the same node.

If the node that receives the packets from the load balancer lacks a ready and serving Pod, the node routes the packets to a different node which does contain a ready and serving Pod. Response packets from the Pod are routed from its node back to the node which received the request packets from the load balancer. That first node then sends the response packets to the original client using Direct Server Return.

Local The node does not change the source IP address of load-balanced packets.

In most situations, the node routes the packet to a serving Pod running on the node which received the packet from the load balancer. That node sends response packets to the original client using Direct Server Return. This is the primary intent of this type of traffic policy.

In some situations, a node receives packets from the load balancer even though the node lacks a ready, non-terminating serving Pod for the Service. This situation is encountered when the load balancer's health check has not yet reached its failure threshold, but a previously ready and serving Pod is no longer ready or is terminating (for example, when doing a rolling update). How the packets are processed in this situation depends on the GKE version, whether the cluster uses GKE Dataplane V2, and the value of externalTrafficPolicy:

  • Without GKE Dataplane V2, in GKE 1.26 and later and with GKE Dataplane V2 in GKE versions 1.26.4-gke.500 and later, Proxy Terminating Endpoints is enabled. Packets are routed to a terminating Pod as a last resort, if all the following conditions are met:
    • If all serving Pods are terminating and the externalTrafficPolicy is Cluster.
    • If all serving Pods on the node are terminating and the externalTrafficPolicy is Local.
  • For all other GKE versions, the packet is answered by the node's kernel with a TCP reset.

Pricing and quotas

Network pricing applies to packets processed by a load balancer. For more information, see Cloud Load Balancing and forwarding rules pricing. You can also estimate billing charges using the Google Cloud pricing calculator.

The number of forwarding rules you can create is controlled by load balancer quotas:

What's next