This page provides a general overview of how Google Kubernetes Engine (GKE) creates and manages Google Cloud load balancers when you apply a Kubernetes LoadBalancer Services manifest. It describes LoadBalancer types, configuration parameters, and provides best practice recommendations.
Before reading this page, ensure that you're familiar with GKE networking concepts.
Overview
When you create a LoadBalancer Service, GKE configures a Google Cloud pass-through load balancer whose characteristics depend on parameters of your Service manifest.
Customize your LoadBalancer Service for a Network
When choosing which LoadBalancer Service configuration to use, consider the following aspects:
Type of load balancer – Internal or External
When you create a LoadBalancer Service in GKE, you specify whether the load balancer has an internal or external address:
External LoadBalancer Services are implemented by using external passthrough Network Load Balancers. Clients located outside your VPC network and Google Cloud VMs with internet access can access an external LoadBalancer Service.
When you create a LoadBalancer service and don't specify any custom settings, it defaults to this configuration.
As a best practice, when creating an external LoadBalancer Service, include the
cloud.google.com/l4-rbs: "enabled"
annotation in the Service manifest. Including this annotation in the Service manifest creates a backend service-based external passthrough Network Load Balancer.LoadBalancer Service manifests that omit the
cloud.google.com/l4-rbs: "enabled"
annotation create a target pool-based external passthrough Network Load Balancer. Using target pool-based external passthrough Network Load Balancers is no longer recommended.Internal LoadBalancer Services are implemented by using internal passthrough Network Load Balancers. Clients located in the same VPC network or in a network connected to the cluster's VPC network can access an internal LoadBalancer Service.
To create an internal LoadBalancer Service:
As a best practice, ensure that GKE subsetting is enabled so that GKE can efficiently group nodes using
GCE_VM_IP
network endpoint groups (NEGs). GKE subsetting isn't required, but is strongly recommended.Include the
networking.gke.io/load-balancer-type: "Internal"
annotation in the Service manifest.
Effect of externalTrafficPolicy
The externalTrafficPolicy
parameter controls the following:
- Which nodes receive packets from the load balancer
- Whether packets might be routed between nodes in the cluster, after the load balancer delivers the packets to a node
- Whether the original client IP address is preserved or lost
The externalTrafficPolicy
can be either Local
or Cluster
:
- Use
externalTrafficPolicy: Local
to ensure that packets are only delivered to a node with at least one serving, ready, non-terminating Pod, preserving the original client source IP address. This option is best for workloads with a relatively constant number of nodes with serving Pods, even if the overall number of nodes in the cluster varies. This option is required to support weighted load balancing.
- Use
externalTrafficPolicy: Cluster
in situations where the overall number of nodes in your cluster is relatively constant, but the number of nodes with serving Pods varies. This option doesn't preserve original client source IP addresses, and can add latency because packets might be routed to a serving Pod on another node after being delivered to a node from the load balancer. This option is incompatible with weighted load balancing.
For more information about how externalTrafficPolicy
affects packet routing
within the nodes, see packet processing.
Weighted load balancing
External LoadBalancer Services support weighted load balancing, which allows nodes with more serving Pods to receive a larger proportion of new connections compared to nodes with fewer serving Pods.
To use weighted load balancing, you must meet all of the following requirements:
Your GKE cluster must use version 1.31.0-gke.1506000 or later.
The
HttpLoadBalancing
add-on must be enabled for your cluster. This add-on is enabled by default. It allows the cluster to manage load balancers which use backend services.You must include the
cloud.google.com/l4-rbs: "enabled"
annotation in the LoadBalancer Service manifest so that GKE creates a backend service-based external passthrough Network Load Balancer. Target pool-based external passthrough Network Load Balancers don't support weighted load balancing.You must include the
networking.gke.io/weighted-load-balancing: pods-per-node
annotation in the LoadBalancer Service manifest to enable the weighted load balancing feature.The LoadBalancer Service manifest must use
externalTrafficPolicy: Local
. GKE doesn't prevent you from usingexternalTrafficPolicy: Cluster
, butexternalTrafficPolicy: Cluster
effectively disables weighted load balancing because the packet might be routed, after the load balancer, to a different node.
For more information about weighted load balancing from the perspective of the load balancer, see Weighted load balancing in the backend service-based external passthrough Network Load Balancer.
Special considerations for internal LoadBalancer Services
This section describes the GKE subsetting option, which is unique
to internal LoadBalancer Services, and how GKE subsetting
interacts with the externalTrafficPolicy
to influence the maximum number of
load-balanced nodes.
GKE subsetting
Enable GKE subsetting to improve the scalability of internal LoadBalancer Services.
GKE subsetting, also called GKE subsetting for
Layer 4 internal load balancers, is a cluster-wide configuration option that
improves the scalability of internal passthrough Network Load Balancers by more efficiently grouping node
endpoints into GCE_VM_IP
network endpoint groups (NEGs). The NEGs are used as
the backends of the load balancer.
The following diagram shows two Services in a zonal cluster with three nodes.
The cluster has GKE subsetting enabled. Each Service has two
Pods. GKE creates one GCE_VM_IP
NEG for each Service. Endpoints
in each NEG are the nodes with the serving Pods for the respective Service.
You can enable GKE subsetting when you create a cluster or by updating an existing cluster. Once enabled, you cannot disable GKE subsetting. GKE subsetting requires:
- GKE version 1.18.19-gke.1400 or later, and
- The
HttpLoadBalancing
add-on enabled for the cluster. This add-on is enabled by default. It allows the cluster to manage load balancers which use backend services.
Node count
A cluster with GKE subsetting disabled can experience problems with internal LoadBalancer Services if the cluster has more than 250 total nodes (among all node pools). This happens because Internal passthrough Network Load Balancers created by GKE can only distribute packets to 250 or fewer backend node VMs. This limitation exists because of the following two reasons:
- GKE doesn't use load balancer backend subsetting.
- An internal passthrough Network Load Balancer is limited to distributing packets to 250 or fewer backends when load balancer backend subsetting is disabled.
A cluster with GKE subsetting supports internal LoadBalancer Services in clusters with more than 250 total nodes.
An internal LoadBalancer Service using
externalTrafficPolicy: Local
in a cluster that has GKE subsetting enabled supports up to 250 nodes with serving Pods backing this Service.An internal LoadBalancer Service using
externalTrafficPolicy: Cluster
in a cluster that has GKE subsetting enabled doesn't impose any limitation on the number of nodes with serving Pods, because GKE configures no more than 25 node endpoints inGCE_VM_IP
NEGs. For more information, see Node membership inGCE_VM_IP
NEG backends.
Session affinity and traffic distribution
Session affinity lets you control how the load balancer assigns a request from a client to a backend and ensures that all subsequent requests from the client are routed back to that same backend.
When you use an internal passthrough Network Load Balancer with session affinity set to CLIENT_IP
, you
might see uneven traffic distribution to your backends. This is because the load
balancer always sends traffic from a given client IP address to the same
backend. If you have a small number of clients with high traffic volume, this
can overload some backends while leaving others underutilized.
For more information, see Session affinity options.
Node grouping
The Service manifest annotations and, for Internal LoadBalancer Service, the
status of GKE subsetting determine the resulting
Google Cloud load balancer and the type of backends. Backends for
Google Cloud pass-through load balancers identify the network interface
(NIC) of the GKE node, not a particular node or Pod IP address.
The type of load balancer and backends determine how nodes are grouped into
GCE_VM_IP
NEGs, instance groups, or target pools.
GKE LoadBalancer Service | Resulting Google Cloud load balancer | Node grouping method |
---|---|---|
Internal LoadBalancer Service created in a cluster with GKE subsetting enabled1 | An internal passthrough Network Load Balancer whose backend service uses GCE_VM_IP
network endpoint group (NEG) backends |
Node VMs are grouped zonally into The |
Internal LoadBalancer Service created in a cluster with GKE subsetting disabled | An internal passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends | All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the internal passthrough Network Load Balancer's backend service. The The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation. |
External LoadBalancer Service with the
cloud.google.com/l4-rbs: "enabled" annotation2
|
A backend service-based external passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends | All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the external passthrough Network Load Balancer's backend service. The The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation. |
External LoadBalancer Service without the
cloud.google.com/l4-rbs: "enabled" annotation3
|
A target pool-based external passthrough Network Load Balancer whose target pool contains all nodes of the cluster | The target pool is a legacy API which does not rely on instance groups. All nodes have direct membership in the target pool. The |
1 Only the internal passthrough Network Load Balancers created after enabling
GKE subsetting use GCE_VM_IP
NEGs. Any
internal LoadBalancer Services created before enabling GKE
subsetting continue to use unmanaged instance group backends. For examples
and configuration guidance, see
Creating
internal LoadBalancer Services.
2GKE does not automatically migrate existing
external LoadBalancer Services from target pool-based external passthrough Network Load Balancers to
backend service-based external passthrough Network Load Balancers. To create an external LoadBalancer
Service powered by a backend service-based external passthrough Network Load Balancer, you must
include the cloud.google.com/l4-rbs: "enabled"
annotation in the
Service manifest at the time of creation.
3Removing the cloud.google.com/l4-rbs: "enabled"
annotation from an existing external LoadBalancer Service powered by a backend
service-based external passthrough Network Load Balancer does not cause GKE to create a
target pool-based external passthrough Network Load Balancer. To create an external LoadBalancer
Service powered by a target pool-based external passthrough Network Load Balancer, you must
omit the cloud.google.com/l4-rbs: "enabled"
annotation from the
Service manifest at the time of creation.
Node membership in GCE_VM_IP
NEG backends
When GKE subsetting is enabled for a cluster, GKE
creates a unique GCE_VM_IP
NEG in each zone for each internal LoadBalancer
Service. Unlike instance groups, nodes can be members of more than one
load-balanced GCE_VM_IP
NEG. The externalTrafficPolicy
of the Service and
the number of nodes in the cluster determine which nodes are added as endpoints
to the Service's GCE_VM_IP
NEG(s).
The cluster's control plane adds nodes as endpoints to the GCE_VM_IP
NEGs
according to the value of the Service's externalTrafficPolicy
and the number
of nodes in the cluster, as summarized in the following table.
externalTrafficPolicy |
Number of nodes in the cluster | Endpoint membership |
---|---|---|
Cluster |
1 to 25 nodes | GKE uses all nodes in the cluster as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service. |
Cluster |
more than 25 nodes | GKE uses a random subset of up to 25 nodes as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service. |
Local |
any number of nodes1 | GKE only uses nodes which have at least one of the Service's serving Pods as endpoints for the Service's NEG(s). |
1Limited to 250 nodes with serving Pods for internal LoadBalancer Services. More than 250 nodes can be present in the cluster, but internal passthrough Network Load Balancers only distribute to 250 backend VMs when internal passthrough Network Load Balancer backend subsetting is disabled. Even with GKE subsetting enabled, GKE never configures internal passthrough Network Load Balancers with internal passthrough Network Load Balancer backend subsetting. For details about this limit, see Maximum number of VM instances per internal backend service.
Single load-balanced instance group limitation
The Compute Engine API prohibits VMs from being members of more than one load-balanced instance group. GKE nodes are subject to this constraint.
When using unmanaged instance group backends, GKE creates or updates unmanaged instance groups containing all nodes from all node pools in each zone the cluster uses. These unmanaged instance groups are used for:
- Internal passthrough Network Load Balancers created for internal LoadBalancer Services when GKE subsetting is disabled.
- Backend service-based external passthrough Network Load Balancers created for external LoadBalancer
Services with the
cloud.google.com/l4-rbs: "enabled"
annotation. - External Application Load Balancers created for external GKE Ingress resources, using the GKE Ingress controller, but not using container-native load balancing.
Because node VMs can't be members of more than one load-balanced instance group, GKE can't create and manage internal passthrough Network Load Balancers, backend service-based external passthrough Network Load Balancers, and external Application Load Balancers created for GKE Ingress resources if either of the following is true:
- Outside of GKE, you created at least one backend service based load balancer, and you used the cluster's managed instance groups as backends for the load balancer's backend service.
- Outside of GKE, you create a custom unmanaged instance group that contains some or all of the cluster's nodes, then attach that custom unmanaged instance group to a backend service for a load balancer.
To work around this limitation, you can instruct GKE to use NEG backends where possible:
- Enable GKE subsetting. As a result, new internal
LoadBalancer Services use
GCE_VM_IP
NEGs instead. - Configure external GKE Ingress resources to use container native load balancing. For more information, see GKE container-native load balancing.
Load balancer health checks
All GKE LoadBalancer Services implement a load balancer health check. The load balancer health check system operates outside of the cluster and is different from a Pod readiness, liveness, or startup probe.
Load balancer health check packets are answered by either the kube-proxy
(legacy dataplane) or cilium-agent
(GKE Dataplane V2) software running on each
node. Load balancer health checks for LoadBalancer Services cannot be answered
by Pods.
The externalTrafficPolicy
of the Service determines which nodes pass the load
balancer health check:
externalTrafficPolicy |
Which nodes pass the health check | What port is used |
---|---|---|
Cluster |
All nodes of the cluster pass the health check, including nodes without serving Pods. If at least one serving Pod exists on a node, that node passes the load balancer health check regardless of the state of its Pod. | The load balancer health check port must be TCP port 10256. It cannot be customized. |
Local |
The load balancer health check considers a node healthy if at least one ready, non-terminating serving Pod exists on the node, regardless of the state of any other Pods. Nodes without a serving Pod, nodes whose serving Pods all fail readiness probes, and nodes whose serving Pods are all terminating fail the load balancer health check. During state transitions, a node still passes the load balancer health check until the load balancer health check unhealthy threshold has been reached. The transition state occurs when all serving Pods on a node begin to fail readiness probes or when all serving Pods on a node are terminating. How the packet is processed in this situation depends on the GKE version. For additional details, see the next section, Packet processing. |
The Kubernetes control plane assigns the health check port from the node port range unless you specify a custom health check port. |
When weighted load balancing is enabled, the kube-proxy
or cilium-agent
software includes a response header in its answer to the load balancer health
check. This response header defines a weight that is proportional to the number
of serving, ready, and non-terminating Pods on the node. The load balancer
routes new connections to serving Pods based on this weight.
Packet processing
The following sections detail how the load balancer and cluster nodes work together to route packets received for LoadBalancer Services.
Pass-through load balancing
Passthrough Network Load Balancers route packets to the nic0
interface of the
GKE cluster's nodes. Each load-balanced packet received on a node
has the following characteristics:
- The packet's destination IP address matches the load balancer's forwarding rule IP address.
- The protocol and destination port of the packet match both of these:
- a protocol and port specified in
spec.ports[]
of the Service manifest - a protocol and port configured on the load balancer's forwarding rule
- a protocol and port specified in
Destination Network Address Translation on nodes
After the node receives the packet, the node performs additional packet
processing. In GKE clusters that use the legacy dataplane,
nodes use iptables
to process load-balanced packets. In GKE
clusters with GKE Dataplane V2
enabled, nodes use
eBPF instead. The node-level packet
processing always includes the following actions:
- The node performs Destination Network Address Translation (DNAT) on the packet, setting its destination IP address to a serving Pod IP address.
- The node changes the packet's destination port to the
targetPort
of the corresponding Service'sspec.ports[]
.
Source Network Address Translation on nodes
The externalTrafficPolicy
determines whether the node-level packet processing
also performs source network address translation (SNAT) as well as the path the
packet takes from node to Pod:
externalTrafficPolicy |
Node SNAT behavior | Routing behavior |
---|---|---|
Cluster |
The node changes the source IP address of load-balanced packets to match the IP address of the node which received it from the load balancer. | The node routes packets to any serving Pod. The serving Pod might or might not be on the same node. If the node that receives the packets from the load balancer lacks a ready and serving Pod, the node routes the packets to a different node which does contain a ready and serving Pod. Response packets from the Pod are routed from its node back to the node which received the request packets from the load balancer. That first node then sends the response packets to the original client using Direct Server Return. |
Local |
The node does not change the source IP address of load-balanced packets. | In most situations, the node routes the packet to a serving Pod running on the node which received the packet from the load balancer. That node sends response packets to the original client using Direct Server Return. This is the primary intent of this type of traffic policy. In some situations, a node receives packets from the load balancer
even though the node lacks a ready, non-terminating serving Pod for
the Service. This situation is encountered when the load balancer's
health check has not yet reached its failure threshold, but a previously
ready and serving Pod is no longer ready or is terminating (for example,
when doing a rolling update). How the packets are processed in this
situation depends on the GKE version, whether the cluster uses
GKE Dataplane V2, and the value of
|
Pricing and quotas
Network pricing applies to packets processed by a load balancer. For more information, see Cloud Load Balancing and forwarding rules pricing. You can also estimate billing charges using the Google Cloud pricing calculator.
The number of forwarding rules you can create is controlled by load balancer quotas:
- Internal passthrough Network Load Balancers use the per-project backend services quota, the per-project health checks quota, and the Internal passthrough Network Load Balancer forwarding rules per Virtual Private Cloud network quota.
- Backend service-based external passthrough Network Load Balancers use the per-project backend services quota, the per-project health checks quota, and the per-project external passthrough Network Load Balancer forwarding rules quota.
- Target pool-based external passthrough Network Load Balancers use the per-project target pools quota, the per-project health checks quota, and the per-project external passthrough Network Load Balancer forwarding rules quota.
What's next
- Learn about GKE LoadBalancer Service parameters.
- Learn about Kubernetes Services.