This page provides a general overview of how Google Kubernetes Engine (GKE) creates
and manages Google Cloud load balancers when you apply a Kubernetes LoadBalancer
Services manifest. It describes the different types of load balancers and how
settings like the externalTrafficPolicy
and GKE subsetting for
L4 internal load balancers determine how the load balancers are configured.
Before reading this page, you should be familiar with GKE networking concepts.
Overview
When you create a LoadBalancer Service, GKE configures a Google Cloud pass-through load balancer whose characteristics depend on parameters of your Service manifest.
Choose a LoadBalancer Service
When choosing which LoadBalancer Service configuration to use, consider the following aspects:
- The type of IP address of the LoadBalancer. Your load balancer can have an internal or an external IP address.
- The number and type of nodes the LoadBalancer supports.
After you determine your network architecture requirements, use the following decision tree to determine which LoadBalancer Service to choose for your network configuration.
External versus internal load balancing
When you create a LoadBalancer Service in GKE, you specify whether the load balancer has an internal or external address:
If your clients are located in the same VPC network or in a network connected to the cluster's (VPC network), then use an Internal LoadBalancer Service. Internal LoadBalancer Services are implemented by using internal passthrough Network Load Balancers. Clients located in the same VPC network or in a network connected to the cluster's VPC network can access the Service by using the load balancer's IP address.
To create an internal LoadBalancer Service, include one of the following annotations in the
metadata.annotations[]
of the Service manifest:networking.gke.io/load-balancer-type: "Internal"
(GKE 1.17 and later)cloud.google.com/load-balancer-type: "Internal"
(versions earlier than 1.17)
If your clients are located outside your VPC network, then use an external LoadBalancer Service. You can use one of the following types of external passthrough Network Load Balancers accessible on the internet (including Google Cloud VMs with internet access):
(Recommended) Create a backend service-based external passthrough Network Load Balancer by including the following in
metadata.annotations[]
of the manifest:cloud.google.com/l4-rbs: "enabled"
Create a target pool-based external passthrough Network Load Balancer by omitting the
cloud.google.com/l4-rbs: "enabled"
annotation.
Effect of externalTrafficPolicy
You can set externalTrafficPolicy
to Local
or Cluster
to define how packets are
routed to nodes with ready and serving Pods. Consider the following scenarios when defining the externalTrafficPolicy
:
Use
externalTrafficPolicy: Local
to preserve the original client IP addresses or if you want to minimize disruptions when the number of nodes without serving Pods in the cluster changes.Use
externalTrafficPolicy: Cluster
if the overall number of nodes without serving Pods in your cluster remains consistent, but the number of nodes with serving Pods changes. This option does not preserve the original client IP addresses.
For more information about how externalTrafficPolicy
affects packet routing within the nodes, see packet processing.
GKE subsetting
The GKE subsetting for L4 internal load balancers cluster-wide configuration option, or GKE subsetting, improves the scalability of internal passthrough Network Load Balancers by more efficiently grouping node endpoints for the load balancer backends.
The following diagram shows two Services in a zonal cluster with three nodes and
GKE subsetting enabled. Each Service has two Pods.
GKE creates one GCE_VM_IP
network endpoint group (NEG) for each
Service. Endpoints in each NEG are the nodes with the serving Pods for the
respective Service.
You can enable GKE subsetting when you create a cluster or by editing an existing cluster. Once enabled, you cannot disable GKE subsetting. For more information, see GKE subsetting.
GKE subsetting requires:
- GKE version 1.18.19-gke.1400 or later, and
- The
HttpLoadBalancing
add-on enabled for the cluster. This add-on is enabled by default. It allows the cluster to manage load balancers which use backend services.
Node count consideration when enabling GKE subsetting
As a best practice, if you need to create internal LoadBalancer Services, you should enable GKE subsetting. GKE subsetting allows you to support more nodes in your cluster:
If your cluster has GKE subsetting disabled, you should not create more than 250 total nodes (among all node pools). If you create more than 250 total nodes in the cluster, internal LoadBalancer Services might experience uneven traffic distribution or complete loss of connectivity.
If your cluster has GKE subsetting enabled, you can use either
externalTrafficPolicy: Local
orexternalTrafficPolicy: Cluster
, as long as the number of unique nodes with at least one serving Pod is not higher than 250 nodes. Nodes without any serving Pod are not relevant. If you need more than 250 nodes with at least one serving Pod, you must useexternalTrafficPolicy: Cluster
.
Internal passthrough Network Load Balancers created by GKE can only distribute packets to 250 or fewer backend node VMs. This limitation exists because GKE does not use load balancer backend subsetting, and an internal passthrough Network Load Balancer is limited to distributing packets to 250 or fewer backends when load balancer backend subsetting is disabled.
Node grouping
The Service manifest annotations and, for Internal LoadBalancer Service, the status of GKE subsetting
determine the resulting Google Cloud load balancer and the type of
backends. Backends for Google Cloud pass-through load balancers identify
the network interface (NIC) of the GKE node, not a particular node or Pod IP
address. The type of load balancer and backends determine how nodes are grouped
into GCE_VM_IP
NEGs, instance groups, or target pools.
GKE LoadBalancer Service | Resulting Google Cloud load balancer | Node grouping method |
---|---|---|
Internal LoadBalancer Service created in a cluster with GKE subsetting enabled1 | An internal passthrough Network Load Balancer whose backend service uses GCE_VM_IP
network endpoint group (NEG) backends |
Node VMs are grouped zonally into The |
Internal LoadBalancer Service created in a cluster with GKE subsetting disabled | An internal passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends | All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the internal passthrough Network Load Balancer's backend service. The The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation. |
External LoadBalancer Service with the
cloud.google.com/l4-rbs: "enabled" annotation2
|
A backend service-based external passthrough Network Load Balancer whose backend service uses zonal unmanaged instance group backends | All node VMs are placed into zonal unmanaged instance groups which GKE uses as backends for the external passthrough Network Load Balancer's backend service. The The same unmanaged instance groups are used for other load balancer backend services created in the cluster because of the single load-balanced instance group limitation. |
External LoadBalancer Service without the
cloud.google.com/l4-rbs: "enabled" annotation3
|
A target pool-based external passthrough Network Load Balancer whose target pool contains all nodes of the cluster | The target pool is a legacy API which does not rely on instance groups. All nodes have direct membership in the target pool. The |
1 Only the internal passthrough Network Load Balancers created after enabling
GKE subsetting use GCE_VM_IP
NEGs. Any
internal LoadBalancer Services created before enabling GKE
subsetting continue to use unmanaged instance group backends. For examples
and configuration guidance, see
Creating
internal LoadBalancer Services.
2GKE does not automatically migrate existing
external LoadBalancer Services from target pool-based external passthrough Network Load Balancers to
backend service-based external passthrough Network Load Balancers. To create an external LoadBalancer
Service powered by a backend service-based external passthrough Network Load Balancer, you must
include the cloud.google.com/l4-rbs: "enabled"
annotation in the
Service manifest at the time of creation.
3Removing the cloud.google.com/l4-rbs: "enabled"
annotation from an existing external LoadBalancer Service powered by a backend
service-based external passthrough Network Load Balancer does not cause GKE to create a
target pool-based external passthrough Network Load Balancer. To create an external LoadBalancer
Service powered by a target pool-based external passthrough Network Load Balancer, you must
omit the cloud.google.com/l4-rbs: "enabled"
annotation from the
Service manifest at the time of creation.
Node membership in GCE_VM_IP
NEG backends
When GKE subsetting is enabled for a cluster, GKE
creates a unique GCE_VM_IP
NEG in each zone for each internal LoadBalancer
Service. Unlike instance groups, nodes can be members of more than one
load-balanced GCE_VM_IP
NEG. The externalTrafficPolicy
of the Service and
the number of nodes in the cluster determine which nodes are added as endpoints
to the Service's GCE_VM_IP
NEG(s).
The cluster's control plane adds nodes as endpoints to the GCE_VM_IP
NEGs
according to the value of the Service's externalTrafficPolicy
and the number
of nodes in the cluster, as summarized in the following table.
externalTrafficPolicy |
Number of nodes in the cluster | Endpoint membership |
---|---|---|
Cluster |
1 to 25 nodes | GKE uses all nodes in the cluster as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service. |
Cluster |
more than 25 nodes | GKE uses a random subset of 25 nodes as endpoints for the Service's NEG(s), even if a node does not contain a serving Pod for the Service. |
Local |
any number of nodes1 | GKE only uses nodes which have at least one of the Service's serving Pods as endpoints for the Service's NEG(s). |
1Limited to 250 nodes with serving Pods for internal LoadBalancer Services. More than 250 nodes can be present in the cluster, but internal passthrough Network Load Balancers only distribute to 250 backend VMs when internal passthrough Network Load Balancer backend subsetting is disabled. Even with GKE subsetting enabled, GKE never configures internal passthrough Network Load Balancers with internal passthrough Network Load Balancer backend subsetting. For details about this limit, see Maximum number of VM instances per internal backend service.
Single load-balanced instance group limitation
The Compute Engine API prohibits VMs from being members of more than one load-balanced instance group. GKE nodes are subject to this constraint.
When using unmanaged instance group backends, GKE creates or updates unmanaged instance groups containing all nodes from all node pools in each zone the cluster uses. These unmanaged instance groups are used for:
- Internal passthrough Network Load Balancers created for internal LoadBalancer Services when GKE subsetting is disabled.
- Backend service-based external passthrough Network Load Balancers created for external LoadBalancer
Services with the
cloud.google.com/l4-rbs: "enabled"
annotation. - External Application Load Balancers created for external GKE Ingress resources, using the GKE Ingress controller, but not using container-native load balancing.
Because node VMs can't be members of more than one load-balanced instance group, GKE can't create and manage internal passthrough Network Load Balancers, backend service-based external passthrough Network Load Balancers, and external Application Load Balancers created for GKE Ingress resources if either of the following is true:
- Outside of GKE, you created at least one backend service based load balancer, and you used the cluster's managed instance groups as backends for the load balancer's backend service.
- Outside of GKE, you create a custom unmanaged instance group that contains some or all of the cluster's nodes, then attach that custom unmanaged instance group to a backend service for a load balancer.
To work around this limitation, you can instruct GKE to use NEG backends where possible:
- Enable GKE subsetting. As a result, new internal
LoadBalancer Services use
GCE_VM_IP
NEGs instead. - Configure external GKE Ingress resources to use container native load balancing. For more information, see GKE container-native load balancing.
Load balancer health checks
All GKE LoadBalancer Services require a load balancer health check. The load balancer's health check is implemented outside of the cluster and is different from a readiness or liveness probe.
The externalTrafficPolicy
of the Service defines how the load balancer's
health check operates. In all cases, the load balancer's health check probers
send packets to the kube-proxy
software running on each node. The load
balancer's health check is a proxy for information that the kube-proxy
gathers, such as whether a Pod exists, is running, and has passed its readiness
probe. Health checks for LoadBalancer Services cannot be routed to serving
Pods. The load balancer's health check is designed to direct new TCP
connections to nodes.
The following table describes the health check behavior:
externalTrafficPolicy |
Which nodes pass the health check | What port is used |
---|---|---|
Cluster |
All nodes of the cluster pass the health check even if the node has no serving Pods. If one or more serving Pods exist on a node, that node passes the load balancer's health check even if the serving Pods are terminating or are failing readiness probes. | The load balancer health check port must be TCP port 10256. It cannot be customized. |
Local |
Only the nodes with at least one ready, non-terminating serving Pod pass the load balancer's health check. Nodes without a serving Pod, nodes whose serving Pods all fail readiness probes, and nodes whose serving Pods are all terminating fail the load balancer's health check. During state transitions, a node still passes the load balancer's health check until the load balancer's unhealthy threshold is reached. The transition state occurs when all serving Pods on a node begin to fail readiness probes or when all serving Pods on a node are terminating. How the packet is processed in this situation depends on the GKE version. For additional details, see the next section, Packet processing. |
The health check port is TCP port 10256 unless you specify a custom health check port. |
Packet processing
The following sections detail how the load balancer and cluster nodes work together to route packets received for LoadBalancer Services.
Pass-through load balancing
The Google Cloud pass-through load balancer routes packets to the nic0
interface of the GKE cluster's nodes. Each load-balanced packet
received by a node has the following characteristics:
- The packet's destination IP address matches the load balancer's forwarding rule IP address.
- The protocol and destination port of the packet match both of these:
- a protocol and port specified in
spec.ports[]
of the Service manifest - a protocol and port configured on the load balancer's forwarding rule
- a protocol and port specified in
Destination Network Address Translation on nodes
After the node receives the packet, the node performs additional packet
processing. In GKE clusters without GKE Dataplane V2 enabled,
nodes use iptables
to process load-balanced packets. In GKE
clusters with GKE Dataplane V2
enabled, nodes use
eBPF instead. The node-level packet
processing always includes the following actions:
- The node performs Destination Network Address Translation (DNAT) on the packet, setting its destination IP address to a serving Pod IP address.
- The node changes the packet's destination port to the
targetPort
of the corresponding Service'sspec.ports[]
.
Source Network Address Translation on nodes
The externalTrafficPolicy
determines whether the node-level packet processing
also performs source network address translation (SNAT) as well as the path the
packet takes from node to Pod:
externalTrafficPolicy |
Node SNAT behavior | Routing behavior |
---|---|---|
Cluster |
The node changes the source IP address of load-balanced packets to match the IP address of the node which received it from the load balancer. | The node routes packets to any serving Pod. The serving Pod might or might not be on the same node. If the node that receives the packets from the load balancer lacks a ready and serving Pod, the node routes the packets to a different node which does contain a ready and serving Pod. Response packets from the Pod are routed from its node back to the node which received the request packets from the load balancer. That first node then sends the response packets to the original client using Direct Server Return. |
Local |
The node does not change the source IP address of load-balanced packets. | In most situations, the node routes the packet to a serving Pod running on the node which received the packet from the load balancer. That node sends response packets to the original client using Direct Server Return. This is the primary intent of this type of traffic policy. In some situations, a node receives packets from the load balancer
even though the node lacks a ready, non-terminating serving Pod for
the Service. This situation is encountered when the load balancer's
health check has not yet reached its failure threshold, but a previously
ready and serving Pod is no longer ready or is terminating (for example,
when doing a rolling update). How the packets are processed in this
situation depends on the GKE version, whether the cluster uses
GKE Dataplane V2, and the value of
|
Pricing and quotas
Network pricing applies to packets processed by a load balancer. For more information, see Cloud Load Balancing and forwarding rules pricing. You can also estimate billing charges using the Google Cloud pricing calculator.
The number of forwarding rules you can create is controlled by load balancer quotas:
- Internal passthrough Network Load Balancers use the per-project backend services quota, the per-project health checks quota, and the Internal passthrough Network Load Balancer forwarding rules per Virtual Private Cloud network quota.
- Backend service-based external passthrough Network Load Balancers use the per-project backend services quota, the per-project health checks quota, and the per-project external passthrough Network Load Balancer forwarding rules quota.
- Target pool-based external passthrough Network Load Balancers use the per-project target pools quota, the per-project health checks quota, and the per-project external passthrough Network Load Balancer forwarding rules quota.