Traffic distribution for internal passthrough Network Load Balancers

This page explains concepts about how internal passthrough Network Load Balancers distribute traffic.

Backend selection and connection tracking

Backend selection and connection tracking work together to balance multiple connections across different backends and to route all packets for each connection to the same backend. This is accomplished with a two-part strategy. First, a backend is selected using consistent hashing. Then, this selection is recorded in a connection tracking table.

The following steps capture the backend selection and connection tracking process.

1. Check for a connection tracking table entry to use a previously selected backend

For an existing connection, the load balancer uses the connection tracking table to identify the previously selected backend for that connection.

The load balancer attempts to match each load-balanced packet with an entry in its connection tracking table using the following process:

If the packet is a TCP packet with the SYN flag:
- If the load balancer's connection tracking mode is PER_CONNECTION, continue to the Identify eligible backends step. In PER_CONNECTION tracking mode, a TCP packet with the SYN flag always represents a new connection, regardless of the configured session affinity.
- If the load balancer's connection tracking mode is PER_SESSION and the session affinity is either NONE or CLIENT_IP_PORT_PROTO, continue to the Identify eligible backends step. In PER_SESSION tracking mode, a TCP packet with the SYN flag represents a new connection only when using one of the 5-tuple session affinity options (NONE or CLIENT_IP_PORT_PROTO).
For all other packets, the load balancer checks if the packet matches an existing connection tracking table entry. The connection tuple (a set of packet characteristics) used to compare the packet to existing connection tracking table entries depends on the connection tracking mode and session affinity you configured. For information about which connection tuple is used for connection tracking, see the table in the Connection tracking mode section.
- If the packet matches a connection tracking table entry, the load balancer sends the packet to the previously selected backend.
- If the packet doesn't match a connection tracking table entry, continue to the Identify eligible backends step.
For information about how long a connection tracking table entry persists and under what conditions it persists, see the Create a connection tracking table entry step.

2. Select an eligible backend for a new connection

For a new connection, the load balancer uses the consistent hashing algorithm to select a backend from among the eligible backends.

The following steps outline the process to select an eligible backend for a new connection and then record that connection in a connection tracking table.

2.1 Identify eligible backends

This step models which backends are candidates to receive new connections, taking into consideration health and failover policy configuration:

No failover policy

The set of eligible backends depends only on health checks:

When at least one backend is healthy, eligible backends are all healthy backends.
When all backends are unhealthy, eligible backends are all backends.

Failover policy configured

The set of eligible backends depends on health checks and failover policy configuration:

When at least one backend is healthy, eligible backends are defined as follows, using the first condition that is true from this ordered list:
- If there are no healthy primary backends, eligible backends are all healthy failover backends.
- If there are no healthy failover backends, eligible backends are all healthy primary backends.
- If the failover ratio is set to 0.0 (the default value), eligible backends are all healthy primary backends.
- If the ratio of the number of healthy primary backends compared to the total number of primary backends is greater than or equal to the configured failover ratio, eligible backends are all healthy primary backends.
- The eligible backends are all healthy failover backends.
When there are no healthy backends, eligible backends are defined as follows:
- If the load balancer's failover policy is configured to drop new connections when no backends are healthy, the set of eligible backends is empty. The load balancer drops the packets for new connections.
- If the load balancer's failover policy is not configured to drop new connections when no backends are healthy, health checks aren't relevant. Eligible backends are all primary backends.

2.2 Adjust eligible backends for zonal affinity

This step is skipped if any of the following is true:

Zonal affinity is disabled
The client is incompatible with zonal affinity
A zonal match with the zone of a compatible client doesn't happen

If zonal affinity is enabled, a client is compatible with zonal affinity, and a zonal match happens, new connections from the client are routed to an adjusted set of eligible backends. For more information, see the following:

2.3 Select an eligible backend

The load balancer uses consistent hashing to select an eligible backend. The load balancer maintains hashes of eligible backends, mapped to a unit circle. When processing a packet for a connection that's not in the connection tracking table, the load balancer computes a hash of the packet characteristics and maps that hash to the same unit circle, selecting an eligible backend on the circle's circumference. The set of packet characteristics used to calculate the packet hash is defined by the session affinity setting.

If a session affinity isn't explicitly configured, the NONE session affinity is the default.
The load balancer assigns new connections to eligible backends in a way that is as consistent as possible even if the number of eligible backends changes. The following benefits of consistent hashing show how the load balancer selects eligible backends for possible new connections that do not have connection tracking table entries:
- The load balancer selects the same backend for all possible new connections that have identical packet characteristics, as defined by session affinity, if the set of eligible backends does not change.
- When a new eligible backend is added, approximately 1/N possible new connections map to the newly added backend. In this situation, N is the count of eligible backends after the new backend is added.
- When an eligible backend is removed, approximately 1/N possible new connections map to one of the N-1 remaining backends. In this situation, N is the count of backends before the eligible backend is removed.

2.4 Create a connection tracking table entry

After selecting a backend, the load balancer creates a connection tracking table entry. The connection tracking table entry maps packet characteristics to the selected backend. The packet header fields used for this mapping depend on the connection tracking mode and session affinity you configured.

The load balancer removes connection tracking table entries according to the following rules:

A connection tracking table entry is removed after the connection has been idle. Unless you configure a custom idle timeout, the load balancer uses a default idle timeout of 600 seconds. For more information, see Idle timeout.
Connection tracking table entries are not removed when a TCP connection is closed with a FIN or RST packet. Any new TCP connection always carries the SYN flag and is processed as described in the Check for a connection tracking table entry step.
If a failover policy is configured and the connection draining on failover and failback setting is disabled, the load balancer removes all entries in the connection tracking table when the eligible backends switch from primary to failover backends (failover), or switch from failover to primary backends (failback). For more information, see Connection draining on failover and failback.
Entries in the connection tracking table can be removed if a backend becomes unhealthy. This behavior depends on the connection tracking mode, the protocol, and the connection persistence on unhealthy backends setting. For more information, see Connection persistence on unhealthy backends.
Entries in the connection tracking table are removed after the connection draining timeout that occurs following an event like deleting a backend VM or removing a backend VM from an instance group or NEG. For more information, see Enable connection draining.

Session affinity

Session affinity controls the distribution of new connections from clients to the load balancer's backends. Internal passthrough Network Load Balancers use session affinity to select a backend from a set of eligible backends as described in the Identify eligible backends and Select an eligible backend steps in the Backend selection and connection tracking section. You configure session affinity on the backend service, not on each backend instance group or NEG.

Internal passthrough Network Load Balancers support the following session affinity settings. Each session affinity setting uses consistent hashing to select an eligible backend. The session affinity setting determines which fields from the IP header and Layer 4 headers are used to calculate the hash.

Hash method for backend selection	Session affinity setting
5-tuple hash (consists of source IP address, source port, protocol, destination IP address, and destination port) for non-fragmented packets that include port information such as TCP packets and non-fragmented UDP packets OR 3-tuple hash (consists of source IP address, destination IP address, and protocol) for fragmented UDP packets and packets of all other protocols	`NONE`¹
5-tuple hash (consists of source IP address, source port, protocol, destination IP address, and destination port) for non-fragmented packets that include port information such as TCP packets and non-fragmented UDP packets OR 3-tuple hash (consists of source IP address, destination IP address, and protocol) for fragmented UDP packets and packets of all other protocols	`CLIENT_IP_PORT_PROTO`
3-tuple hash (consists of source IP address, destination IP address, and protocol)	`CLIENT_IP_PROTO`
2-tuple hash (consists of source IP address and destination IP address)	`CLIENT_IP`
1-tuple hash (consists of source IP only)	`CLIENT_IP_NO_DESTINATION`²

¹ A session affinity setting of NONE does not mean that there is no session affinity. It means that no session affinity option is explicitly configured.

Hashing is always performed to select a backend. And a session affinity setting of NONE means that the load balancer uses a 5-tuple hash or a 3-tuple hash to select backends—functionally the same behavior as when CLIENT_IP_PORT_PROTO is set.

² CLIENT_IP_NO_DESTINATION is a one-tuple hash based on just the source IP address of each received packet. This setting can be useful in situations where you need the same backend VM to process all packets from a client, based solely on the source IP address of the packet, without respect to the packet destination IP address. These situations usually arise when an internal passthrough Network Load Balancer is a next hop for a static route. For details, see Session affinity and next hop internal passthrough Network Load Balancer.

To learn how the different session affinity settings affect the backend selection and connection tracking methods, see the table in the Connection tracking mode section.

Session affinity and next hop internal passthrough Network Load Balancer

When an internal passthrough Network Load Balancer load balancer is a next hop of a static route, the destination IP address is not limited to the forwarding rule IP address of the load balancer. Instead, the destination IP address of the packet can be any IP address that fits within the destination range of the static route.

Selecting an eligible backend depends on calculating a hash of packet characteristics. Except for the CLIENT_IP_NO_DESTINATION session affinity (1-tuple hash), the hash depends, in part, on the packet destination IP address.

The load balancer selects the same backend for all possible new connections that have identical packet characteristics, as defined by session affinity, if the set of eligible backends does not change. If you need the same backend VM to process all packets from a client, based solely on the source IP address, regardless of destination IP addresses, use the CLIENT_IP_NO_DESTINATION session affinity.

Connection tracking policy

This section describes the settings that control the connection tracking behavior of internal passthrough Network Load Balancers. A connection tracking policy includes the following settings:

Connection tracking mode
Connection persistence on unhealthy backends
Idle timeout

Connection tracking mode

The load balancer's connection tracking table maps connection tuples to previously selected backends in a hash table. The set of packet characteristics that compose each connection tuple is determined by the connection tracking mode and session affinity.

Internal passthrough Network Load Balancers track connections for all protocols that they support.

The connection tracking mode refers to the granularity of the each connection tuple in the load balancer's connection tracking table. The connection tuple can be 5-tuple or 3-tuple (PER_CONNECTION mode), or it can match the session affinity setting (PER_SESSION mode).

PER_CONNECTION. This is the default connection tracking mode. This connection tracking mode uses a 5-tuple hash or a 3-tuple hash. Non-fragmented packets that include port information such as TCP packets and non-fragmented UDP packets are tracked with 5-tuple hashes. All other packets are tracked with 3-tuple hashes.
PER_SESSION. This connection tracking mode uses a hash consisting of the same packet characteristics used by the session affinity hash. Depending on the chosen session affinity, PER_SESSION can result in connections that more frequently match an existing connection tracking table entry, reducing the frequency that a backend needs to be selected by the session affinity hash.

The following table summarizes how connection tracking mode and session affinity work together to route all packets for each connection to the same backend.

Backend selection using session affinity		Connection tracking mode
Session affinity setting	Hash method for backend selection	`PER_CONNECTION` (default)	`PER_SESSION`
`NONE`	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash
`CLIENT_IP_NO_DESTINATION`	All protocols: 1-tuple hash	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash	All protocols: 1-tuple hash
`CLIENT_IP`	All protocols: 2-tuple hash	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash	All protocols: 2-tuple hash
`CLIENT_IP_PROTO`	All protocols: 3-tuple hash	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash	All protocols: 3-tuple hash
`CLIENT_IP_PORT_PROTO`	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash	TCP and unfragmented UDP: 5-tuple hash Fragmented UDP and all other protocols: 3-tuple hash

To learn how to change the connection tracking mode, see Configure a connection tracking policy.

Connection persistence on unhealthy backends

The connection persistence settings control whether an existing connection persists on a selected backend VM or endpoint after that backend becomes unhealthy, as long as the backend remains in the load balancer's configured backend group (in an instance group or a NEG).

The following connection persistence options are available:

DEFAULT_FOR_PROTOCOL (default)
NEVER_PERSIST
ALWAYS_PERSIST

The following table summarizes connection persistence options and how connections persist for different protocols, session affinity options, and tracking modes.

Connection persistence on unhealthy backends option	Connection tracking mode
Connection persistence on unhealthy backends option	`PER_CONNECTION`	`PER_SESSION`
`DEFAULT_FOR_PROTOCOL`	TCP: connections persist on unhealthy backends (all session affinities) UDP: connections never persist on unhealthy backends	TCP: connections persist on unhealthy backends if session affinity is `NONE` or `CLIENT_IP_PORT_PROTO` UDP: connections never persist on unhealthy backends
`NEVER_PERSIST`	TCP, UDP: connections never persist on unhealthy backends
`ALWAYS_PERSIST`	TCP, UDP: connections persist on unhealthy backends (all session affinities) This option should only be used for advanced use cases.	Configuration not possible

To learn how to change connection persistence behavior, see Configure a connection tracking policy.

Note: When a backend becomes unhealthy, the load balancer removes existing connection tracking table entries according to the packet protocol and the selected connection persistence on unhealthy backends option. Subsequent packets for the connections whose entries were removed are treated as new connections. The load balancer creates a replacement connection tracking table entry—after selecting an eligible backend— for each connection. Replacement connection tracking table entries remain valid according to the same conditions as any other connection tracking table entry. If the unhealthy backend returns to being healthy, that event alone does not cause the replacement connection tracking table entries to be removed, unless the event triggers a failback and connection draining on failover and failback is disabled.

Idle timeout

A connection tracking table entry is removed after the connection has been idle for a certain period of time. Unless you configure a custom idle timeout, the load balancer uses a default idle timeout value of 600 seconds (10 minutes).

The following table shows the minimum and maximum configurable idle timeout values for different combinations of session affinity and connection tracking mode settings.

Session affinity	Connection tracking mode	Default idle timeout	Minimum configurable idle timeout	Maximum configurable idle timeout
Any connection tuple	`PER_CONNECTION`	600 seconds	60 seconds	600 seconds
1-tuple (`CLIENT_IP_NO_DESTINATION`) 2-tuple (`CLIENT_IP`) 3-tuple (`CLIENT_IP_PROTO`)	`PER_SESSION`	600 seconds	60 seconds	57,600 seconds
5-tuple (`NONE`, `CLIENT_IP_PORT_PROTO`)	`PER_SESSION`	600 seconds	60 seconds	600 seconds

To learn how to change the idle timeout value, see Configure a connection tracking policy.

Connections for single-client deployments

When testing connections to the IP address of an internal passthrough Network Load Balancer that only has one client, you should keep the following in mind:

If the client VM is not a VM being load balanced—that is, it is not also a backend VM, new connections are delivered to the load balancer's healthy backend VMs. However, because all session affinity options rely on at least the client system's IP address, connections from the same client might be distributed to the same backend VM more frequently than you might expect.

Practically, this means that you cannot accurately monitor traffic distribution through an internal passthrough Network Load Balancer by connecting to it from a single client. The number of clients needed to monitor traffic distribution varies depending on the load balancer type, the type of traffic, and the number of healthy backends.
If the client VM is also a backend VM of the load balancer, connections sent to the IP address of the load balancer's forwarding rule are always answered by the same backend VM (which is also the client VM). This happens regardless of whether the backend VM is healthy. It happens for all traffic sent to the load balancer's IP address, not just traffic on the protocol and ports specified in the load balancer's internal forwarding rule.

For more information, see sending requests from load-balanced VMs.

Connection draining

Connection draining provides a configurable amount of additional time for established connections to persist in the load balancer's connection tracking table when one of the following actions takes place:

A virtual machine (VM) instance is removed from a backend instance group (this includes abandoning an instance in a backend managed instance group)
A VM is stopped or deleted (this includes automatic actions like rolling updates or scaling down a backend managed instance group)
An endpoint is removed from a backend network endpoint group (NEG)

By default, connection draining for the aforementioned actions is disabled. For information about how connection draining is triggered and how to enable connection draining, see Enabling connection draining.

UDP fragmentation

Internal passthrough Network Load Balancers can process both fragmented and unfragmented UDP packets. If your application uses fragmented UDP packets, keep the following in mind:

UDP packets might become fragmented before reaching a Google Cloud VPC network.
Google Cloud VPC networks forward UDP fragments as they arrive (without waiting for all fragments to arrive).
Non-Google Cloud networks and on-premises network equipment might forward UDP fragments as they arrive, delay fragmented UDP packets until all fragments have arrived, or discard fragmented UDP packets. For details, see the documentation for the network provider or network equipment.

If you expect fragmented UDP packets and need to route them to the same backends, use the following forwarding rule and backend service configuration parameters:

Forwarding rule configuration: Use only one UDP forwarding rule per load-balanced IP address, and configure the forwarding rule to accept traffic on all ports. This ensures that all fragments arrive at the same forwarding rule. Even though the fragmented packets (other than the first fragment) lack a destination port, configuring the forwarding rule to process traffic for all ports also configures it to receive UDP fragments that have no port information. To configure all ports, either use the Google Cloud CLI to set --ports=ALL or use the API to set allPorts to True.
Backend service configuration: Set the backend service's session affinity to CLIENT_IP (2-tuple hash) or CLIENT_IP_PROTO (3-tuple hash) so that the same backend is selected for UDP packets that include port information and UDP fragments (other than the first fragment) that lack port information. Set the backend service's connection tracking mode to PER_SESSION so that the connection tracking table entries are built by using the same 2-tuple or 3-tuple hashes.

Failover

An internal passthrough Network Load Balancer lets you designate some backends as failover backends. These backends are only used when the number of healthy VMs in the primary backend instance groups has fallen below a configurable threshold. By default, if all primary and failover VMs are unhealthy, as a last resort Google Cloud distributes new connections only among all the primary VMs.

When you add a backend to an internal passthrough Network Load Balancer's backend service, by default that backend is a primary backend. You can designate a backend to be a failover backend when you add it to the load balancer's backend service, or by editing the backend service later.

For more information about how failover is used for backend selection and connection tracking, see the Identify eligible backends and Create a connection tracking table entry steps in the Backend selection and connection tracking section.

For more information about how failover works, see Failover for internal passthrough Network Load Balancers.

What's next

To configure and test an internal passthrough Network Load Balancer that uses failover, see Configure failover for internal passthrough Network Load Balancers.
To configure and test an internal passthrough Network Load Balancer, see Set up an internal passthrough Network Load Balancer.