Backend service-based external TCP/UDP Network Load Balancing overview

Google Cloud external TCP/UDP Network Load Balancing (after this referred to as Network Load Balancing) is a regional, pass-through load balancer. A network load balancer distributes external traffic among virtual machine (VM) instances in the same region.

You can configure a network load balancer for TCP, UDP, ESP, and ICMP traffic. Support for ESP and ICMP is in Preview.

A network load balancer can receive traffic from:

  • Any client on the internet
  • Google Cloud VMs with external IPs
  • Google Cloud VMs that have internet access through Cloud NAT or instance-based NAT

Network Load Balancing has the following characteristics:

  • Network Load Balancing is a managed service.
  • Network Load Balancing is implemented by using Andromeda virtual networking and Google Maglev.
  • Network load balancers are not proxies.
    • Load-balanced packets are received by backend VMs with the packet's source and destination IP addresses, protocol, and, if the protocol is port-based, the source and destination ports unchanged.
    • Load-balanced connections are terminated by the backend VMs.
    • Responses from the backend VMs go directly to the clients, not back through the load balancer. The industry term for this is direct server return.

Backend service-based network load balancers have the following characteristics:

  • Managed instance group backends. Backend service-based network load balancers support using managed instance groups (MIGs) as backends. Managed instance groups automate certain aspects of backend management and provide better scalability and reliability as compared to unmanaged instance groups.
  • Fine-grained traffic distribution control. A backend service configuration contains a set of values, such as health checks, session affinity, connection tracking, connection draining, and failover policies. Most of these settings have default values that let you get started quickly.
  • Health checks. Backend service-based network load balancers use health checks that match the type of traffic (TCP, SSL, HTTP, HTTPS, or HTTP/2) that they are distributing.

Architecture

The following diagram illustrates the components of a network load balancer:

External TCP/UDP network load balancer with regional backend service
Network Load Balancing with regional backend service

The load balancer is made up of several configuration components. A single load balancer can have the following:

  • One or more regional external IP addresses
  • One or more regional external forwarding rules
  • One regional external backend service
  • One or more backend instance groups
  • Health check associated with the backend service

Additionally, you must create firewall rules that allow your load balancing traffic and health check probes to reach the backend VMs.

IP address

A network load balancer requires at least one forwarding rule. The forwarding rule references a single regional external IP address. Regional external IP addresses are accessible anywhere on the internet, but they come from a pool unique to each Google Cloud region.

When you create a forwarding rule, you can specify the name or IP address of an existing reserved regional external IP address. If you don't specify an IP address, the forwarding rule references an ephemeral regional external IP address. Use a reserved IP address if you need to keep the address associated with your project for reuse after you delete a forwarding rule or if you need multiple forwarding rules to reference the same IP address.

Network Load Balancing supports both Standard Tier and Premium Tier regional external IP addresses. Both the IP address and the forwarding rule must use the same network tier.

For steps to reserve an IP address, see External IP addresses.

Forwarding rule

A regional external forwarding rule specifies the protocol and ports on which the load balancer accepts traffic. Because network load balancers are not proxies, they pass traffic to backends on the same protocol and ports, if the packet carries port information. The forwarding rule in combination with the IP address forms the frontend of the load balancer.

The load balancer preserves the source IP addresses of incoming packets. The destination IP address for incoming packets is the IP address associated with the load balancer's forwarding rule.

Incoming traffic is matched to a forwarding rule, which is a combination of a particular IP address, protocol, and if the protocol is port-based, one of port(s), a range of ports, or all ports. The forwarding rule then directs traffic to the load balancer's backend service.

A network load balancer requires at least one forwarding rule. You can define multiple forwarding rules for the same load balancer as described in Multiple forwarding rules.

Forwarding rule protocols

Network Load Balancing supports the following protocol options for each forwarding rule: TCP, UDP, and L3_DEFAULT (Preview).

Use the TCP and UDP options to configure TCP or UDP load balancing. The L3_DEFAULT protocol option enables a network load balancer to load-balance TCP, UDP, ESP, and ICMP traffic.

In addition to supporting protocols other than TCP and UDP, L3_DEFAULT makes it possible for a single forwarding rule to serve multiple protocols. For example, IPSec services typically handle some combination of ESP and UDP-based IKE and NAT-T traffic. The L3_DEFAULT option allows a single forwarding rule to be configured to process all of those protocols.

Forwarding rules using the TCP or UDP protocols can reference a backend service using either the same protocol as the forwarding rule or a backend service whose protocol is UNSPECIFIED(Preview). L3_DEFAULT forwarding rules can only reference a backend service with protocol UNSPECIFIED.

The following table summarizes how to use these settings for different protocols.

Traffic to be load-balanced Forwarding rule protocol Backend service protocol
TCP TCP TCP or UNSPECIFIED
L3_DEFAULT UNSPECIFIED
UDP UDP UDP or UNSPECIFIED
L3_DEFAULT UNSPECIFIED
ESP L3_DEFAULT UNSPECIFIED
ICMP (Echo Request only) L3_DEFAULT UNSPECIFIED

Multiple forwarding rules

You can configure multiple regional external forwarding rules for the same network load balancer. Optionally, each forwarding rule can have a different regional external IP address, or multiple forwarding rules can have the same regional external IP address.

Configuring multiple regional external forwarding rules can be useful for these use cases:

  • You need to configure more than one external IP address for the same backend service.
  • You need to configure different protocols and non-overlapping ports or port ranges for the same external IP address.

For a given IP address, an L3_DEFAULT forwarding rule can co-exist with forwarding rules with other protocols (TCP or UDP), but not with another L3_DEFAULT forwarding rule.

A packet arriving at the load balancer's IP address matches an L3_DEFAULT forwarding rule only if a more specific forwarding rule is not available (for example, for TCP or UDP traffic). More specifically, a packet arriving at an IP address, protocol and a port matches an L3_DEFAULT forwarding rule if and only if there are no other forwarding rules for that IP address that match the packet's protocol and destination port.

When using multiple forwarding rules, make sure that you configure the software running on your backend VMs so that it binds to all the external IP address(es) of the load balancer's forwarding rule(s).

Regional backend service

Each network load balancer has one regional backend service that defines the behavior of the load balancer and how traffic is distributed to its backends. The name of the backend service is the name of the network load balancer shown in the Google Cloud Console.

Each backend service defines the following backend parameters:

  • Protocol. A backend service accepts traffic on the IP address and ports (if configured) specified by one or more regional external forwarding rules. The backend service passes packets to backend VMs while preserving the packet's source and destination IP addresses, protocol, and, if the protocol is port-based, the source and destination ports.

    Backend services used with network load balancers support the following protocol options: TCP, UDP, or UNSPECIFIED (Preview).

    Backend services with the UNSPECIFIED protocol can be used with any forwarding rule regardless of the forwarding rule protocol. Backend services with a specific protocol (TCP or UDP) can only be referenced by forwarding rules with the same protocol (TCP or UDP). Forwarding rules with the L3_DEFAULT protocol can only refer to backend services with the UNSPECIFIED protocol.

    See Forwarding rule protocol specification for a table with possible forwarding rule and backend service protocol combinations.

  • Traffic distribution. A backend service allows traffic to be distributed according to a configurable session affinity and connection tracking policies. The backend service can also be configured to enable connection draining and designate failover backends for the load balancer.

  • Health check. A backend service must have an associated regional health check.

Each backend service operates in a single region and distributes traffic to the first network interface (nic0) of backend VMs. Backends must be instance groups in the same region as the backend service (and forwarding rule). The backends can be zonal unmanaged instance groups, zonal managed instance groups, or regional managed instance groups.

Backend service-based network load balancers support instance groups whose member instances use any VPC network in the same region, as long as the VPC network is in the same project as the backend service. (All VMs within a given instance group must use the same VPC network.)

Backend instance groups

A network load balancer distributes connections among backend VMs contained within managed or unmanaged instance groups. Instance groups can be regional or zonal in scope.

  • Regional managed instance groups. Use regional managed instance groups if you can deploy your software by using instance templates. Regional managed instance groups automatically distribute traffic among multiple zones, providing the best option to avoid potential issues in any given zone.

    An example deployment using a regional managed instance group is shown here. The instance group has an instance template that defines how instances should be provisioned, and each group deploys instances within three zones of the us-central1 region.

    Network load balancer with a regional managed instance group
    Network Load Balancing with a regional managed instance group
  • Zonal managed or unmanaged instance groups. Use zonal instance groups in different zones (in the same region) to protect against potential issues in any given zone.

    An example deployment using zonal instance groups is shown here. This load balancer provides availability across two zones.

    Network load balancer with zonal instance groups
    Network Load Balancing with zonal instance groups

Health checks

Network Load Balancing uses regional health checks to determine which instances can receive new connections. Each network load balancer's backend service must be associated with a regional health check. Load balancers use health check status to determine how to route new connections to backend instances.

For more details about how Google Cloud health checks work, see How health checks work.

Network Load Balancing supports the following types of health checks:

Health checks for other protocol traffic

Google Cloud does not offer any protocol-specific health checks beyond the ones listed here. When you use Network Load Balancing to load-balance a protocol other than TCP, you must still run a TCP-based service on your backend VMs to provide the required health check information.

For example, if you are load-balancing UDP traffic, client requests are load balanced by using the UDP protocol, and you must run a TCP service to provide information to Google Cloud health check probers. To achieve this, you can run a simple HTTP server on each backend VM that returns an HTTP 200 response to health check probers. You should use your own logic running on the backend VM to ensure that the HTTP server returns 200 only if the UDP service is properly configured and running.

Firewall rules

Because Network Load Balancing is a pass-through load balancer, you control access to the load balancer's backends using Google Cloud firewall rules. You must create ingress allow firewall rules or an ingress allow hierarchical firewall policy to permit health checks and the traffic that you're load balancing.

Forwarding rules and ingress allow firewall rules or hierarchical firewall policies work together in the following way: a forwarding rule specifies the protocol and, if defined, port requirements that a packet must meet to be forwarded to a backend VM. Ingress allow firewall rules control whether the forwarded packets are delivered to the VM or dropped. All VPC networks have an implied deny ingress firewall rule that blocks incoming packets from any source. The Google Cloud default VPC network includes a limited set of pre-populated ingress allow firewall rules.

  • To accept traffic from any IP address on the internet, you must create an ingress allow firewall rule with the 0.0.0.0/0 source range. To only allow traffic from certain IP address ranges, use more restrictive source ranges.

  • As a security best practice, your ingress allow firewall rules should only permit the IP protocols and ports that you need. Restricting the protocol (and, if possible, port) configuration is especially important when using forwarding rules whose protocol is set to L3_DEFAULT. L3_DEFAULT forwarding rules forward packets for all supported IP protocols (on all ports if the protocol and packet have port information).

  • Network Load Balancing uses Google Cloud health checks. Therefore, you must always allow traffic from the health check IP address ranges. These ingress allow firewall rules can be made specific to the protocol and ports of the load balancer's health check.

Return path

Network Load Balancing uses special routes outside of your VPC network to direct incoming requests and health check probes to each backend VM.

The load balancer preserves the source IP addresses of packets. Responses from the backend VMs go directly to the clients, not back through the load balancer. The industry term for this is direct server return.

Shared VPC architecture

Except for the IP address, all of the components of a network load balancer must exist in the same project. The following table summarizes Shared VPC components for Network Load Balancing:

IP address Forwarding rule Backend components
A regional external IP address must be defined in either the same project as the load balancer or the Shared VPC host project. A regional external forwarding rule must be defined in the same project as the instances in the backend service.

The regional backend service must be defined in the same project and same region where the instances in the backend instance group exist.

Health checks associated with the backend service must be defined in the same project and the same region as the backend service.

Traffic distribution

The way that a network load balancer distributes new connections depends on whether you have configured failover:

  • If you haven't configured failover, a network load balancer distributes new connections to its healthy backend VMs if at least one backend VM is healthy. When all backend VMs are unhealthy, the load balancer distributes new connections among all backends as a last resort. In this situation, the load balancer routes each new connection to an unhealthy backend VM.
  • If you have configured failover, a network load balancer distributes new connections among healthy backend VMs in its active pool, according to a failover policy that you configure. When all backend VMs are unhealthy, you can choose from one of the following behaviors:
    • (Default) The load balancer distributes traffic to only the primary VMs. This is done as a last resort. The backup VMs are excluded from this last-resort distribution of connections.
    • The load balancer drops traffic.

For details about how connections are distributed, see the next section Backend selection and connection tracking.

For details about how failover works, see the Failover section.

Backend selection and connection tracking

Network Load Balancing uses configurable backend selection and connection tracking algorithms to determine how traffic is distributed to backend VMs.

Network Load Balancing uses the following algorithm to distribute packets among backend VMs (in its active pool, if you have configured failover):

  1. If the load balancer has an entry in its connection tracking table matching the characteristics of an incoming packet, the packet is sent to the backend specified by the connection tracking table entry. The packet is considered to be part of a previously established connection, so the packet is sent to the backend VM that the load balancer previously determined and recorded in its connection tracking table.
  2. If the load balancer receives a packet for which it has no connection tracking entry, the load balancer does the following:

    1. The load balancer selects a backend. The load balancer calculates a hash based on the configured session affinity. It uses this hash to select a backend from among the ones that are currently healthy (unless all backends are unhealthy, in which case all backends are considered as long as the failover policy hasn't been configured to drop traffic in this situation). The default session affinity, NONE, uses the following hash algorithms:

      • For TCP and unfragmented UDP packets, a 5-tuple hash of the packet's source IP address, source port, destination IP address, destination port, and the protocol
      • For fragmented UDP packets and all other protocols, a 3-tuple hash of the packet's source IP address, destination IP address, and the protocol

      Backend selection can be customized by using a hash algorithm that uses fewer pieces of information. For all the supported options, see session affinity options.

    2. The load balancer adds an entry to its connection tracking table. This entry records the selected backend for the packet's connection so that all future packets from this connection are sent to the same backend. Whether connection tracking is used depends on the protocol:

      • TCP packets. Connection tracking is always enabled, and cannot be turned off. By default, connection tracking is 5-tuple, but it can be configured to be less than 5-tuple. When it is 5-tuple, TCP SYN packets are treated differently. Unlike non-SYN packets, they discard any matching connection tracking entry and always select a new backend.

        The default 5-tuple connection tracking is used when:

        • tracking mode is PER_CONNECTION (all session affinities), or,
        • tracking mode is PER_SESSION and the session affinity is NONE, or,
        • tracking mode is PER_SESSION and the session affinity is CLIENT_IP_PORT_PROTO.
      • UDP and ESP packets. Connection tracking is enabled only if session affinity is set to something other than NONE.

      • ICMP packets. Connection tracking cannot be used.

      For additional details about when connection tracking is enabled, and what tracking method is used when connection tracking is enabled, see connection tracking mode.

      In addition, note the following:

      • An entry in the connection tracking table expires 60 seconds after the load balancer processes the last packet that matched the entry. This 60-second idle timeout value is not configurable.
      • Depending on the protocol, the load balancer might remove connection tracking table entries when backends become unhealthy. For details and how to customize this behavior, see Connection persistence on unhealthy backends.

Session affinity options

Session affinity controls the distribution of new connections from clients to the load balancer's backend VMs. Session affinity is specified for the entire regional external backend service, not on a per backend instance group basis.

Network Load Balancing supports the following session affinity options:

  • None (NONE). 5-tuple hash of source IP address, source port, protocol, destination IP address, and destination port
  • Client IP, Destination IP (CLIENT_IP). 2-tuple hash of source IP address and destination IP address
  • Client IP, Destination IP, Protocol (CLIENT_IP_PROTO). 3-tuple hash of source IP address, destination IP address, and protocol
  • Client IP, Client Port, Destination IP, Destination Port, Protocol (CLIENT_IP_PORT_PROTO). 5-tuple hash of source IP address, source port, protocol, destination IP address, and destination port

To learn how these session affinity options affect the backend selection and connection tracking methods, see this table.

Connection tracking mode

Whether connection tracking is enabled depends only on the protocol of the load-balanced traffic and the session affinity settings. Tracking mode specifies the connection tracking algorithm to be used when connection tracking is enabled. There are two tracking modes: PER_CONNECTION (default) and PER_SESSION.

  • PER_CONNECTION (default). This is the default tracking mode. With this connection tracking mode, TCP traffic is always tracked per 5-tuple, regardless of the session affinity setting. For UDP and ESP traffic, connection tracking is enabled when the selected session affinity is not NONE. UDP and ESP packets are tracked using the tracking methods described in this table.

  • PER_SESSION. If session affinity is CLIENT_IP or CLIENT_IP_PROTO, configuring this mode results in 2-tuple and 3-tuple connection tracking, respectively, for all protocols (except ICMP which is not connection-trackable). For other session affinity settings, PER_SESSION mode behaves identically to PER_CONNECTION mode.

To learn how these tracking modes work with different session affinity settings for each protocol, see the following table.

Backend selection Connection tracking mode
Session affinity setting Hash method for backend selection PER_CONNECTION (default) PER_SESSION
Default: No session affinity

(NONE)

TCP and unfragmented UDP: 5-tuple hash

Fragmented UDP and all other protocols: 3-tuple hash

  • TCP: 5-tuple connection tracking
  • All other protocols: connection tracking off
  • TCP: 5-tuple connection tracking
  • All other protocols: connection tracking off
Client IP, Destination IP

(CLIENT_IP)

All protocols: 2-tuple hash
  • TCP and unfragmented UDP: 5-tuple connection tracking
  • Fragmented UDP and ESP: 3-tuple connection tracking
  • All other protocols: connection tracking off
  • TCP, UDP, ESP: 2-tuple connection tracking
  • All other protocols: connection tracking off
Client IP, Destination IP, Protocol

(CLIENT_IP_PROTO)

All protocols: 3-tuple hash
  • TCP and unfragmented UDP: 5-tuple connection tracking
  • Fragmented UDP and ESP: 3-tuple connection tracking
  • All other protocols: connection tracking off
  • TCP, UDP, ESP: 3-tuple connection tracking
  • All other protocols: connection tracking off
Client IP, Client Port, Destination IP, Destination Port, Protocol

(CLIENT_IP_PORT_PROTO)

TCP and unfragmented UDP: 5-tuple hash

Fragmented UDP and all other protocols: 3-tuple hash

  • TCP and unfragmented UDP: 5-tuple connection tracking
  • Fragmented UDP and ESP: 3-tuple connection tracking
  • All other protocols: connection tracking off
  • TCP and unfragmented UDP: 5-tuple connection tracking
  • Fragmented UDP and ESP: 3-tuple connection tracking
  • All other protocols: connection tracking off

To learn how to change the connection tracking mode, see Configure a connection tracking policy.

Connection persistence on unhealthy backends

The connection persistence settings control whether an existing connection persists on a selected backend after that backend becomes unhealthy (as long as the backend remains in the load balancer's configured backend instance group).

The behavior described in this section does not apply to cases where you remove a backend VM from its instance group, or remove the instance group from the backend service. In such cases, established connections only persist as described in connection draining.

The following connection persistence options are available:

  • DEFAULT_FOR_PROTOCOL (default)
  • NEVER_PERSIST
  • ALWAYS_PERSIST

The following table summarizes connection persistence options and how connections persist for different protocols, session affinity options, and tracking modes.

Connection persistence on unhealthy backends option Connection tracking mode
PER_CONNECTION PER_SESSION
DEFAULT_FOR_PROTOCOL

TCP: connections persist on unhealthy backends (all session affinities)

All other protocols: connections never persist on unhealthy backends

TCP: connections persist on unhealthy backends if session affinity is NONE or CLIENT_IP_PORT_PROTO

All other protocols: connections never persist on unhealthy backends

NEVER_PERSIST All protocols: connections never persist on unhealthy backends
ALWAYS_PERSIST

TCP: connections persist on unhealthy backends (all session affinities)

ESP, UDP: connections persist on unhealthy backends if session affinity is not NONE

ICMP: not applicable, as ICMP is not connection-trackable

This option should only be used for advanced use cases.

Configuration not possible

TCP connection persistence behavior on unhealthy backends

Whenever a TCP connection with 5-tuple tracking persists on an unhealthy backend:

  • If the unhealthy backend continues to respond to packets, the connection continues until it is reset or closed (by either the unhealthy backend or the client).
  • If the unhealthy backend sends a TCP reset (RST) packet or does not respond to packets, then the client might retry with a new connection, letting the load balancer select a different, healthy backend. TCP SYN packets always select a new, healthy backend.

To learn how to change connection persistence behavior, see Configure a connection tracking policy.

Connection draining

Connection draining is a process applied to established connections when:

  • a backend VM is removed from an instance group, or,
  • when a managed instance group removes a backend VM (by replacement, abandonment, when rolling upgrades, or scaling down), or,
  • when an instance group is removed from a backend service.

By default, connection draining is disabled. When disabled, established connections are terminated as quickly as possible. When connection draining is enabled, established connections are allowed to persist for a configurable timeout, after which the backend VM instance is terminated.

For more details about how connection draining is triggered and how to enable connection draining, see Enabling connection draining.

UDP fragmentation

Network Load Balancing processes both fragmented and unfragmented UDP packets. Unfragmented packets are handled normally in all configurations. If your application uses fragmented UDP packets, keep the following in mind:

  • UDP packets may become fragmented before reaching a Google Cloud VPC network.
  • Google Cloud VPC networks forward UDP fragments as they arrive (without waiting for all fragments to arrive).
  • Non-Google Cloud networks and on-premises network equipment might forward UDP fragments as they arrive, delay fragmented UDP packets until all fragments have arrived, or discard fragmented UDP packets. Refer to the other network provider or network equipment documentation for details.

If you expect fragmented UDP packets, do the following:

  • Use only one UDP forwarding rule per load-balanced IP address, and configure the forwarding rule to accept traffic on all ports. This ensures that all fragments arrive at the same forwarding rule even if they don't have the same destination port. To configure all ports, either set --ports=ALL by using the gcloud command-line tool, or set allPorts to True by using the API.

  • Use one of the following approaches to configure the backend service:

    • Disable session affinity and connection tracking. Set session affinity to NONE. The load balancer uses a 5-tuple hash to select a backend for unfragmented packets (which have port information), and a 3-tuple hash for fragmented packets (which lack port information). In this setup, fragmented and unfragmented UDP packets from the same client might be forwarded to different backends.
    • Enable 2- or 3-tuple session affinity and connection tracking. Set session affinity to CLIENT_IP or CLIENT_IP_PROTO and connection tracking mode to PER_SESSION. In this setup, fragmented and unfragmented UDP packets from the same client are forwarded to the same backend (without using any port information).

Using target instances as backends

If you're using target instances as backends for the network load balancer and you expect fragmented UDP packets, use only one UDP forwarding rule per load-balanced IP address, and configure the forwarding rule to accept traffic on all ports. This ensures that all fragments arrive at the same forwarding rule even if they don't have the same destination port. To configure all ports, either set --ports=ALL using gcloud, or set allPorts to True using the API.

Failover

You can configure a network load balancer to distribute connections among virtual machine (VM) instances in primary backend instance groups, and then switch, if needed, to using failover backend instance groups. Failover provides yet another method of increasing availability, while also giving you greater control over how to manage your workload when your primary backend VMs aren't healthy.

By default, when you add a backend to a network load balancer's backend service, that backend is a primary backend. You can designate a backend to be a failover backend when you add it to the load balancer's backend service, or by editing the backend service later.

For more details about how failover works, see Failover overview for Network Load Balancing.

Limitations

  • Network endpoint groups (NEGs) are not supported as backends for network load balancers.
  • Backend service-based network load balancers are not supported with Google Kubernetes Engine.
  • You cannot use the Google Cloud Console to do the following tasks:

    • Create or modify a network load balancer whose forwarding rule uses the L3_DEFAULT protocol.
    • Create or modify a network load balancer whose backend service protocol is set to UNSPECIFIED.
    • Configure a connection tracking policy for a backend service.

    Use either the gcloud command-line tool or the REST API instead.

What's next