Backend service-based external TCP/UDP Network Load Balancing overview

Google Cloud external TCP/UDP Network Load Balancing (after this referred to as Network Load Balancing) is a regional, pass-through load balancer. A network load balancer distributes TCP or UDP traffic among virtual machine (VM) instances in the same region.

A network load balancer can receive traffic from:

  • Any client on the internet
  • Google Cloud VMs with external IPs
  • Google Cloud VMs that have internet access through Cloud NAT or instance-based NAT

Network Load Balancing has the following characteristics:

  • Network Load Balancing is a managed service.
  • Network Load Balancing is implemented by using Andromeda virtual networking and Google Maglev.
  • The network load balancers are not proxies.
    • Load-balanced packets are received by backend VMs with their source IP unchanged.
    • Load-balanced connections are terminated by the backend VMs.
    • Responses from the backend VMs go directly to the clients, not back through the load balancer. The industry term for this is direct server return.

This page describes how Network Load Balancing works with a backend service instead of a target pool. Previously, the only option for a network load balancer was the target-pool based network load balancer. Compared to target pools, backend services give you more fine-grained control over how your load balancer behaves.

  • A backend service configuration contains a set of values, such as health checks, session affinity, connection draining timeouts, and failover policies. Most of these settings have default values that allow for easy configuration to get you started quickly.
  • Backend service-based network load balancers use health checks that match the type of traffic (TCP, SSL, HTTP, HTTPS, or HTTP/2) they are distributing. Target pool-based load balancers only support legacy HTTP health checks.
  • Backend service-based network load balancers support using managed instance groups as backends. Managed instance groups automate certain aspects of backend management and provide better scalability and reliability as compared to unmanaged instance groups. Target pool-based network load balancers support only unmanaged instance groups.

Architecture

The following diagram illustrates the components of a network load balancer:

External TCP/UDP network load balancer with regional backend service
Network Load Balancing with regional backend service

The load balancer is made up of several configuration components. A single load balancer can have the following:

  • One or more regional external IP addresses
  • One or more regional external forwarding rules
  • One regional external backend service
  • One or more backend instance groups
  • Health check associated with the backend service

Additionally, you must create firewall rules to allow health check probes to the backend VMs.

IP address

A network load balancer requires at least one forwarding rule. The forwarding rule references a single regional external IP address. Regional external IP addresses are accessible anywhere on the internet, but they come from a pool unique to each Google Cloud region.

When you create a forwarding rule, you can specify the name or IP address of an existing reserved regional external IP address. If you don't specify an IP address, the forwarding rule references an ephemeral regional external IP address. Use a reserved IP address if you need to keep the address associated with your project for reuse after you delete a forwarding rule or if you need multiple forwarding rules to reference the same IP address.

Network Load Balancing supports both Standard Tier and Premium Tier regional external IP addresses. Both the IP address and the forwarding rule must use the same network tier.

For steps to reserve an IP address, see External IP addresses.

Forwarding rule

A regional external forwarding rule specifies the protocol and ports on which the load balancer accepts traffic. Because network load balancers are not proxies, they pass traffic to backends on the same protocol and port. The forwarding rule in combination with the IP address forms the frontend of the load balancer.

The load balancer preserves the source IP addresses of incoming packets. The destination IP address for incoming packets is the IP address associated with the load balancer's forwarding rule.

Incoming traffic is matched to a forwarding rule, which is a combination of a particular IP address, protocol, and port(s) or range of ports. The forwarding rule then directs traffic to the load balancer's backend service.

A network load balancer requires at least one forwarding rule. You can define multiple forwarding rules for the same load balancer as described in the next section.

Multiple forwarding rules

You can configure multiple regional external forwarding rules for the same network load balancer. Optionally, each forwarding rule can have a different regional external IP address, or multiple forwarding rules can have the same regional external IP address.

Configuring multiple regional external forwarding rules can be useful for these use cases:

  • You need to configure more than one external IP address for the same backend service.
  • You need to configure different protocols, ports or port ranges by using the same external IP address.

When using multiple forwarding rules, make sure that you configure the software running on your backend VMs so that it binds to all the external IP address(es) of the load balancer's forwarding rule(s).

Regional backend service

Each network load balancer has one regional backend service that defines the behavior of the load balancer and how traffic is distributed to its backends. The name of the backend service is the name of the network load balancer shown in the Google Cloud Console.

Each backend service defines the following backend parameters:

  • Protocol. A backend service accepts either TCP or UDP traffic, but not both, on the IP address and the ports specified by one or more regional external forwarding rules. The backend service allows traffic to be delivered to backend VMs on the same IP address and ports to which traffic was sent. The backend service and all associated forwarding rules must use the same protocol.

  • Traffic distribution. A backend service allows traffic to be distributed according to a configurable session affinity. The backend service can also be configured to enable connection draining and designate failover backends for the load balancer.

  • Health check. A backend service must have an associated regional health check.

Each backend service operates in a single region and distributes traffic to the first network interface (nic0) of backend VMs. Backends must be instance groups in the same region as the backend service (and forwarding rule). The backends can be zonal unmanaged instance groups, zonal managed instance groups, or regional managed instance groups.

Backend service-based network load balancers support instance groups whose member instances use any VPC network in the same region, as long as the VPC network is in the same project as the backend service. (All VMs within a given instance group must use the same VPC network.)

Backend instance groups

An external TCP/UDP load balancer distributes connections among backend VMs contained within managed or unmanaged instance groups.

Instance groups can be regional or zonal in scope. The external TCP/UDP load balancer is highly available by design. There are no special steps needed to make the load balancer highly available because the mechanism doesn't rely on a single device or VM instance. You only need to make sure that your backend VM instances are deployed to multiple zones so that the load balancer can work around potential issues in any given zone.

  • Regional managed instance groups. Use regional managed instance groups if you can deploy your software by using instance templates. Regional managed instance groups automatically distribute traffic among multiple zones, providing the best option to avoid potential issues in any given zone.

    An example deployment using a regional managed instance group is shown here. The instance group has an instance template that defines how instances should be provisioned, and each group deploys instances within three zones of the us-central1 region.

    Network load balancer with a regional managed instance group
    Network Load Balancing with a regional managed instance group
  • Zonal managed or unmanaged instance groups. Use zonal instance groups in different zones (in the same region) to protect against potential issues in any given zone.

    An example deployment using zonal instance groups is shown here. This load balancer provides availability across two zones.

    Network load balancer with zonal instance groups
    Network Load Balancing with zonal instance groups

Health checks

Network Load Balancing uses regional health checks to determine which instances can receive new connections. Each network load balancer's backend service must be associated with a regional health check. Load balancers use health check status to determine how to route new connections to backend instances.

For more details about how Google Cloud health checks work, see How health checks work.

Network Load Balancing supports the following types of health checks:

Health checks and UDP traffic

Google Cloud does not offer a health check that uses the UDP protocol. When you use Network Load Balancing with UDP traffic, you must run a TCP-based service on your backend VMs to provide health check information.

In this configuration, client requests are load balanced by using the UDP protocol, and a TCP service is used to provide information to Google Cloud health check probers. For example, you can run a simple HTTP server on each backend VM that returns an HTTP 200 response to health check probers. In this example, you should use your own logic running on the backend VM to ensure that the HTTP server returns 200 only if the UDP service is properly configured and running.

Firewall rules

Because Network Load Balancing is a pass-through load balancer, you control access to the load balancer's backends using Google Cloud firewall rules. To accept traffic from any IP address on the Internet, you must create an ingress allow firewall rule for the relevant protocol and ports using the 0.0.0.0/0 source range. To only allow traffic from certain IP address ranges, use more restrictive source ranges.

Additionally, because Network Load Balancing uses Google Cloud health checks, you must always allow traffic from the health check IP address ranges. These ingress allow firewall rules can be made specific to the protocol and ports of the load balancer's health check.

Return path

Network Load Balancing uses special routes outside of your VPC network to direct incoming requests and health check probes to each backend VM.

The load balancer preserves the source IP addresses of packets. Responses from the backend VMs go directly to the clients, not back through the load balancer. The industry term for this is direct server return.

Shared VPC architecture

Except for the IP address, all of the components of a network load balancer must exist in the same project. The following table summarizes Shared VPC components for Network Load Balancing:

IP address Forwarding rule Backend components
A regional external IP address must be defined in either the same project as the load balancer or the Shared VPC host project. A A regional external forwarding rule must be defined in the same project as the instances in the backend service. The regional backend service must be defined in the same project and same region where the instances in the backend instance group exist. Health checks associated with the backend service must be defined in the same project and the same region as the backend service.

Traffic distribution

The way that a network load balancer distributes new connections depends on whether you have configured failover:

  • If you haven't configured failover, a network load balancer distributes new connections to its healthy backend VMs if at least one backend VM is healthy. When all backend VMs are unhealthy, the load balancer distributes new connections among all backends as a last resort. In this situation, the load balancer routes each new connection to an unhealthy backend VM.

  • If you have configured failover, a network load balancer distributes new connections among VMs in its active pool, according to a failover policy that you configure. When all backend VMs are unhealthy, you can choose from one of the following behaviors:

    • (Default) The load balancer distributes traffic to only the primary VMs. This is done as a last resort. The backup VMs are excluded from this last-resort distribution of connections.
    • The load balancer is configured to drop traffic.

Connection tracking and consistent hashing

Network Load Balancing uses a connection tracking table and a configurable consistent hashing algorithm to determine how traffic is distributed to backend VMs.

If the load balancer has an entry in its connection tracking table for an incoming packet that is part of a previously established connection, the packet is sent to the backend VM that the load balancer previously determined. The previously determined backend had been recorded in the load balancer's connection tracking table.

When the load balancer receives a packet for which it has no connection tracking entry, the load balancer does the following:

  • If no session affinity has been configured, the load balancer creates a five-tuple hash of the packet's source IP address, source port, destination IP address, destination port, and the protocol. It uses this hash to select a backend that is currently healthy.
  • If you have configured a session affinity option, the load balancer still creates a hash, but from fewer pieces of information, as described in session affinity.
  • If the packet is a TCP packet, or if the packet is a UDP packet where session affinity is set to something other than NONE, the load balancer records the selected backend in its connection tracking table. Connection tracking table entries expire after 60 seconds if they are not used.

Persistent connection behavior

  • TCP traffic. For TCP packets, the health check state of a backend VM only controls the distribution of packets for new connections. As long as a backend VM remains a member of its instance group, and as long as that instance group remains configured as a backend for the load balancer, packets that are part of the same TCP connection are sent to the previously-selected backend VM, even if that VM's health check state has changed to unhealthy. If the unhealthy backend responds to packets, the connection will not be interrupted. If the unhealthy backend refuses packets or does not respond to them, then the client can retry with a new connection, and a different, healthy backend can be selected for the retried connection.

    If you remove a backend VM from its instance group, or if you remove the instance group from the backend service, established connections only persist as described in connection draining.

  • UDP traffic. Because UDP traffic has no session, by default all UDP packets are processed without using a connection tracking table. UDP connections are not persisted on unhealthy backends. However, if session affinity is set to any value other than NONE, UDP connections will be tracked (like TCP connections), but UDP connections are not persisted on unhealthy backends (unlike TCP connections).

The following table summarizes when the load balancer employs connection tracking and the persistent connection behavior for each protocol.

Protocol Connection Tracking Persist Connections On Unhealthy Backends
TCP Always ON YES
UDP Default: OFF
Connection tracking is turned ON when session affinity
is set to anything other than NONE.
NO

Session affinity options

Session affinity controls the distribution of new connections from clients to the load balancer's backend VMs. For example, you can direct new connections from the same client to the same backend VM, subject to the concepts discussed in the connection tracking and consistent hashing section.

Network Load Balancing supports the following session affinity options, which you specify for the entire regional external backend service, not on a per backend instance group basis.

Session affinity Consistent hashing method Connection tracking Notes
None
(NONE)
5-tuple hash 5-tuple tracking for TCP only For TCP, this is effectively the same as Client IP, Client Port, Destination IP, Destination Port, Protocol (5-tuple hash).
For UDP, connection tracking is disabled by default.
Client IP, Destination IP
(CLIENT_IP)
2-tuple hash of:
• packet's source IP address
• packet's destination IP address
5-tuple tracking for TCP and UDP Use this option when you need all connections from the same source IP address to be served by the same backend VM.
Client IP, Destination IP, Protocol
(CLIENT_IP_PROTO)
3-tuple hash of:
• packet's source IP address
• packet's destination IP address
• protocol
5-tuple tracking for TCP and UDP
Client IP, Client Port, Destination IP, Destination Port, Protocol (5-tuple)
(CLIENT_IP_PORT_PROTO)
5-tuple hash 5-tuple tracking for TCP and UDP For TCP, this is equivalent to NONE.
For UDP, this option enables connection tracking.

Connection draining

Connection draining is a process applied to established TCP sessions when you remove a backend VM from an instance group, or when a managed instance group removes a backend VM (by replacement, abandonment, when rolling upgrades, or scaling down).

By default, connection draining is enabled. When enabled, it allows established TCP connections to persist until the VM no longer exists. If you disable connection draining, established TCP connections are terminated as quickly as possible.

For more details about how connection draining is triggered and how to enable connection draining, see Enabling connection draining.

Failover

You can configure an external TCP/UDP load balancer to distribute connections among virtual machine (VM) instances in primary backend instance groups, and then switch, if needed, to using failover backend instance groups. Failover provides yet another method of increasing availability, while also giving you greater control over how to manage your workload when your primary backend VMs aren't healthy.

By default, when you add a backend to a network load balancer's backend service, that backend is a primary backend. You can designate a backend to be a failover backend when you add it to the load balancer's backend service, or by editing the backend service later.

For more details on how failover works, see Failover overview for Network Load Balancing.

UDP fragmentation

If you are load balancing UDP packets, be aware of the following:

  • Unfragmented packets are handled normally in all configurations.
  • UDP packets may become fragmented before reaching Google Cloud. Non-Google Cloud networks might delay fragmented UDP packets because they wait for all fragments to arrive, or they discard fragmented packets altogether. Google Cloud networks forward UDP fragments as they arrive.

If you expect fragmented UDP packets, do the following:

  • Use only one UDP forwarding rule per load-balanced IP address, and configure the forwarding rule to accept traffic on all ports. This ensures that all fragments arrive at the same forwarding rule even if they don't have the same destination port. To configure all ports, either set --ports=ALL using gcloud, or set allPorts to True using the API.
  • Set session affinity to None (NONE). This indicates that maintaining affinity is not required, and hence the load balancer uses 5-tuple hash to select a backend for unfragmented packets, but 3-tuple hash for fragmented packets.

With these settings, UDP fragments from the same packet are forwarded to the same instance for reassembly.

Using target instances as backends

If you're using target instances as backends for the network load balancer and you expect fragmented UDP packets, use only one UDP forwarding rule per load-balanced IP address, and configure the forwarding rule to accept traffic on all ports. This ensures that all fragments arrive at the same forwarding rule even if they don't have the same destination port. To configure all ports, either set --ports=ALL using gcloud, or set allPorts to True using the API.

Limitations

  • Network endpoint groups (NEGs) are not supported as backends for network load balancers.
  • Backend service-based network load balancers are not supported with Google Kubernetes Engine.
  • For backend services associated with network load balancers, the output of the gcloud compute backend-services get-health command returns only the internal IP addresses of the backends, not the external IP address of the forwarding rule assigned to the instances.

What's next