Internal TCP/UDP Load Balancing overview

Google Cloud Internal TCP/UDP Load Balancing is a regional load balancer that is built on the Andromeda network virtualization stack.

Internal TCP/UDP Load Balancing distributes traffic among internal virtual machine (VM) instances in the same region in a Virtual Private Cloud (VPC) network. It enables you to run and scale your services behind an internal IP address that is accessible only to systems in the same VPC network or systems connected to your VPC network.

An Internal TCP/UDP Load Balancing service has a frontend (the forwarding rule) and a backend (the backend service). You can use either instance groups or GCE_VM_IP zonal NEGs as backends on the backend service. This example shows instance group backends.

High-level internal TCP/UDP load balancer example.
High-level internal TCP/UDP load balancer example (click to enlarge)

For information about how the Google Cloud load balancers differ from each other, see the following documents:

Use cases

Use Internal TCP/UDP Load Balancing in the following circumstances:

  • You need a high-performance, pass-through Layer 4 load balancer for TCP or UDP traffic.
  • If serving traffic through TLS (SSL), it is acceptable to have SSL traffic terminated by your backends instead of by the load balancer. The internal TCP/UDP load balancer cannot terminate SSL traffic.

  • You need to forward the original packets unproxied. For example, if you need the client source IP address to be preserved.

  • You have an existing setup that uses a pass-through load balancer, and you want to migrate it without changes.

The internal TCP/UDP load balancers address many use cases. This section provides a few high-level examples.

Access examples

You can access an internal TCP/UDP load balancer in your VPC network from a connected network by using the following:

  • VPC Network Peering
  • Cloud VPN and Cloud Interconnect

For detailed examples, see Internal TCP/UDP Load Balancing and connected networks.

Three-tier web service example

You can use Internal TCP/UDP Load Balancing in conjunction with other load balancers. For example, if you incorporate external HTTP(S) load balancers, the external HTTP(S) load balancer is the web tier and relies on services behind the internal TCP/UDP load balancer.

The following diagram depicts an example of a three-tier configuration that uses external HTTP(S) load balancers and internal TCP/UDP load balancers:

Three-tier web app with HTTP(S) Load Balancing and Internal TCP/UDP Load Balancing.
Three-tier web app with HTTP(S) Load Balancing and Internal TCP/UDP Load Balancing (click to enlarge)

Three-tier web service with global access example

If you enable global access, your web-tier VMs can be in another region, as shown in the following diagram.

This multi-tier application example shows the following:

  • A globally-available internet-facing web tier that load balances traffic with HTTP(S) Load Balancing.
  • An internal backend load-balanced database tier in the us-east1 region that is accessed by the global web tier.
  • A client VM that is part of the web tier in the europe-west1 region that accesses the internal load-balanced database tier located in us-east1.
Three-tier web app with HTTP(S) Load Balancing, global access, and
         Internal TCP/UDP Load Balancing.
Three-tier web app with HTTP(S) Load Balancing, global access, and Internal TCP/UDP Load Balancing (click to enlarge)

Using internal TCP/UDP load balancers as next hops

You can use an internal TCP/UDP load balancer as the next gateway to which packets are forwarded along the path to their final destination. To do this, you set the load balancer as the next hop in a custom static route.

An internal TCP/UDP load balancer deployed as a next hop in a custom route processes all traffic regardless of the protocol (TCP, UDP, or ICMP).

Here is a sample architecture using an internal TCP/UDP load balancer as the next hop to a NAT gateway. You can route traffic to your firewall or gateway virtual appliance backends through an internal TCP/UDP load balancer.

NAT use case (click to enlarge)
NAT use case (click to enlarge)

Additional use cases include:

  • Hub and spoke: Exchanging next-hop routes by using VPC Network Peering—You can configure a hub-and-spoke topology with your next-hop firewall virtual appliances located in the hub VPC network. Routes using the load balancer as a next hop in the hub VPC network can be usable in each spoke network.
  • Load balancing to multiple NICs on the backend VMs.

For more information about these use cases, see Internal TCP/UDP load balancers as next hops.

How Internal TCP/UDP Load Balancing works

An internal TCP/UDP load balancer has the following characteristics:

  • It's a managed service.
  • It's not a proxy.
  • It's implemented in virtual networking.

Unlike a proxy load balancer, an internal TCP/UDP load balancer doesn't terminate connections from clients and then open new connections to backends. Instead, an internal TCP/UDP load balancer routes original connections directly from clients to the healthy backends, without any interruption.

  • There's no intermediate device or single point of failure.
  • Client requests to the load balancer's IP address go directly to the healthy backend VMs.
  • Responses from the healthy backend VMs go directly to the clients, not back through the load balancer. TCP responses use direct server return. For more information, see TCP and UDP request and return packets.

The load balancer monitors VM health by using health check probes. For more information, see the Health check section.

The Google Cloud Linux guest environment, Windows guest environment, or an equivalent process configures each backend VM with the IP address of the load balancer. For VMs created from Google Cloud images, the Guest agent (formerly, the Windows Guest Environment or Linux Guest Environment) installs the local route for the load balancer's IP address. Google Kubernetes Engine instances based on Container- Optimized OS implement this by using iptables instead.

Google Cloud virtual networking manages traffic delivery and scaling as appropriate.

Protocols, scheme, and scope

Each internal TCP/UDP load balancer supports:

  • One backend service with load balancing scheme INTERNAL and the TCP or the UDP protocol (but not both)
  • Backend VMs specified as either:
  • One or more forwarding rules, each using either the TCP or UDP protocol, matching the backend service's protocol
  • Each forwarding rule with its own unique IP address or multiple forwarding rules that share a common IP address
  • Each forwarding rule with up to five ports or all ports
  • If global access is enabled, clients in any region
  • If global access is disabled, clients in the same region as the load balancer

An internal TCP/UDP load balancer doesn't support:

Client access

The client VM must be in the same network or in a VPC network connected by using VPC Network Peering. You can enable global access to allow client VM instances from any region to access your internal TCP/UDP load balancer.

The following table summarizes client access.

Global access disabled Global access enabled
Clients must be in the same region as the load balancer. They also must be in the same VPC network as the load balancer or in a VPC network that is connected to the load balancer's VPC network by using VPC Network Peering. Clients can be in any region. They still must be in the same VPC network as the load balancer or in a VPC network that's connected to the load balancer's VPC network by using VPC Network Peering.
On-premises clients can access the load balancer through Cloud VPN tunnels or interconnect attachments (VLANs). These tunnels or attachments must be in the same region as the load balancer. On-premises clients can access the load balancer through Cloud VPN tunnels or interconnect attachments (VLANs). These tunnels or attachments can be in any region.

IP addresses for request and return packets

When a client system sends a TCP or UDP packet to an internal TCP/UDP load balancer, the packet's source and destination are as follows:

  • Source: the client's primary internal IP address or the IP address from one of the client's alias IP ranges.
  • Destination: the IP address of the load balancer's forwarding rule.

Therefore, packets arrive at the load balancer's backend VMs with the destination IP address of the load balancer itself. This type of load balancer is not a proxy, and this is expected behavior. Accordingly, the software running on the load balancer's backend VMs must be doing the following:

  • Listening on (bound to) the load balancer's IP address or any IP address (0.0.0.0 or ::)
  • Listening on (bound to) a port that's included in the load balancer's forwarding rule

Return packets are sent directly from the load balancer's backend VMs to the client. The return packet's source and destination IP addresses depend on the protocol:

  • TCP is connection-oriented, and internal TCP/UDP load balancers use direct server return. This means that response packets are sent from the IP address of the load balancer's forwarding rule.
  • In contrast, UDP is connectionless. By default, return packets are sent from the primary internal IP address of the backend instance's network interface. However, you can change this behavior. For example, configuring a UDP server to bind to the forwarding rule's IP address causes response packets to be sent from the forwarding rule's IP address.

The following table summarizes sources and destinations for response packets:

Traffic type Source Destination
TCP The IP address of the load balancer's forwarding rule The requesting packet's source
UDP Depends on the UDP server software The requesting packet's source

Architecture

An internal TCP/UDP load balancer with multiple backends distributes connections among all of those backends. For information about the distribution method and its configuration options, see traffic distribution.

You can use either instance groups or zonal NEGs, but not a combination of both, as backends for an internal TCP/UDP load balancer:

  • If you choose instance groups you can use unmanaged instance groups, managed zonal instance groups, managed regional instance groups, or a combination of these instance group types.
  • If you choose zonal NEGs, you must use GCE_VM_IP zonal NEGs.

High availability describes how to design an internal load balancer that is not dependent on a single zone.

Instances that participate as backend VMs for internal TCP/UDP load balancers must be running the appropriate Linux or Windows guest environment or other processes that provide equivalent functionality. This guest environment must be able to contact the metadata server (metadata.google.internal, 169.254.169.254) to read instance metadata so that it can generate local routes to accept traffic sent to the load balancer's internal IP address.

This diagram illustrates traffic distribution among VMs located in two separate instance groups. Traffic sent from the client instance to the IP address of the load balancer (10.10.10.9) is distributed among backend VMs in either instance group. Responses sent from any of the serving backend VMs are delivered directly to the client VM.

You can use Internal TCP/UDP Load Balancing with either a custom mode or auto mode VPC network. You can also create internal TCP/UDP load balancers with an existing legacy network.

An internal TCP/UDP load balancer consists of the following Google Cloud components.

Component Purpose Requirements
Internal IP address This is the address for the load balancer. The internal IP address must be in the same subnet as the internal forwarding rule. The subnet must be in the same region and VPC network as the backend service.
Internal forwarding rule An internal forwarding rule in combination with the internal IP address is the frontend of the load balancer. It defines the protocol and port(s) that the load balancer accepts, and it directs traffic to a regional internal backend service. Forwarding rules for internal TCP/UDP load balancers must do the following:
• Have a load-balancing-scheme of INTERNAL.
• Use an ip-protocol of either TCP or UDP, matching the protocol of the backend service.
• Reference a subnet in the same VPC network and region as the backend service.
Regional internal backend service The regional internal backend service defines the protocol used to communicate with the backends, and it specifies a health check. Backends can be unmanaged instance groups, managed zonal instance groups, managed regional instance groups, or zonal NEGs with GCE_VM_IP endpoints. The backend service must do the following:
• Have a load-balancing-scheme of INTERNAL.
• Use a protocol of either TCP or UDP, matching the ip-protocol of the forwarding rule.
• Have an associated health check.
• Have an associated region. The forwarding rule and all backends must be in the same region as the backend service
• Be associated with a single VPC network. When not specified, the network is inferred based on the network used by each backend VM's default network interface (nic0).

Although the backend service is not tied to a specific subnet, the forwarding rule's subnet must be in the same VPC network as the backend service.
Health check Every backend service must have an associated health check. The health check defines the parameters under which Google Cloud considers the backends that it manages to be eligible to receive traffic. Only healthy backend VMs receive traffic sent from client VMs to the IP address of the load balancer.
Even though the forwarding rule and backend service can use either TCP or UDP, Google Cloud does not have a health check for UDP traffic. For more information, see health checks and UDP traffic.

Internal IP address

Internal TCP/UDP Load Balancing uses an internal IPv4 address from the primary IP range of the subnet that you select when you create the internal forwarding rule. The IP address can't be from a secondary IP range of the subnet.

You specify the IP address for an internal TCP/UDP load balancer when you create the forwarding rule. You can choose to receive an ephemeral IP address or use a reserved IP address.

Firewall rules

Your internal TCP/UDP load balancer requires the following firewall rules:

The example in Configuring firewall rules demonstrates how to create both.

Forwarding rules

A forwarding rule specifies the protocol and ports on which the load balancer accepts traffic. Because internal TCP/UDP load balancers are not proxies, they pass traffic to backends on the same protocol and port.

An internal TCP/UDP load balancer requires at least one internal forwarding rule. You can define multiple forwarding rules for the same load balancer.

The forwarding rule must reference a specific subnet in the same VPC network and region as the load balancer's backend components. This requirement has the following implications:

  • The subnet that you specify for the forwarding rule doesn't need to be the same as any of the subnets used by backend VMs; however, the subnet must be in the same region as the forwarding rule.
  • When you create an internal forwarding rule, Google Cloud chooses an available regional internal IP address from the primary IP address range of the subnet that you select. Alternatively, you can specify an internal IP address in the subnet's primary IP range.

Forwarding rules and global access

An internal TCP/UDP load balancer's forwarding rules are regional, even when global access is enabled. After you enable global access, the regional internal forwarding rule's allowGlobalAccess flag is set to true.

Forwarding rules and port specifications

When you create an internal forwarding rule, you must choose one of the following port specifications:

  • Specify at least one and up to five ports, by number.
  • Specify ALL to forward traffic on all ports.

An internal forwarding rule that supports either all TCP ports or all UDP ports allows backend VMs to run multiple applications, each on its own port. Traffic sent to a given port is delivered to the corresponding application, and all applications use the same IP address.

When you need to forward traffic on more than five specific ports, combine firewall rules with forwarding rules. When you create the forwarding rule, specify all ports, and then create ingress allow firewall rules that only permit traffic to the desired ports. Apply the firewall rules to the backend VMs.

You cannot modify a forwarding rule after you create it. If you need to change the specified ports or the internal IP address for an internal forwarding rule, you must delete and recreate it.

Multiple forwarding rules for a single backend service

You can configure multiple internal forwarding rules that all reference the same internal backend service. An internal TCP/UDP load balancer requires at least one internal forwarding rule.

Configuring multiple forwarding rules for the same backend service lets you do the following, using either TCP or UDP (not both):

  • Assign multiple IP addresses to the load balancer. You can create multiple forwarding rules, each using a unique IP address. Each forwarding rule can specify all ports or a set of up to five ports.

  • Assign a specific set of ports, using the same IP address, to the load balancer. You can create multiple forwarding rules sharing the same IP address, where each forwarding rule uses a specific set of up to five ports. This is an alternative to configuring a single forwarding rule that specifies all ports.

For more information about scenarios involving two or more internal forwarding rules that share a common internal IP address, see Multiple forwarding rules with the same IP address.

When using multiple internal forwarding rules, make sure that you configure the software running on your backend VMs to bind to all of the forwarding rule IP addresses or to any address (0.0.0.0/0). The destination IP address for a packet delivered through the load balancer is the internal IP address associated with the corresponding internal forwarding rule. For more information, see TCP and UDP request and return packets.

Backend service

Each internal TCP/UDP load balancer has one regional internal backend service that defines backend parameters and behavior. The name of the backend service is the name of the internal TCP/UDP load balancer shown in the Google Cloud Console.

Each backend service defines the following backend parameters:

  • Protocol. A backend service accepts either TCP or UDP traffic, but not both, on the ports specified by one or more internal forwarding rules. The backend service allows traffic to be delivered to backend VMs on the same ports to which traffic was sent. The backend service protocol must match the forwarding rule's protocol.

  • Traffic distribution. A backend service allows traffic to be distributed according to a configurable session affinity.

  • Health check. A backend service must have an associated health check.

Each backend service operates in a single region and distributes traffic for backend VMs in a single VPC network:

  • Regionality. Backends are either instance groups or zonal NEGs (with GCE_VM_IP endpoints in the same region as the backend service (and forwarding rule). The instance group backends can be unmanaged instance groups, zonal managed instance groups, or, regional managed instance groups. The zonal NEG backends can only use GCE_VM_IP endpoints.

  • VPC network. All backend VMs must have a network interface in the VPC network associated with the backend service. You can either explicitly specify a backend service's network or use an implied network. In either case, every internal forwarding rule's subnet must be in the backend service's VPC network.

Backend services and network interfaces

Each backend service operates in a single VPC network and Google Cloud region. The VPC network can be implied or explicitly specified with the --network flag in the gcloud compute backend-services create command:

  • When explicitly specified, the backend service's VPC --network flag identifies the network interface on each backend VM to which traffic is load balanced. Each backend VM must have a network interface in the specified VPC network. In this case, network interface identifiers (nic0 through nic7) can be different among backend VMs. There are additional points to consider depending on the type of backend:

    Instance group backends

    • Different backend VMs in the same unmanaged instance group might use different interface identifiers if each VM has an interface in the specified VPC network.
    • The interface identifier doesn't need to be the same among all backend instance groups—it might be nic0 for backend VMs in one backend instance group and nic2 for backend VMs in another backend instance group.

    Zonal NEG backends

    • Different endpoints in the same GCE_VM_IP zonal NEG might use different interface identifiers.
    • If you specify both a VM name and IP address when adding an endpoint to the zonal NEG, Google Cloud validates that the endpoint is a primary internal IP addresses for the VM's NIC located in the NEG's selected VPC network. Failed validations present error messages indicating that the endpoint doesn't match the primary IP address of the VM's NIC in the NEG's network.
    • If you do not specify IP addresses when adding endpoints to the zonal NEG, Google Cloud selects the primary internal IP address of the NIC in the NEG's selected VPC network.
  • If you don't include the --network flag when you create the backend service, the backend service chooses a network based on the network of the initial (or only) network interface used by all backend VMs. This means that nic0 must be in the same VPC network for all VMs in all backend instance groups.

Health check

The load balancer's backend service must be associated with a global or regional health check. Special routes outside of the VPC network facilitate communication between health check systems and the backends.

You can use an existing health check or define a new one. The internal TCP/UDP load balancers use health check status to determine how to route new connections, as described in Traffic distribution.

You can use any of the following health check protocols; the protocol of the health check does not have to match the protocol of the load balancer:

  • HTTP, HTTPS, or HTTP/2. If your backend VMs serve traffic by using HTTP, HTTPS, or HTTP/2, it's best to use a health check that matches that protocol because HTTP-based health checks offer options appropriate to that protocol. Serving HTTP-type traffic through an internal TCP/UDP load balancer means that the load balancer's protocol is TCP.
  • SSL or TCP. If your backend VMs do not serve HTTP-type traffic, you should use either an SSL or TCP health check.

Regardless of the type of health check that you create, Google Cloud sends health check probes to the IP address of the internal TCP/UDP load balancer's forwarding rule, to the network interface in the VPC selected by the load balancer's backend service. This simulates how load balanced traffic is delivered. Software running on your backend VMs must respond to both load balanced traffic and health check probes sent to the load balancer's IP address. For more information, see Destination for probe packets.

Health checks and UDP traffic

Google Cloud does not offer a health check that uses the UDP protocol. When you use Internal TCP/UDP Load Balancing with UDP traffic, you must run a TCP-based service on your backend VMs to provide health check information.

In this configuration, client requests are load balanced by using the UDP protocol, and a TCP service is used to provide information to Google Cloud health check probers. For example, you can run a simple HTTP server on each backend VM that returns an HTTP 200 response to Google Cloud. In this example, you should use your own logic running on the backend VM to ensure that the HTTP server returns 200 only if the UDP service is properly configured and running.

High availability architecture

The internal TCP/UDP load balancer is highly available by design. There are no special steps to make the load balancer highly available because the mechanism doesn't rely on a single device or VM instance.

To ensure that your backend VM instances are deployed to multiple zones, follow these deployment recommendations:

  • Use regional managed instance groups if you can deploy your software by using instance templates. Regional managed instance groups automatically distribute traffic among multiple zones, providing the best option to avoid potential issues in any given zone.

  • If you use zonal managed instance groups or unmanaged instance groups, use multiple instance groups in different zones (in the same region) for the same backend service. Using multiple zones protects against potential issues in any given zone.

Shared VPC architecture

The following table summarizes the component requirements for Internal TCP/UDP Load Balancing used with a Shared VPC network. For an example, see creating an internal TCP/UDP load balancer on the Provisioning Shared VPC page.

IP address Forwarding rule Backend components
An internal IP address must be defined in the same project as the backend VMs.

For the load balancer to be available in a Shared VPC network, the Google Cloud internal IP address must be defined in the same service project where the backend VMs are located, and it must reference a subnet in the desired Shared VPC network in the host project. The address itself comes from the primary IP range of the referenced subnet.

If you create an internal IP address in a service project and the IP address subnet is in the service project's VPC network, your internal TCP/UDP load balancer is local to the service project. It's not local to any Shared VPC host project.
An internal forwarding rule must be defined in the same project as the backend VMs.

For the load balancer to be available in a Shared VPC network, the internal forwarding rule must be defined in the same service project where the backend VMs are located, and it must reference the same subnet (in the Shared VPC network) that the associated internal IP address references.

If you create an internal forwarding rule in a service project and the forwarding rule's subnet is in the service project's VPC network, your internal TCP/UDP load balancer is local to the service project. It's not local to any Shared VPC host project.
In a Shared VPC scenario, backend VMs are located in a service project. A regional internal backend service and health check must be defined in that service project.

Traffic distribution

The way that an internal TCP/UDP load balancer distributes new connections depends on whether you have configured failover:

  • If you haven't configured failover, an internal TCP/UDP load balancer distributes new connections to its healthy backend VMs if at least one backend VM is healthy. When all backend VMs are unhealthy, the load balancer distributes new connections among all backends as a last resort. In this situation, the load balancer routes each new connection to an unhealthy backend VM.

  • If you have configured failover, an internal TCP/UDP load balancer distributes new connections among VMs in its active pool, according to a failover policy that you configure. When all backend VMs are unhealthy, you can choose from one of the following behaviors:

    • (Default) The load balancer distributes traffic to only the primary VMs. This is done as a last resort. The backup VMs are excluded from this last-resort distribution of connections.
    • The load balancer is configured to drop traffic.

The method for distributing new connections depends on the load balancer's session affinity setting.

The health check state controls the distribution of new connections. An established TCP session persists on an unhealthy backend VM if the unhealthy backend VM is still handling the connection.

Session affinity options

Session affinity controls the distribution of new connections from clients to the load balancer's backend VMs. You set session affinity when your backend VMs need to keep track of state information for their clients. This is a common requirement for applications that need to maintain state, including web applications.

Session affinity works on a best-effort basis.

The internal TCP/UDP load balancers support the following session affinity options, which you specify for the entire internal backend service, not per backend instance group:

Type Meaning Protocols supported
None This is the default setting.

It operates in the same way as Client IP, protocol, and port.

TCP, UDP traffic
Client IP Connections from the same source and destination IP address go to the same instance, based on a 2-tuple hash of the following:
  • Source address in the IP packet
  • Destination address in the IP packet
TCP, UDP traffic
Client IP and protocol Connections from the same source and destination IP address and protocol go to the same instance, based on a 3-tuple hash of the following:
  • Source address in the IP packet
  • Destination address in the IP packet
  • Protocol (TCP or UDP)
TCP, UDP traffic
Client IP, protocol, and port Packets are sent to backends using a hash created from the following information:
  • packet's source IP address
  • packet's source port (if present)
  • packet's destination IP address
  • packet's destination port (if present)
  • protocol
Source address, destination address, and protocol information are obtained from the IP packet's header. Source port and destination port, if present, are obtained from the Layer 4 header.
For example, TCP segments and non-fragmented UDP datagrams always include a source port and destination port. Fragmented UDP datagrams don't include port information.
When port information is not present, the five-tuple hash is effectively a three-tuple hash.
TCP traffic only

The destination IP address is the IP address of the load balancer's forwarding rule, unless packets are delivered to the load balancer because of a custom static route. If an internal TCP/UDP load balancer is a next hop for a route, see the next section in this article, Session affinity and next hop internal TCP/UDP load balancer.

Session affinity and next hop internal TCP/UDP load balancer

Regardless of the session affinity option that you choose, Google Cloud uses the packet's destination. When sending a packet directly to the load balancer, the packet's destination matches the IP address of the load balancer's forwarding rule.

However, when using an internal TCP/UDP load balancer as a next hop for a custom static route, the packet's destination is most likely not the IP address of the load balancer's forwarding rule. For packets with a destination within the route's destination range, the route directs to the load balancer.

To use an internal TCP/UDP load balancer as the next hop for a custom static route, see Internal TCP/UDP load balancers as next hops.

Session affinity and health check state

Changing health states of backend VMs can cause a loss of session affinity. For example, if a backend VM becomes unhealthy, and there is at least one other healthy backend VM, an internal TCP/UDP load balancer does not distribute new connections to the unhealthy VM. If a client had session affinity with that unhealthy VM, it is directed to the other healthy backend VM instead, losing its session affinity.

Testing connections from a single client

When testing connections to the IP address of an internal TCP/UDP load balancer from a single client system, you should keep the following in mind:

  • If the client system is not a VM being load balanced—that is, not a backend VM, new connections are delivered to the load balancer's healthy backend VMs. However, because all session affinity options rely on at least the client system's IP address, connections from the same client might be distributed to the same backend VM more frequently than you might expect.

    Practically, this means that you cannot accurately monitor traffic distribution through an internal TCP/UDP load balancer by connecting to it from a single client. The number of clients needed to monitor traffic distribution varies depending on the load balancer type, the type of traffic, and the number of healthy backends.

  • If the client VM is a backend VM of the load balancer, connections sent to the IP address of the load balancer's forwarding rule are always answered by the client/backend VM. This happens regardless of whether the backend VM is healthy. It happens for all traffic sent to the load balancer's IP address, not just traffic on the protocol and ports specified in the load balancer's internal forwarding rule.

    For more information, see sending requests from load-balanced VMs.

Failover

Internal TCP/UDP Load Balancing lets you designate some backends as failover backends. These backends are only used when the number of healthy VMs in the primary backend instance groups has fallen below a configurable threshold. By default, if all primary and failover VMs are unhealthy, as a last resort Google Cloud distributes new connections only among all the primary VMs.

When you add a backend to an internal TCP/UDP load balancer's backend service, by default that backend is a primary backend. You can designate a backend to be a failover backend when you add it to the load balancer's backend service, or by editing the backend service later.

For a detailed conceptual overview of failover in Internal TCP/UDP Load Balancing, see Failover for Internal TCP/UDP Load Balancing overview.

Subsetting

Subsetting for internal TCP/UDP load balancers lets you scale your internal TCP/UDP load balancer to support a larger number of backend VM instances per internal backend service.

For information about how subsetting affects this limit, see the "Backend services" section of Load balancing resource quotas and limits.

By default, subsetting is disabled, which limits the backend service to distributing to up to 250 backend instances or endpoints. If your backend service needs to support more than 250 backends, you can enable subsetting. When subsetting is enabled, a subset of backend instances is selected for each client connection.

The following diagram shows a scaled-down model of the difference between these two modes of operation.

Comparing an internal TCP/UDP load balancer without and with subsetting (click to enlarge)
Comparing an internal TCP/UDP load balancer without and with subsetting (click to enlarge)

Without subsetting, the complete set of healthy backends is better utilized, and new client connections are distributed among all healthy backends according to traffic distribution. Subsetting imposes load balancing restrictions but allows the load balancer to support more than 250 backends.

For configuration instructions, see Subsetting.

Caveats related to subsetting

  • When subsetting is enabled, not all backends will receive traffic from a given sender even when the number of backends is small.
  • See the quotas page for the maximum number of backend instances when subsetting is enabled.
  • Only 5-tuple session affinity is supported with subsetting.
  • Packet mirroring is not supported with subsetting.
  • Enabling and then disabling subsetting breaks existing connections.
  • When using subsetting through Cloud VPN or Cloud Interconnect and there aren't enough VPN gateways (< 10) on the Google Cloud side, you are likely to only use the subset of load balancer backends. If packets for a single TCP connection re-routes to a different VPN gateway, you will experience connection resets. There should be enough VPN gateways on Google Cloud to utilize all the backend pools. If traffic for a single TCP connection re-routes to a different VPN gateway, you'll experience connection resets.

Limits

For information about quotas and limits, see Load balancing resource quotas.

What's next