Internal Load Balancing enables you to run and scale your services behind a private load balancing IP address that is accessible only to instances internal to your Virtual Private Cloud (VPC).
For a quick introduction to Internal Load Balancing, see Internal Load Balancing in 5 mins.
Google Cloud Platform (GCP) offers Internal Load Balancing for your TCP/UDP-based traffic. Internal Load Balancing enables you to run and scale your services behind a private load balancing IP address that is accessible only to your internal virtual machine instances.
Use Internal Load Balancing to configure an Internal Load Balancing IP address to act as the frontend to your private backend instances. You do not need a public IP for your load balanced service. Your internal client requests stay internal to your VPC network and region, likely resulting in lowered latency since all your load-balanced traffic will stay within Google’s network. Overall, your configuration becomes simpler.
Internal Load Balancing works with auto mode VPC networks, custom mode VPC networks, and legacy networks. Internal Load Balancing can also be implemented with regional managed instance groups. This allows you to autoscale across a region, making your service immune to zonal failures.
This rest of this user guide walks you through the features and configuration for Internal Load Balancing for TCP/UDP.
About Internal Load Balancing
Internal Load Balancing enables you to support use cases such as the traditional 3-tier web services, where your web tier uses external load balancers such as HTTP(S) or TCP/UDP (network) Load Balancing and your instances running the application tier or backend databases are deployed behind Internal Load Balancing.
With Internal Load Balancing, you can:
- Load balance TCP/UDP traffic to an internal IP address
- You can configure the Internal load-balancing IP from within your VPC network.
- Load balance across instances in a region
- Allows you to instantiate instances in multiple availability zones within the same region.
- Configure health checking for your backends
- Backend instances are health checked by GCP health checking systems.
- You can configure a TCP, SSL(TLS), HTTP, or HTTPS health check.
- Get all the benefits of a fully managed load balancing service that scales
as needed to handle client traffic.
- The highly-available load balancer is not a choke point.
Internal Load Balancing works with auto mode networks, custom mode networks, and legacy networks.
Internal Load Balancing can be implemented in a variety of ways, such as a proxy.
In the traditional proxy model of Internal Load Balancing, as shown below on the left, you configure an internal IP on a load balancing device or instance(s) and your client instance connects to this IP. Traffic coming to the IP is terminated at the load balancer. The load balancer selects a backend and establishes a new connection to it. In effect, there are two connections: Client<->Load Balancer and Load Balancer<->Backend.
GCP Internal Load Balancing distributes client instance requests to the backends using a different approach, as shown on the right. It uses lightweight load-balancing built on top of Andromeda network virtualization stack to provide software-defined load balancing that directly delivers the traffic from the client instance to a backend instance.
Internal Load Balancing is not based on a device or a VM instance. Instead, it is a software-defined, fully distributed load balancing solution.
Deploying Internal Load Balancing with GCP clients
When your clients are in the VPC network, Internal Load Balancing distributes internal client traffic across backends that run private services. Your client instances, the Internal Load Balancing IP address, and the backend instances are all configured on your VPC network.
In the illustration, the client instance in Subnet 1 connects to the internal Load Balancing IP address (10.240.0.200). Traffic from that instance is load balanced to a backend instance (10.240.0.2) in Subnet 2.
Deploying Internal Load Balancing with clients across VPN or Interconnect
When your clients connect across a VPN or Interconnect, they can be located in your on-premises network, in another cloud provider's network, or in another GCP network. Traffic from your clients reaches the internal load balancer in the VPC network through Cloud VPN. This traffic is then internally load balanced to a healthy backend instance belonging to the backend service that the forwarding rule points to.
The VPN tunnel and the internal load balancer must be in the same region.
The above deployment includes instances in Subnet 2 and 3 that serve content
for a shopping cart application. These instances are a part of a GCP Network
In the illustration, the client instance with the IP address 192.168.1.1 resides in your on-premises network and connects to the Internal Load Balancing IP address (10.240.0.200) in GCP using VPN. Traffic from the client is then load balanced to a backend instance (10.240.0.2) in Subnet 2.
Multiple VPN tunnels or Interconnects
When you configure multiple VPN tunnels between your non-GCP and GCP networks, your non-GCP clients access Internal Load Balancing over these tunnels as shown below.
In the illustration, the client instance with the IP address 192.168.1.1 resides in your on-premises network and connects to the Internal Load Balancing IP address (10.240.0.200) in GCP over VPN. Traffic can be ECMP hashed and load balanced across multiple tunnels.
As long as the number of tunnels does not change, the load balancer sends all traffic for a given session to the same backend. To send all traffic from a given client to the same backend, use session affinity.
If the number of tunnels changes, the on-premises network may choose a different tunnel for a session. The load balancer will attempt to map to the same backend, but a small percentage of connections may get reset.
Deploying Internal Load Balancing
With Internal Load Balancing, you configure a private RFC 1918 address as your load balancing IP address and configure backend instance groups to handle requests coming to this load balancing IP address from client instances.
The backend instance groups can be zonal or regional, which enables you to configure instance groups in line with your availability requirements.
The client instances originating traffic to the Internal Load Balancing IP must belong to the same VPC network and region, but can be in different subnets, as the load balancing IP address and the backends.
An Internal Load Balancing deployment example is shown below:
In the above example, you have a VPC network called
comprised of a single subnet A (10.10.0.0/16) in the us-central region. You
have two backend instance groups to provide availability across two zones. The
load balancing IP address (that is, the forwarding rule IP address)
10.10.10.1 is selected from the same VPC network. An instance, 10.10.100.2, in
my-internal-app sends a request to the load balancing IP address
10.10.10.1. This request gets load balanced to an instance in one of the
instance groups, IG1 and IG2.
Configuration details for the above deployments are described below.
Load balancing IP address
With Internal Load Balancing, the load balancing IP address is a private RFC 1918 address.
You can assign the IP address of an internal load balancer, such as the forwarding rule IP, in one of the following ways:
You select your Internal Load Balancing IP
You can specify an unallocated IP address from the region the forwarding rule is associated with as the load balancing IP address. This IP address can be from any subnet in that region that is part of the overall VPC network. If you are configuring Internal Load Balancing in a legacy network, then you can use any unused IP address in the network.
You need to manually determine which IP addresses are already in use by listing all existing instance IP addresses and other forwarding rule IP addresses for the VPC network/subnet.
You can select an Internal Load Balancing IP by specifying an ephemeral internal IP or you can reserve a static internal IP address that remains reserved to the project until you remove it.
Once you select and specify the internal IP address for your forwarding rule, it will remain allocated as long as the forwarding rule exists. When the forwarding rule is deleted, its IP address will either return back to the available pool of IP addresses for the VPC network and may get allocated to an instance or another forwarding rule, or it returns back to the project if it is a static internal IP address, available for you to assign to another resource.
Load balancing auto-allocates your frontend IP address
You can have the IP address be allocated automatically by creating a forwarding rule without specifying an IP address. In this case, GCP will assign an unallocated internal IP address to the forwarding rule from the VPC network/subnet that it is associated with. The IP address will remain allocated only as long as the forwarding rule exists.
Service discovery (internal DNS names for Internal load balancers)
When you create the forwarding rule for an Internal load balancer, you can assign the forwarding rule a "service label." GCP uses this service label to construct a fully qualified domain name (FQDN) for the Internal load balancing service, and GCP automatically installs this name in the GCP internal DNS system. Instances can use this FQDN instead of the load balancer's IP address to send traffic to the backend instances.
The FQDN is of the following format:
- [SERVICE_LABEL] is the service label in the following format:
- Can be composed of the following valid
z(lower case letters),
9(numeric characters) and
- MUST start with a lowercase letter
- MUST end with a lowercase letter or a number
- Can be up to 63 characters
- Can be composed of the following valid characters:
- if you have a project ID in the form of
example.com:project-id, the [PROJECT_ID] becomes
project-id.example.com. Otherwise, it is your normal project ID.
Specifying a service label when creating a forwarding rule
To specify a service label, use the
gcloud beta compute forwarding-rules [FORWARDING_RULE] \ --load-balancing-scheme internal \ --service-label [SERVICE_LABEL] \ [...other options...]
Finding an existing service label and FQDN
To look up the FQDN of a forwarding rule, use the
gcloud beta compute forwarding-rules describe [FORWARDING_RULE]
Look for the
serviceName fields to find the service
label and FQDN, respectively.
Internal Load Balancing selection algorithm
The backend instance for a client is selected using a hashing algorithm that takes instance health into consideration.
By default, the load balancer directs connections from a client to a backend instance using a 5-tuple hash, which uses the following five parameters for hashing:
- Client source IP address
- Client port
- Destination IP address (the load balancing IP address)
- Destination port
- Protocol (either TCP or UDP).
If you wish the load balancer to direct all traffic from a client to a specific backend instance, then use one of the following options for session affinity:
- Hash based on 3-tuple (Client IP, Dest IP, Protocol)
- Hash based on 2-tuple (Client IP, Dest IP)
The Session affinity section provides more details on these options.
As described in the previous section on selection algorithm, the default behavior is that connections from a client get load balanced across all backend instance using a 5-tuple hash. The 5-tuple hash uses the client source IP, client port, destination IP (load balancing IP address), destination port, and the protocol (either TCP or UDP).
However, in many instances such as the case where web applications store state locally on the instance and require all the traffic from a client to be load balanced to the same backend instance, you want all traffic from the client instance to be load balanced to the same backend instance. In absence of this capability, the traffic might fail or end up being serviced sub-optimally.
With Internal Load Balancing, you can enable all traffic from a client to stick to a specific backend instance by enabling the affinity feature.
You can enable the following types of affinity:
- Hash based on 3-tuple (client_ip_proto) (Client IP, Dest IP, Protocol)
- Use this affinity if you want all traffic from a client to be directed to the same backend instance based on a hash of the above three parameters.
- Hash based on 2-tuple (client_ip) (Client IP, Dest IP)
- Use this affinity if you want all traffic from a client irrespective of the protocol to be directed to the same backend instance based on a hash of the above two parameters.
In general, if you enable 3- or 2- tuple affinity, your client traffic will be load balanced to the same backend, but overall the traffic may not be as evenly distributed as the default 5-tuple hash. In general, a given connection will stay on the same backend instance as long as it is healthy.
Health checks determine which instances can receive new connections. You can configure a TCP, SSL, HTTP, or HTTPS health check for determining the health of your instances.
- If the service running on your backend instances is based on HTTP, use an HTTP health check.
- If the service running on your backend instances is based on HTTPS, use an HTTPS health check.
- If the service running on your backend instances uses SSL, use an SSL health check.
- Unless you have an explicit reason to use a different kind of health check, use a TCP health check.
For Internal Load Balancing, the health check probes to your load balanced
instances come from addresses in the ranges
184.108.40.206/16. Your firewall rules must allow these connections.
The section Configure a firewall rule to allow Internal Load Balancing covers this step.
See the Health Checks page for more details about health checks.
Internal Load Balancing is a fully managed Google Cloud Platform service, so you do not need any configuration to ensure high availability of the load balancer itself. It is a fully distributed service which will scale as needed to handle client traffic. Limits are described in the Limits section.
You can configure multiple instance groups to deliver high availability for your service. Instance groups are zonal in scope, so you can configure instance groups in multiple zones to guard against instance failures in a single zone.
When you configure more than one instance group, then the instances in all these instance groups are treated as one pool of instances and Internal Load Balancing distributes your user requests across the healthy instances in this group using the hash algorithm described in the Internal Load Balancing selection algorithm section.
In effect, you can think of your deployment as logically comprised of one large pool that spans one or more zones in the region where you have deployed Internal Load Balancing.
In the above diagram, assuming all instances are healthy, the client instances are load balanced to a pool composed of instances 1, 2, 3, and 4.
A maximum number of 50 internal load balancer forwarding rules is allowed per network.
A maximum of 250 backends is allowed per internal load balancer forwarding rule.
Normal load balancing pricing applies.
Q: What protocols are supported for Internal Load Balancing?
- Internal Load Balancing is supported for TCP and UDP traffic.
Q: Can I use target pools with Internal Load Balancing?