Internal Application Load Balancer overview

This document introduces the concepts that you need to understand to configure internal Application Load Balancers.

A Google Cloud internal Application Load Balancer is a proxy-based layer 7 load balancer that enables you to run and scale your services behind a single internal IP address. The internal Application Load Balancer distributes HTTP and HTTPS traffic to backends hosted on a variety of Google Cloud platforms such as Compute Engine, Google Kubernetes Engine (GKE), and Cloud Run. For details, see Use cases.

Modes of operation

You can configure an internal Application Load Balancer in the following modes:

Cross-region internal Application Load Balancer. This is a multi-region load balancer that is implemented as a managed service based on the open-source Envoy proxy. The cross-region mode enables you to load balance traffic to backend services that are globally distributed, including traffic management that ensures traffic is directed to the closest backend. This load balancer also enables high availability. Placing backends in multiple regions helps avoid failures in a single region. If one region's backends are down, traffic can fail over to another region.

Regional internal Application Load Balancer. This is a regional load balancer that is implemented as a managed service based on the open-source Envoy proxy. Regional mode ensures that all clients and backends are from a specified region, which helps when you need regional compliance. This load balancer is enabled with rich traffic control capabilities based on HTTP(S) parameters. After the load balancer is configured, it automatically allocates Envoy proxies to meet your traffic needs.

The following table describes the important differences between regional and cross-region modes:

Feature	Cross-region internal Application Load Balancer	Regional internal Application Load Balancer
Virtual IP address (VIP) of the load balancer.	Allocated from a subnet in a specific Google Cloud region. VIP addresses from multiple regions can share the same global backend service. You can configure DNS-based global load balancing by using DNS routing policies to route client requests to the closest VIP address.	Allocated from a subnet in a specific Google Cloud region.
Client access	Always globally accessible. Clients from any Google Cloud region in a VPC can send traffic to the load balancer.	Not globally accessible by default. You can optionally enable global access.
Load balanced backends	Global backends. Load balancer can send traffic to backends in any region.	Regional backends. Load balancer can only send traffic to backends that are in the same region as the proxy of the load balancer.
High availability and failover	Automatic failover to healthy backends in the same or different regions.	Automatic failover to healthy backends in the same region.

Identify the mode

Cloud console

In the Google Cloud console, go to the Load balancing page.

Go to Load balancing

On the Load Balancers tab, you can see the load balancer type, protocol, and region. If the region is blank, then the load balancer is in the cross-region mode. The following table summarizes how to identify the mode of the load balancer.

Load balancer mode	Load balancer type	Access type	Region
Regional internal Application Load Balancer	Application	Internal	Specifies a region
Cross-region internal Application Load Balancer	Application	Internal

gcloud

To determine the mode of a load balancer, run the following command:

gcloud compute forwarding-rules describe FORWARDING_RULE_NAME

In the command output, check the load balancing scheme, region, and network tier. The following table summarizes how to identify the mode of the load balancer.

Load balancer mode	Load balancing scheme	Forwarding rule
Cross-region internal Application Load Balancer	INTERNAL_MANAGED	Global
Regional internal Application Load Balancer	INTERNAL_MANAGED	Regional

Architecture and resources

The following diagram shows the Google Cloud resources required for internal Application Load Balancers in each mode:

Cross-region

This diagram shows the components of a cross-region internal Application Load Balancer deployment in Premium Tier within the same VPC network. Each global forwarding rule uses a regional IP address that the clients use to connect.

Regional

This diagram shows the components of a regional internal Application Load Balancer deployment in Premium Tier.

Each internal Application Load Balancer uses the following Google Cloud configuration resources.

Proxy-only subnet

In the previous diagram, the proxy-only subnet provides a set of IP addresses that Google uses to run Envoy proxies on your behalf. You must create a proxy-only subnet in each region of a VPC network where you use internal Application Load Balancers.

The following table describes the differences between proxy-only subnets in the regional and cross-region modes:

Load balancer mode Value of the proxy-only subnet --purpose flag

Cross-region internal Application Load Balancer

Load balancer mode	Value of the proxy-only subnet `--purpose` flag
Cross-region internal Application Load Balancer	GLOBAL_MANAGED_PROXY Regional and cross-region load balancers cannot share the same subnets The cross-region Envoy-based load balancer must have a proxy-only subnet in each region the load balancer is configured. Cross-region load balancer proxies in the same region and network share the same proxy-only subnet.
Regional internal Application Load Balancer	REGIONAL_MANAGED_PROXY Regional and cross-region load balancers cannot share the same subnets All the regional Envoy-based load balancers in a region and VPC network share the same proxy-only subnet

GLOBAL_MANAGED_PROXY

Regional and cross-region load balancers cannot share the same subnets

The cross-region Envoy-based load balancer must have a proxy-only subnet in each region the load balancer is configured. Cross-region load balancer proxies in the same region and network share the same proxy-only subnet.

Regional internal Application Load Balancer

REGIONAL_MANAGED_PROXY

Regional and cross-region load balancers cannot share the same subnets

All the regional Envoy-based load balancers in a region and VPC network share the same proxy-only subnet

Further:

Proxy-only subnets are only used for Envoy proxies, not your backends.
Backend VMs or endpoints of all internal Application Load Balancers in a region and VPC network receive connections from the proxy-only subnet.
The virtual IP address of an internal Application Load Balancer is not located in the proxy-only subnet. The load balancer's IP address is defined by its internal managed forwarding rule, which is described below.

Forwarding rule and IP address

Forwarding rules route traffic by IP address, port, and protocol to a load balancing configuration that consists of a target proxy and a backend service.

Each forwarding rule references a single regional IP address that you can use in DNS records for your application. You can either reserve a static IP address that you can use or let Cloud Load Balancing assign one for you. We recommend that you reserve a static IP address; otherwise, you must update your DNS record with the newly assigned ephemeral IP address whenever you delete a forwarding rule and create a new one.

Clients use the IP address and port to connect to the load balancer's Envoy proxies—the forwarding rule's IP address is the IP address of the load balancer (sometimes called a virtual IP address or VIP). Clients connecting to a load balancer must use HTTP version 1.1 or later. For the complete list of supported protocols, see Load balancer features.

The internal IP address associated with the forwarding rule can come from a subnet in the same network and region as your backends.

Each forwarding rule for an Application Load Balancer can reference a single port from 1-65535. To support multiple ports, you must configure multiple forwarding rules. You can configure multiple forwarding rules to use the same internal IP address (VIP) and to reference the same target HTTP(S) proxy as long as the overall combination of IP address, port, and protocol is unique for each forwarding rule. This way, you can use a single load balancer with a shared URL map as a proxy for multiple applications.

The following table shows the differences between forwarding rules in the regional and cross-region modes:

Load balancer mode Forwarding rule, IP address, and proxy-only subnet --purpose Routing from the client to the load balancer's frontend

Cross-region internal Application Load Balancer

Load balancer mode	Forwarding rule, IP address, and proxy-only subnet `--purpose`	Routing from the client to the load balancer's frontend
Cross-region internal Application Load Balancer	Global forwarding rule Regional IP addresses Load balancing scheme: `INTERNAL_MANAGED` Proxy-only subnet `--purpose`: `GLOBAL_MANAGED_PROXY` IP address `--purpose`: `SHARED_LOADBALANCER_VIP`	Global access is enabled by default to allow clients from any region in a VPC to access your load balancer. Backends can be in multiple regions.
Regional internal Application Load Balancer	Regional forwarding rule Regional IP address Load balancing scheme: `INTERNAL_MANAGED` Proxy-only subnet `--purpose`: `REGIONAL_MANAGED_PROXY` IP address `--purpose`: `SHARED_LOADBALANCER_VIP`	You can enable global access to allow clients from any region in a VPC to access your load balancer. Backends must also be in the same region as the load balancer.

Global forwarding rule

Regional IP addresses

Load balancing scheme:

INTERNAL_MANAGED

Proxy-only subnet --purpose:

GLOBAL_MANAGED_PROXY

IP address --purpose:

SHARED_LOADBALANCER_VIP

Global access is enabled by default to allow clients from any region in a VPC to access your load balancer. Backends can be in multiple regions.

Regional internal Application Load Balancer

Regional forwarding rule

Regional IP address

Load balancing scheme:

INTERNAL_MANAGED

Proxy-only subnet --purpose:

REGIONAL_MANAGED_PROXY

IP address --purpose:

SHARED_LOADBALANCER_VIP

You can enable global access to allow clients from any region in a VPC to access your load balancer. Backends must also be in the same region as the load balancer.

Target proxy

A target HTTP(S) proxy terminates HTTP(S) connections from clients. The HTTP(S) proxy consults the URL map to determine how to route traffic to backends. A target HTTPS proxy uses an SSL certificate to authenticate itself to clients.

The load balancer preserves the Host header of the original client request. The load balancer also appends two IP addresses to the X-Forwarded-For header:

The IP address of the client that connects to the load balancer
The IP address of the load balancer's forwarding rule

If there is no X-Forwarded-For header on the incoming request, these two IP addresses are the entire header value. If the request does have an X-Forwarded-For header, other information, such as the IP addresses recorded by proxies on the way to the load balancer, are preserved before the two IP addresses. The load balancer does not verify any IP addresses that precede the last two IP addresses in this header.

If you are running a proxy as the backend server, this proxy typically appends more information to the X-Forwarded-For header, and your software might need to take that into account. The proxied requests from the load balancer come from an IP address in the proxy-only subnet, and your proxy on the backend instance might record this address as well as the backend instance's own IP address.

Depending on the type of traffic your application needs to handle, you can configure a load balancer with either a target HTTP proxy or a target HTTPS proxy.

The following table shows the target proxy APIs required by internal Application Load Balancers in each mode:

Target proxy	Cross-region internal Application Load Balancer	Regional internal Application Load Balancer
HTTP	Global `targetHttpProxies`	`regionTargetHttpProxies`
HTTPS	Global `targetHttpsProxies`	`regionTargetHttpsProxies`

SSL certificates

Internal Application Load Balancers using target HTTPS proxies require private keys and SSL certificates as part of the load balancer configuration.

The following table specifies the type of SSL certificates required by internal Application Load Balancers in each mode:

Load balancer mode	SSL certificate type
Regional internal Application Load Balancer	Compute Engine regional SSL certificates Certificate Manager regional self-managed certificates and Google-managed certificates. The following types of Google-managed certificates are supported with Certificate Manager: Regional Google-managed certificates with per-project DNS authorization. For more information, see Deploy a regional Google-managed certificate. Regional Google-managed certificates with private Certificate Authority Service. For more information, see Deploy a regional Google-managed certificate with CA Service. Google-managed certificates with load balancer authorization are not supported.
Cross-region internal Application Load Balancer	Certificate Manager self-managed certificates and Google-managed certificates. The following types of Google-managed certificates are supported with Certificate Manager: DNS Authorization with Public DNS. For more information, see Create a Google-managed certificate with DNS authorization. Private Certificate Authority Service. For more information, see Create a Google-managed certificate issued by your CA Service instance. Google-managed certificates with load balancer authorization are not supported. Compute Engine SSL certificates are not supported.

Load balancer mode

SSL certificate type

Regional internal Application Load Balancer

Compute Engine regional SSL certificates

Certificate Manager regional self-managed certificates and Google-managed certificates.

The following types of Google-managed certificates are supported with Certificate Manager:

Regional Google-managed certificates with per-project DNS authorization. For more information, see Deploy a regional Google-managed certificate.
Regional Google-managed certificates with private Certificate Authority Service. For more information, see Deploy a regional Google-managed certificate with CA Service.

Google-managed certificates with load balancer authorization are not supported.

Cross-region internal Application Load Balancer

Certificate Manager self-managed certificates and Google-managed certificates.

The following types of Google-managed certificates are supported with Certificate Manager:

DNS Authorization with Public DNS. For more information, see Create a Google-managed certificate with DNS authorization.
Private Certificate Authority Service. For more information, see Create a Google-managed certificate issued by your CA Service instance.

Google-managed certificates with load balancer authorization are not supported.

Compute Engine SSL certificates are not supported.

URL maps

The target HTTP(S) proxy uses URL maps to make a routing determination based on HTTP attributes (such as the request path, cookies, or headers). Based on the routing decision, the proxy forwards client requests to specific backend services. The URL map can specify additional actions to take such as rewriting headers, sending redirects to clients, and configuring timeout policies (among others).

The following table specifies the type of URL map required by internal Application Load Balancers in each mode:

Load balancer mode	URL map type
Cross-region internal Application Load Balancer	Global URL maps
Regional internal Application Load Balancer	Region URL maps

Backend service

A backend service distributes requests to healthy backends: instance groups containing Compute Engine VMs, Cloud Run, or NEGs containing GKE containers.

Backend services support the HTTP, HTTPS, or HTTP/2 protocols. HTTP/2 is only supported over TLS. Clients and backends do not need to use the same request protocol. For example, clients can send requests to the load balancer by using HTTP/2, and the load balancer can forward these requests to backends by using HTTP/1.1.

The following table specifies the backend service type required by internal Application Load Balancers in each mode:

Load balancer mode	Backend service type
Cross-region internal Application Load Balancer	Global backendServices
Regional internal Application Load Balancer	Regional backendServices

The following table specifies the backend features supported by internal Application Load Balancers in each mode:

Load balancer mode	Supported backends on a backend service
	Instance groups	Zonal NEGs	Internet NEGs	Serverless NEGs	Hybrid NEGs	Private Service Connect NEGs
Cross-region internal Application Load Balancer		²		Cloud Run
Regional internal Application Load Balancer		¹		Cloud Run

¹ Use GCE_VM_IP_PORT type endpoints with GKE: Use standalone zonal NEGs or use Ingress

² Use GCE_VM_IP_PORT type endpoints with GKE: Use standalone zonal NEGs

For more information, see Backend services overview.

Backends and VPC networks

All backends must be located in the same VPC network. Placing backends in different VPC networks, even those connected using VPC Network Peering, is not supported. For details about how client systems in peered VPC networks can access load balancers, see Internal Application Load Balancers and connected networks.

Backend subsetting

Backend subsetting is an optional feature supported by regional internal Application Load Balancer that improves performance and scalability by assigning a subset of backends to each of the proxy instances.

By default, backend subsetting is disabled. For information about enabling this feature, see Backend subsetting for internal Application Load Balancer.

Health checks

Each backend service specifies a health check that periodically monitors the backends' readiness to receive a connection from the load balancer. This reduces the risk that requests might be sent to backends that can't service the request. Health checks do not check if the application itself is working.

For the health check probes to succeed, you must create an Ingress allow firewall rule that allows health check probes to reach your backend instances. Typically, health check probes originate from Google's centralized health checking mechanism. However for hybrid NEGs, health checks originate from the proxy-only subnet instead. For details, see distributed Envoy health checks in the Hybrid NEGs overview.

Health check protocol

Although it is not required and not always possible, it is a best practice to use a health check whose protocol matches the protocol of the backend service. For example, an HTTP/2 health check most accurately tests HTTP/2 connectivity to backends. In contrast, internal Application Load Balancers that use hybrid NEG backends do not support gRPC health checks. For the list of supported health check protocols, see Load balancing features.

The following table specifies the scope of health checks supported by internal Application Load Balancers in each mode:

Load balancer mode	Health check type
Cross-region internal Application Load Balancer	Global
Regional internal Application Load Balancer	Regional

For more information about health checks, see the following:

Firewall rules

An internal Application Load Balancer requires the following firewall rules:

An ingress allow rule that permits traffic from Google's central health check ranges.
- 35.191.0.0/16
- 130.211.0.0/22
An ingress allow rule that permits traffic from the proxy-only subnet.

There are certain exceptions to the firewall rule requirements for these ranges:

Allowlisting Google's health check probe ranges isn't required for hybrid NEGs. However, if you're using a combination of hybrid and zonal NEGs in a single backend service, you need to allowlist the Google health check probe ranges for the zonal NEGs.
For regional internet NEGs, health checks are optional. Traffic from load balancers using regional internet NEGs originates from the proxy-only subnet and is then NAT-translated (by using Cloud NAT) to either manual or auto-allocated NAT IP addresses. This traffic includes both health check probes and user requests from the load balancer to the backends. For details, see Regional NEGs: Use Cloud NAT to egress.

Client access

Clients can be in the same network or in a VPC network connected by using VPC Network Peering.

For cross-region internal Application Load Balancers, global access is enabled by default. Clients from any region in a VPC can access your load balancer.

For regional internal Application Load Balancers, clients must be in the same region as the load balancer by default. You can enable global access to allow clients from any region in a VPC to access your load balancer.

The following table summarizes client access for regional internal Application Load Balancers:

Global access disabled	Global access enabled
Clients must be in the same region as the load balancer. They also must be in the same VPC network as the load balancer or in a VPC network that is connected to the load balancer's VPC network by using VPC Network Peering.	Clients can be in any region. They still must be in the same VPC network as the load balancer or in a VPC network that's connected to the load balancer's VPC network by using VPC Network Peering.
On-premises clients can access the load balancer through Cloud VPN tunnels or VLAN attachments. These tunnels or attachments must be in the same region as the load balancer.	On-premises clients can access the load balancer through Cloud VPN tunnels or VLAN attachments. These tunnels or attachments can be in any region.

Shared VPC architectures

Internal Application Load Balancers support networks that use Shared VPC. Shared VPC lets organizations connect resources from multiple projects to a common VPC network so that they can communicate with each other securely and efficiently using internal IPs from that network. If you're not already familiar with Shared VPC, read the Shared VPC overview documentation.

There are many ways to configure an internal Application Load Balancer within a Shared VPC network. Regardless of type of deployment, all the components of the load balancer must be in the same organization.

Subnets and IP address	Frontend components	Backend components
Create the required network and subnets (including the proxy-only subnet), in the Shared VPC host project. The load balancer's internal IP address can be defined in either the host project or a service project, but it must use a subnet in the desired Shared VPC network in the host project. The address itself comes from the primary IP range of the referenced subnet.	The regional internal IP address, the forwarding rule, the target HTTP(S) proxy, and the associated URL map must be defined in the same project. This project can be the host project or a service project.	You can do one of the following: Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in the same service project as the frontend components. Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in as many service projects as required. A single URL map can reference backend services across different projects. This type of deployment is known as cross-project service referencing. Each backend service must be defined in the same project as the backends it references. Health checks associated with backend services must be defined in the same project as the backend service as well.

Subnets and IP address

Frontend components

Backend components

Create the required network and subnets (including the proxy-only subnet), in the Shared VPC host project.

The load balancer's internal IP address can be defined in either the host project or a service project, but it must use a subnet in the desired Shared VPC network in the host project. The address itself comes from the primary IP range of the referenced subnet.

The regional internal IP address, the forwarding rule, the target HTTP(S) proxy, and the associated URL map must be defined in the same project. This project can be the host project or a service project.

You can do one of the following:

Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in the same service project as the frontend components.
Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in as many service projects as required. A single URL map can reference backend services across different projects. This type of deployment is known as cross-project service referencing.

Each backend service must be defined in the same project as the backends it references. Health checks associated with backend services must be defined in the same project as the backend service as well.

While you can create all the load balancing components and backends in the Shared VPC host project, this type of deployment does not separate network administration and service development responsibilities.

All load balancer components and backends in a service project

The following architecture diagram shows a standard Shared VPC deployment where all load balancer components and backends are in a service project. This deployment type is supported by all Application Load Balancers.

The load balancer uses IP addresses and subnets from the host project. Clients can access an internal Application Load Balancer if they are in the same Shared VPC network and region as the load balancer. Clients can be located in the host project, or in an attached service project, or any connected networks.

Internal Application Load Balancer on Shared VPC network

Serverless backends in a Shared VPC environment**

For an internal Application Load Balancer that is using a serverless NEG backend, the backing Cloud Run service must be in the same service project as the the backend service and the serverless NEG. The load balancer's frontend components (forwarding rule, target proxy, URL map) can be created in either the host project, the same service project as the backend components, or any other service project in the same Shared VPC environment.

Cross-project service referencing

In this model, the load balancer's frontend and URL map are in a host or service project. The load balancer's backend services and backends can be distributed across projects in the Shared VPC environment. Cross-project backend services can be referenced in a single URL map. This is referred to as cross-project service referencing.

Cross-project service referencing allows organizations to configure one central load balancer and route traffic to hundreds of services distributed across multiple different projects. You can centrally manage all traffic routing rules and policies in one URL map. You can also associate the load balancer with a single set of hostnames and SSL certificates. You can therefore optimize the number of load balancers needed to deploy your application, and lower manageability, operational costs, and quota requirements.

By having different projects for each of your functional teams, you can also achieve separation of roles within your organization. Service owners can focus on building services in service projects, while network teams can provision and maintain load balancers in another project, and both can be connected by using cross-project service referencing.

Service owners can maintain autonomy over the exposure of their services and control which users can access their services by using the load balancer. This is achieved by a special IAM role called the Compute Load Balancer Services User role (roles/compute.loadBalancerServiceUser).

To learn how to configure Shared VPC for an internal Application Load Balancer—with and without cross-project service referencing, see Set up an internal Application Load Balancer with Shared VPC.

Known limitations with cross-project service referencing

You can't reference a cross-project backend service if the backend service has regional internet NEG backends. All other backend types are supported.
Google Cloud does not differentiate between resources (for example, backend services) using the same name across multiple projects. Therefore, when you're using cross-project service referencing, we recommend that you use unique backend service names across projects within your organization.

Example 1: Load balancer frontend and backend in different service projects

Here is an example of a deployment where the load balancer's frontend and URL map are created in service project A and the URL map references a backend service in service project B.

In this case, Network Admins or Load Balancer Admins in service project A will require access to backend services in service project B. Service project B admins grant the compute.loadBalancerServiceUser IAM role to Load Balancer Admins in service project A who want to reference the backend service in service project B.

Load balancer frontend and URL map in service project — Load balancer frontend and backend in different service projects

Example 2: Load balancer frontend in the host project and backends in service projects

In this type of deployment, the load balancer's frontend and URL map are created in the host project and the backend services (and backends) are created in service projects.

In this case, Network Admins or Load Balancer Admins in the host project will require access to backend services in the service project. Service project admins grant the compute.loadBalancerServiceUser IAM role to to Load Balancer Admins in the host project A who want to reference the backend service in the service project.

Load balancer frontend and URL map in host project

Timeouts and retries

Internal Application Load Balancers support the following types of timeouts:

Timeout type and description	Default values	Supports custom values
Timeout type and description	Default values	Backend service timeout A request and response timeout. Represents the maximum amount of time that can elapse from when the load balancer sends the first byte of the HTTP request to your backend to when your backend returns the last byte of the HTTP response. If the entire HTTP response has not been returned to the load balancer within the request or response timeout, the remaining response data is dropped.	For serverless NEGs on a backend service: 60 minutes For all other backend types on a backend service: 30 seconds
Client HTTP keepalive timeout The maximum amount of time that the TCP connection between a client and the load balancer's managed Envoy proxy can be idle. (The same TCP connection might be used for multiple HTTP requests.)	10 minutes (600 seconds)
Backend HTTP keepalive timeout The maximum amount of time that the TCP connection between the load balancer's managed Envoy proxy and a backend can be idle. (The same TCP connection might be used for multiple HTTP requests.)	10 minutes (600 seconds)

Backend service timeout

The configurable backend service timeout represents the maximum amount of time that the load balancer waits for your backend to process an HTTP request and return the corresponding HTTP response. Except for serverless NEGs, the default value for the backend service timeout is 30 seconds.

For example, if you want to download a 500-MB file, and the value of the backend service timeout is 90 seconds, the load balancer expects the backend to deliver the entire 500-MB file within 90 seconds. It is possible to configure the backend service timeout to be insufficient for the backend to send its complete HTTP response. In this situation, if the load balancer has at least received HTTP response headers from the backend, the load balancer returns the complete response headers and as much of the response body as it could obtain within the backend service timeout.

You should set the backend service timeout to the longest amount of time that you expect your backend to need in order to process an HTTP response. You should increase the backend service timeout if the software running on your backend needs more time to process an HTTP request and return its entire response.

The backend service timeout accepts values between 1 and 2,147,483,647 seconds; however, larger values are not practical configuration options. Google Cloud does not guarantee that an underlying TCP connection can remain open for the entirety of the value of the backend service timeout. Client systems must implement retry logic instead of relying on a TCP connection to be open for long periods of time.

For WebSocket connections used with internal Application Load Balancers, active WebSocket connections don't follow the backend service timeout. Idle WebSocket connections are closed after the backend service timeout.

Google Cloud periodically restarts or changes the number of serving Envoy software tasks. The longer the backend service timeout value, the more likely it is that Envoy task restarts or replacements will terminate TCP connections.

To configure the backend service timeout, use one of the following methods:

Google Cloud console: Modify the Timeout field of the load balancer's backend service.
Google Cloud CLI: Use the gcloud compute backend-services update command to modify the --timeout parameter of the backend service resource.
API: Modify the timeoutSec parameter for the regionBackendServices resource.

Client HTTP keepalive timeout

The client HTTP keepalive timeout represents the maximum amount of time that a TCP connection can be idle between the (downstream) client and an Envoy proxy. The client HTTP keepalive timeout value is fixed at 600 seconds.

An HTTP keepalive timeout is also called a TCP idle timeout.

The load balancer's client HTTP keepalive timeout should be greater than the HTTP keepalive (TCP idle) timeout used by downstream clients or proxies. If a downstream client has a greater HTTP keepalive (TCP idle) timeout than the load balancer's client HTTP keepalive timeout, it's possible for a race condition to occur. From the perspective of a downstream client, an established TCP connection is permitted to be idle for longer than permitted by the load balancer. This means that the downstream client can send packets after the load balancer considers the TCP connection to be closed. When that happens, the load balancer responds with a TCP reset (RST) packet.

Backend HTTP keepalive timeout

Internal Application Load Balancers are proxies that use a first TCP connection between the (downstream) client and an Envoy proxy, and a second TCP connection between the Envoy proxy and your backends.

The load balancer's secondary TCP connections might not get closed after each request; they can stay open to handle multiple HTTP requests and responses. The backend HTTP keepalive timeout defines the TCP idle timeout between the load balancer and your backends. The backend HTTP keepalive timeout does not apply to WebSockets.

The backend keepalive timeout is fixed at 10 minutes (600 seconds) and cannot be changed. The load balancer's backend keepalive timeout should be less than the keepalive timeout used by software running on your backends. This avoids a race condition where the operating system of your backends might close TCP connections with a TCP reset (RST). Because the backend keepalive timeout for the load balancer is not configurable, you must configure your backend software so that its HTTP keepalive (TCP idle) timeout value is greater than 600 seconds.

The following table lists the changes necessary to modify keepalive timeout values for common web server software.

Web server software	Parameter	Default setting	Recommended setting
Apache	KeepAliveTimeout	`KeepAliveTimeout 5`	`KeepAliveTimeout 620`
nginx	keepalive_timeout	`keepalive_timeout 75s;`	`keepalive_timeout 620s;`

Retries

To configure retries, you can use a retry policy in the URL map. The default number of retries (numRetries) is 1. The default timeout for each try (perTryTimeout) is 30 seconds with a maximum configurable perTryTimeout of 24 hours.

Without a retry policy, unsuccessful requests that have no HTTP body (for example, GET requests) that result in HTTP 502, 503, or 504 responses are retried once.

HTTP POST requests are not retried.

Retried requests only generate one log entry for the final response.

For more information, see Internal Application Load Balancer logging and monitoring.

Accessing connected networks

You can access an internal Application Load Balancer in your VPC network from a connected network by using the following:

VPC Network Peering
Cloud VPN and Cloud Interconnect

For detailed examples, see Internal Application Load Balancers and connected networks.

Failover

If a backend becomes unhealthy, traffic is automatically redirected to healthy backends.

The following table describes the failover behavior in each mode:

Load balancer mode	Failover behavior	Behavior when all backends are unhealthy
Cross-region internal Application Load Balancer	Automatic failover to healthy backends in the same region or other regions. Traffic is distributed among healthy backends spanning multiple regions based on the configured traffic distribution.	Returns HTTP 503
Regional internal Application Load Balancer	Automatic failover to healthy backends in the same region. Envoy proxy sends traffic to healthy backends in a region based on the configured traffic distribution.	Returns HTTP 503

Load balancer mode

Failover behavior

Behavior when all backends are unhealthy

Cross-region internal Application Load Balancer

Automatic failover to healthy backends in the same region or other regions.

Traffic is distributed among healthy backends spanning multiple regions based on the configured traffic distribution.

Returns HTTP 503

Regional internal Application Load Balancer

Automatic failover to healthy backends in the same region.

Envoy proxy sends traffic to healthy backends in a region based on the configured traffic distribution.

Returns HTTP 503

High availability and cross-region failover

You can set up a cross-region internal Application Load Balancer in multiple regions to get the following benefits:

If the cross-region internal Application Load Balancer in a region fails, the DNS routing policies route traffic to a cross-region internal Application Load Balancer in another region.

The high availability deployment example shows the following:
- A cross-region internal Application Load Balancer with frontend virtual IP address (VIP) in the RegionA and RegionB regions in your VPC network. Your clients are located in the RegionA region.
- You can make the load balancer accessible by using frontend VIPs from two regions, and use DNS routing policies to return the optimal VIP to your clients. Use Geolocation routing policies if you want your clients to use the VIP that is geographically closest.
- DNS routing policies can detect whether a VIP is not responding during a regional outage, and return the next most optimal VIP to your clients, ensuring that your application stays up even during regional outages.
Cross-region internal Application Load Balancer with high availability deployment (click to enlarge).
If backends in a particular region are down, the cross-region internal Application Load Balancer traffic fails over to the backends in another region gracefully.

The cross-region failover deployment example shows the following:
- A cross-region internal Application Load Balancer with a frontend VIP address in the RegionA region of your VPC network. Your clients are also located in the RegionA region.
- A global backend service that references the backends in the RegionA and RegionB Google Cloud regions.
- When the backends in RegionA region are down, traffic fails over to the RegionB region.
Cross-region internal Application Load Balancer with a cross-region failover deployment (click to enlarge).

WebSocket support

Google Cloud HTTP(S)-based load balancers have built-in support for the WebSocket protocol when you use HTTP or HTTPS as the protocol to the backend. The load balancer does not need any configuration to proxy WebSocket connections.

The WebSocket protocol provides a full-duplex communication channel between clients and servers. An HTTP(S) request initiates the channel. For detailed information about the protocol, see RFC 6455.

When the load balancer recognizes a WebSocket Upgrade request from an HTTP(S) client followed by a successful Upgrade response from the backend instance, the load balancer proxies bidirectional traffic for the duration of the current connection. If the backend instance does not return a successful Upgrade response, the load balancer closes the connection.

Session affinity for WebSockets works the same as for any other request. For information, see Session affinity.

gRPC support

gRPC is an open-source framework for remote procedure calls. It is based on the HTTP/2 standard. Use cases for gRPC include the following:

Low-latency, highly scalable, distributed systems
Developing mobile clients that communicate with a cloud server
Designing new protocols that must be accurate, efficient, and language-independent
Layered design to enable extension, authentication, and logging

To use gRPC with your Google Cloud applications, you must proxy requests end-to-end over HTTP/2. To do this:

Configure an HTTPS load balancer.
Enable HTTP/2 as the protocol from the load balancer to the backends.

The load balancer negotiates HTTP/2 with clients as part of the SSL handshake by using the ALPN TLS extension.

The load balancer may still negotiate HTTPS with some clients or accept insecure HTTP requests on a load balancer that is configured to use HTTP/2 between the load balancer and the backend instances. Those HTTP or HTTPS requests are transformed by the load balancer to proxy the requests over HTTP/2 to the backend instances.

You must enable TLS on your backends. For more information, see Encryption from the load balancer to the backends.

TLS support

By default, an HTTPS target proxy accepts only TLS 1.0, 1.1, 1.2, and 1.3 when terminating client SSL requests.

When the internal Application Load Balancer uses HTTPS as a backend service protocol, it can negotiate TLS 1.0, 1.1, 1.2, or 1.3 to the backend.

Limitations

There's no guarantee that a request from a client in one zone of the region is sent to a backend that's in the same zone as the client. Session affinity doesn't reduce communication between zones.
Internal Application Load Balancers aren't compatible with the following features:
- Cloud CDN
- Google Cloud Armor
- Cloud Storage buckets
- Google-managed SSL certificates
An internal Application Load Balancer supports HTTP/2 only over TLS.
Clients connecting to an internal Application Load Balancer must use HTTP version 1.1 or later. HTTP 1.0 is not supported.
Google Cloud doesn't warn you if your proxy-only subnet runs out of IP addresses.
The internal forwarding rule that your internal Application Load Balancer uses must have exactly one port.
Internal Application Load Balancers don't support Cloud Trace.
When using an internal Application Load Balancer with Cloud Run in a Shared VPC environment, standalone VPC networks in service projects can send traffic to any other Cloud Run services deployed in any other service projects within the same Shared VPC environment. This is a known issue and this form of access will be blocked in the future.
Google Cloud doesn't guarantee that an underlying TCP connection can remain open for the entirety of the value of the backend service timeout. Client systems must implement retry logic instead of relying on a TCP connection to be open for long periods of time.

What's next

To configure load balancing on a Shared VPC setup, see Set up an internal Application Load Balancer for Shared VPC.
To configure load balancing for your services running in GKE pods, see Deploying GKE Gateways, Container-native load balancing with standalone NEGs and the Attaching an internal Application Load Balancer to standalone NEGs section.
To configure an regional internal Application Load Balancer with Private Service Connect, see Configuring Private Service Connect with consumer HTTP(S) service controls.
To manage the proxy-only subnet resource, see Proxy-only subnets.
To configure backend subsetting on regional internal Application Load Balancers, see Backend subsetting.
To inject custom logic into the load balancing data path, configure Service Extensions callouts.