This document introduces the concepts that you need to understand to configure Google Cloud external HTTP(S) Load Balancing.
For information about how the Google Cloud load balancers differ from each other, see the following documents:
An external HTTP(S) load balancer is composed of several components. The following diagram illustrates the architecture of a complete external HTTP(S) load balancer.
The following sections describe how each component works together to make up each type of load balancer. For a detailed description of each component, see Components later in this topic.
HTTP Load Balancing
A complete HTTP load balancer is structured as follows:
- A forwarding rule directs incoming requests to a target HTTP proxy.
- The target HTTP proxy checks each request against a URL map to determine the appropriate backend service for the request.
- The backend service directs each request to an appropriate backend based on serving capacity, zone, and instance health of its attached backends. The health of each backend instance is verified by using an HTTP health check, an HTTPS health check, or an HTTP/2 health check. If the backend service is configured to use an HTTPS or HTTP/2 health check, the request is encrypted on its way to the backend instance.
- Sessions between the load balancer and the instance can use the HTTP, HTTPS, or HTTP/2 protocol. If you use HTTPS or HTTP/2, each backend instance must have an SSL certificate.
HTTPS Load Balancing
An HTTPS load balancer has the same basic structure as an HTTP load balancer, but differs in the following ways:
- An HTTPS load balancer uses a target HTTPS proxy instead of a target HTTP proxy.
- An HTTPS load balancer requires at least one signed SSL certificate installed on the target HTTPS proxy for the load balancer. You can use self-managed or Google-managed SSL certificates.
- The client SSL session terminates at the load balancer.
- HTTPS load balancers support the QUIC transport layer protocol.
Source IP addresses
The source IP addresses for packets, as seen by each backend virtual machine (VM) instance or container, is an IP address from these ranges:
The source IP address for actual load-balanced traffic is the same as the health checks probe IP ranges.
The source IP addresses for traffic, as seen by the backends, is not the Google Cloud external IP address of the load balancer. In other words, there are two HTTP, SSL, or TCP sessions:
Session 1, from original client to the load balancer (GFE):
- Source IP address: the original client (or external IP address if the client is behind NAT).
- Destination IP address: your load balancer's IP address.
Session 2, from the load balancer (GFE) to the backend VM or container:
Source IP address: an IP address in one of these ranges:
You cannot predict the actual source address.
Destination IP address: the internal IP address of the backend VM or container in the Virtual Private Cloud (VPC) network.
Client communications with the load balancer
- Clients can communicate with the load balancer by using the HTTP 1.1 or HTTP/2 protocol.
- When HTTPS is used, modern clients default to HTTP/2. This is controlled on the client, not on the HTTPS load balancer.
- You cannot disable HTTP/2 by making a configuration change on the load
balancer. However, you can configure some clients to use HTTP 1.1
instead of HTTP/2. For example, with
curl, use the
- HTTPS load balancers do not support client certificate-based authentication, also known as mutual TLS authentication.
The external HTTP(S) load balancers are reverse proxy load balancers. The load balancer terminates incoming connections, and then opens new connections from the load balancer to the backends. The reverse proxy functionality is provided by the Google Front Ends (GFEs).
The firewall rules that you set block traffic from the GFEs to the backends, but do not block incoming traffic to the GFEs.
The external HTTP(S) load balancers have a number of open ports to support other Google services that run on the same architecture. If you run a security or port scan against the external IP address of a Google Cloud external HTTP(S) load balancer, additional ports appear to be open.
This does not affect external HTTP(S) load balancers. External forwarding rules, which are used in the definition of an external HTTP(S) load balancer, can only reference TCP ports 80, 8080, and 443. Traffic with a different TCP destination port is not forwarded to the load balancer's backend.
The following are components of external HTTP(S) load balancers.
Forwarding rules and addresses
Forwarding rules route traffic by IP address, port, and protocol to a load balancing configuration consisting of a target proxy, URL map, and one or more backend services.
Each forwarding rule provides a single IP address that can be used in DNS records for your application. No DNS-based load balancing is required. You can either specify the IP address to be used or let Cloud Load Balancing assign one for you.
- The forwarding rule for an HTTP load balancer can only reference TCP ports 80 and 8080.
- The forwarding rule for an HTTPS load balancer can only reference TCP port 443.
The type of forwarding rule required by external HTTP(S) load balancers depends on which Network Service Tier the load balancer is in.
- The external HTTP(S) load balancers in the Premium Tier use global external forwarding rules.
- The external HTTP(S) load balancers in the Standard Tier use regional external forwarding rules.
Target proxies terminate HTTP(S) connections from clients. One or more forwarding rules direct traffic to the target proxy, and the target proxy consults the URL map to determine how to route traffic to backends.
The proxies set HTTP request/response headers as follows:
Via: 1.1 google(requests and responses)
X-Forwarded-Proto: [http | https](requests only)
X-Forwarded-For: <unverified IP(s)>, <immediate client IP>, <global forwarding rule external IP>, <proxies running in Google Cloud>(requests only)
The X-Forwarded-For (XFF) header contains a comma-separated list of IP addresses representing proxies through which the request has passed. Each proxy can append the IP address of its client to the list. Because of this, the number of IP addresses in the XFF header can vary. A Google Cloud external HTTP(S) load balancer adds two IP addresses to the header: the IP address of the requesting client and the external IP address of the load balancer's forwarding rule, in that order.
Therefore, the IP address that immediately precedes the Google Cloud load balancer's IP address is the IP address of the system that contacts the load balancer. The system might be a client, or it might be another proxy server, outside Google Cloud, that forwards requests on behalf of a client.
When a proxy server outside Google Cloud contacts the Google Cloud external HTTP(S) load balancer on behalf of a client, the load balancer might not receive the client IP address of the system that contacts that outside proxy. The outside proxy might not append the IP address of its client to the XFF header. If all outside proxies append a client IP address to the XFF header, the first IP address in the list is the IP address of the original client.
If the backend VMs of an external HTTP(S) load balancer serve as internal proxies, those might add more client IP addresses to the XFF header. In this situation, the IP address of the external HTTP(S) load balancer's forwarding rule might not be the last IP address in the list.
X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options>(requests only)
Contains parameters for Stackdriver Trace.
You can create custom request headers if the default headers do not meet your needs. For more information about this feature, see Creating user-defined request headers.
Do not rely on the proxy to preserve the case of request or response header
names. For example, a
Server: Apache/1.0 response header may appear at the
URL maps define matching patterns for URL-based routing of requests to the appropriate backend services. A default service is defined to handle any requests that do not match a specified host rule or path matching rule. In some situations, such as the cross-region load balancing example, you might not define any URL rules and rely only on the default service. For content-based routing of traffic, the URL map allows you to divide your traffic by examining the URL components to send requests to different sets of backends.
If you are using HTTPS Load Balancing, you must install one or more SSL certificates on the target HTTPS proxy. You can have up to fifteen (15) SSL certificates installed. They are used by target HTTPS proxies to authenticate communications between the load balancer and the client. These can be self-managed or Google-managed SSL certificates. Google-managed SSL certificates each support (in beta) up to 100 domains.
If you are using HTTPS or HTTP/2 from the load balancer to the backends, you must install SSL certificates on each VM instance. To install SSL certificates on a VM instance, use the instructions in your application documentation. These certificates can be self-signed.
For external HTTP(S) Load Balancing, Google encrypts traffic between the load balancer and backend instances. SSL certificate resources are not required on individual VM instances. If you need SSL certificates on a VM instance because of services you have running there, install the certificates as described in your application documentation.
SSL policies give you the ability to control the features of SSL that your HTTPS load balancer negotiates with HTTPS clients.
By default, HTTPS Load Balancing uses a set of SSL features that provides good security and wide compatibility. Some applications require more control over which SSL versions and ciphers are used for their HTTPS or SSL connections. You can define SSL policies that control the features of SSL that your load balancer negotiates and associate an SSL policy with your target HTTPS proxy.
Geographic control over where TLS is terminated
The HTTPS load balancer terminates TLS in locations that are distributed globally, so as to minimize latency between clients and the load balancer. If you require geographic control over where TLS is terminated, you should use Google Cloud Network Load Balancing instead, and terminate TLS on backends that are located in regions appropriate to your needs.
Backend services provide configuration information to the load balancer. An external HTTP(S) load balancer must have at least one backend service and can have multiple backend services.
Load balancers use the information in a backend service to direct incoming traffic to one or more attached backends.
The backends of a backend service can be either instance groups or network endpoint groups (NEGs), but not a combination of both. When you add a backend instance group or NEG, you specify a balancing mode, which defines a method for distributing requests and a target capacity. For more information, see Load distribution algorithm.
HTTP(S) Load Balancing supports Cloud Load Balancing Autoscaler, which allows users to perform autoscaling on the instance groups in a backend service. For more information, see Scaling based on HTTP(S) Load Balancing serving capacity.
You can enable connection draining on backend services to ensure minimal interruption to your users when an instance that is serving traffic is terminated, removed manually, or removed by an autoscaler. To learn more, see Enabling connection draining.
Behavior of the load balancer in different Network Service Tiers
HTTP(S) Load Balancing is a global service when the Premium Network Service Tier is used. You may have more than one backend service in a region, and you may create backend services in more than one region, all serviced by the same global load balancer. Traffic is allocated to backend services as follows:
- When a user request comes in, the load balancing service determines the approximate origin of the request from the source IP address.
- The load balancing service knows the locations of the instances owned by the backend service, their overall capacity, and their overall current usage.
- If the closest instances to the user have available capacity, the request is forwarded to that closest set of instances.
- Incoming requests to the given region are distributed evenly across all available backend services and instances in that region. However, at very small loads, the distribution may appear to be uneven.
- If there are no healthy instances with available capacity in a given region, the load balancer instead sends the request to the next closest region with available capacity.
HTTP(S) Load Balancing is a regional service when the Standard Network Service Tier is used. Its backend instance groups or NEGs must all be located in the region used by the load balancer's external IP address and forwarding rule.
Each backend service also specifies which health check is performed against each available
instance. For the health check probes to function correctly, you must create a
firewall rule that allows traffic from
reach your instances.
For more information about health checks, see Creating health checks.
Protocol to the backends
When you configure a backend service for the external HTTP(S) load balancer, you set the protocol that the backend service uses to communicate with the backends. You can choose HTTP, HTTPS, or HTTP/2. The load balancer uses only the protocol that you specify. The load balancer does not fall back to one of the other protocols if it is unable to negotiate a connection to the backend with the specified protocol.
If you use HTTP/2, you must use TLS. HTTP/2 without encryption is not supported.
Although it is not required, it is a best practice to use a health check whose protocol matches the protocol of the backend service. For example, an HTTP/2 health check most accurately tests HTTP/2 connectivity to backends.
Using gRPC with your Google Cloud applications
gRPC is an open-source framework for remote procedure calls. It is based on the HTTP/2 standard. Use cases for gRPC include the following:
- Low latency, highly scalable, distributed systems
- Developing mobile clients that communicate with a cloud server
- Designing new protocols that must be accurate, efficient, and language independent
- Layered design to enable extension, authentication, and logging
To use gRPC with your Google Cloud applications, you must proxy requests end-to-end over HTTP/2. To do this with an external HTTP(S) load balancer:
- Configure an HTTPS load balancer.
- Enable HTTP/2 as the protocol from the load balancer to the backends.
The load balancer negotiates HTTP/2 with clients as part of the SSL handshake by using the ALPN TLS extension.
The load balancer may still negotiate HTTPS with some clients or accept insecure HTTP requests on an external HTTP(S) load balancer that is configured to use HTTP/2 between the load balancer and the backend instances. Those HTTP or HTTPS requests are transformed by the load balancer to proxy the requests over HTTP/2 to the backend instances.
If you want to configure an external HTTP(S) load balancer by using HTTP/2 with Google Kubernetes Engine Ingress or by using gRPC and HTTP/2 with Ingress, see HTTP/2 for load balancing with Ingress.
For information about troubleshooting problems with HTTP/2, see Troubleshooting issues with HTTP/2 to the backends.
For information about HTTP/2 limitations, see HTTP/2 limitations.
You must create a firewall rule that allows traffic from
184.108.40.206/16 to reach your instances. These are IP
address ranges that the load balancer uses to connect to backend instances.
This rule allows traffic from both the load balancer and the health checker.
The rule must allow traffic on the port that your global forwarding rule has been
configured to use, and your health checker should be configured to use the
same port. If your health checker uses a different port, you must create
another firewall rule for that port.
Firewall rules block and allow traffic at the instance level, not at the edges of the network. They cannot prevent traffic from reaching the load balancer itself.
If you need to determine external IP addresses at a particular time, use the instructions in the Compute Engine FAQ.
Google Cloud uses special routes not defined in your VPC network for health checks. For more information, see Load balancer return paths.
Load distribution algorithm
External HTTP(S) Load Balancing supports two balancing modes for backends:
RATEfor instance group backends or NEGs
UTILIZATIONfor instance group backends
The backends of a backend service can be either instance groups or NEGs, but
not a combination of both. When you add a backend instance group or NEG, you
specify a balancing mode, which defines a method for distributing requests and
a target capacity. For instance group backends, you can use either
RATE balancing mode. For NEGs, you must use
When you use the
RATE balancing mode, you must specify a target maximum number
of requests (queries) per second (RPS, QPS). This target is used to define when
an instance or endpoint is at capacity. The balancing mode can be exceeded if
all backends are at or above capacity.
When an external HTTP(S) load balancer is in Premium Tier, requests sent to the load balancer are delivered to backend instance groups or NEGs in the region closest to the user, if a backend in that region has available capacity. (Available capacity is configured by the load balancer's balancing mode.)
When an external HTTP(S) load balancer is in Standard Tier, its backend instance groups or NEGs must all be located in the region used by the load balancer's external IP address and forwarding rule.
After a region is selected:
An external HTTP(S) load balancer tries to balance requests as evenly as possible within the zones of a region, subject to session affinity. When you configure multiple NEGs or zonal instance groups in the same region or one or more regional managed instance groups, the external HTTP(S) load balancer behaves this way.
Within a zone, an external HTTP(S) load balancer tries to balance requests by using a round-robin algorithm, subject to session affinity.
For specific examples of the load distribution algorithm, see How HTTP(S) Load Balancing works.
Session affinity provides a best-effort attempt to send requests from a particular client to the same backend for as long as the backend is healthy and has the capacity, according to the configured balancing mode.
Google Cloud HTTP(S) Load Balancing offers three types of session affinity:
- NONE. Session affinity is not set for the load balancer.
- Client IP affinity sends requests from the same client IP address to the same backend.
- Generated cookie affinity sets a client cookie when the first request is made, and then sends requests with that cookie to the same backend.
When you use session affinity, we recommend the
RATE balancing mode rather
UTILIZATION. Session affinity works best if you set the balancing mode
to requests per second (RPS).
WebSocket proxy support
HTTP(S) Load Balancing has native support for the WebSocket protocol. Backends that use the WebSocket protocol to communicate with clients can use the external HTTP(S) load balancer as a front end for scale and availability. The load balancer does not need any additional configuration to proxy WebSocket connections.
The WebSocket protocol, which is defined in RFC 6455, provides a full-duplex communication channel between clients and servers. The channel is initiated from an HTTP(S) request.
When HTTP(S) Load Balancing recognizes a WebSocket
Upgrade request from
an HTTP(S) client and the request is followed by a successful
from the backend instance, the load balancer proxies bidirectional traffic for
the duration of the current connection. If the backend does not return a
Upgrade response, the load balancer closes the connection.
The timeout for a WebSocket connection depends on the configurable response timeout of the load balancer, which is 30 seconds by default. This timeout is applied to WebSocket connections regardless of whether they are in use. For more information about the response timeout and how to configure it, see Timeouts and retries.
If you have configured either client IP or generated cookie session affinity for your external HTTP(S) load balancer, all WebSocket connections from a client are sent to the same backend instance, if the instance continues to pass health checks and has capacity.
The WebSocket protocol is supported with Ingress.
QUIC protocol support for HTTPS Load Balancing
HTTPS Load Balancing supports the QUIC protocol in connections between the load balancer and the clients. QUIC is a transport layer protocol that provides congestion control similar to TCP and the security equivalent to SSL/TLS for HTTP/2, with improved performance. QUIC allows faster client connection initiation, eliminates head-of-line blocking in multiplexed streams, and supports connection migration when a client's IP address changes.
QUIC affects connections between clients and the load balancer, not connections between the load balancer and its backends.
The target proxy's QUIC override setting allows you to enable one of the following:
- Negotiate QUIC for a load balancer when possible.
- Always disable QUIC for a load balancer.
If you do not specify a value for the QUIC override setting, you allow Google
to manage when QUIC is used. Google enables QUIC only when the
flag in the
gcloud command-line tool
is set to
ENABLE or the
quicOverrideflag in the
is set to
For information about enabling and disabling QUIC support,
see Target proxies. You can enable or
disable QUIC support in the frontend configuration section of the
Google Cloud Console by using the
gcloud command-line tool or by using the REST API.
How QUIC is negotiated
When you enable QUIC, the load balancer can advertise its QUIC capability to clients, allowing clients that support QUIC to attempt to establish QUIC connections with the HTTPS load balancer. Properly implemented clients always fall back to HTTPS or HTTP/2 when they cannot establish a QUIC connection. Because of this fallback, enabling or disabling QUIC in the load balancer does not disrupt the load balancer's ability to connect to clients.
When you have QUIC enabled in your HTTPS load balancer, some circumstances can cause your client to fall back to HTTPS or HTTP/2 instead of negotiating QUIC. These include the following:
- When a client supports versions of QUIC that are not compatible with the QUIC versions supported by the HTTPS load balancer.
- When the load balancer detects that UDP traffic is blocked or rate-limited in a way that would prevent QUIC from working.
- If QUIC is temporarily disabled for HTTPS load balancers in response to bugs, vulnerabilities, or other concerns.
When a connection falls back to HTTPS or HTTP/2 because of these circumstances, we do not count this as a failure of the load balancer.
Ensure that the previously described behaviors are acceptable for your workloads before you enable QUIC.
Your HTTP(S) Load Balancing service can be configured and updated through the following interfaces:
gcloudcommand-line tool: a command-line tool included in the Cloud SDK. The HTTP(S) Load Balancing documentation calls on this tool frequently to accomplish tasks. For a complete overview of the tool, see the
gcloudTool Guide. You can find commands related to load balancing in the
gcloud computecommand group.
You can also get detailed help for any
gcloudcommand by using the
gcloud compute http-health-checks create --help
The Cloud Console: Load balancing tasks can be accomplished by using the Cloud Console.
The REST API: All load balancing tasks can be accomplished by using the Cloud Load Balancing API. The API reference docs describe the resources and methods available to you.
By default, an HTTPS target proxy accepts only TLS 1.0, 1.1, and 1.2 when terminating client SSL requests. You can use SSL policies to change this default behavior and control how the load balancer negotiates SSL with clients.
When the load balancer uses HTTPS as a backend service protocol, it can negotiate TLS 1.0, 1.1, or 1.2 to the backend.
Timeouts and retries
HTTP(S) Load Balancing has two distinct types of timeouts:
A configurable HTTP response timeout, which represents the amount of time the load balancer waits for your backend to return a complete HTTP response. The default value for the response timeout is 30 seconds. Consider increasing this timeout under either of these circumstances:
- You expect a backend to take longer to return HTTP responses.
- The connection is upgraded to a WebSocket.
For WebSocket traffic sent through a Google Cloud external HTTP(S) load balancer, the backend service timeout is interpreted as the maximum amount of time that a WebSocket connection can remain open, whether idle or not. For more information, see Backend service settings.
A TCP session timeout, whose value is fixed at 10 minutes (600 seconds). This session timeout is sometimes called a keepalive or idle timeout, and its value is not configurable by modifying your backend service. You must configure the web server software used by your backends so that its keepalive timeout is longer than 600 seconds to prevent connections from being closed prematurely by the backend. This timeout does not apply to WebSockets.
This table illustrates changes necessary to modify keepalive timeouts for common web server software:
|Web server software||Parameter||Default setting||Recommended setting|
|Apache||KeepAliveTimeout||KeepAliveTimeout 5||KeepAliveTimeout 620|
|nginx||keepalive_timeout||keepalive_timeout 75s;||keepalive_timeout 620s;|
HTTP(S) Load Balancing retries failed GET requests in certain circumstances, such as when the response timeout is exhausted. It does not retry failed POST requests. Retries are limited to two attempts. Retried requests only generate one log entry for the final response.
For more information, see HTTP(S) Load Balancing logging and monitoring.
Illegal request and response handling
The external HTTP(S) load balancer blocks both client requests and backend responses from reaching the backend or the client, respectively, for a number of reasons. Some reasons are strictly for HTTP/1.1 compliance and others are to avoid unexpected data being passed to or from the backends.
The load balancer blocks the following for HTTP/1.1 compliance:
- It cannot parse the first line of the request.
- A header is missing the
- Headers or the first line contain invalid characters.
- The content length is not a valid number, or there are multiple content length headers.
- There are multiple transfer encoding keys, or there are unrecognized transfer encoding values.
- There's a non-chunked body and no content length specified.
- Body chunks are unparseable. This is the only case where some data reaches the backend. The load balancer closes the connections to the client and backend when it receives an unparseable chunk.
The load balancer blocks the request if any of the following are true:
- The total size of request headers and the request URL exceeds the limit for the maximum request header size for external HTTP(S) Load Balancing.
- The request method does not allow a body, but the request has one.
- The request contains an
Upgradeheader, and the
Upgradeheader is not used to enable WebSocket connections.
- The HTTP version is unknown.
The load balancer blocks the backend's response if any of the following are true:
- The total size of response headers exceeds the limit for maximum response header size for external HTTP(S) Load Balancing.
- The HTTP version is unknown.