Setting Up HTTP(S) Load Balancing

Google Cloud Platform (GCP) HTTP(S) load balancing provides global load balancing for HTTP(S) requests destined for your instances. You can configure URL rules that route some URLs to one set of instances and route other URLs to other instances. Requests are always routed to the instance group that is closest to the user, provided that group has enough capacity and is appropriate for the request. If the closest group does not have enough capacity, the request is sent to the closest group that does have capacity.

HTTP requests can be load balanced based on port 80 or port 8080. HTTPS requests can be load balanced on port 443.

The load balancer acts as an HTTP/2 to HTTP/1.1 translation layer, which means that the web servers always see and respond to HTTP/1.1 requests, but that requests from the browser can be HTTP/1.0, HTTP/1.1, or HTTP/2.

HTTP(S) load balancing does not support WebSocket. You can use WebSocket traffic with Network load balancing.

Before you begin

HTTP(S) load balancing uses instance groups to organize instances. Make sure you are familiar with instance groups before you use load balancing.

Example configurations

If you want to jump right in and build a working load balancer for testing, the following guides demonstrate two different scenarios using the HTTP(S) load balancing service. These scenarios provide a practical context for HTTP(S) load balancing and demonstrate how you might set up load balancing for your specific needs.

The rest of this page digs into more detail about how load balancers are constructed and how they work.

Creating a cross-region load balancer

Representation of
  cross-region load balancing

You can use a global IP address that can intelligently route users based on proximity. For example, if you set up instances in North America, Europe, and Asia, users around the world will be automatically sent to the backends closest to them, assuming those instances have enough capacity. If the closest instances do not have enough capacity, cross-region load balancing automatically forwards users to the next closest region.

Get started with cross-region load balancing


Creating a content-based load balancer

Representation of
  content-based load balancing

Content-based or content-aware load balancing uses HTTP(S) load balancing to distribute traffic to different instances based on the incoming HTTP(S) URL. For example, you can set up some instances to handle your video content and another set to handle everything else. You can configure your load balancer to direct traffic for example.com/video to the video servers and example.com/ to the default servers.

Get started with content-based load balancing


Content-based and cross-region load-balancing can work together by using multiple backend services and multiple regions. You can build on top of the scenarios above to configure your own load balancing configuration that meets your needs.

Fundamentals

Overview

An HTTP(S) load balancer is composed of several components. The following diagram illustrates the architecture of a complete HTTP(S) load balancer:

Cross-region load balancing diagram (click to enlarge)

The following sections describe how each component works together to make up each type of load balancer. For a detailed description of each component, see Components below.

HTTP load balancing

A complete HTTP load balancer is structured as follows:

  1. A global forwarding rule directs incoming requests to a target HTTP proxy.
  2. The target HTTP proxy checks each request against a URL map to determine the appropriate backend service for the request.
  3. The backend service directs each request to an appropriate backend based on serving capacity, zone, and instance health of its attached backends. The health of each backend instance is verified using either an HTTP health check or an HTTPS health check. If the backend service is configured to use the latter, the request will be encrypted on its way to the backend instance.

In addition, you must create a firewall rule that allows traffic from 130.211.0.0/22 to reach your instances. The rule should enable traffic on the port your global forwarding rule has been configured to use (either 80 or 8080).

HTTPS load balancing

An HTTPS load balancer shares the same basic structure as an HTTP load balancer (described above), but differs in the following ways:

  • Uses a target HTTPS proxy instead of a target HTTP proxy
  • Requires a signed SSL certificate for the load balancer
  • Requires a firewall rule that enables traffic from 130.211.0.0/22 on port 443 to reach your instances
  • The client SSL session terminates at the load balancer. Sessions between the load balancer and the instance can either be HTTPS (recommended) or HTTP. If HTTPS, each instance must have a certificate.

Components

Global forwarding rules and addresses

Global forwarding rules route traffic by IP address, port, and protocol to a load balancing configuration consisting of a target proxy, URL map, and one or more backend services.

Each global forwarding rule provides a single global IP address that can be used in DNS records for your application. No DNS-based load balancing is required. You can either specify the IP address to be used or let Google Compute Engine assign one for you.

Target proxies

Target proxies terminate HTTP(S) connections from clients, and are referenced by one or more global forwarding rules and route the incoming requests to a URL map.

The proxies set HTTP request/response headers as follows:

  • Via: 1.1 google (requests and responses)
  • X-Forwarded-Proto: [http | https] (requests only)
  • X-Forwarded-For: <client IP(s)>, <global forwarding rule external IP> (requests only)
    Can be a comma-separated list of IP addresses depending on the X-Forwarded-For entries appended by the intermediaries the client is traveling through. The first element in the <client IP(s)> section shows the origin address.
  • X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only)
    Parameters for Stackdriver Trace.

URL maps

URL maps define matching patterns for URL-based routing of requests to the appropriate backend services. A default service is defined to handle any requests that do not match a specified host rule or path matching rule. In some situations, such as the cross-region load balancing example, you might not define any URL rules and rely only on the default service. For content-based routing of traffic, the URL map allows you to divide your traffic by examining the URL components to send requests to different sets of backends.

SSL certificates

SSL certificates are used by target HTTPS proxies to securely route incoming HTTPS requests to backend services defined in a URL map.

Backend services

Backend services direct incoming traffic to one or more attached backends. Each backend is composed of an instance group and additional serving capacity metadata. Backend serving capacity can be based on CPU or requests per second (RPS).

Each backend service also specifies which health checks will be performed against the available instances.

HTTP(S) load balancing supports Compute Engine Autoscaler, which allows users to perform autoscaling on the instance groups in a backend service. For more information, see Scaling Based on HTTP load balancing serving capacity.

You can enable connection draining on backend services to ensure minimal interruption to your users when an instance that is serving traffic is terminated, removed manually, or removed by an autoscaler. To learn more about connection draining, read the Enabling Connection Draining documentation.

Load distribution algorithm

HTTP(S) load balancing provides two methods of determining instance load. Within the backend service object, the balancingMode property selects between the requests per second (RPS) and CPU utilization modes. Both modes allow a maximum value to be specified; the HTTP load balancer will try to ensure that load remains under the limit, but short bursts above the limit can occur during failover or load spike events.

Incoming requests are sent to the region closest to the user that has remaining capacity. If more than one zone is configured with backends in a region, the traffic is distributed across the instance groups in each zone according to each group's capacity. Within the zone, the requests are spread evenly over the instances using a round-robin algorithm. Round-robin distribution can be overridden by configuring session affinity.

Session affinity

Session affinity sends all request from the same client to the same virtual machine instance as long as the instance stays healthy and has capacity.

GCP HTTP(S) Load Balancing offers two types of session affinity:

Interfaces

Your HTTP(S) load balancing service can be configured and updated through the following interfaces:

  • The gcloud tool: gcloud is a command-line tool included in the Cloud SDK. The HTTP(S) load balancing documentation calls on this tool frequently to accomplish tasks. For a complete overview of gcloud documentation, see the gcloud Tool Guide. You can find commands related to load balancing in the gcloud compute and gcloud preview command groups.

    You can also get detailed help for any gcloud command by using the --help flag:

    gcloud compute http-health-checks create --help
    
  • The Google Cloud Platform Console: Load balancing tasks can be accomplished through the Google Cloud Platform Console.

  • The REST API: All load balancing tasks can be accomplished using the Google Compute Engine API. The API reference docs describe the resources and methods available to you.

TLS support

A HTTPS target proxy accepts only TLS 1.0 and up when terminating client SSL requests. It speaks only TLS 1.0 and up to the backend service when the backend protocol is HTTPS.

Illegal request handling

The HTTP(S) load balancer blocks client requests from reaching the backend for a number of reasons: some strictly for HTTP/1.1 compliance and others to avoid unexpected data being passed to the backends.

The load balancer blocks the following for HTTP/1.1 compliance:

  • It cannot parse the first line of the request.
  • A header is missing the : delimiter.
  • Headers or the first line contain invalid characters.
  • The content length is not a valid number, or there are multiple content length headers.
  • There are multiple transfer encoding keys, or there are unrecognized transfer encoding values.
  • There's a non-chunked body and no content length specified.
  • Body chunks are unparseable. This is the only case where some data will make it to the backend. The load balancer will close the connections to client and backend when it receives an unparseable chunk.

The load balancer also blocks the request if any of the following are true:

  • The combination of request URL and headers is longer than about 15KB.
  • The request method does not allow a body, but the request has one.
  • The request contains an upgrade header.
  • The HTTP version is unknown.

Logging

Each HTTP(S) request is logged temporarily via Stackdriver Logging. If you have been accepted into the Alpha testing phase, logging is automatic and does not need to be enabled.

How to view logs

To view logs, go to the Logs Viewer in the Cloud Platform Console.

HTTP(S) logs are indexed first by forwarding rule, then by URL map.

  • To see all logs, in the first pull-down menu select Load Balancing > All forwarding rules.
  • To see logs for just one forwarding rule, select a single forwarding rule name from the list.
  • To see logs for just one URL map used by a forwarding rule, select Load Balancing and choose the forwarding rule and URL map of interest.

Log fields of type boolean typically only appear if they have a value of true. If a boolean field has a value of false, that field is omitted from the log.

UTF-8 encoding is enforced for log fields. Characters that are not UTF-8 characters are replaced with question marks.

What is logged

HTTP(S) load balancing log entries contain information useful for monitoring and debugging your HTTPS(S) traffic. Log entries contain the following types of information:

  • General information shown in most GCP logs, such as severity, project ID, project number, timestamp, and so on.
  • HttpRequest log fields.
  • a statusDetails field inside the structPayload. This field holds a string that explains why the load balancer returned the HTTP status that it did. The tables below contain further explanations of these log strings.

statusDetail HTTP success messages

statusDetails (successful) Meaning
response_from_cache The HTTP request was served from cache.
response_from_cache_validated The return code was set from a cached entry that was validated by a backend.
response_sent_by_backend The HTTP request was proxied successfully to the backend.

statusDetail HTTP failure messages

statusDetails (failure) Meaning
aborted_request_due_to_backend_early_response A request with body was aborted due to backend sending an early response with an error code. The response was forwarded to the client. The request was terminated.
backend_503_propagated_as_error The backend sent a 503 that the load balancer could not recover from with retries.
backend_connection_closed_after_partial_response_sent The backend connection closed unexpectedly after a partial response had been sent to the client.
backend_connection_closed_before_data_sent_to_client The backend unexpectedly closed its connection to the load balancer before the response was proxied to the client.
backend_early_response_with_non_error_status The backend sent a non-error response (1XX or 2XX) to an HTTP POST/PUT request before receiving the whole request body.
backend_response_corrupted The HTTP response body sent by the backend has invalid chunked transfer-encoding or is otherwise corrupted.
backend_timedout_after_partial_response The backend connection timed out after a partial response was sent to the client.
backend_timeout The backend timed out while generating a response.
body_not_allowed The client sent a HTTP request with a body, but the HTTP method used does not allow a body.
body_without_content_type The client send a request with a body, but the request headers are missing a required Content-Type header.
cache_lookup_failed_after_partial_response The load balancer failed to serve a full response from cache due to an internal error.
client_disconnected_after_partial_response The connection to the client was broken after the load balancer sent a partial response.
client_disconnected_before_any_response The connection to the client was broken before the load balancer sent any response.
client_timed_out The load balancer idled out the client connection due to lack of progress while proxying either the request or response.
connection_terminated The load balancer closed an idle client connection.
error_uncompressing_gzipped_body There was an error uncompressing a gzipped HTTP response.
failed_to_connect_to_backend The load balancer failed to connect to the backend.
failed_to_pick_backend The load balancer failed to pick a healthy backend to handle the request.
headers_too_long The request headers were larger than the maximum allowed.
http_version_not_supported HTTP version not supported. Currently only HTTP 0.9, 1.0, 1.1, and 2.0 are supported.
http2_connection_inadequate_security An HTTP/2 connection was terminated because the client connection does not use TLS version 1.2 or above, or the negotiated cipher suite does not meet security requirements per HTTP/2 spec.
http2_server_push_canceled_invalid_response_code The load balancer canceled the HTTP/2 server push because the backend returned an invalid response code.
http2_unsupported_version A HTTP/2 connection was terminated because the client attempted to use an unsupported version of the protocol.
internal_error Internal error at the load balancer.
invalid_http2_client_header_format The HTTP/2 headers from client are invalid.
malformed_chunked_body The request body was improperly chunk encoded.
required_body_but_no_content_length The HTTP request requires a body but the request headers did not include a content length or transfer-encoding chunked header.
secure_url_rejected A request with a https:// URL was received over a plaintext HTTP/1.1 connection.
unsupported_method The client supplied an unsupported HTTP request method.
upgrade_header_rejected The client HTTP request contained the Upgrade header and was refused.
uri_too_long The HTTP request URI was longer than the maximum allowed length.
user_not_authenticated The user was not authenticated.
websocket_handshake_failed The websocket handshake failed.

Notes and Restrictions

  • HTTP(S) load balancing does not support HTTP/1.1 100 Continue response. This might affect multipart POST.
  • If your load balanced instances are running a public operating system image supplied by Compute Engine, then firewall rules in the operating system will be configured automatically to allow load balanced traffic. If you are using a custom image, you have to configure the operating system firewall manually. This is separate from the GCP firewall rule that must be created as part of configuring an HTTP(S) load balancer.
  • Load balancing does not keep instances in sync. You must set up your own mechanisms, such as using Deployment Manager, for ensuring that your instances have consistent configurations and data.

Troubleshooting

  • Traffic from the load balancer to your instances has an IP address in the range of 130.211.0.0/22. When viewing logs on your load balanced instances, you will not see the source address of the original client. Instead, you will see source addresses from this range.

Send feedback about...

Compute Engine Documentation