Setting Up HTTP(S) Load Balancing

Google Cloud Platform (GCP) HTTP(S) load balancing provides global load balancing for HTTP(S) requests destined for your instances. You can configure URL rules that route some URLs to one set of instances and route other URLs to other instances. Requests are always routed to the instance group that is closest to the user, provided that group has enough capacity and is appropriate for the request. If the closest group does not have enough capacity, the request is sent to the closest group that does have capacity.

HTTP(S) load balancing supports both IPv4 and IPv6 addresses for client traffic. Client IPv6 requests are terminated at the global load balancing layer, then proxied over IPv4 to your backends.

HTTP requests can be load balanced based on port 80 or port 8080. HTTPS requests can be load balanced on port 443.

The load balancer acts as an HTTP/2 to HTTP/1.1 translation layer, which means that the web servers always see and respond to HTTP/1.1 requests, but that requests from the browser can be HTTP/1.0, HTTP/1.1, or HTTP/2. HTTP/2 server push is not supported.

Before you begin

HTTP(S) load balancing uses instance groups to organize instances. Make sure you are familiar with instance groups before you use load balancing.

Example configurations

If you want to jump right in and build a working load balancer for testing, the following guides demonstrate two different scenarios using the HTTP(S) load balancing service. These scenarios provide a practical context for HTTP(S) load balancing and demonstrate how you might set up load balancing for your specific needs.

The rest of this page digs into more detail about how load balancers are constructed and how they work.

Creating a cross-region load balancer

Representation of
  cross-region load balancing

You can use a global IP address that can intelligently route users based on proximity. For example, if you set up instances in North America, Europe, and Asia, users around the world will be automatically sent to the backends closest to them, assuming those instances have enough capacity. If the closest instances do not have enough capacity, cross-region load balancing automatically forwards users to the next closest region.

Get started with cross-region load balancing

Creating a content-based load balancer

Representation of
  content-based load balancing

Content-based or content-aware load balancing uses HTTP(S) load balancing to distribute traffic to different instances based on the incoming HTTP(S) URL. For example, you can set up some instances to handle your video content and another set to handle everything else. You can configure your load balancer to direct traffic for to the video servers and to the default servers.

Get started with content-based load balancing

You can also use HTTP(S) load balancing with Google Cloud Storage buckets. Once you have your content-based load balancer set up, you can add a Cloud Storage bucket to your load balancer.

Content-based and cross-region load-balancing can work together by using multiple backend services and multiple regions. You can build on top of the scenarios above to configure your own load balancing configuration that meets your needs.



An HTTP(S) load balancer is composed of several components. The following diagram illustrates the architecture of a complete HTTP(S) load balancer:

Cross-region load balancing diagram (click to enlarge)

The following sections describe how each component works together to make up each type of load balancer. For a detailed description of each component, see Components below.

HTTP load balancing

A complete HTTP load balancer is structured as follows:

  1. A global forwarding rule directs incoming requests to a target HTTP proxy.
  2. The target HTTP proxy checks each request against a URL map to determine the appropriate backend service for the request.
  3. The backend service directs each request to an appropriate backend based on serving capacity, zone, and instance health of its attached backends. The health of each backend instance is verified using either an HTTP health check or an HTTPS health check. If the backend service is configured to use the latter, the request will be encrypted on its way to the backend instance.
  4. Sessions between the load balancer and the instance can either be HTTPS or HTTP. If you use HTTPS, each instance in the backend services must have an SSL certificate.

HTTPS load balancing

An HTTPS load balancer has the same basic structure as an HTTP load balancer (described above), but differs in the following ways:

  • An HTTPS load balancer uses a target HTTPS proxy instead of a target HTTP proxy.
  • An HTTPS load balancer requires at least one signed SSL certificate installed on the target HTTPS proxy for the load balancer.
  • The client SSL session terminates at the load balancer.
  • HTTPS load balancers support the QUIC transport layer protocol.


Global forwarding rules and addresses

Global forwarding rules route traffic by IP address, port, and protocol to a load balancing configuration consisting of a target proxy, URL map, and one or more backend services.

Each global forwarding rule provides a single global IP address that can be used in DNS records for your application. No DNS-based load balancing is required. You can either specify the IP address to be used or let Google Compute Engine assign one for you.

Target proxies

Target proxies terminate HTTP(S) connections from clients, and are referenced by one or more global forwarding rules and route the incoming requests to a URL map.

The proxies set HTTP request/response headers as follows:

  • Via: 1.1 google (requests and responses)
  • X-Forwarded-Proto: [http | https] (requests only)
  • X-Forwarded-For: <unverified IP(s)>, <immediate client IP>, <global forwarding rule external IP>, <proxies running in GCP> (requests only)
    A comma-separated list of IP addresses appended by the intermediaries the request traveled through. If you are running proxies inside GCP that append data to the X-Forwarded-For header, then your software must take into account the existence and number of those proxies. Only the <immediate client IP> and <global forwarding rule external IP> entries are provided by the load balancer. All other entries in the list are passed along without verification. The <immediate client IP> entry is the client that connected directly to the load balancer. The <global forwarding rule external IP> entry is the external IP address of the load balancer's forwarding rule. If there are more entries than that, then the first entry in the list is the address of the original client. Other entries before the <immediate client IP> entry represent other proxies that forwarded the request along to the load balancer.
  • X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only)
    Parameters for Stackdriver Trace.

You can create custom request headers if the default headers do not meet your needs. For more information on this feature, see User-defined request headers.

URL maps

URL maps define matching patterns for URL-based routing of requests to the appropriate backend services. A default service is defined to handle any requests that do not match a specified host rule or path matching rule. In some situations, such as the cross-region load balancing example, you might not define any URL rules and rely only on the default service. For content-based routing of traffic, the URL map allows you to divide your traffic by examining the URL components to send requests to different sets of backends.

SSL certificates

If you are using HTTPS load balancing, you must install one or more SSL certificates on the target HTTPS proxy. You can have up to ten (10) SSL certificates installed. They are used by target HTTPS proxies to authenticate communications between the load balancer and the client.

For more information on installing SSL certificates, see SSL Certificates.

If you are using HTTPS from the load balancer to the backends, you must install SSL certificates on each VM instance. To install SSL certificates on a VM instance, use the instructions in your application documentation.

SSL policies

SSL policies give you the ability to control the features of SSL that your HTTPS load balancer negotiates with HTTPS clients.

By default, HTTPS load balancing uses a set of SSL features that provides good security and wide compatibility. Some applications require more control over which SSL versions and ciphers are used for their HTTPS or SSL connections. You can define SSL policies that control the features of SSL that your load balancer negotiates and associate an SSL policy with your target HTTPS proxy.

Backend services

Backend services direct incoming traffic to one or more attached backends. Each backend is composed of an instance group and additional serving capacity metadata. Backend serving capacity can be based on CPU or requests per second (RPS).

Each backend service also specifies which health checks will be performed against the available instances.

HTTP(S) load balancing supports Compute Engine Autoscaler, which allows users to perform autoscaling on the instance groups in a backend service. For more information, see Scaling Based on HTTP load balancing serving capacity.

You can enable connection draining on backend services to ensure minimal interruption to your users when an instance that is serving traffic is terminated, removed manually, or removed by an autoscaler. To learn more about connection draining, read the Enabling Connection Draining documentation.

Backend buckets

Backend buckets direct incoming traffic to Google Cloud Storage buckets. See Adding a Cloud Storage bucket to content-based load balancing for an example of adding buckets to an existing load balancer setup.

Firewall rules

You must create a firewall rule that allows traffic from and to reach your instances. These are IP address ranges that the load balancer uses to connect to backend instances. This rule allows traffic from both the load balancer and the health checker. The rule must allow traffic on the port your global forwarding rule has been configured to use, and your health checker should be configured to use the same port. If your health checker uses a different port, then you must create another firewall rule for that port.

Note that firewall rules block and allow traffic at the instance level, not at the edges of the network. They cannot prevent traffic from reaching the load balancer itself.

If you need to determine external IP addresses at a particular time, use the instructions in the Google Compute Engine FAQ.

Load distribution algorithm

HTTP(S) load balancing provides two methods of determining instance load. Within the backend service object, the balancingMode property selects between the requests per second (RPS) and CPU utilization modes. Both modes allow a maximum value to be specified; the HTTP load balancer will try to ensure that load remains under the limit, but short bursts above the limit can occur during failover or load spike events.

Incoming requests are sent to the region closest to the user, provided that region has available capacity. If more than one zone is configured with backends in a region, the traffic is distributed across the instance groups in each zone according to each group's capacity. Within the zone, the requests are spread evenly over the instances using a round-robin algorithm. Round-robin distribution can be overridden by configuring session affinity.

Session affinity

Session affinity sends all request from the same client to the same virtual machine instance as long as the instance stays healthy and has capacity.

GCP HTTP(S) Load Balancing offers two types of session affinity:

WebSocket proxy support

The HTTP(S) load balancer has native support for the WebSocket protocol. Backends that use WebSocket to communicate with clients can use the HTTP(S) load balancer as a front end, for scale and availability. The load balancer does not need any additional configuration to proxy WebSocket connections.

The WebSocket protocol, which is defined in RFC 6455, provides a full-duplex communication channel between clients and servers. The channel is initiated from an HTTP(S) request.

When the HTTP(S) load balancer recognizes a WebSocket Upgrade request from an HTTP(S) client and the request is followed by a successful Upgrade response from the backend instance, the load balancer proxies bidirectional traffic for the duration of the current connection. If the backend does not return a successful Upgrade response, the load balancer closes the connection.

The timeout for a WebSocket connection depends on the configurable response timeout of the load balancer, which is 30 seconds by default. This timeout is applied to WebSocket connections regardless of whether they are in use or not. For more information about the response timeout and how to configure it, refer to Timeouts and retries.

If you have configured either client IP or generated cookie session affinity for your HTTP(S) load balancer, all WebSocket connections from a client are sent to the same backend instance, provided the instance continues to pass health checks and has capacity.

QUIC protocol support for HTTPS load balancing

HTTPS load balancing supports the QUIC protocol in connections between the load balancer and the clients. QUIC is a transport layer protocol that provides congestion control similar to TCP and security equivalent to SSL/TLS for HTTP/2, with improved performance. QUIC allows faster client connection initiation, eliminates head-of-line blocking in multiplexed streams, and supports connection migration when a client's IP address changes.

QUIC affects connections between clients and the load balancer, not connections between the load balancer and backends.

The target proxy's QUIC override setting allows you to enable one of the following:

  • When possible, negotiate QUIC for a load balancer OR
  • Always disable QUIC for a load balancer.

If you specify no QUIC override, you allow Google to manage when QUIC is used. Google does not enable QUIC with no override specified. For information on enabling and disabling QUIC support, see Target Proxies.

How QUIC is Negotiated

When you enable QUIC, the load balancer can advertise its QUIC capability to clients, allowing clients that support QUIC to attempt to establish QUIC connections with the HTTPS load balancer. Properly implemented clients always fall back to HTTPS or HTTP/2 when they cannot establish a QUIC connection. Because of this fallback, enabling or disabling QUIC in the load balancer does not disrupt the load balancer’s ability to connect to clients.

When you have QUIC enabled in your HTTPS load balancer, some circumstances can cause your client to fall back to HTTPS or HTTP/2 instead of negotiating QUIC. These include:

  • When a client supports versions of QUIC that are not compatible with the QUIC versions supported by the HTTPS load balancer
  • When the load balancer detects that UDP traffic is blocked or rate limited in a way that would prevent QUIC from working
  • If QUIC is temporarily disabled for HTTPS load balancers in response to bugs, vulnerabilities, or other concerns.

When a connection falls back to HTTPS or HTTP/2 because of these circumstances, we do not count this as a failure of the load balancer.

Ensure that the above described behaviors are acceptable for your workloads before you enable QUIC.


Your HTTP(S) load balancing service can be configured and updated through the following interfaces:

  • The gcloud command-line tool: a command-line tool included in the Cloud SDK. The HTTP(S) load balancing documentation calls on this tool frequently to accomplish tasks. For a complete overview of the tool, see the gcloud Tool Guide. You can find commands related to load balancing in the gcloud compute command group.

    You can also get detailed help for any gcloud command by using the --help flag:

    gcloud compute http-health-checks create --help
  • The Google Cloud Platform Console: Load balancing tasks can be accomplished through the Google Cloud Platform Console.

  • The REST API: All load balancing tasks can be accomplished using the Google Compute Engine API. The API reference docs describe the resources and methods available to you.

TLS support

By default, an HTTPS target proxy accepts only TLS 1.0, 1.1, and 1.2 when terminating client SSL requests. You can use SSL policies to change this default behavior and control how the load balancer negotiates SSL with clients.

When the load balancer uses HTTPS as a backend service protocol, it can negotiate TLS 1.0, 1.1, or 1.2 to the backend.

Timeouts and retries

HTTP(S) load balancing has two distinct types of timeouts:

  • A configurable response timeout, which represents the amount of time the load balancer will wait for your backend to return a complete response. It is not an idle (keepalive) timeout. This timeout is configurable by modifying the timeout setting for your backend service. The default value is 30 seconds. Consider increasing this timeout under these circumstances:

    • If you expect a backend to take longer to return HTTP responses, or
    • If the connection is upgraded to a WebSocket.
  • A TCP session timeout, whose value is fixed at 10 minutes (600 seconds). This session timeout is sometimes called a keepalive or idle timeout, and its value is not configurable by modifying your backend service. You must configure the web server software used by your backends so that its keepalive timeout is longer than 600 seconds to prevent connections from being closed prematurely by the backend. This timeout does not apply to WebSockets.

This table illustrates changes necessary to modify keepalive timeouts for common web server software:

Web Server Software Parameter Default Setting Recommended Setting
Apache KeepAliveTimeout KeepAliveTimeout 5 KeepAliveTimeout 620
nginx keepalive_timeout keepalive_timeout 75s; keepalive_timeout 620s;

HTTP(S) load balancing retries failed GET requests in certain circumstances, such as when the response timeout is exhausted. It does not retry failed POST requests. Retries are limited to two attempts. Retried requests only generate one log entry for the final response. Refer to Logging for more information.

Illegal request handling

The HTTP(S) load balancer blocks client requests from reaching the backend for a number of reasons: some strictly for HTTP/1.1 compliance and others to avoid unexpected data being passed to the backends.

The load balancer blocks the following for HTTP/1.1 compliance:

  • It cannot parse the first line of the request.
  • A header is missing the : delimiter.
  • Headers or the first line contain invalid characters.
  • The content length is not a valid number, or there are multiple content length headers.
  • There are multiple transfer encoding keys, or there are unrecognized transfer encoding values.
  • There's a non-chunked body and no content length specified.
  • Body chunks are unparseable. This is the only case where some data will make it to the backend. The load balancer will close the connections to client and backend when it receives an unparseable chunk.

The load balancer also blocks the request if any of the following are true:

  • The combination of request URL and headers is longer than about 15KB.
  • The request method does not allow a body, but the request has one.
  • The request contains an upgrade header.
  • The HTTP version is unknown.


Each HTTP(S) request is logged temporarily via Stackdriver Logging. If you have been accepted into the Alpha testing phase, logging is automatic and does not need to be enabled.

How to view logs

To view logs, go to the Logs Viewer.

HTTP(S) logs are indexed first by forwarding rule, then by URL map.

  • To see all logs, in the first pull-down menu select Cloud HTTP Load Balancer > All forwarding_rule_name.
  • To see logs for just one forwarding rule, select a single forwarding rule name from the list.
  • To see logs for just one URL map used by a forwarding rule, highlight a forwarding rule and choose the URL map of interest.

Log fields of type boolean typically only appear if they have a value of true. If a boolean field has a value of false, that field is omitted from the log.

UTF-8 encoding is enforced for log fields. Characters that are not UTF-8 characters are replaced with question marks.

You can configure export of Stackdriver logs based metrics for Cloud HTTP(S) Load Balancer resource logs (resource.type=http_load_balancer). The metrics created will be based on the "Google Cloud HTTP Load Balancing Rule (Logs-based Metrics)" resource (l7_lb_rule), which is available under Stackdriver monitoring dashboards instead of under the https_lb_rule resource.

What is logged

HTTP(S) load balancing log entries contain information useful for monitoring and debugging your HTTP(S) traffic. Log entries contain the following types of information:

  • General information shown in most GCP logs, such as severity, project ID, project number, timestamp, and so on.
  • HttpRequest log fields. However, HttpRequest.protocol is not populated for HTTP(S) Load Balancing Stackdriver logs.
  • The trace ID and span ID are logged from the X-Cloud-Trace-Context sent to the backend. The trace field is formatted as "projects/[PROJECT_ID]/traces/[TRACE_ID]", and the spanId field is the span ID value, but formatted as a 16-character hexadecimal string.
  • a statusDetails field inside the structPayload. This field holds a string that explains why the load balancer returned the HTTP status that it did. The tables below contain further explanations of these log strings.

statusDetail HTTP success messages

statusDetails (successful) Meaning Common Accompanying Response Codes
byte_range_caching The HTTP request was served using byte range caching. Any cachable response code is possible.
response_from_cache The HTTP request was served from cache. Any cachable response code is possible.
response_from_cache_validated The return code was set from a cached entry that was validated by a backend. Any cachable response code is possible.
response_sent_by_backend The HTTP request was proxied successfully to the backend. Returned from VM backend - any response code is possible.

statusDetail HTTP failure messages

statusDetails (failure) Meaning Common Accompanying Response Codes
aborted_request_due_to_backend_early_response A request with body was aborted due to backend sending an early response with an error code. The response was forwarded to the client. The request was terminated. 4XX or 5XX
backend_503_propagated_as_error The backend sent a 503 that the load balancer could not recover from with retries. 503
backend_connection_closed_after_partial_response_sent The backend connection closed unexpectedly after a partial response had been sent to the client. Returned from VM backend - any response code is possible. A 0 indicates the backend did not send full response headers.
backend_connection_closed_before_data_sent_to_client The backend unexpectedly closed its connection to the load balancer before the response was proxied to the client. This can happen if the load balancer is sending traffic to another entity, such as a third party load balancer running on a VM instance, that has a TCP timeout that is shorter than the HTTP(S) load balancer's 10 minute (600 second) timeout. Manually setting the TCP timeout (keepalive) on the target service to greater than 600 seconds may fix this problem. 502
backend_early_response_with_non_error_status The backend sent a non-error response (1XX or 2XX) to a request before receiving the whole request body. 502
backend_response_corrupted The HTTP response body sent by the backend has invalid chunked transfer-encoding or is otherwise corrupted. Any response code possible depending on the nature of the corruption. Often 502.
backend_timeout The backend timed out while generating a response. 502
body_not_allowed The client sent a HTTP request with a body, but the HTTP method used does not allow a body. 400
byte_range_caching_aborted The load balancer aborted the response due to the origin server sending a byte range response inconsistent with the partial response already sent to the client. This can be caused by some VM instances returning different ETag or Last-Modified headers than other instances for the same resource. 2XX
cache_lookup_failed_after_partial_response The load balancer failed to serve a full response from cache due to an internal error. 2XX
client_disconnected_after_partial_response The connection to the client was broken after the load balancer sent a partial response. Returned from the VM backend - any response code is possible.
client_disconnected_before_any_response The connection to the client was broken before the load balancer sent any response. 0
client_timed_out The load balancer idled out the client connection due to lack of progress while proxying either the request or response. 0
error_uncompressing_gzipped_body There was an error uncompressing a gzipped HTTP response. 503
failed_to_connect_to_backend The load balancer failed to connect to the backend. 502
failed_to_pick_backend The load balancer failed to pick a healthy backend to handle the request. 502
headers_too_long The request headers were larger than the maximum allowed. 413
http_version_not_supported HTTP version not supported. Currently only HTTP 0.9, 1.0, 1.1, and 2.0 are supported. 400
http2_server_push_canceled_invalid_response_code The load balancer canceled the HTTP/2 server push because the backend returned an invalid response code. Can only happen when using http2 to the backend. Client will receive a RST_STREAM containing INTERNAL_ERROR.
internal_error Internal error at the load balancer. 400
invalid_http2_client_header_format The HTTP/2 headers from client are invalid. 400
malformed_chunked_body The request body was improperly chunk encoded. 411
required_body_but_no_content_length The HTTP request requires a body but the request headers did not include a content length or transfer-encoding chunked header. 400 or 403
secure_url_rejected A request with a https:// URL was received over a plaintext HTTP/1.1 connection. 400
unsupported_method The client supplied an unsupported HTTP request method. 400
upgrade_header_rejected The client HTTP request contained the Upgrade header and was refused. 400
uri_too_long The HTTP request URI was longer than the maximum allowed length. 414
websocket_closed The websocket connection was closed.
websocket_handshake_failed The websocket handshake failed. 501

Logging for IP blacklist/whitelist

The following log entries in the Logs Viewer are for HTTP(S) IP blacklist/ whitelist logging and include the following structure in jsonPayload. HTTP request details appear in the httpRequest message.

  • status_details (string) - a textual description of the response code
  • enforced_security_policy - the security policy rule that was actually enforced
    • outcome (string) - ACCEPT, DENY or UNKNOWN_OUTCOME
    • configured_action (string) - same as outcome
    • name (string) - name of the security policy
    • priority (number) - matching rule priority
  • preview_security_policy - populated if request hit a rule configured for preview (present only when a preview rule would have taken priority over the enforced rule)
    • outcome (string) - ACCEPT, DENY or UNKNOWN_OUTCOME
    • configured_action (string) - same as outcome
    • name (string) - name of the security policy
    • priority (number) - matching rule priority

You can interact with the HTTP(S) Load Balancer logs using the Stackdriver logging API. The logging API provides ways to interactively filter logs that have specific fields set, and export matching logs to Stackdriver Console, Google Cloud Storage, BigQuery, or Cloud Pub/Sub. For more information on the Stackdriver logging API, see Viewing Logs.


HTTP(S) Load Balancing exports monitoring data to Stackdriver. Monitoring metrics can be used to evaluate an HTTP(S) load balancer's configuration, usage, and performance; troubleshoot problems; and improve resource utilization and user experience.

In addition to the predefined dashboards in Stackdriver, you can create custom dashboards, set up alerts, and query the metrics through the Stackdriver monitoring API.

Viewing Stackdriver monitoring dashboards

  1. Go to Stackdriver in the Google Cloud Platform Console.
    Go to Stackdriver
  2. Select Resources > Google Cloud Load Balancers.
  3. Click the name of your load balancer.

In the left pane, you can see various details for this HTTP(S) load balancer. In the right pane you can see timeseries graphs. Click the Breakdowns link to see specific breakdowns.

Defining Stackdriver alerts

You can define Stackdriver alerts over various HTTP(S) Load Balancing metrics:

  1. Go to Stackdriver in the Google Cloud Platform Console.
    Go to Stackdriver
  2. Select Alerting > Create a Policy.
  3. Click Add Condition and select condition type.
  4. Select metrics and filters. For metrics, the resource type is HTTP(S) Load Balancer.
  5. Click Save Condition.
  6. Enter policy name and click Save Policy.

Defining Stackdriver custom dashboards

You can create custom Stackdriver dashboards over HTTP(S) Load Balancing metrics:

  1. Go to Stackdriver in the Google Cloud Platform Console.
    Go to Stackdriver
  2. Select Dashboards > Create Dashboard.
  3. Click Add Chart.
  4. Give the chart a title.
  5. Select metrics and filters. For metrics, the resource type is HTTP(S) Load Balancer.
  6. Click Save.

Metric reporting frequency and retention

Metrics for the HTTP(S) load balancers are exported to Stackdriver in 1-minute granularity batches. Monitoring data is retained for six (6) weeks. The dashboard provides data analysis in default intervals of 1H, 6H, 1D, 1W, and 6W. You can manually request analysis in any interval from 6W to 1 minute.

Monitoring metrics for HTTP(S) load balancers

The following metrics for HTTP(S) load balancers are reported into Stackdriver:

Metric Description
Request count The number of requests served by the HTTP(S) load balancer
Request bytes count The number of bytes sent as requests from users to the HTTP(S) load balancer
Response bytes count The number of bytes sent as responses from the HTTP(S) load balancer to users
Total latencies A distribution of the latency measured from the time the first byte of the request was received by the load balancer proxy until the proxy receives an ACK from the requesting client on the last response byte. Total latencies are measured by request/response, so pauses between requests on the same connection using `Connection: keep-alive` do not affect the measurement. This measurement is typically reduced to 95th percentile in Stackdriver views.
Example: a load balancer has 1 request per second from the UK, all with 100ms latency, and 9 requests per second from the US, all with 50ms latency. Over a certain minute there were 60 requests from the UK and 540 requests from the US. Monitoring metrics preserves the distribution over all dimensions. You can request the following:
  • median overall latency (300/600) - 50ms
  • median UK latency (30/60) - 100ms
  • 95th percentile overall latency (570/600) - 100ms
  • and so on...
Frontend RTT(*) A distribution of the smoothed RTT measured for each connection between client and proxy (measured by the proxy's TCP stack). Typically reduced to 95th percentile in Stackdriver views.
Backend latencies(*) A distribution of the latency measured from when the first request byte was sent by the proxy to the backend, until the proxy received from the backend the last byte of the response. Typically reduced to 95th percentile in Stackdriver views.
Response code class fraction Percentage of total HTTP(S) load balancer responses that are in each response code class (2XX, 4XX, ...). In Stackdriver, this value is only available via default dashboards. It is not available for custom dashboards. You can set alerts for it via the API.

(*) The sum of Frontend RTT and Backend latencies is not guaranteed to be less than or equal to Total latencies. This is because although we poll RTT over the socket from proxy to client at the time the HTTP response is ACKed, we rely on kernel reporting for some of these measurements, and we cannot guarantee that the kernel will have an RTT measurement for the given HTTP response. The end result is a smoothed RTT value that is also affected by previous HTTP responses, SYN/ACKs, and SSL handshakes that aren't affecting current HTTP request actual timings.

Filtering dimensions for HTTP(S) load balancer metrics

Metrics are aggregated for each HTTP(S) load balancer. You can filter aggregated metrics by the following dimensions(*):

Property Description
BACKEND SCOPE The GCP scope (region or zone) of the backend service instance group that served the connection.

If no instance group was available or if the request was served by another entity, you will see one of the following values instead of the region or zone of the backend service instance group.
  • FRONTEND_5XX - Because of an internal error or lack of healthy backends, the proxy couldn't assign a instance group to the request. Instead, the proxy returned a 5xx response to the requestor.
  • SERVED_FROM_BACKEND_BUCKET - The request was handled by a backend bucket, not a backend service instance group.
  • SERVED_FROM_CACHE - The request was handled by a proxy cache, so no backend was assigned.
BACKEND ZONE If the instance group was a zonal instance group, the GCP zone of the instance group that served the user request. (Examples: us-central1-a, europe-west1-b, asia-east1-c)
BACKEND REGION If the instance group was a regional instance group, the GCP region of the instance group that served the user request. (Examples: us-central1, europe-west1, asia-east1)
PROXY CONTINENT Continent of the HTTP(S) proxy that terminated the user HTTP(S) connection. (Examples: America, Europe, Asia)
INSTANCE GROUP The name of the instance group that served the user request.

If no instance group was available or if the request was served by another entity, you will see one of the following values instead of an instance group.
  • FRONTEND_5XX - An internal error occurred before the proxy could select a backend. Proxy returned 5xx to the client.
  • INVALID_BACKEND - Proxy could not find a healthy backend to assign the request to, so returned a 5xx response to the requestor.
  • SERVED_FROM_BACKEND_BUCKET - The request was handled by a backend bucket, not a backend service instance group.
  • SERVED_FROM_CACHE - The request was handled by a proxy cache, so no backend was assigned.
BACKEND SERVICE The name of the backend service that served the user request.
MATCHED URL RULE The URL map path rule that matched the prefix of the user HTTP(S) request (up to 50 characters).
FORWARDING RULE The name of the forwarding rule used by the client to send the request.

(*) Currently the "Response code class fraction" metric is available per entire load balancer only, with no further breakdowns available.

Notes and Restrictions

  • HTTP(S) load balancing supports the HTTP/1.1 100 Continue response.
  • If your load balanced instances are running a public operating system image supplied by Compute Engine, then firewall rules in the operating system will be configured automatically to allow load balanced traffic. If you are using a custom image, you have to configure the operating system firewall manually. This is separate from the GCP firewall rule that must be created as part of configuring an HTTP(S) load balancer.
  • Load balancing does not keep instances in sync. You must set up your own mechanisms, such as using Deployment Manager, for ensuring that your instances have consistent configurations and data.
  • The HTTP(S) load balancer does not support sending an HTTP DELETE with a body to the load balancer. Such requests will receive an error message: Error 400 (Bad Request)!! Your client has issued a malformed or illegal request. Only DELETE requests without bodies are supported.


Load balanced traffic does not have a source address of the original client

Traffic from the load balancer to your instances has an IP address in the ranges of and When viewing logs on your load balanced instances, you will not see the source address of the original client. Instead, you will see source addresses from this range.

Getting a permission error when trying to view an object in my Cloud Storage bucket

In order to serve objects through load balancing, the Cloud Storage objects must be publicly accessible. Make sure to update the permissions of the objects being served so they are publicly readable.

URL doesn’t serve expected Cloud Storage object

The Cloud Storage object to serve is determined based on your URL map and the URL that you request. If the request path maps to a backend bucket in your URL map, the Cloud Storage object is determined by appending the full request path onto the Cloud Storage bucket that the URL map specifies.

For example, if you map /static/* to gs://[EXAMPLE_BUCKET], the request to https://<GCLB IP or Host>/static/path/to/content.jpg will try to serve gs://[EXAMPLE_BUCKET]/static/path/to/content.jpg. If that object doesn’t exist, you will get the following error message instead of the object:

The specified key does not exist.

Compression isn't working

HTTP(S) Load Balancing does not compress or decompress responses itself, but it can serve responses generated by your backend service that are compressed via tools such as gzip or DEFLATE.

If responses served by HTTP(S) Load Balancing are not compressed but should be, check that the web server software running on your instances is configured to compress responses. By default, some web server software will automatically disable compression for requests that include a Via header. The presence of a Via header indicates the request was forwarded by a proxy. Because it is a proxy, HTTP(S) Load Balancing adds a Via header to each request as required by the HTTP specification. To enable compression, you may have to override your web server's default configuration to tell it to compress responses even if the request had a Via header.

If you are using the nginx web server software, modify the nginx.conf configuration file to enable compression. The location of this file depends on where nginx is installed. In many Linux distributions, the file is stored at /etc/nginx/nginx.conf. To allow nginx compression to work with HTTP(S) load balancing, add the following two lines to the http section of nginx.conf:

gzip_proxied any;
gzip_vary on;

The first line enables compression even for requests forwarded by a proxy like HTTP(S) Load Balancing. The second line adds a Vary: Accept-Encoding header to responses. Vary: Accept-Encoding notifies caching proxies such as Cloud CDN that they should maintain separate cache entries for compressed and non-compressed variants of compressible resources.

After modifying nginx.conf, you need to restart nginx before it will use the new configuration. In many Linux distributions, nginx can be restarted by running sudo service nginx restart or /etc/init.d/nginx restart.

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Compute Engine Documentation