Internal HTTP(S) load balancer overview

A Google Cloud internal HTTP(S) load balancer is a proxy-based, regional Layer 7 load balancer that enables you to run and scale your services behind an internal IP address.

Internal HTTP(S) load balancers distribute HTTP and HTTPS traffic to backends hosted on Compute Engine, Google Kubernetes Engine (GKE), and Cloud Run. The load balancer is accessible only in the chosen region of your Virtual Private Cloud (VPC) network on an internal IP address.

An internal HTTP(S) load balancer is a managed service based on the open source Envoy proxy. This enables rich traffic control capabilities based on HTTP(S) parameters. After the load balancer has been configured, it automatically allocates Envoy proxies to meet your traffic needs.

At a high level, an internal HTTP(S) load balancer consists of:

  • An internal IP address to which clients send traffic. Only clients that are located in the same region as the load balancer can access this IP address. Internal client requests stay internal to your network and region.
  • One or more backend services to which the load balancer forwards traffic. Backends can be Compute Engine VMs, groups of Compute Engine VMs (through instance groups), Cloud Run applications, or GKE nodes (through network endpoint groups [NEGs]). These backends must be located in the same region as the load balancer.
Internal services with Layer 7-based load balancing (click to enlarge)
Internal services with Layer 7-based load balancing (click to enlarge)

For limitations specific to internal HTTP(S) load balancers, see the Limitations section.

For information about how the Google Cloud load balancers differ from each other, see the following documents:

Use cases

Internal HTTP(S) load balancers address many use cases. This section provides a few high-level examples. For additional examples, see traffic management use cases.

Three-tier web services

You can use internal HTTP(S) load balancers to support traditional three-tier web services. The following example shows how you can use three types of Google Cloud load balancers to scale three tiers. At each tier, the load balancer type depends on your traffic type:

The diagram shows how traffic moves through the tiers:

  1. An external HTTP(S) load balancer distributes traffic from the internet to a set of web frontend instance groups in various regions.
  2. These frontends send the HTTP(S) traffic to a set of regional, internal HTTP(S) load balancers (the subject of this overview).
  3. The internal HTTP(S) load balancers distribute the traffic to middleware instance groups.
  4. These middleware instance groups send the traffic to internal TCP/UDP load balancers, which load balance the traffic to data storage clusters.
Layer 7-based routing for internal tiers in a multi-tier app (click to enlarge)
Layer 7-based routing for internal tiers in a multi-tier app

Load balancing using path-based routing

One common use case is load balancing traffic among services. In this example, an internal client can request video and image content by using the same base URL, mygcpservice.internal, with the paths /video and /images.

The internal HTTP(S) load balancer's URL map specifies that requests to path /video should be sent to the video backend service, while requests to path /images should be sent to the images backend service. In the following example, the video and images backend services are served by using Compute Engine VMs, but they can also be served by using GKE pods.

When an internal client sends a request to the load balancer's internal IP address, the load balancer evaluates the request according to this logic and sends the request to the correct backend service.

The following diagram illustrates this use case.

Internal (micro) services with Layer 7-based load balancing (click to enlarge)
Internal (micro) services with Layer 7-based load balancing

Modernizing legacy services

Internal HTTP(S) load balancers can be an effective tool for modernizing legacy applications.

One example of a legacy application is a large monolithic application that you cannot easily update. In this case, you can deploy an internal HTTP(S) load balancer in front of your legacy application. You can then use the load balancer's traffic control capabilities to direct a subset of traffic to new microservices that replace the functionality that your legacy application provides.

To begin, you would configure the load balancer's URL map to route all traffic to the legacy application by default. This maintains the existing behavior. As replacement services are developed, you would update the URL map to route portions of traffic to these replacement services.

Imagine that your legacy application contains some video processing functionality that is served when internal clients send requests to /video. You could break this video service out into a separate microservice as follows:

  1. Add an internal HTTP(S) load balancer in front of your legacy application.
  2. Create a replacement video processing microservice.
  3. Update the load balancer's URL map so that all requests to path /video are routed to the new microservice instead of to the legacy application.

As you develop additional replacement services, you would continue to update the URL map. Over time, fewer requests would be routed to the legacy application. Eventually, replacement services would exist for all the functionality that the legacy application provided. At this point, you could retire your legacy application.

Three-tier web services with global access

If you enable global access, your web-tier client VMs can be in another region.

This multi-tier application example shows the following:

  • A globally-available internet-facing web tier that load balances traffic with an external HTTP(S) load balancer.
  • An internal backend load-balanced database tier in the us-east1 region that is accessed by the global web tier.
  • A client VM that is part of the web tier in the europe-west1 region that accesses the internal load-balanced database tier located in us-east1.
Three-tier web app with an external HTTP(S) load balancer, global access, and an
         internal HTTP(S) load balancer.
Three-tier web app with an external HTTP(S) load balancer, global access, and an internal HTTP(S) load balancer (click to enlarge)

Load balancing with hybrid connectivity

Cloud Load Balancing supports load-balancing traffic to endpoints that extend beyond Google Cloud, such as on-premises data centers and other public clouds that you can use hybrid connectivity to reach.

The following diagram demonstrates a hybrid deployment with an internal HTTP(S) load balancer.

Hybrid connectivity with internal HTTP(S) load balancers (click to enlarge)
Hybrid connectivity with internal HTTP(S) load balancers (click to enlarge)

Private Service Connect

You can use an internal HTTP(S) load balancer to send requests to supported regional Google APIs and services. For more information, see Access Google APIs through backends.

Load balancing for GKE applications

If you are building applications in GKE, we recommend that you use the built-in GKE Gateway controller or the GKE Ingress controller, which deploys Google Cloud load balancers on behalf of GKE users. This is the same as the standalone load balancing architecture described on this page, except that its lifecycle is fully automated and controlled by GKE.

Related GKE documentation:

Architecture and resources

The following diagram shows the Google Cloud resources required for an internal HTTP(S) load balancer.

Internal HTTP(S) load balancer components (click to enlarge)
Internal HTTP(S) load balancer components

Each internal HTTP(S) load balancer uses these Google Cloud configuration resources:

Proxy-only subnet

In the diagram above, the proxy-only subnet provides a set of IP addresses that Google uses to run Envoy proxies on your behalf. You must create a proxy-only subnet in each region of a VPC network where you use internal HTTP(S) load balancers. All your internal HTTP(S) load balancers in a region and VPC network share the same proxy-only subnet because all internal HTTP(S) load balancers in the region and VPC network share a pool of Envoy proxies. Further:

  • Proxy-only subnets are only used for Envoy proxies, not your backends.
  • Backend VMs or endpoints of all internal HTTP(S) load balancers in a region and VPC network receive connections from the proxy-only subnet.
  • The IP address of an internal HTTP(S) load balancer is not located in the proxy-only subnet. The load balancer's IP address is defined by its internal managed forwarding rule, which is described below.

Forwarding rule and IP address

An internal managed forwarding rule specifies an internal IP address, port, and regional target HTTP(S) proxy. Clients use the IP address and port to connect to the load balancer's Envoy proxies – the forwarding rule's IP address is the IP address of the load balancer (sometimes called a virtual IP address or VIP).

Clients connecting to an internal HTTP(S) load balancer must use HTTP version 1.1 or later. For the complete list of supported protocols, see Load balancer features.

The internal IP address associated with the forwarding rule can come from any subnet in the same network and region. Note the following conditions:

  • The IP address can (but does not need to) come from the same subnet as the backend instance groups.
  • The IP address must not come from the reserved proxy-only subnet that has its --purpose flag set to REGIONAL_MANAGED_PROXY.
  • If you want to share an internal IP address with multiple forwarding rules, set the IP address's --purpose flag to SHARED_LOADBALANCER_VIP.

Each forwarding rule that you use in an internal HTTP(S) load balancer can reference exactly one TCP port. For HTTP load balancers, use either port 80 or 8080; for HTTPS load balancers, use port 443.

Forwarding rules and global access

An internal HTTP(S) load balancer's forwarding rules are regional, even when global access is enabled. After you enable global access, the regional internal forwarding rule's allowGlobalAccess flag is set to true.

Target proxy

A regional target HTTP(S) proxy terminates HTTP(S) connections from clients. The HTTP(S) proxy consults the URL map to determine how to route traffic to backends. A target HTTPS proxy uses an SSL certificate to authenticate itself to clients.

The load balancer preserves the Host header of the original client request. The load balancer also appends two IP addresses to the X-Forwarded-For header:

  • The IP address of the client that connects to the load balancer
  • The IP address of the load balancer's forwarding rule

If there is no X-Forwarded-For header on the incoming request, these two IP addresses are the entire header value. If the request does have an X-Forwarded-For header, other information, such as the IP addresses recorded by proxies on the way to the load balancer, are preserved before the two IP addresses. The load balancer does not verify any IP addresses that precede the last two IP addresses in this header.

If you are running a proxy as the backend server, this proxy typically appends more information to the X-Forwarded-For header, and your software might need to take that into account. The proxied requests from the load balancer come from an IP address in the proxy-only subnet, and your proxy on the backend instance might record this address as well as the backend instance's own IP address.

SSL certificates

Internal HTTP(S) load balancers using target HTTPS proxies require private keys and SSL certificates as part of the load balancer configuration. These load balancers use self-managed regional Compute Engine SSL certificates.

For more information about SSL certificates and Google Cloud proxy load balancers, see the SSL certificates overview.

URL map

The HTTP(S) proxy uses a regional URL map to make a routing determination based on HTTP attributes (such as the request path, cookies, or headers). Based on the routing decision, the proxy forwards client requests to specific regional backend services. The URL map can specify additional actions to take such as rewriting headers, sending redirects to clients, and configuring timeout policies (among others).

Backend service

A regional backend service distributes requests to healthy backends: instance groups containing Compute Engine VMs, NEGs containing GKE containers, or Private Service Connect NEGs pointing to supported Google APIs and services.

Backend services support the HTTP, HTTPS, or HTTP/2 protocols. HTTP/2 is only supported over TLS. Clients and backends do not need to use the same request protocol. For example, clients can send requests to the load balancer by using HTTP/2, and the load balancer can forward these requests to backends by using HTTP/1.1.

One or more backends must be connected to the backend service. Because the scope of an internal HTTP(S) load balancer is regional, not global, clients and backend VMs or endpoints must all be in the same region. Backends can be instance groups or NEGs in any of the following configurations:

  • Managed instance groups (zonal or regional)
  • Unmanaged instance groups (zonal)
  • Network endpoint groups

You cannot use instance groups and NEGs on the same backend service.

Backends and VPC networks

All backends must be located in the same VPC network and region. Placing backends in different VPC networks, even those connected using VPC Network Peering, is not supported. For details about how client systems in peered VPC networks can access load balancers, see Internal HTTP(S) load balancers and connected networks.

Backend subsetting

Backend subsetting is an optional feature that improves performance and scalability by assigning a subset of backends to each of the proxy instances.

By default, backend subsetting is disabled. For information about enabling this feature, see Backend subsetting for internal HTTP(S) load balancer.

Health check

Each backend service specifies a health check that periodically monitors the backends' readiness to receive a connection from the load balancer. This reduces the risk that requests might be sent to backends that can't service the request. Health checks do not check if the application itself is working.

Firewall rules

An internal HTTP(S) load balancer requires the following firewall rules:

  • An ingress allow rule that permits traffic from Google's central health check ranges.
    • 35.191.0.0/16
    • 130.211.0.0/22
    Currently, health check probes for hybrid NEGs originate from Google's centralized health checking mechanism. If you cannot allow traffic that originates from the Google health check ranges to reach your hybrid endpoints and would prefer to have the health check probes originate from private IP addresses instead, speak to your Google account representative to get your project allowlisted for distributed Envoy health checks.
  • An ingress allow rule that permits traffic from the proxy-only subnet.

Client access

By default, clients must be in the same region as the load balancer. Clients can be in the same network or in a VPC network connected by using VPC Network Peering. You can enable global access to allow clients from any region to access your load balancer.

Internal HTTP(S) load balancer with global access (click to enlarge)
Internal HTTP(S) load balancer with global access (click to enlarge)

The following table summarizes client access.

Global access disabled Global access enabled
Clients must be in the same region as the load balancer. They also must be in the same VPC network as the load balancer or in a VPC network that is connected to the load balancer's VPC network by using VPC Network Peering. Clients can be in any region. They still must be in the same VPC network as the load balancer or in a VPC network that's connected to the load balancer's VPC network by using VPC Network Peering.
On-premises clients can access the load balancer through Cloud VPN tunnels or VLAN attachments. These tunnels or attachments must be in the same region as the load balancer. On-premises clients can access the load balancer through Cloud VPN tunnels or VLAN attachments. These tunnels or attachments can be in any region.

Shared VPC architectures

Internal HTTP(S) load balancers support networks that use Shared VPC. Shared VPC lets organizations connect resources from multiple projects to a common VPC network so that they can communicate with each other securely and efficiently using internal IPs from that network. If you're not already familiar with Shared VPC, read the Shared VPC overview documentation.

There are many ways to configure an internal HTTP(S) load balancer within a Shared VPC network. Regardless of type of deployment, all the components of the load balancer must be in the same organization.

Subnets and IP address Frontend components Backend components

Create the required network and subnets (including the proxy-only subnet), in the Shared VPC host project.

The load balancer's internal IP address can be defined in either the host project or a service project, but it must use a subnet in the desired Shared VPC network in the host project. The address itself comes from the primary IP range of the referenced subnet.

The regional external IP address, the forwarding rule, the target HTTP(S) proxy, and the associated URL map must be defined in the same project. This project can be the host project or a service project. You can do one of the following:
  • Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in the same service project as the frontend components.
  • Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in as many service projects as required. A single URL map can reference backend services across different projects. This type of deployment is known as cross-project service referencing.

Each backend service must be defined in the same project as the backends it references. Health checks associated with backend services must be defined in the same project as the backend service as well.

While you can create all the load balancing components and backends in the Shared VPC host project, this model does not separate network administration and service development responsibilities.

Clients can access an internal HTTP(S) load balancer if they are in the same Shared VPC network and region as the load balancer. Clients can be located in the host project, or in an attached service project, or any connected networks.

Serverless backends in a Shared VPC environment

For an internal HTTP(S) load balancer that is using a serverless NEG backend, the backing Cloud Run service must be in the same service project as the the backend service and the serverless NEG. The load balancer's frontend components (forwarding rule, target proxy, URL map) can be created in either the host project, the same service project as the backend components, or any other service project in the same Shared VPC environment.

Cross-project service referencing

In this model, the load balancer's frontend and URL map are in a host or service project. The load balancer's backend services and backends can be distributed across projects in the Shared VPC environment. Cross-project backend services can be referenced in a single URL map. This is referred to as cross- project service referencing.

Cross-project service referencing allows organizations to configure one central load balancer and route traffic to hundreds of services distributed across multiple different projects. You can centrally manage all traffic routing rules and policies in one URL map. You can also associate the load balancer with a single set of hostnames and SSL certificates. You can therefore optimize the number of load balancers needed to deploy your application, and lower manageability, operational costs, and quota requirements.

By having different projects for each of your functional teams, you can also achieve separation of roles within your organization. Service owners can focus on building services in service projects, while network teams can provision and maintain load balancers in another project, and both can be connected by using cross-project service referencing.

Service owners can maintain autonomy over the exposure of their services and control which users can access their services by using the load balancer. This is achieved by a special IAM role called the Compute Load Balancer Services User role (roles/compute.loadBalancerServiceUser).

To learn how to configure Shared VPC for an internal HTTP(S) load balancer—with and without cross-project service referencing, see Set up an internal HTTP(S) load balancer with Shared VPC.

Cross-project service referencing can be used with instance groups, serverless NEGs, or any other supported backend types.

Example 1: Load balancer frontend and backend in different service projects

Here is an example of a deployment where the load balancer's frontend and URL map are created in service project A and the URL map references a backend service in service project B.

In this case, Network Admins or Load Balancer Admins in service project A will require access to backend services in service project B. Service project B admins grant the compute.loadBalancerServiceUser IAM role to Load Balancer Admins in service project A who want to reference the backend service in service project B.

Load balancer frontend and URL map in service project
Load balancer frontend and backend in different service projects

Example 2: Load balancer frontend in the host project and backends in service projects

In this type of deployment, the load balancer's frontend and URL map are created in the host project and the backend services (and backends) are created in service projects.

In this case, Network Admins or Load Balancer Admins in the host project will require access to backend services in the service project. Service project admins grant the compute.loadBalancerServiceUser IAM role to to Load Balancer Admins in the host project A who want to reference the backend service in the service project.

Load balancer frontend and URL map in host project
Load balancer frontend and URL map in host project

All load balancer components and backends in a service project

In this model, all load balancer components and backends are in a service project. This deployment model is supported by all HTTP(S) load balancers.

The load balancer uses IP addresses and subnets from the host project.

Internal HTTP(S) load balancer on Shared VPC network
Internal HTTP(S) load balancer on Shared VPC network

Timeouts and retries

Internal HTTP(S) load balancers have the following timeouts:
  • A configurable HTTP backend service timeout, which represents the amount of time the load balancer waits for your backend to return a complete HTTP response. The default value for the backend service timeout is 30 seconds. The full range of timeout values allowed is 1-2,147,483,647 seconds.

    For example, if you want to download a 500-MB file, and the value of the backend service timeout is the default value of 30 seconds, the load balancer expects the backend to deliver the entire 500-MB file within 30 seconds. It is possible to configure the backend service timeout to not be long enough for the backend to send its complete HTTP response. In this situation, if the load balancer has at least received HTTP response headers, the load balancer returns the complete response headers and as much of the response body as it could obtain within the backend service timeout.

    The backend service timeout should be set to the maximum possible time from the first byte of the request to the last byte of the response, for the interaction between Envoy and your backend. If you are using WebSockets, the backend service timeout should be set to the maximum duration of a WebSocket, idle or active.

    Consider increasing this timeout under any of these circumstances:

    • You expect a backend to take longer to return HTTP responses.
    • The connection is upgraded to a WebSocket.

    The backend service timeout you set is a best-effort goal. It does not guarantee that underlying TCP connections will stay open for the duration of that timeout.

    You can set the backend service timeout to whatever value you'd like; however, setting it to a value beyond one day (86,400 seconds) does not mean that the load balancer will keep a TCP connection running for that long. Google periodically restarts Envoy proxies for software updates and routine maintenance, and your backend service timeout does not override that. The longer you make your backend service timeout, the more likely it is that Google will terminate a TCP connection for Envoy maintenance. We recommend that you implement retry logic to reduce the impact of such events.

    The backend service timeout is not an HTTP idle (keepalive) timeout. It is possible that input and output (IO) from the backend is blocked due to a slow client (a browser with a slow connection, for example). This wait time isn't counted against the backend service timeout.

    To configure the backend service timeout, use one of the following methods:

    • Google Cloud console: Modify the Timeout field of the load balancer's backend service.
    • Google Cloud CLI: Use the gcloud compute backend-services update command to modify the --timeout parameter of the backend service resource.
    • API: Modify the timeoutSec parameter for the global or regional backend service resource.

  • An HTTP keepalive timeout, whose value is fixed at 10 minutes (600 seconds). This value is not configurable by modifying your backend service. You must configure the web server software used by your backends so that its keepalive timeout is longer than 600 seconds to prevent connections from being closed prematurely by the backend. This timeout does not apply to WebSockets. This table illustrates changes necessary to modify keepalive timeouts for common web server software:
    Web server software Parameter Default setting Recommended setting
    Apache KeepAliveTimeout KeepAliveTimeout 5 KeepAliveTimeout 620
    nginx keepalive_timeout keepalive_timeout 75s; keepalive_timeout 620s;
  • A Stream idle timeout, whose value is fixed at 5 minutes (300 seconds). This value is non-configurable. HTTP streams become idle after 5 minutes without activity.

Retries

Retries are configurable using a retry policy in the URL map. The default number of retries (numRetries) is 1. The default timeout for each try (perTryTimeout) is 30 seconds with a maximum configurable perTryTimeout of 24 hours.

Without a retry policy, unsuccessful requests that have no HTTP body (for example, GET requests) resulting in HTTP 502, 503, or 504 responses are retried once. HTTP POST requests are not retried.

Retried requests only generate one log entry for the final response.

For more information, see Internal HTTP(S) load balancer logging and monitoring.

Accessing connected networks

You can access an internal HTTP(S) load balancer in your VPC network from a connected network by using the following:

  • VPC Network Peering
  • Cloud VPN and Cloud Interconnect

For detailed examples, see Internal HTTP(S) load balancers and connected networks.

Failover

If a backend becomes unhealthy, traffic is automatically redirected to healthy backends within the same region. If all backends are unhealthy, the load balancer returns an HTTP 503 Service Unavailable response.

WebSocket support

Google Cloud HTTP(S)-based load balancers have native support for the WebSocket protocol when you use HTTP or HTTPS as the protocol to the backend. The load balancer does not need any configuration to proxy WebSocket connections.

The WebSocket protocol provides a full-duplex communication channel between clients and servers. An HTTP(S) request initiates the channel. For detailed information about the protocol, see RFC 6455.

When the load balancer recognizes a WebSocket Upgrade request from an HTTP(S) client followed by a successful Upgrade response from the backend instance, the load balancer proxies bidirectional traffic for the duration of the current connection. If the backend instance does not return a successful Upgrade response, the load balancer closes the connection.

The timeout for a WebSocket connection depends on the configurable backend service timeout of the load balancer, which is 30 seconds by default. This timeout applies to WebSocket connections regardless of whether they are in use.

Session affinity for WebSockets works the same as for any other request. For information, see Session affinity.

gRPC support

gRPC is an open-source framework for remote procedure calls. It is based on the HTTP/2 standard. Use cases for gRPC include the following:

  • Low-latency, highly scalable, distributed systems
  • Developing mobile clients that communicate with a cloud server
  • Designing new protocols that must be accurate, efficient, and language-independent
  • Layered design to enable extension, authentication, and logging

To use gRPC with your Google Cloud applications, you must proxy requests end-to-end over HTTP/2. To do this:

  1. Configure an HTTPS load balancer.
  2. Enable HTTP/2 as the protocol from the load balancer to the backends.

The load balancer negotiates HTTP/2 with clients as part of the SSL handshake by using the ALPN TLS extension.

The load balancer may still negotiate HTTPS with some clients or accept insecure HTTP requests on a load balancer that is configured to use HTTP/2 between the load balancer and the backend instances. Those HTTP or HTTPS requests are transformed by the load balancer to proxy the requests over HTTP/2 to the backend instances.

You must enable TLS on your backends. For more information, see Encryption from the load balancer to the backends.

TLS support

By default, an HTTPS target proxy accepts only TLS 1.0, 1.1, 1.2, and 1.3 when terminating client SSL requests.

When the internal HTTP(S) load balancer uses HTTPS as a backend service protocol, it can negotiate TLS 1.0, 1.1, 1.2, or 1.3 to the backend.

Limitations

  • There's no guarantee that a request from a client in one zone of the region is sent to a backend that's in the same zone as the client. Session affinity doesn't reduce communication between zones.

  • Internal HTTP(S) load balancers aren't compatible with the following features:

  • An internal HTTP(S) load balancer supports HTTP/2 only over TLS.

  • Clients connecting to an internal HTTP(S) load balancer must use HTTP version 1.1 or later. HTTP 1.0 is not supported.

  • Google Cloud doesn't warn you if your proxy-only subnet runs out of IP addresses.

  • The internal forwarding rule that your internal HTTP(S) load balancer uses must have exactly one port.

  • Internal HTTP(S) load balancers don't support Cloud Trace.

  • When using an internal HTTP(S) load balancer with Cloud Run in a Shared VPC environment, standalone VPC networks in service projects can send traffic to any other Cloud Run services deployed in any other service projects within the same Shared VPC environment. This is a known issue and this form of access will be blocked in the future.

What's next