Google Cloud Internal HTTP(S) Load Balancing is a proxy-based, regional Layer 7 load balancer that enables you to run and scale your services behind an internal IP address.
Internal HTTP(S) Load Balancing distributes HTTP and HTTPS traffic to backends hosted on Compute Engine and Google Kubernetes Engine (GKE). The load balancer is accessible only in the chosen region of your Virtual Private Cloud (VPC) network on an internal IP address.
Internal HTTP(S) Load Balancing is a managed service based on the open source Envoy proxy. This enables rich traffic control capabilities based on HTTP(S) parameters. After the load balancer has been configured, it automatically allocates Envoy proxies to meet your traffic needs.
At a high level, an internal HTTP(S) load balancer consists of:
- An internal IP address to which clients send traffic. Only clients that are located in the same region as the load balancer can access this IP address. Internal client requests stay internal to your network and region.
- One or more backend services to which the load balancer forwards traffic. Backends can be Compute Engine VMs, groups of Compute Engine VMs (through instance groups), or GKE nodes (through network endpoint groups [NEGs]). These backends must be located in the same region as the load balancer.
For limitations specific to Internal HTTP(S) Load Balancing, see the Limitations section.
For information about how the Google Cloud load balancers differ from each other, see the following documents:
Internal HTTP(S) Load Balancing addresses many use cases. This section provides a few high-level examples. For additional examples, see traffic management use cases.
Three-tier web services
You can use Internal HTTP(S) Load Balancing to support traditional three-tier web services. The following example shows how you can use three types of Google Cloud load balancers to scale three tiers. At each tier, the load balancer type depends on your traffic type:
Web tier: Traffic enters from the internet and is load balanced by using an external HTTP(S) load balancer.
Application tier: The application tier is scaled by using a regional internal HTTP(S) load balancer.
Database tier: The database tier is scaled by using an internal TCP/UDP load balancer.
The diagram shows how traffic moves through the tiers:
- An external HTTP(S) load balancer distributes traffic from the internet to a set of web frontend instance groups in various regions.
- These frontends send the HTTP(S) traffic to a set of regional, internal HTTP(S) load balancers (the subject of this overview).
- The internal HTTP(S) load balancers distribute the traffic to middleware instance groups.
- These middleware instance groups send the traffic to internal TCP/UDP load balancers, which load balance the traffic to data storage clusters.
Load balancing using path-based routing
One common use case is load balancing traffic among services. In this example,
an internal client can request video and image content by using the same base URL,
mygcpservice.internal, with the paths
The internal HTTP(S) load balancer's URL map specifies that requests to path
/video should be sent to the video backend service, while requests to path
/images should be sent to the images backend service. In the following example,
the video and images backend services are served by using Compute Engine
VMs, but they can also be served by using GKE pods.
When an internal client sends a request to the load balancer's internal IP address, the load balancer evaluates the request according to this logic and sends the request to the correct backend service.
The following diagram illustrates this use case.
Modernizing legacy services
Internal HTTP(S) Load Balancing can be an effective tool for modernizing legacy applications.
One example of a legacy application is a large monolithic application that you cannot easily update. In this case, you can deploy an internal HTTP(S) load balancer in front of your legacy application. You can then use the load balancer's traffic control capabilities to direct a subset of traffic to new microservices that replace the functionality that your legacy application provides.
To begin, you would configure the load balancer's URL map to route all traffic to the legacy application by default. This maintains the existing behavior. As replacement services are developed, you would update the URL map to route portions of traffic to these replacement services.
Imagine that your legacy application contains some video processing
functionality that is served when internal clients send requests to
You could break this video service out into a separate microservice as follows:
- Add Internal HTTP(S) Load Balancing in front of your legacy application.
- Create a replacement video processing microservice.
- Update the load balancer's URL map so that all requests to path
/videoare routed to the new microservice instead of to the legacy application.
As you develop additional replacement services, you would continue to update the URL map. Over time, fewer requests would be routed to the legacy application. Eventually, replacement services would exist for all the functionality that the legacy application provided. At this point, you could retire your legacy application.
Private Service Connect
You can use an Internal HTTP(S) Load Balancing to send requests to supported regional Google APIs and services. See Private Service Connect for more information.
Load balancing for GKE applications
If you are building applications in GKE, we recommend that you use the built-in GKE Ingress controller, which deploys Google Cloud load balancers on behalf of GKE users. This is the same as the standalone load balancing architecture described on this page, except that its lifecycle is fully automated and controlled by GKE.
Related GKE documentation:
- Use Ingress for Internal HTTP(S) Load Balancing
- Configure Ingress for Internal HTTP(S) Load Balancing
Architecture and resources
The following diagram shows the Google Cloud resources required for an internal HTTP(S) load balancer.
Each internal HTTP(S) load balancer uses these Google Cloud configuration resources:
In the diagram above, the proxy-only subnet provides a set of IP addresses that Google uses to run Envoy proxies on your behalf. You must create a proxy-only subnet in each region of a VPC network where you use internal HTTP(S) load balancers. All your internal HTTP(S) load balancers in a region and VPC network share the same proxy-only subnet because all internal HTTP(S) load balancers in the region and VPC network share a pool of Envoy proxies. Further:
- Proxy-only subnets are only used for Envoy proxies, not your backends.
- Backend VMs or endpoints of all internal HTTP(S) load balancers in a region and VPC network receive connections from the proxy-only subnet.
- The IP address of an internal HTTP(S) load balancer is not located in the proxy-only subnet. The load balancer's IP address is defined by its internal managed forwarding rule, which is described below.
Forwarding rule and IP address
An internal managed forwarding rule specifies an internal IP address, port, and regional target HTTP(S) proxy. Clients use the IP address and port to connect to the load balancer's Envoy proxies – the forwarding rule's IP address is the IP address of the load balancer (sometimes called a virtual IP address or VIP).
Clients connecting to an internal HTTP(S) load balancer must use HTTP version 1.1 or later. For the complete list of supported protocols, see Load balancer features.
The internal IP address associated with the forwarding rule can come from
any subnet (in the same network and region) with its
--purpose flag set to
PRIVATE. Note that:
- The IP address can (but does not need to) come from the same subnet as the backend instance groups.
- The IP address must not come from the reserved proxy-only subnet that has
--purposeflag set to
Each forwarding rule that you use in an internal HTTP(S) load balancer can reference exactly one TCP port. For HTTP load balancers, use either port 80 or 8080; for HTTPS load balancers, use port 443.
A regional target HTTP(S) proxy terminates HTTP(S) connections from clients. The HTTP(S) proxy consults the URL map to determine how to route traffic to backends. A target HTTPS proxy uses an SSL certificate to authenticate itself to clients.
The load balancer preserves the Host header of the original client request. The
load balancer also appends two IP addresses to the
- The IP address of the client that connects to the load balancer
- The IP address of the load balancer's forwarding rule
If there is no
X-Forwarded-For header on the incoming request, these two IP
addresses are the entire header value. If the request does have an
X-Forwarded-For header, other information, such as the IP addresses recorded
by proxies on the way to the load balancer, are preserved before the two IP
addresses. The load balancer does not verify any IP addresses that precede the
last two IP addresses in this header.
If you are running a proxy as the backend server, this proxy typically appends
more information to the
X-Forwarded-For header, and your software might need to
take that into account. The proxied requests from the load balancer come from an
IP address in the proxy-only subnet, and your proxy on the backend instance
might record this address as well as the backend instance's own IP address.
Transport Layer Security (TLS) is an encryption protocol used in SSL certificates to protect network communications.
Google Cloud uses SSL certificates to provide privacy and security from a client to a load balancer. If you are using HTTPS-based load balancing, you must install one or more SSL certificates on the target HTTPS proxy.
For more information about SSL certificates, see the following:
- SSL certificates overview
- Serving multiple SSL certificates
- Self-managed certificates
- Google-managed certificates
- SSL certificates quotas on the load balancing quotas page
- Encryption from the load balancer to the backends
- Encryption in Transit in Google Cloud white paper
The HTTP(S) proxy uses a regional URL map to make a routing determination based on HTTP attributes (such as the request path, cookies, or headers). Based on the routing decision, the proxy forwards client requests to specific regional backend services. The URL map can specify additional actions to take such as rewriting headers, sending redirects to clients, and configuring timeout policies (among others).
A regional backend service distributes requests to healthy backends: instance groups containing Compute Engine VMs, NEGs containing GKE containers, or Private Service Connect NEGs pointing to supported Google APIs and services.
Backend services support the HTTP, HTTPS, or HTTP/2 protocols. HTTP/2 is only supported over TLS. Clients and backends do not need to use the same request protocol. For example, clients can send requests to the load balancer by using HTTP/2, and the load balancer can forward these requests to backends by using HTTP/1.1.
One or more backends must be connected to the backend service. Because the scope of an internal HTTP(S) load balancer is regional, not global, clients and backend VMs or endpoints must all be in the same region. Backends can be instance groups or NEGs in any of the following configurations:
- Managed instance groups (zonal or regional)
- Unmanaged instance groups (zonal)
- Network endpoint groups (zonal)
You cannot use instance groups and NEGs on the same backend service.
A regional health check periodically monitors the readiness of your backends. This reduces the risk that requests might be sent to backends that can't service the request.
An internal HTTP(S) load balancer requires the following firewall rules:
- An ingress allow rule to permit traffic from the health check ranges
- An ingress allow rule that permits traffic from the proxy-only subnet
Timeouts and retriesInternal HTTP(S) Load Balancing has three distinct types of timeouts:
A configurable HTTP backend service timeout, which represents the amount of time the load balancer waits for your backend to return a complete HTTP response. The default value for the backend service timeout is 30 seconds. The full range of timeout values allowed is 1-2,147,483,647 seconds.
For example, if the value of the backend service timeout is the default value of 30 seconds, the backends have 30 seconds to respond to requests. The load balancer retries the HTTP GET request once if the backend closes the connection or times out before sending response headers to the load balancer. If the backend sends response headers or if the request sent to the backend is not an HTTP GET request, the load balancer does not retry. If the backend does not reply at all, the load balancer returns an HTTP 5xx response to the client. For these load balancers, change the timeout value if you want to allow more or less time for the backends to respond to requests.
The backend service timeout should be set to the maximum possible time from the first byte of the request to the last byte of the response, for the interaction between Envoy and your backend. If you are using WebSockets, the backend service timeout should be set to the maximum duration of a WebSocket, idle or active.
Consider increasing this timeout under any of these circumstances:
- You expect a backend to take longer to return HTTP responses.
- The connection is upgraded to a WebSocket.
The backend service timeout you set is a best-effort goal. It does not guarantee that underlying TCP connections will stay open for the duration of that timeout.
You can set the backend service timeout to whatever value you'd like; however, setting it to a value beyond one day (86,400 seconds) does not mean that the load balancer will keep a TCP connection running for that long. It might, but it might not. Google periodically restarts Envoy proxies for software updates and routine maintenance, and your backend service timeout does not override that. The longer you make your backend service timeout, the more likely it is that Google will terminate a TCP connection for Envoy maintenance. We recommend you implement retry logic to reduce the impact of such events.
For more information, see Backend service settings.
The backend service timeout is not an HTTP idle (keepalive) timeout. It is possible that input and output (IO) from the backend is blocked due to a slow client (a browser with a slow connection, for example). This wait time isn't counted against the backend service timeout.
- An HTTP keepalive timeout, whose value is fixed at 10 minutes (600 seconds).
This value is not configurable by modifying your backend service. You must
configure the web server software used by your backends so that its keepalive
timeout is longer than 600 seconds to prevent connections from being closed
prematurely by the backend. This timeout does not apply to WebSockets.
This table illustrates changes necessary to modify keepalive timeouts for
common web server software:
Web server software Parameter Default setting Recommended setting Apache KeepAliveTimeout KeepAliveTimeout 5 KeepAliveTimeout 620 nginx keepalive_timeout keepalive_timeout 75s; keepalive_timeout 620s;
- A Stream idle timeout, whose value is fixed at 5 minutes (300 seconds). This value is non-configurable. HTTP streams become idle after 5 minutes without activity.
Retries are configurable using a retry policy in the
numRetries). Without a retry policy, requests are limited
to one attempt, by default. The default retry condition is
The load balancer retries failed GET requests in certain circumstances, such as when the backend service timeout is exhausted. It does not retry failed POST requests. Retried requests only generate one log entry for the final response.
For more information, see Internal HTTP(S) Load Balancing logging and monitoring.
Accessing connected networks
You can access an internal HTTP(S) load balancer in your VPC network from a connected network by using the following:
- VPC Network Peering
- Cloud VPN and Cloud Interconnect
For detailed examples, see Internal HTTP(S) Load Balancing and connected networks.
If a backend becomes unhealthy, traffic is automatically redirected to
healthy backends within the same region. If all backends are unhealthy,
the load balancer returns an
HTTP 503 Service Unavailable response.
Google Cloud HTTP(S)-based load balancers have native support for the WebSocket protocol when you use HTTP or HTTPS as the protocol to the backend. The load balancer does not need any configuration to proxy WebSocket connections.
The WebSocket protocol provides a full-duplex communication channel between clients and servers. An HTTP(S) request initiates the channel. For detailed information about the protocol, see RFC 6455.
When the load balancer recognizes a WebSocket
Upgrade request from
an HTTP(S) client followed by a successful
Upgrade response from the backend
instance, the load balancer proxies bidirectional traffic for
the duration of the current connection. If the backend instance does not return
Upgrade response, the load balancer closes the connection.
The timeout for a WebSocket connection depends on the configurable backend service timeout of the load balancer, which is 30 seconds by default. This timeout applies to WebSocket connections regardless of whether they are in use.
Session affinity for WebSockets works the same as for any other request. For information, see Session affinity.
gRPC is an open-source framework for remote procedure calls. It is based on the HTTP/2 standard. Use cases for gRPC include the following:
- Low-latency, highly scalable, distributed systems
- Developing mobile clients that communicate with a cloud server
- Designing new protocols that must be accurate, efficient, and language-independent
- Layered design to enable extension, authentication, and logging
To use gRPC with your Google Cloud applications, you must proxy requests end-to-end over HTTP/2. To do this:
- Configure an HTTPS load balancer.
- Enable HTTP/2 as the protocol from the load balancer to the backends.
The load balancer negotiates HTTP/2 with clients as part of the SSL handshake by using the ALPN TLS extension.
The load balancer may still negotiate HTTPS with some clients or accept insecure HTTP requests on a load balancer that is configured to use HTTP/2 between the load balancer and the backend instances. Those HTTP or HTTPS requests are transformed by the load balancer to proxy the requests over HTTP/2 to the backend instances.
You must enable TLS on your backends. For more information, see Encryption from the load balancer to the backends.
Shared VPC architectures
Internal HTTP(S) Load Balancing supports networks that use Shared VPC. If you're not already familiar with Shared VPC, read the Shared VPC overview documentation.
In the context of Internal HTTP(S) Load Balancing, there are two ways to configure load balancing within a Shared VPC network. You can create the load balancer and its backend instances either in the service project or in the host project.
Load balancer and backends in a service project
In this model, the load balancer and backends exist in a service project and use IP addresses in subnets of a Shared VPC network.
This deployment model aligns closely with typical Shared VPC use cases:
- By dividing responsibility between network administration and service development, maintains a clear separation of responsibilities between network administrators and service developers.
- Allows network administrators to securely and efficiently manage internal IP addresses.
In the host project:
- A Shared VPC Admin enables the host project and connects service projects to the host project.
- The host project's owner, editor, or more granular role (such as a Network Admin) creates networks and subnets, including proxy-only subnets, in the host project.
- The host project's owner, editor, or more granular role (such as a Security Admin) configures firewall rules in the host project.
- A Shared VPC Admin configures IAM policies which determine what subnets can be used by Service Project Admins.
- The service project's owner, editor, or more granular role (such as a Compute Admin) creates backend instances.
- The service project's owner, editor, or more granular role (such as a Network Admin or Load Balancer Admin) creates the load balancer components (forwarding rule, target HTTP(S) proxy, URL map, backend service(s), and health checks).
- These load balancing resources and backend instances reference a Shared VPC network and subnets in the host project.
Clients can access the load balancer if they are in the same Shared VPC network and region as the load balancer. Clients can be in a service project or the host project.
To learn how to configure an internal HTTP(S) load balancer for a Shared VPC network, see Setting up Internal HTTP(S) Load Balancing with Shared VPC.
Load balancer and backends in a host project
In this model, the Shared VPC network, load balancer components, and backends are all in the host project. This model does not separate network administration and service development responsibilities.
Configuration for this model is the same as configuring the load balancer in a standalone VPC network. Follow the steps in Setting up Internal HTTP(S) Load Balancing.
By default, an HTTPS target proxy accepts only TLS 1.0, 1.1, 1.2, and 1.3 when terminating client SSL requests.
When the internal HTTP(S) load balancer uses HTTPS as a backend service protocol, it can negotiate TLS 1.0, 1.1, 1.2, or 1.3 to the backend.
Internal HTTP(S) Load Balancing operates at a regional level.
There's no guarantee that a request from a client in one zone of the region is sent to a backend that's in the same zone as the client. Session affinity doesn't reduce communication between zones.
Internal HTTP(S) Load Balancing isn't compatible with the following features:
When creating an internal HTTP(S) load balancer in a Shared VPC host or service project:
All load balancing components and backends must exist in the same project, either all in a host project or all in a service project. For example, you cannot deploy the load balancer's forwarding rule in one project and create backend instances in another project.
Clients can be located in either the host project, any attached service projects, or any connected networks. Clients must use the same Shared VPC network and be in the same region as the load balancer.
An internal HTTP(S) load balancer supports HTTP/2 only over TLS.
Clients connecting to an internal HTTP(S) load balancer must use HTTP version 1.1 or later. HTTP 1.0 is not supported.
Google Cloud doesn't warn you if your proxy-only subnet runs out of IP addresses.
The internal forwarding rule that your internal HTTP(S) load balancer uses must have exactly one port.
Internal HTTP(S) Load Balancing does not support Cloud Trace.
Internal HTTP(S) load balancers do not support VPC Network Peering. You must create the load balancer's forwarding rules and backends in the same VPC network.
- To configure load balancing for your services running on Compute Engine VMs, see Setting up Internal HTTP(S) Load Balancing for Compute Engine VMs.
- To configure load balancing on a Shared VPC setup, see Setting up Internal HTTP(S) Load Balancing for Shared VPC.
- To configure load balancing for your services running in GKE pods, see Container-native load balancing with standalone NEGs and the Attaching an internal HTTP(S) load balancer to standalone NEGs section.
- To configure Internal HTTP(S) Load Balancing with Private Service Connect, see Configuring Private Service Connect with consumer HTTP(S) service controls.
- To manage the proxy-only subnet resource, see Proxy-only subnets for internal HTTP(S) load balancers.