Google Cloud Internal HTTP(S) Load Balancing is a proxy-based, regional Layer 7 load balancer that enables you to run and scale your services behind an internal IP address.
Internal HTTP(S) Load Balancing distributes HTTP and HTTPS traffic to backends hosted on Compute Engine and Google Kubernetes Engine (GKE). The load balancer is accessible only in the chosen region of your Virtual Private Cloud (VPC) network on an internal IP address.
Internal HTTP(S) Load Balancing is a managed service based on the open source Envoy proxy. This enables rich traffic control capabilities based on HTTP(S) parameters. After the load balancer has been configured, it automatically allocates Envoy proxies to meet your traffic needs.
At a high level, an internal HTTP(S) load balancer consists of:
- An internal IP address to which clients send traffic. Only clients that are located in the same region as the load balancer can access this IP address. Internal client requests stay internal to your network and region.
- One or more backend services to which the load balancer forwards traffic. Backends can be Compute Engine VMs, groups of Compute Engine VMs (through instance groups), or GKE nodes (through network endpoint groups [NEGs]). These backends must be located in the same region as the load balancer.
Two additional components are used to deliver the load balancing service:
- A URL map, which defines traffic control rules (based on Layer 7 parameters such as HTTP headers) that map to specific backend services. The load balancer evaluates incoming requests against the URL map to route traffic to backend services or perform additional actions (such as redirects).
- Health checks, which periodically check the status of backends and reduce the risk that client traffic is sent to a non-responsive backend.
For limitations specific to Internal HTTP(S) Load Balancing, see the Limitations section.
For information about how the Google Cloud load balancers differ from each other, see the following documents:
Internal HTTP(S) Load Balancing addresses many use cases. This section provides a few high-level examples. For additional examples, see traffic management use cases.
Load balancing using path-based routing
One common use case is load balancing traffic among services. In this example,
an internal client can request video and image content by using the same base URL,
mygcpservice.internal, with the paths
The internal HTTP(S) load balancer's URL map specifies that requests to path
/video should be sent to the video backend service, while requests to path
/images should be sent to the images backend service. In the following example,
the video and images backend services are served by using Compute Engine
VMs, but they can also be served by using GKE pods.
When an internal client sends a request to the load balancer's internal IP address, the load balancer evaluates the request according to this logic and sends the request to the correct backend service.
The following diagram illustrates this use case.
Modernizing legacy services
Internal HTTP(S) Load Balancing can be an effective tool for modernizing legacy applications.
One example of a legacy application is a large monolithic application that you cannot easily update. In this case, you can deploy an internal HTTP(S) load balancer in front of your legacy application. You can then use the load balancer's traffic control capabilities to direct a subset of traffic to new microservices that replace the functionality that your legacy application provides.
To begin, you would configure the load balancer's URL map to route all traffic to the legacy application by default. This maintains the existing behavior. As replacement services are developed, you would update the URL map to route portions of traffic to these replacement services.
Imagine that your legacy application contains some video processing
functionality that is served when internal clients send requests to
You could break this video service out into a separate microservice as follows:
- Add Internal HTTP(S) Load Balancing in front of your legacy application.
- Create a replacement video processing microservice.
- Update the load balancer's URL map so that all requests to path
/videoare routed to the new microservice instead of to the legacy application.
As you develop additional replacement services, you would continue to update the URL map. Over time, fewer requests would be routed to the legacy application. Eventually, replacement services would exist for all the functionality that the legacy application provided. At this point, you could retire your legacy application.
Three-tier web services
You can use Internal HTTP(S) Load Balancing to support traditional three-tier web services. The following example shows how you can use three types of Google Cloud load balancers to scale three tiers. At each tier, the load balancer type depends on your traffic type:
Web tier: Traffic enters from the internet and is load balanced by using an external HTTP(S) load balancer.
Application tier: The application tier is scaled by using a regional internal HTTP(S) load balancer.
Database tier: The database tier is scaled by using an internal TCP/UDP load balancer.
The diagram shows how traffic moves through the tiers:
- An external HTTP(S) load balancer distributes traffic from the internet to a set of web frontend instance groups in various regions.
- These frontends send the HTTP(S) traffic to a set of regional, internal HTTP(S) load balancers (the subject of this overview).
- The internal HTTP(S) load balancers distribute the traffic to middleware instance groups.
- These middleware instance groups send the traffic to internal TCP/UDP load balancers, which load balance the traffic to data storage clusters.
You can access an internal HTTP(S) load balancer in your VPC network from a connected network by using the following:
- VPC Network Peering
- Cloud VPN and Cloud Interconnect
For detailed examples, see Internal load balancing and connected networks.
Architecture and resources
The following diagram shows the Google Cloud resources required for an internal HTTP(S) load balancer.
The following resources define an internal HTTP(S) load balancer:
An internal managed forwarding rule specifies an internal IP address, port, and regional target HTTP(S) proxy. Clients use the IP address and port to connect to the load balancer's Envoy proxies.
A regional target HTTP(S) proxy receives a request from the client. The HTTP(S) proxy evaluates the request by using the URL map to make traffic routing decisions. The proxy can also authenticate communications by using SSL certificates.
The HTTP(S) proxy uses a regional URL map to make a routing determination based on HTTP attributes (such as the request path, cookies, or headers). Based on the routing decision, the proxy forwards client requests to specific regional backend services. The URL map can specify additional actions to take such as rewriting headers, sending redirects to clients, and configuring timeout policies (among others).
A regional backend service distributes requests to healthy backends (either instance groups containing Compute Engine VMs or NEGs containing GKE containers).
One or more backends must be connected to the backend service. Backends can be instance groups or NEGs in any of the following configurations:
- Managed instance groups (zonal or regional)
- Unmanaged instance groups (zonal)
- Network endpoint groups (zonal)
You cannot use instance groups and NEGs on the same backend service.
A regional health check periodically monitors the readiness of your backends. This reduces the risk that requests might be sent to backends that can't service the request.
A proxy-only subnet whose IP addresses are the source of traffic from the load balancer proxies to your backends. You must create one proxy-only subnet in each region of a VPC network in which you use internal HTTP(S) load balancers. Google manages this subnet, and all your internal HTTP(S) load balancers in the region share it. You cannot use this subnet to host your backends.
If you are using HTTPS-based load balancing, you must install one or more SSL certificates on the target HTTPS proxy.
These certificates are used by target HTTPS proxies to secure communications between the load balancer and the client.
For information about SSL certificate limits and quotas, see SSL certificates on the load balancing quotas page.
For the best security, use end-to-end encryption for your HTTPS load balancer deployment. For more information, see Encryption from the load balancer to the backends.
For general information about how Google encrypts user traffic, see the Encryption in Transit in Google Cloud white paper.
Your internal HTTP(S) load balancer requires the following firewall rules:
- An ingress allow rule to permit traffic from the health check ranges
- An ingress allow rule that permits traffic from the proxy-only subnet
Timeouts and retries
The backend service timeout is a request/response timeout for HTTP(S) traffic. This is the amount of time that the load balancer waits for a backend to return a full response to a request.
For example, if the value of the backend service timeout is the default value of 30 seconds, the backends have 30 seconds to respond to requests. The load balancer retries the HTTP GET request once if the backend closes the connection or times out before sending response headers to the load balancer. If the backend sends response headers or if the request sent to the backend is not an HTTP GET request, the load balancer does not retry. If the backend does not reply at all, the load balancer returns an HTTP 5xx response to the client. For these load balancers, change the timeout value if you want to allow more or less time for the backends to respond to requests.
Traffic types, scheme, and scope
Backend services support the HTTP, HTTPS, or HTTP/2 protocols. Clients and backends do not need to use the same request protocol. For example, clients can send requests to the load balancer by using HTTP/2, and the load balancer can forward these requests to backends by using HTTP/1.1.
Because the scope of an internal HTTP(S) load balancer is regional, not global, clients and backend VMs or endpoints must all be in the same region.
Internal HTTP(S) Load Balancing operates at a regional level.
There's no guarantee that a request from a client in one zone of the region is sent to a backend that's in the same zone as the client. Session affinity doesn't reduce communication between zones.
Internal HTTP(S) Load Balancing isn't compatible with the following features:
When creating an internal HTTP(S) load balancer in a Shared VPC host project:
- Client VMs can be located in either the host project or any connected service project. The client VMs must use the same Shared VPC network and the same region as the load balancer.
- All the load balancer's components and backends must be in the host project. This is different from other Google Cloud load balancers because none of the internal HTTP(S) load balancer components can be in a service project when the load balancer uses a Shared VPC network.
- The host project within the Shared VPC network owns and creates
the proxy-only subnet (purpose=
The WebSocket protocol is not supported.
An internal HTTP(S) load balancer supports HTTP/2 only over TLS.
Google Cloud doesn't warn you if your proxy-only subnet runs out of IP addresses.
Within each VPC network, each internal managed forwarding rule must have its own IP address. For more information, see Multiple forwarding rules with a common IP address.
The internal forwarding rule that your internal HTTP(S) load balancer uses must have exactly one port.
- To configure load balancing for your services running on Compute Engine VMs, see Setting up Internal HTTP(S) Load Balancing for Compute Engine VMs.
- To configure load balancing for your services running in GKE pods, see Container-native load balancing with standalone NEGs and the Attaching an internal HTTP(S) load balancer to standalone NEGs section.
- To manage the proxy-only subnet resource, see Proxy-only subnets for internal HTTP(S) load balancers.