Internal HTTP(S) Load Balancing concepts

Internal HTTP(S) Load Balancing is a proxy-based, regional Layer 7 load balancer that enables you to run and scale your services behind a private load balancing IP address that is accessible only in the load balancer's region in your VPC network.


Google Cloud Internal HTTP(S) Load Balancing is a proxy-based load balancer for HTTP and HTTPS traffic in your private internal network. It distributes traffic to backends hosted on Google Compute Engine and Google Kubernetes Engine. The load balancer is accessible only in the chosen region of your VPC network on a private, internal (RFC 1918) IP address.

Internal HTTP(S) Load Balancing is a managed service based on the open source Envoy proxy. This enables rich traffic control capabilities based on HTTP(S) parameters. Once the load balancer has been configured, it automatically allocates Envoy proxies to meet your traffic needs.

For information about how an internal HTTP(S) load balancer compares to other Google Cloud load balancers, refer to Choosing a load balancer.

High-level diagram

At a high level, an internal HTTP(S) load balancer consists of:

  • A private IP address to which clients send traffic. This IP address is only accessible by clients that are located in the same region as the load balancer. Internal client requests stay internal to your network and region.
  • One or more backend services to which the load balancer forwards traffic. Backends can be Google Compute Engine VMs, groups of Google Compute Engine VMs (via managed instance groups), or Google Kubernetes Engine nodes (via network endpoint groups). These backends must be located in the same region as the load balancer.
Internal services with Layer 7-based load balancing (click to enlarge)
Internal services with Layer 7-based load balancing (click to enlarge)

Two additional components are used to deliver the load balancing service:

  • A URL map, which defines traffic control rules (based on layer 7 parameters like HTTP headers) mapping to specific backend services. The load balancer evaluates incoming requests against the URL map to route traffic to backends services or perform additional actions (like redirects).
  • Health checks, which periodically check the status of backends and reduce the risk that client traffic is sent to a non-responsive backend.

Traffic types, scheme, and scope

Backend services support the HTTP, HTTPS, or HTTP/2 protocols. Note that clients and backends do not need to use the same request protocol. Clients can, for example, send requests to the load balancer using HTTP/2. The load balancer can forward these requests to backends using HTTP/1.1.

Because the scope of an internal HTTP(S) load balancer is regional, not global, clients and backend VMs or endpoints must all be in the same region. Clients in connected networks can use a Cloud VPN tunnel or Cloud Interconnect attachment in the same region as the load balancer.

Use cases

Load balancing using path-based routing

One common use case is load balancing traffic among services. In this example, an internal client can request video and image content using the same base URL, mygcpservice.internal, with paths /video and /images.

The internal HTTP(S) load balancer's URL map specifies that requests to path /video should be sent to the video backend service, while requests to path /images should be sent to the images backend service. In the example below, the video and images backend services are served using Google Compute Engine VMs, but they can be GKE pods instead.

When an internal client sends a request to the load balancer's internal IP address, the load balancer evaluates the request according to this logic and sends the request to the correct backend service.

The following diagram illustrates this use case:

Internal (micro) services with Layer 7-based load balancing (click to enlarge)
Internal (micro) services with Layer 7-based load balancing

Modernizing legacy services

Internal HTTP(S) Load Balancing can be an effective tool for modernizing legacy applications.

One example of a legacy application is a large monolithic application that you cannot easily update. In this case, you can deploy an internal HTTP(S) load balancer in front of your legacy application. You can then use the load balancer's traffic control capabilities to direct a subset of traffic to new microservices that replace functionality provided by your legacy application.

To start, you would configure the load balancer's URL map to route all traffic to the legacy application by default. This maintains the existing behavior. As replacement services are developed, you would update the URL map to route portions of traffic to these replacement services.

Imagine that your legacy application contains some video processing functionality that is served when internal clients send requests to /video. You could break this video service out into a separate microservice as follows:

  • Add Internal HTTP(S) Load Balancing in front of your legacy application
  • Create a replacement video processing microservice
  • Update the load balancer's URL map so that all requests to path /video are routed to the new microservice instead of the legacy application.

As you develop additional replacement services, you would continue to update the URL map. Over time, fewer requests would be routed to the legacy application until, eventually, replacement services exist for all of the functionality provided by the legacy application. At this point, you could retire your legacy application.

3-tier web services

You can use Internal HTTP(S) Load Balancing to support traditional 3-tier web services.

In the following diagram, three different types of load balancers scale the three tiers:

The diagram shows three types of Google Cloud load balancers.

  1. A global, external HTTP(S) load balancer distributes traffic from the Internet to a set of Web frontend instance groups in various regions.
  2. These frontends send the HTTP(S) traffic to a set of regional, internal HTTP(S) load balancers (the subject of this overview).
  3. The HTTP(S) load balancers distribute the traffic to middleware instance groups.
  4. These middleware instances send the traffic to internal TCP/UDP load balancers, which load balance the traffic to data storage clusters.
Layer 7-based routing for internal tiers in multi-tier app (click to enlarge)
Layer 7-based routing for internal tiers in multi-tier app

Architecture and resources

The following diagram shows the Google Cloud resources required for an internal HTTP(S) load balancer.

Internal HTTP(S) Load Balancing components (click to enlarge)
Internal HTTP(S) Load Balancing components

The following resources define an internal HTTP(S) load balancer:

  • An internal managed forwarding rule specifies an internal IP address, port, and regional target HTTP(S) proxy. Clients use the IP address and port to connect to the load balancer's Envoy proxies.

  • A regional target HTTP(S) proxy receives a request from the client. The HTTP(S) proxy evaluates the request using the URL map to make traffic routing decisions. The proxy can also authenticate communications using SSL certificates.

  • If you are using internal HTTPS load balancing, the HTTP(S) proxy uses regional SSL certificates to prove its identity to clients. Note that a target HTTP(S) proxy supports up to a documented number of SSL certificates.

  • The HTTP(S) proxy uses a regional URL map to make a routing determination based on HTTP attributes (like the request path, cookies, or headers). Based on the routing decision, the proxy forwards client requests to specific regional backend services. The URL map can specify additional actions to take, for example, rewriting headers, sending redirects to clients, and timeout policies (among others).

  • A regional backend service distributes requests to healthy backends (either instance groups containing Compute Engine VMs or NEGs containing Kubernetes Engine containers).

  • One or more backends must be connected to the backend service. Backends can be instance groups or NEGs in any of the following configurations:

    • Managed instance groups (zonal or regional)
    • Unmanaged instance groups (zonal)
    • Network endpoint groups (zonal)

    You cannot use instance groups and NEGs on the same backend service.

  • A regional health check periodically monitors the readiness of your backends. This reduces the risk that requests might be sent to backends that can't service the request.

  • A proxy-only subnet whose IP addresses are the source of traffic from the load balancer proxies to your backends. You must create one proxy-only subnet in each region of a VPC network in which you use internal HTTP(S) load balancers. This subnet is managed by Google and shared by all of your internal HTTP(S) load balancers in the region. You cannot use this subnet to host your backends.

  • A firewall for your backends to accept connections from the proxy-only subnet. Refer to the example in Configuring firewall rules.


  • Internal HTTP(S) Load Balancing operates at a regional level.

  • There's no guarantee that a request from a client in one zone of the region is sent to a backend that's in the same zone as the client. Session affinity doesn't reduce communication between zones.

  • Internal HTTP(S) Load Balancing isn't compatible with the following features:

  • Internal HTTP(S) Load Balancing isn't compatible with VPC Network Peering. If you need an internal load balancer that is compatible with VPC Network Peering, use Internal TCP/UDP Load Balancing.

  • The WebSocket protocol is not supported.

  • Google Cloud doesn't warn you if your proxy-only subnet runs out of IP addresses.

  • Within each VPC network, each internal managed forwarding rule must have its own IP address. For more information, see Multiple forwarding rules with a common IP address.

  • The internal forwarding rule used by your internal HTTP(S) load balancer must have exactly one port.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Load Balancing