Backend services overview

Stay organized with collections Save and categorize content based on your preferences.

A backend service defines how Cloud Load Balancing distributes traffic. The backend service configuration contains a set of values, such as the protocol used to connect to backends, various distribution and session settings, health checks, and timeouts. These settings provide fine-grained control over how your load balancer behaves. If you need to get started quickly, most of the settings have default values that allow for easy configuration. A backend service is either global or regional in scope.

Load balancers, Envoy proxies, and proxyless gRPC clients use the configuration information in the backend service resource to do the following:

  • Direct traffic to the correct backends, which are instance groups or network endpoint groups (NEGs).
  • Distribute traffic according to a balancing mode, which is a setting for each backend.
  • Determine which health check is monitoring the health of the backends.
  • Specify session affinity.
  • Determine whether other services are enabled, including the following services that are only available for certain load balancers:
    • Cloud CDN
    • Google Cloud Armor security policies
    • Identity-Aware Proxy

You set these values when you create a backend service or add a backend to the backend service.

The following table summarizes which load balancers use backend services. The product that you are using also determines the maximum number of backend services, the scope of a backend service, the type of backends supported, and the backend service's load balancing scheme. The load balancing scheme is an identifier that Google uses to classify forwarding rules and backend services. Each load balancing product uses one load balancing scheme for its forwarding rules and backend services. Some schemes are shared among products.

Table: Backend services and supported backend types
Product Maximum number of backend services Scope of backend service Supported backend types Load balancing scheme
Global external HTTP(S) load balancer Multiple Global Each backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT type zonal NEGs
  • All hybrid connectivity NEGs: One or more NON_GCP_PRIVATE_IP_PORT type NEGs
  • A combination of zonal and hybrid NEGs: GCE_VM_IP_PORT and NON_GCP_PRIVATE_IP_PORT type NEGs 2
  • All serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services
  • Private Service Connect NEGs: If more than one NEG is specified, the NEGs must be in different regions
EXTERNAL_MANAGED
Global external HTTP(S) load balancer (classic) Multiple Global1 Each backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT type zonal NEGs
  • All hybrid connectivity NEGs: One or more NON_GCP_PRIVATE_IP_PORT type NEGs
  • A combination of zonal and hybrid NEGs: GCE_VM_IP_PORT and NON_GCP_PRIVATE_IP_PORT type NEGs 2
  • All serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services, or
  • One internet NEG for an external backend
EXTERNAL
Regional external HTTP(S) load balancer Multiple Regional Each backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT type zonal NEGs
  • A single Private Service Connect NEG
  • All hybrid connectivity NEGs: One or more NON_GCP_PRIVATE_IP_PORT type NEGs
  • A combination of zonal and hybrid NEGs: GCE_VM_IP_PORT and NON_GCP_PRIVATE_IP_PORT type NEGs 2
  • A single Private Service Connect NEG
EXTERNAL_MANAGED
Internal HTTP(S) load balancer Multiple Regional Each backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT type zonal NEGs
  • All hybrid connectivity NEGs: One or more NON_GCP_PRIVATE_IP_PORT type NEGs
  • A combination of zonal and hybrid NEGs: GCE_VM_IP_PORT and NON_GCP_PRIVATE_IP_PORT type NEGs 2
  • A single Private Service Connect NEG
INTERNAL_MANAGED
External SSL proxy load balancer 1 Global1 The backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT type zonal NEGs
  • All hybrid connectivity NEGs: One or more NON_GCP_PRIVATE_IP_PORT type NEGs
  • A combination of zonal and hybrid NEGs: GCE_VM_IP_PORT and NON_GCP_PRIVATE_IP_PORT type NEGs 2
  • One internet NEG for an external backend
EXTERNAL
External TCP proxy load balancer 1 Global1 The backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT type zonal NEGs
  • All hybrid connectivity NEGs: One or more NON_GCP_PRIVATE_IP_PORT type NEGs
  • A combination of zonal and hybrid NEGs: GCE_VM_IP_PORT and NON_GCP_PRIVATE_IP_PORT type NEGs 2
  • One internet NEG for an external backend
EXTERNAL
Internal regional TCP proxy load balancer (Preview) 1 Regional The backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT type zonal NEGs
  • All hybrid connectivity NEGs: One or more NON_GCP_PRIVATE_IP_PORT type NEGs
  • A combination of zonal and hybrid NEGs: GCE_VM_IP_PORT and NON_GCP_PRIVATE_IP_PORT type NEGs
INTERNAL_MANAGED
Network load balancer 1 Regional The backend service supports the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
EXTERNAL
Internal TCP/UDP load balancer 1 Regional, but configurable to be globally accessible The backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP type zonal NEGs
INTERNAL
Traffic Director Multiple Global Each backend service supports one of the following backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more GCE_VM_IP_PORT or NON_GCP_PRIVATE_IP_PORT type zonal NEGs
  • One internet NEG of type INTERNET_FQDN_PORT
  • One or more service bindings
INTERNAL_SELF_MANAGED
1 Backend services used by the global external HTTP(S) load balancer (classic), external SSL proxy load balancers, and external TCP proxy load balancers are always global in scope, in either Standard or Premium Network Tier. However, in Standard Tier the following restrictions apply:
2 For GKE deployments, mixed NEG backends are only supported with standalone NEGs.

Backends

A backend is one or more endpoints that receive traffic from a Google Cloud load balancer, a Traffic Director-configured Envoy proxy, or a proxyless gRPC client. There are several types of backends:

You cannot delete a backend instance group or NEG that is associated with a backend service. Before you delete an instance group or NEG, you must first remove it as a backend from all backend services that reference it.

Instance groups

This section discusses how instance groups work with the backend service.

Backend VMs and external IP addresses

Backend VMs in backend services do not need external IP addresses:

  • For global external HTTP(S) load balancers, external SSL proxy load balancers, and external TCP proxy load balancers: Clients communicate with a Google Front End (GFE) which hosts your load balancer's external IP address. GFEs communicate with backend VMs or endpoints by sending packets to an internal address created by joining an identifier for the backend's VPC network with the internal IPv4 address of the backend. For instance group backends, the internal IPv4 address is always the primary internal IPv4 address that corresponds to the nic0 interface of the VM. For GCE_VM_IP_PORT endpoints in a zonal NEG, you can specify the endpoint's IPv4 address as either the primary IPv4 address associated with any NIC of a VM or any IPv4 address from an alias IP range associated with any NIC of a VM. Communication between GFEs and backend VMs or endpoints is facilitated through special routes.
  • For regional external HTTP(S) load balancers: Clients communicate with an Envoy proxy which hosts your load balancer's external IP address. Envoy proxies communicate with backend VMs or endpoints by sending packets to an internal address created by joining an identifier for the backend's VPC network with the internal IPv4 address of the backend.
    • For instance group backends, the internal IPv4 address is always the primary internal IPv4 address that corresponds to the nic0 interface of the VM, and nic0 must be in the same network as the load balancer.
    • For GCE_VM_IP_PORT endpoints in a zonal NEG, you can specify the endpoint's IPv4 address as either the primary IPv4 address of a VM's NIC or an IPv4 address from an alias IP range of a VM's NIC, as long as the VM's NIC is in the same network as the load balancer.
  • For network load balancers: Clients communicate directly with backends by way of Google's Maglev pass-through load balancing infrastructure. Packets are routed and delivered to the nic0 interface of a backend VM with the original source and destination IP addresses preserved. Backends respond to clients using direct server return. The methods used to select a backend and to track connections are configurable.

Named ports

The backend service's named port attribute is only applicable to proxy load balancers using instance group backends. The named port defines the destination port used for the TCP connection between the proxy (GFE or Envoy) and the backend instance.

Named ports are configured as follows:

  • On each instance group backend, you must configure one or more named ports using key/value pairs. The key represents a meaningful port name that you choose, and the value represents the port number you assign to the name. The mapping of names to numbers is done individually for each instance group backend.

  • On the backend service, you specify a single named port using just the port name (--port-name).

On a per-instance group backend basis, the backend service translates the port name to a port number. When an instance group's named port matches the backend service's --port-name, the backend service uses this port number for communication with the instance group's VMs.

For example, you might set the named port on an instance group with the name my-service-name and the port 8888:

gcloud compute instance-groups unmanaged set-named-ports my-unmanaged-ig \
    --named-ports=my-service-name:8888

Then you refer to the named port in the backend service configuration with the --port-name on the backend service set to my-service-name:

gcloud compute backend-services update my-backend-service \
    --port-name=my-service-name

A backend service can use a different port number when communicating with VMs in different instance groups if each instance group specifies a different port number for the same port name.

The resolved port number used by the proxy load balancer's backend service doesn't need to match the port number used by the load balancer's forwarding rules. A proxy load balancer listens for TCP connections sent to the IP address and destination port of its forwarding rules. Because the proxy opens a second TCP connection to its backends, the second TCP connection's destination port can be different.

Named ports are only applicable to instance group backends. Zonal NEGs with GCE_VM_IP_PORT endpoints, hybrid NEGs with NON_GCP_PRIVATE_IP_PORT endpoints, and internet NEGs define ports using a different mechanism, namely, on the endpoints themselves. Serverless NEGs reference Google services and PSC NEGs reference service attachments using abstractions that do not involve specifying a destination port.

Internal TCP/UDP load balancers and external TCP/UDP network load balancers don't use named ports. This is because they are pass-through load balancers that route connections directly to backends instead of creating new connections. Packets are delivered to the backends preserving the destination IP address and port of the load balancer's forwarding rule.

To learn how to create named ports, see the following instructions:

Restrictions and guidance for instance groups

Keep the following restrictions and guidance in mind when you create instance groups for your load balancers:

  • Do not put a VM in more than one load-balanced instance group. If a VM is a member of two or more unmanaged instance groups, or a member of one managed instance group and one or more unmanaged instance groups, Google Cloud limits you to only using one of those instance groups at a time as a backend for a particular backend service.

    If you need a VM to participate in multiple load balancers, you must use the same instance group as a backend on each of the backend services.

  • For proxy load balancers, when you want to balance traffic to different ports, specify the required named ports on one instance group and have each backend service subscribe to a unique named port .

  • You can use the same instance group as a backend for more than one backend service. In this situation, the backends must use compatible balancing modes. Compatible means that the balancing modes must be the same, or they must be a combination of CONNECTION and RATE.

    Incompatible balancing mode combinations are as follows:

    • CONNECTION with UTILIZATION
    • RATE with UTILIZATION

    Consider the following example:

    • You have two backend services: external-https-backend-service for an external HTTP(S) load balancer and internal-tcp-backend-service for an internal TCP/UDP load balancer.
    • You're using an instance group called instance-group-a in internal-tcp-backend-service.
    • In internal-tcp-backend-service, you must apply the CONNECTION balancing mode because internal TCP/UDP load balancers only support the CONNECTION balancing mode.
    • You can also use instance-group-a in external-https-backend-service if you apply the RATE balancing mode in external-https-backend-service.
    • You cannot also use instance-group-a in external-https-backend-service with the UTILIZATION balancing mode.
  • To change the balancing mode for an instance group serving as a backend for multiple backend services:

    • Remove the instance group from all backend services except for one.
    • Change the balancing mode for the backend on the one remaining backend service.
    • Re-add the instance group as a backend to the remaining backend services, if they support the new balancing mode.
  • If your instance group is associated with several backend services, each backend service can reference the same named port or a different named port on the instance group.

  • We recommend not adding an autoscaled managed instance group to more than one backend service. Doing so might cause unpredictable and unnecessary scaling of instances in the group, especially if you use the HTTP Load Balancing Utilization autoscaling metric.

    • While not recommended, this scenario might work if the autoscaling metric is either CPU Utilization or a Cloud Monitoring Metric that is unrelated to the load balancer's serving capacity. Using one of these autoscaling metrics might prevent erratic scaling.

Zonal network endpoint groups

Network endpoints represent services by their IP address or an IP address/port combination, rather than referring to a VM in an instance group. A network endpoint group (NEG) is a logical grouping of network endpoints.

Zonal network endpoint groups (NEGs) are zonal resources that represent collections of either IP addresses or IP address/port combinations for Google Cloud resources within a single subnet.

A backend service that uses zonal NEGs as its backends distributes traffic among applications or containers running within VMs.

There are two types of network endpoints available for zonal NEGs:

  • GCE_VM_IP endpoints (supported only with internal TCP/UDP load balancers).
  • GCE_VM_IP_PORT endpoints.

To see which products support zonal NEG backends, see Table: Backend services and supported backend types.

For details, see Zonal NEGs overview.

Internet network endpoint groups

Internet NEGs are global resources that define external backends. An external backend is a backend that is hosted within on-premises infrastructure or on infrastructure provided by third parties.

An internet NEG is a combination of an IP address or hostname, plus an optional port:

  • A publicly resolvable fully qualified domain name and an optional port, for example backend.example.com:443 (default ports: 80 for HTTP and 443 for HTTPS).
  • A publicly accessible IP address and an optional port, for example 203.0.113.8:80 or 203.0.113.8:443 (default ports: 80 for HTTP and 443 for HTTPS)

To see which products support internet NEG backends, see Table: Backend services and supported backend types.

For more information about Internet NEGs, see Internet network endpoint group overview.

Serverless network endpoint groups

A network endpoint group (NEG) specifies a group of backend endpoints for a load balancer. A serverless NEG is a backend that points to a Cloud Run, App Engine, Cloud Functions, or API Gateway service.

A serverless NEG can represent one of the following:

  • A Cloud Run service or a group of services.
  • A Cloud Functions function or a group of functions.
  • An App Engine app (Standard or Flex), a specific service within an app, a specific version of an app, or a group of services.
  • An API Gateway that provides access to your services through a REST API consistent across all services, regardless of service implementation. This capability is in Preview.

To set up a serverless NEG for serverless applications that share a URL pattern, you use a URL mask. A URL mask is a template of your URL schema (for example, example.com/<service>). The serverless NEG will use this template to extract the <service> name from the incoming request's URL and route the request to the matching Cloud Run, Cloud Functions, or App Engine service with the same name.

To see which load balancers support serverless NEG backends, see Table: Backend services and supported backend types.

For more information about serverless NEGs, see the Serverless network endpoint groups overview.

Service bindings

A service binding is a backend that establishes a connection between a backend service in Traffic Director and a service registered in Service Directory. A backend service can reference several service bindings. A backend service with a service binding cannot reference any other type of backend.

Mixed backends

The following usage considerations apply when you add different types of backends to a single backend service:

  • A single backend service cannot simultaneously use both instance groups and zonal NEGs.
  • You can use a combination of different types of instance groups on the same backend service. For example, a single backend service can reference a combination of both managed and unmanaged instance groups. For complete information about which backends are compatible with which backend services, see the table in the previous section.
  • With certain proxy load balancers, you can use a combination of zonal NEGs (with GCE_VM_IP_PORT endpoints) and hybrid connectivity NEGs (with NON_GCP_PRIVATE_IP_PORT endpoints) to configure hybrid load balancing. To see which load balancers have this capability, refer Table: Backend services and supported backend types.

Protocol to the backends

When you create a backend service, you must specify the protocol used to communicate with the backends. You can specify only one protocol per backend service — you cannot specify a secondary protocol to use as a fallback.

Which protocols are valid depends on the type of load balancer or whether you are using Traffic Director.

Table: Protocol to the backends
Product Load balancing scheme Backend service protocol options
Global external HTTP(S) load balancer EXTERNAL_MANAGED HTTP, HTTPS, HTTP/2
Global external HTTP(S) load balancer (classic) EXTERNAL HTTP, HTTPS, HTTP/2
Regional external HTTP(S) load balancer (Preview) EXTERNAL_MANAGED HTTP, HTTPS, HTTP/2
External SSL proxy load balancer EXTERNAL SSL or TCP
External TCP proxy load balancer EXTERNAL TCP or SSL
Internal HTTP(S) load balancer INTERNAL_MANAGED HTTP, HTTPS, HTTP/2
Internal regional TCP proxy load balancer (Preview) INTERNAL_MANAGED TCP
Network load balancer EXTERNAL TCP, UDP, or UNSPECIFIED
Internal TCP/UDP load balancer INTERNAL TCP or UDP
Traffic Director INTERNAL_SELF_MANAGED HTTP, HTTPS, HTTP/2, gRPC, TCP

Changing a backend service's protocol makes the backends inaccessible through load balancers for a few minutes.

Encryption between the load balancer and backends

For information about this topic, see Encryption to the backends.

Traffic distribution

The values of the following fields in the backend services resource determine some aspects of the backend's behavior:

  • A balancing mode defines how the load balancer measures backend readiness for new requests or connections.
  • A target capacity defines a target maximum number of connections, a target maximum rate, or target maximum CPU utilization.
  • A capacity scaler adjusts overall available capacity without modifying the target capacity.

Balancing mode

The balancing mode determines whether the backends of a load balancer or Traffic Director can handle additional traffic or are fully loaded. Google Cloud has three balancing modes:

  • CONNECTION: Determines how the load is spread based on the number of concurrent connections that the backend can handle.
  • RATE: The target maximum number of requests (queries) per second (RPS, QPS). The target maximum RPS/QPS can be exceeded if all backends are at or above capacity.
  • UTILIZATION: Determines how the load is spread based on the utilization of instances in an instance group.

Balancing modes available for each load balancer

You set the balancing mode when you add a backend to the backend service. The balancing modes available to a load balancer depend on the type of load balancer and the type of backends.

Pass-through load balancers require the CONNECTION balancing mode but don't support setting any target capacity.

The HTTP(S) proxy load balancers support either RATE or UTILIZATION balancing modes for instance group backends, RATE balancing mode for zonal NEGs with GCE_VM_IP_PORT endpoints, and RATE balancing mode for hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints). For any other type of supported backend, balancing mode must be omitted.

  • For the global external HTTP(S) load balancer (classic), a region is selected based on the location of the client and whether the region has available capacity, based on the load balancing mode's target capacity. Then, within a region, the balancing mode's target capacity is used to compute proportions for how many requests should go to each backend in the region. Requests or connections are then distributed in a round robin fashion among instances or endpoints within the backend.
  • For global external HTTP(S) load balancers, a region is selected based on the location of the client and whether the region has available capacity, based on the load balancing mode's target capacity. Within a region, the balancing mode's target capacity is used to compute proportions for how many requests should go to each backend (instance group or NEG) in the region. Within each instance group or NEG, the load balancing policy (LocalityLbPolicy) determines how traffic is distributed to instances or endpoints within the group.
  • For regional external HTTP(S) load balancers, and internal HTTP(S) load balancers, the balancing mode's target capacity is used to compute proportions for how many requests should go to each backend (instance group or NEG) in the region. Within each instance group or NEG, the load balancing policy (LocalityLbPolicy) determines how traffic is distributed to instances or endpoints within the group.

The TCP/SSL proxy load balancers support either CONNECTION or UTILIZATION balancing modes for instance group backends, CONNECTION balancing mode for zonal NEGs with GCE_VM_IP_PORT endpoints, and CONNECTION balancing mode for hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints). For any other type of supported backend, balancing mode must be omitted.

  • For the external TCP proxy load balancer and the external SSL proxy load balancer, a region is selected based on the location of the client and whether the region has available capacity based on the load balancing mode's target capacity. Then, within a region, the load balancing mode's target capacity is used to compute proportions for how many requests or connections should go to each backend (instance group or NEG) in the region. After the load balancer has selected a backend, requests or connections are then distributed in a round robin fashion among VM instances or network endpoints within each individual backend.

  • For the internal regional TCP proxy load balancer, the load balancing mode's target capacity is used to compute proportions for how many requests should go to each backend (instance group or NEG). Within each instance group or NEG, the load balancing policy (LocalityLbPolicy) determines how traffic is distributed to instances or endpoints within the group.

The following table summarizes the load balancing modes available for each load balancer and backend combination.

Table: Balancing modes available for each load balancer
Load balancer Backends Balancing modes available
  • Global external HTTP(S) load balancer
  • Global external HTTP(S) load balancer (classic)
  • Regional external HTTP(S) load balancer (Preview)
Instance groups RATE or UTILIZATION
Zonal NEGs (GCE_VM_IP_PORT endpoints) RATE
Hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints) RATE
  • External TCP proxy load balancer
  • External SSL proxy load balancer
  • Internal regional TCP proxy load balancer (Preview)
Instance groups CONNECTION or UTILIZATION
Zonal NEGs (GCE_VM_IP_PORT endpoints) CONNECTION
Hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints)
(supported by the internal regional TCP proxy load balancer only)
CONNECTION
Network load balancer Instance groups CONNECTION
Internal TCP/UDP load balancer Instance groups CONNECTION
Zonal NEGs (GCE_VM_IP endpoints) CONNECTION

If the average utilization of all VMs that are associated with a backend service is less than 10%, Google Cloud might prefer specific zones. This can happen when you use managed regional instance groups, managed zonal instance groups in different zones, and unmanaged zonal instance groups. This zonal imbalance automatically resolves as more traffic is sent to the load balancer.

For more information, see gcloud compute backend-services add-backend.

Target capacity

Each balancing mode has a corresponding target capacity, which defines one of the following target maximums:

  • Number of connections
  • Rate
  • CPU utilization

For every balancing mode, the target capacity is not a circuit breaker. A load balancer can exceed the maximum under certain conditions, for example, if all backend VMs or endpoints have reached the maximum.

Connection balancing mode

For CONNECTION balancing mode, the target capacity defines a target maximum number of concurrent connections. Except for internal TCP/UDP load balancers and network load balancers, you must use one of the following settings to specify a target maximum number of connections:

  • max-connections-per-instance (per VM): Target average number of connections for a single VM.
  • max-connections-per-endpoint (per endpoint in a zonal NEG): Target average number of connections for a single endpoint.
  • max-connections (per zonal NEGs and for zonal instance groups): Target average number of connections for the whole NEG or instance group. For regional managed instance groups, use max-connections-per-instance instead.

The following table shows how the target capacity parameter defines the following:

  • The target capacity for the whole backend
  • The expected target capacity for each instance or endpoint
Table: Target capacity for backends using the CONNECTION balancing mode
Backend type Target capacity
If you specify Whole backend capacity Expected per instance or per endpoint capacity
Instance group
N instances,
H healthy
max-connections-per-instance=X X × N (X × N)/H
Zonal NEG
N endpoints,
H healthy
max-connections-per-endpoint=X X × N (X × N)/H
Instance groups
(except regional managed instance groups)

H healthy instances
max-connections=Y Y Y/H

As illustrated, the max-connections-per-instance and max-connections-per-endpoint settings are proxies for calculating a target maximum number of connections for the whole instance group or whole zonal NEG:

  • In an instance group with N instances, setting max-connections-per-instance=X has the same meaning as setting max-connections=X × N.
  • In a zonal NEG with N endpoints, setting max-connections-per-endpoint=X has the same meaning as setting max-connections=X × N.

Rate balancing mode

For the RATE balancing mode, you must define the target capacity using one of the following parameters:

  • max-rate-per-instance (per VM): Provide a target average HTTP request rate for a single VM.
  • max-rate-per-endpoint (per endpoint in a zonal NEG): Provide a target average HTTP request rate for a single endpoint.
  • max-rate (per zonal NEGs and for zonal instance groups): Provide a target average HTTP request rate for the whole NEG or instance group. For regional managed instance groups, use max-rate-per-instance instead.

The following table shows how the target capacity parameter defines the following:

  • The target capacity for the whole backend
  • The expected target capacity for each instance or endpoint
Table: Target capacity for backends using the RATE balancing mode
Backend type Target capacity
If you specify Whole backend capacity Expected per instance or per endpoint capacity
Instance group
N instances,
H healthy
max-rate-per-instance=X X × N (X × N)/H
zonal NEG
N endpoints,
H healthy
max-rate-per-endpoint=X X × N (X × N)/H
Instance groups
(except regional managed instance groups)

H healthy instances
max-rate=Y Y Y/H

As illustrated, the max-rate-per-instance and max-rate-per-endpoint settings are proxies for calculating a target maximum rate of HTTP requests for the whole instance group or whole zonal NEG:

  • In an instance group with N instances, setting max-rate-per-instance=X has the same meaning as setting max-rate=X × N.
  • In a zonal NEG with N endpoints, setting max-rate-per-endpoint=X has the same meaning as setting max-rate=X × N.

Utilization balancing mode

The UTILIZATION balancing mode has no mandatory target capacity. You have a number of options that depend on the type of backend, as summarized in the table in the following section.

The max-utilization target capacity can only be specified per instance group and cannot be applied to a particular VM in the group.

The UTILIZATION balancing mode has no mandatory target capacity. When you use the Google Cloud console to add a backend instance group to a backend service, the Google Cloud console sets the value of max-utilization to 0.8 (80%) if the UTILIZATION balancing mode is selected. In addition to max-utilization, the UTILIZATION balancing mode supports more complex target capacities, as summarized in the table in the following section.

Changing the balancing mode of a load balancer

For some load balancers or load balancer configurations, you cannot change the balancing mode because the backend service has only one possible balancing mode. For others, depending on the backend used, you can change the balancing mode because more than one mode is available to those backend services.

To see which balancing modes are supported for each load balancer, refer the Table: Balancing modes available for each load balancer

Balancing modes and target capacity settings

This table summarizes all possible balancing modes for a given load balancer and type of backend. It also shows the available or required capacity settings that you must specify with the balancing mode.

Table: Target capacity specification for balancing modes
Load balancer Type of backend Balancing mode Target capacity
  • Global external HTTP(S) load balancer
  • Global external HTTP(S) load balancer (classic)
  • Regional external HTTP(S) load balancer
  • Internal HTTP(S) load balancer
  • Traffic Director
Instance group RATE You must specify one of the following:
  • max-rate per zonal instance group
  • max-rate-per-instance
     (zonal or regional instance groups)
UTILIZATION You can optionally specify one of the following:
  • (1) max-utilization
  • (2) max-rate per zonal instance group
  • (3) max-rate-per-instance
     (zonal or regional instance groups)
  • (1) and (2) together
  • (1) and (3) together
Zonal NEG (GCP_VM_IP_PORT) RATE You must specify one of the following:
  • max-rate per zonal NEG
  • max-rate-per-endpoint
Hybrid NEG (NON_GCP_PRIVATE_IP_PORT) RATE You must specify one of the following:
  • max-rate per hybrid NEG
  • max-rate-per-endpoint
  • External SSL proxy load balancer
  • External TCP proxy load balancer
  • Internal regional TCP proxy load balancer (Preview)
Instance group CONNECTION You must specify one of the following:
  • max-connections per zonal instance group
  • max-connections-per-instance  (zonal or regional instance groups)
UTILIZATION You can optionally specify one of the following:
  • (1) max-utilization
  • (2) max-connections per zonal instance group
  • (3) max-connections-per-instance
     (zonal or regional instance groups)
  • (1) and (2) together
  • (1) and (3) together
Zonal NEG (GCP_VM_IP_PORT) CONNECTION You must specify one of the following:
  • max-connections per zonal NEG
  • max-connections-per-endpoint
Hybrid NEG (NON_GCP_PRIVATE_IP_PORT)
(supported with the internal regional TCP proxy load balancer)
CONNECTION You must specify one of the following:
  • max-connections per hybrid NEG
  • max-connections-per-endpoint
Internal TCP/UDP load balancer Instance group CONNECTION You cannot specify a target maximum number of connections.
Zonal NEGs (GCP_VM_IP) CONNECTION You cannot specify a target maximum number of connections.
External TCP/UDP Network load balancer Instance group CONNECTION You cannot specify a target maximum number of connections.

Capacity scaler

For certain proxy load balancer configurations, you can adjust the capacity scaler to scale the effective target capacity (effective target utilization, effective target rate, or effective target connections) without explicitly changing one of the --max-* parameters.

By default, the value of the capacity scaler is 1.0 (100%). You can set the capacity scaler to either of these values:

  • exactly 0.0, which will prevent all new connections
  • a value between 0.1 (10%) and 1.0 (100%)

The following examples demonstrate how the capacity scaler works in conjunction with the target capacity setting.

  • If the balancing mode is RATE, the max-rate is set to 80 RPS, and the capacity scaler is 1.0, the effective target capacity is also 80 RPS.

  • If the balancing mode is RATE, the max utilization is set to 80 RPS, and the capacity scaler is 0.5, the effective target capacity is 40 RPS (0.5 times 80).

  • If the balancing mode is RATE, the max utilization is set to 80 RPS, and the capacity scaler is 0.0, the effective target capacity is zero. A capacity scaler of zero will take the backend out of rotation.

Traffic Director and traffic distribution

Traffic Director also uses backend service resources. Specifically, Traffic Director uses backend services whose load balancing scheme is INTERNAL_SELF_MANAGED. For an internal self-managed backend service, traffic distribution is based on the combination of a load balancing mode and a load balancing policy. The backend service directs traffic to a backend according to the backend's balancing mode. Then Traffic Director distributes traffic according to a load balancing policy.

Internal self-managed backend services support the following balancing modes:

  • UTILIZATION, if all the backends are instance groups
  • RATE, if all the backends are either instance groups or zonal NEGs

If you choose RATE balancing mode, you must specify a maximum rate, maximum rate per instance, or maximum rate per endpoint.

For more information about Traffic Director, see Traffic Director concepts.

Backend subsetting

Backend subsetting is an optional feature that improves performance and scalability by assigning a subset of backends to each of the proxy instances.

Backend subsetting is supported for the following:

  • Internal HTTP(S) Load Balancing
  • Internal TCP/UDP Load Balancing

Backend subsetting for internal HTTP(S) load balancer

For internal HTTP(S) load balancers, backend subsetting automatically assigns only a subset of the backends within the regional backend service to each proxy instance.

By default, each internal HTTP(S) load balancer proxy instance opens connections to all the backends within a backend service. When the number of proxy instances and the backends are both large opening connections to all the backends can lead to performance issues. By enabling subsetting, each proxy only opens connections to a subset of the backends, reducing the number of connections which are kept open to each backend. Reducing the number of simultaneously open connections to each backend can improve performance for both the backends and the proxies.

The following diagram shows a load balancer with two proxies. Without backend subsetting, traffic from both proxies is distributed to all the backends in the backend service 1. With backend subsetting enabled, traffic from each proxy is distributed to a subset of the backends. Traffic from proxy 1 is distributed to backends 1 and 2, and traffic from proxy 2 is distributed to backends 3 and 4.

Comparing internal https load balancer without and with backend subsetting.
Comparing internal HTTP(S) load balancer without and with backend subsetting.

You can additionally refine the load balancing traffic to the backends by setting the localityLbPolicy policy. For more information, see Traffic policies.

To read about setting up backend subsetting for internal HTTP(S) load balancers, see Configure backend subsetting.

Caveats related to backend subsetting for internal HTTP(S) load balancer
  • Although backend subsetting is designed to ensure that all backend instances remain well utilized, it can introduce some bias in the amount of traffic that each backend receives. Setting the localityLbPolicy to LEAST_REQUEST is recommended for backend services that are sensitive to the balance of backend load.
  • Enabling and then disabling subsetting breaks existing connections.
  • Backend subsetting requires that the session affinity is NONE (a 5-tuple hash). Other session affinity options can only be used if backend subsetting is disabled. The default values of the --subsetting-policy and --session-affinity flags are both NONE, and only one of them at a time can be set to a different value.

Backend subsetting for internal TCP/UDP load balancer

Backend subsetting for internal TCP/UDP load balancers lets you scale your internal TCP/UDP load balancer to support a larger number of backend VM instances per internal backend service.

For information about how subsetting affects this limit, see the "Backend services" section of Load balancing resource quotas and limits.

By default, subsetting is disabled, which limits the backend service to distributing to up to 250 backend instances or endpoints. If your backend service needs to support more than 250 backends, you can enable subsetting. When subsetting is enabled, a subset of backend instances is selected for each client connection.

The following diagram shows a scaled-down model of the difference between these two modes of operation.

Comparing an internal TCP/UDP load balancer without and with subsetting (click to enlarge)
Comparing an internal TCP/UDP load balancer without and with subsetting (click to enlarge)

Without subsetting, the complete set of healthy backends is better utilized, and new client connections are distributed among all healthy backends according to traffic distribution. Subsetting imposes load balancing restrictions but allows the load balancer to support more than 250 backends.

For configuration instructions, see Subsetting.

Caveats related to backend subsetting for internal TCP/UDP load balancer
  • When subsetting is enabled, not all backends will receive traffic from a given sender even when the number of backends is small.
  • For the maximum number of backend instances when subsetting is enabled, see the quotas page .
  • Only 5-tuple session affinity is supported with subsetting.
  • Packet mirroring is not supported with subsetting.
  • Enabling and then disabling subsetting breaks existing connections.
  • If on-premises clients need for to access an internal TCP/UDP load balancer, subsetting can substantially reduce the number of backends that receive connections from your on-premises clients. This is because the region of the Cloud VPN tunnel or Cloud Interconnect VLAN attachment determines the subset of the load balancer's backends. All Cloud VPN and Cloud Interconnect endpoints in a specific region use the same subset. Different subsets are used in different regions.

Backend subsetting pricing

There is no charge for using backend subsetting. For more information, see All networking pricing.

Session affinity

Session affinity allows you to control how the load balancer selects backends for new connections in a predictable way as long as the number of healthy backends remains constant. This is useful for applications that need multiple requests from a given user to be directed to the same backend or endpoint. Such applications usually include stateful servers used by ads serving, games, or services with heavy internal caching.

Google Cloud load balancers provide session affinity on a best-effort basis. Factors such as changing backend health check states, adding or removing backends, or changes to backend fullness, as measured by the balancing mode, can break session affinity.

Load balance with session affinity works well when there is a reasonably large distribution of unique connections. Reasonably large means at least several times the number of backends. Testing a load balancer with a small number of connections will not result in an accurate representation of the distribution of client connections among backends.

By default, all Google Cloud load balancers select backends by using a five-tuple hash (--session-affinity=NONE), as follows:

  • Packet's source IP address
  • Packet's source port (if present in the packet's header)
  • Packet's destination IP address
  • Packet's destination port (if present in the packet's header)
  • Packet's protocol

For pass-through load balancers, new connections are distributed to healthy backend instances or endpoints (in the active pool, if a failover policy is configured). You can control the following:

For proxy-based load balancers, as long as the number of healthy backend instances or endpoints remains constant, and as long as the previously-selected backend instance or endpoint is not at capacity, subsequent requests or connections go to the same backend VM or endpoint. The target capacity of the balancing mode determines when the backend is at capacity.

The following table shows the session affinity options supported for each product:

Table: Supported session affinity settings
Product Session affinity options
  • None (NONE)
  • Client IP (CLIENT_IP)
  • Generated cookie (GENERATED_COOKIE)
  • Header field (HEADER_FIELD)
  • HTTP cookie (HTTP_COOKIE)

Also note:

  • Session affinity settings are fulfilled only if the load balancing locality policy (LocalityLbPolicy) is set to RING_HASH or MAGLEV.
  • For the global external HTTP(S) load balancer, don't configure session affinity if you're using weighted traffic splitting. If you do, the weighted traffic splitting configuration takes precedence.
Global external HTTP(S) load balancer (classic)
  • None (NONE)
  • Client IP (CLIENT_IP)
  • Generated cookie (GENERATED_COOKIE)
Internal HTTP(S) load balancer
  • None (NONE)
  • Client IP (CLIENT_IP)
  • Generated cookie (GENERATED_COOKIE)
  • Header field (HEADER_FIELD)
  • HTTP cookie (HTTP_COOKIE)

Session affinity settings are fulfilled only if the load balancing locality policy (LocalityLbPolicy) is set to RING_HASH or MAGLEV.

Internal TCP/UDP load balancer
  • None (NONE)
  • CLIENT IP, no destination (Preview) (CLIENT_IP_NO_DESTINATION)
  • Client IP, Destination IP (CLIENT_IP)
  • Client IP, Destination IP, Protocol (CLIENT_IP_PROTO)
  • Client IP, Client Port, Destination IP, Destination Port, Protocol (CLIENT_IP_PORT_PROTO)

For specific information about Internal TCP/UDP Load Balancing and session affinity, see the Internal TCP/UDP Load Balancing overview.

Network load balancer1
  • None (NONE)
  • Client IP, Destination IP (CLIENT_IP)
  • Client IP, Destination IP, Protocol (CLIENT_IP_PROTO)
  • Client IP, Client Port, Destination IP, Destination Port, Protocol (CLIENT_IP_PORT_PROTO)

For specific information about Network Load Balancing and session affinity, see the External TCP/UDP Network Load Balancing overview.

  • None (NONE)
  • Client IP (CLIENT_IP)
Traffic Director
  • None (NONE)
  • Client IP (CLIENT_IP)
  • Generated cookie (GENERATED_COOKIE) (HTTP protocols only)
  • Header field (HEADER_FIELD) (HTTP protocols only)
  • HTTP cookie (HTTP_COOKIE) (HTTP protocols only)

When proxyless gRPC services are configured, Traffic Director does not support session affinity.

1 This table documents session affinities supported by backend service-based network load balancers. Target pool-based network load balancers don't use backend services. Instead, you set session affinity for network load balancers through the sessionAffinity parameter in Target Pools.

Keep the following in mind when configuring session affinity:

  • Do not rely on session affinity for authentication or security purposes. Session affinity is designed to break whenever the number of serving and healthy backends changes. Activities that result in breaking session affinity include:

    • Adding backend instance groups or NEGs to the backend service
    • Removing backend instance groups or NEGs from the backend service
    • Adding instances to an existing backend instance group (which happens automatically when you enable autoscaling with managed instance groups)
    • Removing instances from an existing backend instance group (which happens automatically when you enable autoscaling with managed instance groups)
    • Adding endpoints to an existing backend NEG
    • Removing endpoints from an existing backend NEG
    • When a healthy backend fails its health check and becomes unhealthy
    • When an unhealthy backend passes its health check and becomes healthy
    • For pass-through load balancers: during failover and failback, if a failover policy is configured
    • For proxy load balancers: when a backend is at or above capacity
  • Using a session affinity other than None with the UTILIZATION balancing mode is not recommended. This is because changes in the instance utilization can cause the load balancing service to direct new requests or connections to backend VMs that are less full. This breaks session affinity. Instead, use either the RATE or CONNECTION balancing mode to reduce the chance of breaking session affinity. For more details, see Losing session affinity.

  • For external and internal HTTP(S) load balancers, session affinity might be broken when the intended endpoint or instance exceeds its balancing mode's target maximum. Consider the following example:

    • A load balancer has one NEG and three endpoints.
    • Each endpoint has a target capacity of 1 RPS.
    • The balancing mode is RATE.
    • At the moment, each endpoint is processing 1.1, 0.8, and 1.6 RPS, respectively.
    • When an HTTP request with affinity for the last endpoint arrives on the load balancer, session affinity claims the endpoint that is processing at 1.6 RPS.
    • The new request might go to the middle endpoint with 0.8 RPS.
  • The default values of the --session-affinity and --subsetting-policy flags are both NONE, and only one of them at a time can be set to a different value.

The following sections discuss the different types of session affinity.

Client IP, no destination affinity

Client IP, no destination affinity (CLIENT_IP_NO_DESTINATION) directs requests from the same client source IP address to the same backend instance.

When you use client IP, no destination affinity, keep the following in mind:

  • Client IP, no destination affinity is a one-tuple hash consisting of the client's source IP address.

  • If a client moves from one network to another, its IP address changes, resulting in broken affinity.

Client IP, no destination affinity is only an option for internal TCP/UDP load balancers.

Client IP affinity

Client IP affinity (CLIENT_IP) directs requests from the same client IP address to the same backend instance. Client IP affinity is an option for every Google Cloud load balancer that uses backend services.

When you use client IP affinity, keep the following in mind:

  • Client IP affinity is a two-tuple hash consisting of the client's IP address and the IP address of the load balancer's forwarding rule that the client contacts.

  • The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy. Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address. This can cause incoming traffic to clump unnecessarily onto the same backend instances.

  • If a client moves from one network to another, its IP address changes, resulting in broken affinity.

To learn which products support client IP affinity, refer the Table: Supported session affinity settings.

When you set generated cookie affinity, the load balancer issues a cookie on the first request. For each subsequent request with the same cookie, the load balancer directs the request to the same backend VM or endpoint.

  • For global external HTTP(S) load balancers, the cookie is named GCLB.
  • For regional external HTTP(S) load balancers, internal HTTP(S) load balancers, and Traffic Director, the cookie is named GCILB.

Cookie-based affinity can more accurately identify a client to a load balancer, compared to client IP-based affinity. For example:

  1. With cookie-based affinity, the load balancer can uniquely identify two or more client systems that share the same source IP address. Using client IP-based affinity, the load balancer treats all connections from the same source IP address as if they were from the same client system.

  2. If a client changes its IP address, cookie-based affinity lets the load balancer recognize subsequent connections from that client instead of treating the connection as new. An example of when a client changes its IP address is when a mobile device moves from one network another.

When a load balancer creates a cookie for generated cookie-based affinity, it sets the path attribute of the cookie to /. If the URL map's path matcher has multiple backend service for a host name, all backend services share the same session cookie.

The lifetime of the HTTP cookie generated by the load balancer is configurable. You can set it to 0 (default), which means the cookie is only a session cookie. Or you can set the lifetime of the cookie to a value from 1 to 86400 seconds (24 hours) inclusive.

To learn which products support generated cookie affinity, refer the Table: Supported session affinity settings.

Header field affinity

Traffic Director and an internal HTTP(S) load balancer can use when both of the following are true:

  • The load balancing locality policy is RING_HASH or MAGLEV.
  • The backend service's consistentHash specifies the name of the HTTP header (httpHeaderName).

To learn which products support header field affinity, refer the Table: Supported session affinity settings.

Traffic Director and an internal HTTP(S) load balancer can use HTTP cookie affinity when both of the following are true:

  • The load balancing locality policy is RING_HASH or MAGLEV.
  • The backend service's consistent hash specifies the name of the HTTP cookie.

HTTP cookie affinity routes requests to backend VMs or endpoints in a NEG based on the HTTP cookie named in the HTTP_COOKIE flag. If the client does not provide the cookie, the proxy generates the cookie and returns it to the client in a Set-Cookie header.

To learn which products support HTTP cookie IP affinity, refer the Table: Supported session affinity settings.

Losing session affinity

Regardless of the type of affinity chosen, a client can lose affinity with a backend in the following situations:

  • If the backend instance group or zonal NEG runs out of capacity, as defined by the balancing mode's target capacity. In this situation, Google Cloud directs traffic to a different backend instance group or zonal NEG, which might be in a different zone. You can mitigate this by ensuring that you specify the correct target capacity for each backend based on your own testing.
  • Autoscaling adds instances to, or removes instances from, a managed instance group. When this happens, the number of instances in the instance group changes, so the backend service recomputes hashes for session affinity. You can mitigate this by ensuring that the minimum size of the managed instance group can handle a typical load. Autoscaling is then only performed during unexpected increases in load.
  • If a backend VM or endpoint in a NEG fails health checks, the load balancer directs traffic to a different healthy backend. Refer to the documentation for each Google Cloud load balancer for details about how the load balancer behaves when all of its backends fail health checks.
  • When the UTILIZATION balancing mode is in effect for backend instance groups, session affinity breaks because of changes in backend utilization. You can mitigate this by using the RATE or CONNECTION balancing mode, whichever is supported by the load balancer's type.

When you use HTTP(S) Load Balancing, External SSL Proxy Load Balancing, or External TCP Proxy Load Balancing, keep the following additional points in mind:

  • If the routing path from a client on the internet to Google changes between requests or connections, a different Google Front End (GFE) might be selected as the proxy. This can break session affinity.
  • When you use the UTILIZATION balancing mode — especially without a defined target maximum target capacity — session affinity is likely to break when traffic to the load balancer is low. Switch to using RATE or CONNECTION balancing mode, as supported by your chosen load balancer.

Backend service timeout

Most Google Cloud load balancers have a backend service timeout. The default value is 30 seconds. The full range of timeout values allowed is 1 - 2,147,483,647 seconds.

  • For external HTTP(S) load balancers and internal HTTP(S) load balancers using the HTTP, HTTPS, or HTTP/2 protocol, the backend service timeout is a request/response timeout for HTTP(S) traffic.

    For more details about the backend service timeout for each load balancer, see the following:

  • For external SSL proxy load balancers and external TCP proxy load balancers, the timeout is an idle timeout. To allow more or less time before the connection is deleted, change the timeout value. This idle timeout is also used for WebSocket connections.

  • For internal TCP/UDP load balancers and network load balancers, you can set the value of the backend service timeout using gcloud or the API, but the value is ignored. Backend service timeout has no meaning for these pass-through load balancers.

  • For Traffic Director, the backend service timeout field (specified using timeoutSec) is not supported with proxyless gRPC services. For such services, configure the backend service timeout using the maxStreamDuration field. This is because gRPC does not support the semantics of timeoutSec that specifies the amount of time to wait for a backend to return a full response after the request is sent. gRPC's timeout specifies the amount of time to wait from the beginning of the stream until the response has been completely processed, including all retries.

Health checks

Each backend service whose backends are instance groups or zonal NEGs must have an associated health check. Backend services using a serverless NEG or an internet NEG as a backend must not reference a health check.

When you create a load balancer using the Google Cloud console, you can create the health check, if it is required, when you create the load balancer, or you can reference an existing health check.

When you create a backend service using either instance group or zonal NEG backends using the Google Cloud CLI or the API, you must reference an existing health check. Refer to the load balancer guide in the Health Checks Overview for details about the type and scope of health check required.

For more information, read the following documents:

Additional features enabled on the backend service resource

Some optional Google Cloud features (such as Cloud CDN and Google Cloud Armor) are available for backend services used by global external HTTP(S) load balancers. Google Cloud Armor is also supported with external SSL proxy load balancers and external TCP proxy load balancers.

For more information, see:

Traffic management features

The following features are supported only for some products:

These features are supported by the following load balancers:

  • Global external HTTP(S) load balancer (circuit breaking is not supported)
  • Regional external HTTP(S) load balancer
  • Internal HTTP(S) load balancer
  • Traffic Director (but not supported with proxyless gRPC services)

API and gcloud reference

For more information about the properties of the backend service resource, see the following references:

What's next

For related documentation and information about how backend services are used in load balancing, review the following:

For related videos: