Backend services overview

A backend service defines how Cloud Load Balancing distributes traffic. The backend service configuration contains a set of values, such as the protocol used to connect to backends, various distribution and session settings, health checks, and timeouts. These settings provide fine-grained control over how your load balancer behaves. Most of the settings have default values that allow for easy configuration if you need to get started quickly.

You can configure a backend service for the following Google Cloud load balancing services:

  • External HTTP(S) Load Balancing
  • Internal HTTP(S) Load Balancing
  • SSL Proxy Load Balancing
  • TCP Proxy Load Balancing
  • Internal TCP/UDP Load Balancing

Traffic Director also uses backend services. Network Load Balancing does not use a backend service.

Load balancers, Envoy proxies, and proxyless gRPC clients use the configuration information in the backend service resource to perform the following functions:

  • To direct traffic to the correct backends, which are instance groups or network endpoint groups (NEGs). Note that an external HTTP(S) load balancer can be configured to use a backend bucket instead of a backend service. For information about using backend buckets with external HTTP(S) load balancers, see Setting up a load balancer with backend buckets.
  • To distribute traffic according to a balancing mode, which is a setting for each backend
  • To determine which health check is monitoring the health of the backends
  • To specify session affinity
  • To determine whether Cloud CDN is enabled (external HTTP(S) load balancers only)
  • To determine whether Google Cloud Armor security policies are enabled (external HTTP(S) load balancers only)
  • To determine whether Identity-Aware Proxy is enabled (external HTTP(S) load balancers only)

You set some values when you create a backend service. You set other values when you add a backend to a backend service.

A backend service is either global or regional in scope.

For more information about the properties of the backend service resource, see following references:

The product that you are using, which is either a load balancer or Traffic Director, determines the maximum number of backend services, the scope of a backend service, the type of backends each backend service can use, and the backend service's load balancing scheme:

Product Maximum number of backend services Scope of backend service Supported backend types Load balancing scheme
External HTTP(S) Load Balancing Multiple Global1 Each backend service supports these backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
  • All zonal NEGs: One or more zonal NEGs
  • All serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services
  • One internet NEG
EXTERNAL
Internal HTTP(S) Load Balancing Multiple Regional Each backend service supports these backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends, or
  • All zonal NEGs: One or more zonal NEGs
INTERNAL_MANAGED
SSL Proxy Load Balancing 1 Global1 The backend service supports these backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends, or
  • All zonal NEGs: One or more zonal NEGs, or
  • One internet NEG
EXTERNAL
TCP Proxy Load Balancing 1 Global1 The backend service supports these backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends, or
  • All zonal NEGs: One or more zonal NEGs, or
  • One internet NEG
EXTERNAL
Internal TCP/UDP Load Balancing 1 Regional, but can be configured to be globally accessible The backend service supports this backend combination:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends
INTERNAL
Traffic Director Multiple Global Each backend service supports these backend combinations:
  • All instance group backends: One or more managed, unmanaged, or a combination of managed and unmanaged instance group backends, or
  • All zonal NEGs: One or more zonal NEGs
INTERNAL_SELF_MANAGED

1 Backend services used by HTTP(S) Load Balancing, SSL Proxy Load Balancing, and TCP Proxy Load Balancing are always global in scope, in either Standard or Premium Network Tier. However, in Standard Tier the following restrictions apply:

Backends

A backend is a resource to which a Google Cloud load balancer or a Traffic Director-configured Envoy proxy or proxyless gRPC client distributes traffic. Each backend service is associated with one or more compatible backends. There are several types of backends:

In addition, by using a backend bucket instead of backend service, you can set up Cloud Storage bucket backend.

You cannot use different types of backends with the same backend service. For example, a single backend service cannot reference a combination of instance groups and zonal NEGs. However, you can use a combination of different types of instance groups on the same backend service. For example, a single backend service can reference a combination of both managed and unmanaged instance groups. For complete information about which backends are compatible with which backend services, see the table in the previous section.

You cannot delete a backend instance group or NEG that is associated with a backend service. Before you delete an instance group or NEG, you must first remove it as a backend from all backend services that reference it.

Protocol to the backends

When you create a backend service, you must specify the protocol used to communicate with the backends. You can specify only one protocol per backend service — you cannot specify a secondary protocol to use as a fallback.

The available protocols are:

  • HTTP
  • HTTPS
  • HTTP/2
  • SSL
  • TCP
  • UDP
  • gRPC (Traffic Director only)

Which protocols are valid depends on the type of load balancer or whether you are using Traffic Director. For more information, see Load balancer features and Traffic Director features.

Changing a backend service's protocol makes the backends inaccessible through load balancers for a few minutes.

Encryption between the load balancer and backends

For HTTP(S) Load Balancing, TCP Proxy Load Balancing, and SSL Proxy Load Balancing, Google automatically encrypts traffic between Google Front Ends (GFEs) and your backends that reside within Google Cloud.

In addition to this network-level encryption, you can use a secure protocol, such as SSL, HTTPS, or HTTP/2 (using TLS), as the backend service protocol for the GFE-based load balancers as well as for Internal HTTP(S) Load Balancing and Traffic Director.

When a load balancer connects to your backends using a secure backend service protocol, the load balancer is the SSL or HTTPS client. Similarly, when a client-side proxy configured using Traffic Director connects to your backends using a secure backend service protocol, the proxy is the SSL or HTTPS client.

A secure protocol to connect to backend instances is recommended in the following cases:

  • When you require an auditable, encrypted connection from the load balancer (or Traffic Director) to the backend instances.

  • When the load balancer connects to a backend instance that is outside of Google Cloud (via an internet NEG). Communication to an internet NEG backend might transit the public internet. When the load balancer connects to an internet NEG, the certificate must be signed by a public CA and meet the validation requirements.

To use a secure protocol between the load balancer and your backends, keep the following requirements in mind:

  • You must configure your load balancer's backend services to use the SSL (TLS), HTTPS, or HTTP/2 protocol.

  • On backend instances, you must configure software to serve traffic using the same protocol as the backend service. For example, if the backend service uses HTTPS, make sure to configure your backend instances to use HTTPS. If you use the HTTP/2 protocol, your backends must use TLS. For configuration instructions, refer to your backend instance's software documentation.

  • You must install private keys and certificates on your backend instances. These certificates don't need to match the load balancer's SSL certificate. For installation instructions, refer to your backend instance's software documentation.

  • You must configure the software running on your backend instances to operate as an SSL or HTTPS server. For configuration instructions, refer to your backend instance's software documentation.

Also keep in mind the following caveats:

  • When a load balancer starts a TLS session to backends, the GFE doesn't use the Server Name Indication (SNI) extension.

  • When a load balancer connects to backends that are within Google Cloud, the load balancer accepts any certificate your backends present. Load balancers do not perform certificate validation. For example, the certificate is treated as valid even in the following circumstances:

    • The certificate is self-signed.
    • The certificate is signed by an unknown certificate authority (CA).
    • The certificate has expired or is not yet valid.
    • The CN and subjectAlternativeName attributes don't match a Host header or DNS PTR record.

For additional information on Google's encryption, see Encryption in Transit in Google Cloud.

Instance groups

This section discusses how instance groups work with the backend service.

Backend VMs and external IP addresses

Backend VMs in backend services do not need external IP addresses:

  • For external HTTP(S) load balancers, SSL proxy load balancers, and TCP proxy load balancers: Clients communicate with a Google Front End (GFE) using your load balancer's external IP address. The GFE communicates with backend VMs or endpoints using the internal IP addresses of their primary network interface. Because the GFE is a proxy, the backend VMs themselves do not require external IP addresses.

  • For internal HTTP(S) load balancers, internal TCP/UDP load balancers, and Traffic Director: Backend VMs for internal load balancers and Traffic Director do not require external IP addresses.

Named ports

A load balancer can listen on the frontend on one or more port numbers that you configure in the load balancer's forwarding rule. On the backend, the load balancer can forward traffic to the same or to a different port number. This is the port number that your backend instances (Compute Engine instances) are listening on. You configure this port number in the instance group and refer to it in the backend service configuration.

The backend port number is called a named port because it is a name/value pair. In the instance group, you define the key name and value for the port. Then you refer to the named port in the backend service configuration.

If an instance group's named port matches the --port-name in the backend service configuration, the backend service uses this port number for communication with the instance group's VMs.

The following load balancer types require each backend Compute Engine instance group to have a named port:

  • Internal HTTP(S) Load Balancing
  • HTTP(S) Load Balancing
  • SSL Proxy Load Balancing
  • TCP Proxy Load Balancing

To learn how to create named ports, see the following instructions:

For example, you might set the named port on an instance group as follows, where the service name is my-service-name and the port is 8888:

gcloud compute instance-groups unmanaged set-named-ports my-unmanaged-ig \
    --named-ports=my-service-name:8888

You can then set the --port-name on the backend service to my-service-name:

gcloud compute backend-services update my-backend-service \
    --port-name=my-service-name

Note the following:

  • Each backend service subscribes to a single port name. Consequently, each of its backend instance groups must have at least one named port for that name.

  • A backend service can use a different port number when communicating with VMs in different instance groups if each instance group specifies a different port number for the same port name.

  • The resolved port number used by the backend service does not have to match the port number used by the load balancer's forwarding rules.

Named ports are not used in these circumstances:

  • For zonal NEG or internet NEG backends, because these NEGs define ports using a different mechanism, namely, on the endpoints themselves.
  • For serverless NEG backends, because these NEGs don't have endpoints.
  • For internal TCP/UDP load balancers because an internal TCP/UDP load balancer is a pass-through load balancer, not a proxy, and its backend service does not subscribe to a named port.

For more information about named ports, see gcloud compute instance-groups managed set-named-ports and gcloud compute instance-groups unmanaged set-named-ports in the SDK documentation.

Restrictions and guidance for instance groups

Keep the following restrictions and guidance in mind when you create instance groups for your load balancers:

  • Do not put a VM in more than one load-balanced instance group. If a VM is a member of two or more unmanaged instance groups, or a member of one managed instance group and one or more unmanaged instance groups, Google Cloud limits you to only using one of those instance groups at a time as a backend for a particular backend service.

    If you need a VM to participate in multiple load balancers, you must use the same instance group as a backend on each of the backend services. To balance traffic to different ports, create the required named ports on the one instance group and have each backend service subscribe to a unique named port.

  • It is possible to use the same instance group as a backend for more than one backend service. In this situation, those backend services must all use the same balancing mode.

    This is a complex setup, and is not always possible. For example, the same instance group cannot simultaneously be a backend for a backend service of an internal TCP/UDP load balancer and an external HTTP(S) load balancer. An internal TCP/UDP load balancer uses backend services whose backends must use the CONNECTION balancing mode, while an external HTTP(S) load balancer uses backend services whose backends can support either RATE or UTILIZATION modes. Since no balancing mode is common to both, you cannot use the same instance group as a backend for both types of load balancer.

    • If your instance group is associated with several backend services, each backend service can reference the same named port or a different named port on the instance group.
  • We recommend not adding an autoscaled managed instance group to more than one backend service. Doing so might cause unpredictable and unnecessary scaling of instances in the group, especially if you use the HTTP Load Balancing Utilization autoscaling metric.

    • While not recommended, this scenario might work if the autoscaling metric of the managed instance group is either CPU Utilization or a Cloud Monitoring Metric which is unrelated to the load balancer's serving capacity. Using one of these autoscaling metrics might prevent the erratic scaling that results with the HTTP Load Balancing Utilization autoscaling metric.

Zonal network endpoint groups

A backend service that uses zonal network endpoint groups (NEGs) as its backends distributes traffic among applications or containers running within VMs.

A zonal network endpoint is a combination of an IP address and a port, specified in one of two ways:

  • By specifying an IP address:port pair, such as 10.0.1.1:80.
  • By specifying a network endpoint IP address only. The default port for the NEG is automatically used as the port of the IP address:port pair.

Network endpoints represent services by their IP address and port, rather than referring to a particular VM. A network endpoint group is a logical grouping of network endpoints.

For more information, see Overview of network endpoint groups in load balancing.

Internet network endpoint groups

Internet NEGs are global resources that are hosted within on-premises infrastructure or on infrastructure provided by third parties.

An internet NEG is a combination of an IP address or hostname, plus an optional port:

  • A publicly resolvable fully qualified domain name and an optional port, for example backend.example.com:443 (default ports: 80 for HTTP and 443 for HTTPS).
  • A publicly accessible IP address and an optional port, for example 203.0.113.8:80 or 203.0.113.8:443 (default ports: 80 for HTTP and 443 for HTTPS)

A backend service of an external HTTP(S) load balancer that uses an internet network endpoint group as its backend distributes traffic to a destination outside of Google Cloud.

For more information, see Internet network endpoint group overview.

Serverless network endpoint groups

A network endpoint group (NEG) specifies a group of backend endpoints for a load balancer. A serverless NEG is a backend that points to a Cloud Run, App Engine, or Cloud Functions service.

A serverless NEG can represent:

  • A Cloud Run service or a group of services sharing the same URL pattern.
  • A Cloud Functions function or a group of functions sharing the same URL pattern.
  • An App Engine app (Standard or Flex), a specific service within an app, or even a specific version of an app.

For more information, see Serverless network endpoint group overview.

Traffic distribution

The values of the following fields in the backend services resource determine some aspects of the backend's behavior:

  • A balancing mode, which the load balancer uses to determine candidate backends for new requests or connections.
  • A target capacity, which defines a target maximum number of connections, a target maximum rate, or target maximum CPU utilization.
  • A capacity scaler, which can be used to adjust overall available capacity without modifying the target capacity.

Balancing mode

The balancing mode determines whether backends of a load balancer can handle additional traffic or are fully loaded. Google Cloud has three balancing modes:

  • CONNECTION
  • RATE
  • UTILIZATION

The balancing mode options depend on the backend service's load balancing scheme, the backend service's protocol, and the type of backends connected to the backend service.

You set the balancing mode when you add a backend to the backend service. Note that you cannot set a balancing mode for a serverless NEG.

Balancing mode Supported load balancing schemes Supported backend service protocols Supported backend types Applicable products
CONNECTION EXTERNAL
INTERNAL
SSL, TCP, UDP
Protocol options are limited by the type of load balancer. For example, Internal TCP/UDP Load Balancing only uses the TCP or UDP protocol.
Either instance groups or zonal NEGs, if supported.
For example, Internal TCP/UDP Load Balancing does not support zonal NEGs.
  • SSL Proxy Load Balancing
  • TCP Proxy Load Balancing
  • Internal TCP/UDP Load Balancing
RATE EXTERNAL
INTERNAL_MANAGED
INTERNAL_SELF_MANAGED
HTTP, HTTPS, HTTP2, gRPC Instance groups or zonal NEGs
  • External HTTP(S) Load Balancing
  • Internal HTTP(S) Load Balancing
  • Traffic Director (INTERNAL_SELF_MANAGED; HTTP and gRPC protocols only)
UTILIZATION EXTERNAL
INTERNAL_MANAGED
INTERNAL_SELF_MANAGED
Any protocol Instance groups only. Zonal NEGs do not support utilization mode.
  • External HTTP(S) Load Balancing
  • Internal HTTP(S) Load Balancing
  • SSL Proxy Load Balancing
  • TCP Proxy Load Balancing
  • Traffic Director (INTERNAL_SELF_MANAGED; HTTP and gRPC protocols only)

If the average utilization of all VMs that are associated with a backend service is less than 10%, Google Cloud might prefer specific zones. This can happen when you use managed regional instance groups, managed zonal instance groups in different zones, and unmanaged zonal instance groups. This zonal imbalance automatically resolves as more traffic is sent to the load balancer.

For more information, see gcloud beta compute backend-services add-backend.

Changing the balancing mode of a load balancer

For some load balancers, you cannot change the balancing mode, because the backend service has only one possible balancing mode:

  • The backend services for internal TCP/UDP load balancers can use only CONNECTION balancing mode.
  • The backend services for external HTTP(S) load balancers where all backends are NEGs can use only RATE balancing mode.
  • The backend services for internal HTTP(S) load balancers where all backends are NEGs can use only RATE balancing mode.
  • The backend services for SSL proxy load balancers where all backends are NEGs can use only CONNECTION balancing mode
  • The backend services for TCP proxy load balancers where all backends are NEGs can use only CONNECTION balancing mode

For some load balancers, you can change the balancing mode, because more than one mode is available to their backend services:

  • The backend services for external HTTP(S) load balancers where all backends are instance groups can use RATE or UTILIZATION balancing mode.
  • The backend services for internal HTTP(S) load balancers where all backends are instance groups can use RATE or UTILIZATION balancing mode.
  • The backend services for SSL proxy load balancers where all backends are instance groups can use CONNECTION or UTILIZATION balancing mode.
  • The backend services for TCP proxy load balancers where all backends are instance groups can use CONNECTION or UTILIZATION balancing mode.

If the same instance group is a backend for multiple backend services, it must use the same balancing mode for each backend service. To change the balancing mode for an instance group serving as a backend for multiple backend services:

  • Remove the instance group from all backend services except for one.
  • Change the balancing mode for the backend on the one remaining backend service.
  • Re-add the instance group as a backend to the remaining backend services, if they support the new balancing mode.

In addition, you cannot use the same instance group as a backend for some different backend service combinations. For example, you cannot use the same instance group as a backend for an internal TCP/UDP load balancer, which supports only CONNECTION balancing mode and simultaneously for an external HTTP(S) load balancer, which supports either RATE or UTILIZATION balancing mode.

Target capacity

Each balancing mode has a corresponding target capacity, which defines a target maximum number of connections, a target maximum rate, or target maximum CPU utilization. For every balancing mode, the target capacity is not a circuit breaker. A load balancer will exceed the maximum under certain conditions; for example, if all backend VMs or endpoints have reached it.

  • For CONNECTION balancing mode, the target capacity defines a target maximum number of concurrent connections. Except for internal TCP/UDP load balancers, you must specify a target maximum number of connections using one of three methods: You can specify the maximum number of connections for each VM using max-connections-per-instance or for each endpoint in a zonal NEG using max-connections-per-endpoint. For zonal NEGs and for zonal unmanaged instance groups, you can specify max-connections for the whole group.

    When you specify max-connections-per-instance or max-connections-per-endpoint, Google Cloud computes the effective per-VM or per-endpoint target as follows:

    • (max connections per VM * total number of VMs) / number of healthy VMs
    • (max connections per endpoint * total number of endpoints) / number of healthy endpoints

    When you specify max-connections, Google Cloud computes the effective per-VM or per-endpoint target in this way:

    • max connections / number of healthy VMs
    • max connections / number of healthy endpoints
  • For RATE balancing mode, the target capacity defines a target maximum rate for HTTP requests. You must specify a target rate using one of three methods: You can specify a target maximum rate for each VM using max-rate-per-instance or for each endpoint in a zonal NEG using max-rate-per-endpoint. For zonal NEGs and for zonal unmanaged instance groups, you can specify max-rate for the whole group.

    When you specify max-rate-per-instance or max-rate-per-endpoint, Google Cloud computes the effective per-VM or per-endpoint target rate as follows:

    • max rate per VM * total number of VMs / number of healthy VMs
    • max rate per endpoint * total number of endpoints / number of healthy endpoints

    When you specify max-rate, Google Cloud computes the effective per-VM or per-endpoint target rate in this way:

    • max rate / number of healthy VMs
    • max rate / number of healthy endpoints
  • For UTILIZATION balancing mode, there isn't a mandatory target capacity. You have a number of options that depend on the type of backend, as summarized in the following table.

This table explains all possible balancing modes for a given load balancer and type of backend. It also shows the available or required capacity settings that you must specify in conjunction with the balancing mode.

Load balancer Type of backend Balancing mode Target capacity
Internal TCP/UDP Load Balancing Instance group CONNECTION You cannot specify a maximum number of connections.
SSL Proxy Load Balancing, TCP Proxy Load Balancing Instance group CONNECTION You must specify one of the following:
1. max connections per zonal instance group
2. max connections per instance
 (zonal or regional instance groups)
UTILIZATION You can optionally specify one of the following:
1. max utilization
2. max connections per zonal instance group
3. max connections per instance
 (zonal or regional instance groups)
4. 1 and 2 together
5. 1 and 3 together
Zonal NEG CONNECTION You must specify one of the following:
1. max connections per zonal NEG
2. max connections per endpoint
HTTP(S) Load Balancing, Internal HTTP(S) Load Balancing, Traffic Director Instance group RATE You must specify one of the following:
1. max rate per zonal instance group
2. max rate per instance
 (zonal or regional instance groups)
UTILIZATION You can optionally specify one of the following:
1. max utilization
2. max rate per zonal instance group
3. max rate per instance
 (zonal or regional instance groups)
4. 1 and 2 together
5. 1 and 3 together
Zonal NEG RATE You must specify one of the following:
1. max rate per zonal NEG
2. max rate per endpoint

Capacity scaler

You can optionally adjust the capacity scaler to scale down the target capacity (max utilization, max rate, or max connections) without changing the target capacity. The capacity scaler is supported for all load balancers that support a target capacity. The only exception is the internal TCP/UDP load balancer.

By default, the capacity scaler is 1.0 (100%). This means, for example, that if a backend instance's current percentage of usage is 80% and max utilization is set to 80%, the backend service stops sending requests to this instance:

capacity-scaler: 1 * 0.8

You can set the capacity scaler to 0.0 or from 0.1 (10%) to 1.0 (100%). You cannot configure a setting that is larger than 0.0 and smaller than 0.1. A scale factor of zero (0.0) prevents all new connections. You cannot configure a setting of 0.0 when there is only one backend attached to the backend service.

Using the capacity scaler when two backend services share an instance group backend

In a situation where you have two backend services that use the same instance group backend, you need to allocate the instance group's capacity between the two backend services.

For example, you could set the capacity scaler to 0.5 on both backend services and each backend service would receive 40% of the instance group's capacity:

capacity-scaler: 0.5 * 0.8

When each backend service has used 40% of the instance group's capacity, requests are no longer sent to this instance group.

Traffic Director and traffic distribution

Traffic Director also uses backend service resources. Specifically, Traffic Director uses backend services whose load balancing scheme is INTERNAL_SELF_MANAGED. For an internal self-managed backend service, traffic distribution is based on the combination of a load balancing mode and a load balancing policy. The backend service directs traffic to a backend (instance group or NEG) according to the backend's balancing mode, then, once a backend has been selected, Traffic Director distributes traffic according to a load balancing policy.

Internal self-managed backend services support the following balancing modes:

  • UTILIZATION, if all the backends are instance groups
  • RATE, if all the backends are either instance groups or zonal NEGs

If you choose RATE balancing mode, you must specify a maximum rate, maximum rate per instance, or maximum rate per endpoint.

For more information about Traffic Director, see Traffic Director concepts.

Session affinity

Without session affinity, load balancers distribute new requests based on a 5-tuple hash consisting of the client's IP address and source port, the load balancer's forwarding rule IP address and destination port, and the L3 protocol (TCP protocol for all load balancers that use backend services). The balancing mode of the backend instance group or zonal NEG determines when the backend is at capacity. Some applications – such as stateful servers used by ads serving, games, or services with heavy internal caching – need multiple requests from a given user to be directed to the same backend or endpoint.

Session affinity is available for TCP traffic, including the SSL, HTTP(S), and HTTP/2 protocols. As long as a backend instance or endpoint remains healthy and is not at capacity, as defined by its balancing mode, the load balancer directs subsequent requests to the same backend VM or endpoint. Keep the following in mind when configuring session affinity:

  • UDP traffic doesn't have the concept of a session, so session affinity doesn't make sense for this type of traffic.

  • When proxyless gRPC services are configured, Traffic Director does not support session affinity.

  • Do not rely on session affinity for authentication or security purposes. Session affinity is designed to break when a backend is at or above capacity or if it becomes unhealthy.

  • Google Cloud load balancers provide session affinity on a best-effort basis. Factors such as changing backend health check states or changes to backend fullness, as measured by the balancing mode, can break session affinity. Using a session affinity other than None with the UTILIZATION balancing mode is not recommended because changes in the instance utilization can cause the load balancing service to direct new requests or connections to backend VMs that are less full, breaking session affinity. Instead, use either the RATE or CONNECTION balancing mode, as supported by your chosen load balancer, to reduce the chance of breaking session affinity.

The following table shows the session affinity options:

Product Session affinity options
Internal TCP/UDP • None
• Client IP
• Client IP and protocol
• Client IP, protocol, and port
TCP Proxy
SSL Proxy
• None
• Client IP
External HTTP(S) • None
• Client IP
• Generated cookie
Internal HTTP(S) • None
• Client IP
• Generated cookie
• Header field
• HTTP cookie
Network Network Load Balancing doesn't use backend services. Instead, you set session affinity for network load balancers through target pools. See the sessionAffinity parameter in Target Pools.
Traffic Director • None
• Client IP
• Generated cookie
• Header field
• HTTP cookie

The following sections discuss the different types of session affinity.

Client IP affinity

Client IP affinity directs requests from the same client IP address to the same backend instance. Client IP affinity is an option for every Google Cloud load balancer that uses backend services.

When you use client IP affinity, keep the following in mind:

  • Client IP affinity is a two-tuple hash consisting of the client's IP address and the IP address of the load balancer's forwarding rule that the client contacts.

  • The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy. Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address. This can cause incoming traffic to clump unnecessarily onto the same backend instances.

  • If a client moves from one network to another, its IP address changes, resulting in broken affinity.

When you set generated cookie affinity, the load balancer issues a cookie on the first request. For each subsequent request with the same cookie, the load balancer directs the request to the same backend VM or endpoint.

  • For external HTTP(S) load balancers, the cookie is named GCLB.
  • For internal HTTP(S) load balancers and Traffic Director, the cookie is named GCILB.

Cookie-based affinity can more accurately identify a client to a load balancer, compared to client IP-based affinity. For example:

  1. With cookie-based affinity, the load balancer can uniquely identify two or more client systems that share the same source IP address. Using client IP-based affinity, the load balancer treats all connections from the same source IP address as if they were from the same client system.

  2. If a client changes its IP address - for example, a mobile device moving from network to network - cookie-based affinity allows the load balancer to recognize subsequent connections from that client instead of treating the connection as new.

When a load balancer creates a cookie for generated cookie-based affinity, it sets the path attribute of the cookie to /. If the load balancer's URL map has a path matcher that specifies more than one backend service for a given host name, all backend services using cookie-based session affinity share the same session cookie.

The lifetime of the HTTP cookie generated by the load balancer is configurable. You can set it to 0 (default), which means the cookie is only a session cookie, or you can set the lifetime of the cookie to a value from 1 to 86400 seconds (24 hours) inclusive.

Header field affinity

An internal HTTP(S) load balancer can use header field affinity when the load balancing locality policy is RING_HASH or MAGLEV and the backend service's consistent hash specifies the name of the HTTP header. Header field affinity routes requests to backend VMs or endpoints in a zonal NEG based on the value of the HTTP header named in the --custom-request-header flag.

For more information about Internal HTTP(S) Load Balancing, in which header field affinity is used, see Internal HTTP(S) Load Balancing overview.

An internal TCP/UDP load balancer can use HTTP cookie affinity when the load balancing locality policy is RING_HASH or MAGLEV and the backend service's consistent hash specifies the name of the HTTP cookie.

HTTP cookie affinity routes requests to backend VMs or endpoints in a NEG based on the HTTP cookie named in the HTTP_COOKIE flag. If the client does not provide the cookie, the proxy generates the cookie and returns it to the client in a Set-Cookie header.

For more information about Internal HTTP(S) Load Balancing, in which HTTP cookie affinity is used, see Internal HTTP(S) Load Balancing overview.

Losing session affinity

Regardless of the type of affinity chosen, a client can lose affinity with a backend in the following situations:

  • If the backend instance group or zonal NEG runs out of capacity, as defined by the balancing mode's target capacity. In this situation, Google Cloud directs traffic to a different backend instance group or zonal NEG, which might be in a different zone. You can mitigate this by ensuring that you specify the correct target capacity for each backend based on your own testing.
  • Autoscaling adds instances to, or removes instances from, a managed instance group. When this happens, the number of instances in the instance group changes, so the backend service recomputes hashes for session affinity. You can mitigate this by ensuring that the minimum size of the managed instance group can handle a typical load. Autoscaling is then only performed during unexpected increases in load.
  • If a backend VM or endpoint in a NEG fails health checks, the load balancer directs traffic to a different healthy backend. Refer to the documentation for each Google Cloud load balancer for details about how the load balancer behaves when all of its backends fail health checks.
  • When the UTILIZATION balancing mode is in effect for backend instance groups, session affinity breaks because of changes in backend utilization. You can mitigate this by using the RATE or CONNECTION balancing mode, whichever is supported by the load balancer's type.

When you use HTTP(S) Load Balancing, SSL Proxy Load Balancing, or TCP Proxy Load Balancing, keep the following additional points in mind:

  • If the routing path from a client on the internet to Google changes between requests or connections, a different Google Front End (GFE) might be selected as the proxy. This can break session affinitity.
  • When you use the UTILIZATION balancing mode — especially without a defined target maximum target capacity — session affinity is likely to break when traffic to the load balancer is low. Switch to using RATE or CONNECTION balancing mode instead.

Backend service timeout

Most Google Cloud load balancers have a backend service timeout. The default value is 30 seconds.

  • For external HTTP(S) load balancers and internal HTTP(S) load balancers using the HTTP, HTTPS, or HTTP/2 protocol, the backend service timeout is a request/response timeout for HTTP(S) traffic. This is the amount of time that the load balancer waits for a backend to return a full response to a request. For example, if the value of the backend service timeout is the default value of 30 seconds, the backends have 30 seconds to deliver a complete response to requests. The load balancer retries the HTTP GET request once if the backend closes the connection or times out before sending response headers to the load balancer. If the backend sends response headers (even if the response body is otherwise incomplete) or if the request sent to the backend is not an HTTP GET request, the load balancer does not retry. If the backend does not reply at all, the load balancer returns a an HTTP 5xx response to the client. For these load balancers, change the timeout value if you want to allow more or less time for the backends to completely respond to requests.

  • For external HTTP(S) load balancers, if the HTTP connection is upgraded to a WebSocket, the backend service timeout defines the maximum amount of time that a WebSocket can be open, whether idle or not.

  • For SSL proxy load balancers and TCP proxy load balancers, the timeout is an idle timeout. For these load balancers, change the timeout value if you want to allow more or less time before the connection is deleted. This idle timeout is also used for WebSocket connections.

  • internal TCP/UDP load balancers ignore the value of the backend service timeout.

  • When proxyless gRPC services are configured, Traffic Director does not support the backend service timeout.

Health checks

Each backend service whose backends are instance groups or zonal NEGs must have an associated health check. Backend services using an internet NEG as a backend must not reference a health check.

When you create a load balancer using the Google Cloud Console, you can create the health check, if it is required, when you create the load balancer, or you can reference an existing health check.

When you create a backend service using either instance group or zonal NEG backends using the gcloud command-line tool or the API, you must reference an existing health check. Refer to the load balancer guide in the Health Checks Overview for details about the type and scope of health check required.

For more information, read the following documents:

Additional features enabled on the backend service resource

The following optional Google Cloud features are available for backend services used by an HTTP(S) Load Balancing. They are not discussed in this document, but are discussed on the following pages:

  • Google Cloud Armor provides protection against DDoS and other attacks with security policies.
  • Cloud CDN is a low-latency content delivery system.
  • User-defined request headers are additional headers that the load balancer adds to requests.
  • IAP allows you to require authentication with a Google Account using an OAuth 2.0 sign-in workflow and control access using IAM permissions.

Other notes

The following features are supported only with internal HTTP(S) load balancers and Traffic Director; however, they are not supported when you use proxyless gRPC services with Traffic Director.

  • Circuit breaking
  • Outlier detection
  • Load balancing policies
  • HTTP cookie-based session affinity
  • HTTP header-based session affinity

What's next

For related documentation and information on how backend services are used in load balancing, review the following: