A backend service defines how Cloud Load Balancing distributes traffic. The backend service configuration contains a set of values, such as the protocol used to connect to backends, various distribution and session settings, health checks, and timeouts. These settings provide fine-grained control over how your load balancer behaves. If you need to get started quickly, most of the settings have default values that allow for easy configuration.
You can configure a backend service for the following Google Cloud load balancing services:
- External HTTP(S) Load Balancing
- Internal HTTP(S) Load Balancing
- SSL Proxy Load Balancing
- TCP Proxy Load Balancing
- Internal TCP/UDP Load Balancing
- External TCP/UDP Network Load Balancing
Traffic Director also uses backend services.
Load balancers, Envoy proxies, and proxyless gRPC clients use the configuration information in the backend service resource to do the following:
- Direct traffic to the correct backends, which are instance groups or network endpoint groups (NEGs). You can configure an external HTTP(S) load balancer to use a backend bucket instead of a backend service. For information about using backend buckets with external HTTP(S) load balancers, see Setting up a load balancer with backend buckets.
- Distribute traffic according to a balancing mode, which is a setting for each backend.
- Determine which health check is monitoring the health of the backends.
- Specify session affinity.
- Determine whether other services are enabled, including the following:
- Cloud CDN (external HTTP(S) load balancers only)
- Google Cloud Armor security policies (external HTTP(S) load balancers only)
- Identity-Aware Proxy (external HTTP(S) load balancers only)
You set these values when you create a backend service or add a backend to the backend service.
A backend service is either global or regional in scope.
For more information about the properties of the backend service resource, see following references:
Global backend service API resource
Regional backend service API resource
gcloud compute backend-services
page, for both global and regional backend services.
The product that you are using, which is either a load balancer or Traffic Director, determines the following:
- Maximum number of backend services
- Scope of a backend service
- Type of backends that each backend service can use
- Backend service's load balancing scheme
Product | Maximum number of backend services | Scope of backend service | Supported backend types | Load balancing scheme |
---|---|---|---|---|
External HTTP(S) Load Balancing | Multiple | Global1 | Each backend service supports these backend combinations:
|
EXTERNAL |
Internal HTTP(S) Load Balancing | Multiple | Regional | Each backend service supports these backend combinations:
|
INTERNAL_MANAGED |
SSL Proxy Load Balancing | 1 | Global1 | The backend service supports these backend combinations:
|
EXTERNAL |
TCP Proxy Load Balancing | 1 | Global1 | The backend service supports these backend combinations:
|
EXTERNAL |
Network Load Balancing | 1 | Regional | The backend service supports this backend combination:
|
EXTERNAL |
Internal TCP/UDP Load Balancing | 1 | Regional, but configurable to be globally accessible | The backend service supports these backend combinations:
|
INTERNAL |
Traffic Director | Multiple | Global | Each backend service supports these backend combinations:
|
INTERNAL_SELF_MANAGED |
- The forwarding rule and its external IP address are regional.
- All backends connected to the backend service must be located in the same region as the forwarding rule.
Backends
A backend is a group of endpoints that receive traffic from a Google Cloud load balancer, a Traffic Director-configured Envoy proxy, or a proxyless gRPC client. There are several types of backends:
- Instance group containing virtual machine (VM) instances. An instance group can be a managed instance group, with or without autoscaling, or it can be an unmanaged instance group. More than one backend service can reference an instance group, but all backend services that reference the instance group must use the same balancing mode.
- Zonal NEG
- Serverless NEG
- Internet NEG
In addition, by using a backend bucket instead of a backend service, you can have a Cloud Storage bucket backend.
You cannot use different types of backends with the same backend service. For example, a single backend service cannot reference a combination of instance groups and zonal NEGs. However, you can use a combination of different types of instance groups on the same backend service. For example, a single backend service can reference a combination of both managed and unmanaged instance groups. For complete information about which backends are compatible with which backend services, see the table in the previous section.
You cannot delete a backend instance group or NEG that is associated with a backend service. Before you delete an instance group or NEG, you must first remove it as a backend from all backend services that reference it.
Protocol to the backends
When you create a backend service, you must specify the protocol used to communicate with the backends. You can specify only one protocol per backend service — you cannot specify a secondary protocol to use as a fallback.
The available protocols are:
- HTTP
- HTTPS
- HTTP/2
- SSL
- TCP
- UDP
- gRPC (Traffic Director only)
Which protocols are valid depends on the type of load balancer or whether you are using Traffic Director. For more information, see Load balancer features and Traffic Director features.
Changing a backend service's protocol makes the backends inaccessible through load balancers for a few minutes.
Encryption between the load balancer and backends
For information about this topic, see Encryption to the backends.
Instance groups
This section discusses how instance groups work with the backend service.
Backend VMs and external IP addresses
Backend VMs in backend services do not need external IP addresses:
- For external HTTP(S) load balancers, SSL proxy load balancers, and TCP proxy load balancers: Clients communicate with a Google Front End (GFE) using your load balancer's external IP address. The GFE communicates with backend VMs or endpoints using a combination of an identifier for the backend's VPC network and the VM or endpoint's internal IP address. Internal IP addresses must be associated with the primary network interface (nic0) of the backend. Communication between GFEs and backend VMs or endpoints is facilitated though special routes.
- For network load balancers, packets are first routed to the network load balancer's external IP address. The load balancer then uses consistent hashing to route them to backend VMs.
- For internal HTTP(S) load balancers, internal TCP/UDP load balancers, and Traffic Director: Backend VMs for internal TCP/UDP load balancers and Traffic Director do not require external IP addresses.
Named ports
The following load balancer types require each backend Compute Engine instance group to have a named port:
- Internal HTTP(S) Load Balancing
- HTTP(S) Load Balancing
- SSL Proxy Load Balancing
- TCP Proxy Load Balancing
A load balancer can listen on the frontend on one or more port numbers that you configure in the load balancer's forwarding rule. On the backend, the load balancer can forward traffic to the same or to a different port number. This is the port number that your backend instances (Compute Engine instances) are listening on. You configure this port number in the instance group and refer to it in the backend service configuration.
The backend port number is called a named port because it is a name/value pair. In the instance group, you define the key name and value for the port. Then you refer to the named port in the backend service configuration.
If an instance group's named port matches the --port-name
in the backend
service configuration, the backend service uses this port number for
communication with the instance group's VMs.
To learn how to create named ports, see the following instructions:
- Unmanaged instance groups: Working with named ports
- Managed instance groups: Assigning named ports to managed instance groups
For example, you might set the named port on an instance group as follows,
where the service name is my-service-name
and the port is 8888
:
gcloud compute instance-groups unmanaged set-named-ports my-unmanaged-ig \ --named-ports=my-service-name:8888
You can then set the --port-name
on the backend service to my-service-name
:
gcloud compute backend-services update my-backend-service \ --port-name=my-service-name
Note the following:
Each backend service subscribes to a single port name. Each of its backend instance groups must have at least one named port for that name.
A backend service can use a different port number when communicating with VMs in different instance groups if each instance group specifies a different port number for the same port name.
The resolved port number used by the backend service does not have to match the port number used by the load balancer's forwarding rules.
Named ports are not used in these circumstances:
- For zonal NEG or internet NEG backends, because these NEGs define ports using a different mechanism, namely, on the endpoints themselves.
- For serverless NEG backends, because these NEGs don't have endpoints.
- For internal TCP/UDP load balancers, because an internal TCP/UDP load balancer is a pass-through load balancer, not a proxy. Also, its backend service does not subscribe to a named port.
- For network load balancers, because a network load balancer is a pass-through load balancer, not a proxy, and its backend service does not subscribe to a named port.
For more information about named ports, see gcloud compute instance-groups managed set-named-ports and gcloud compute instance-groups unmanaged set-named-ports in the SDK documentation.
Restrictions and guidance for instance groups
Keep the following restrictions and guidance in mind when you create instance groups for your load balancers:
Do not put a VM in more than one load-balanced instance group. If a VM is a member of two or more unmanaged instance groups, or a member of one managed instance group and one or more unmanaged instance groups, Google Cloud limits you to only using one of those instance groups at a time as a backend for a particular backend service.
If you need a VM to participate in multiple load balancers, you must use the same instance group as a backend on each of the backend services. To balance traffic to different ports, create the required named ports on the one instance group and have each backend service subscribe to a unique named port.
You can use the same instance group as a backend for more than one backend service. In this situation, the backends must use compatible balancing modes. Compatible means that the balancing modes must be the same, or they must be a combination of
CONNECTION
andRATE
. Incompatible combinations are as follows:CONNECTION
withUTILIZATION
RATE
withUTILIZATION
Consider the following example:
- You have two backend services:
external-https-backend-service
for an external HTTP(S) load balancer andinternal-tcp-backend-service
for an internal TCP/UDP load balancer. - You're using an instance group called
instance-group-a
ininternal-tcp-backend-service
. - In
internal-tcp-backend-service
, you must apply theCONNECTION
balancing mode because internal TCP/UDP load balancers only support theCONNECTION
balancing mode. - You can also use
instance-group-a
inexternal-https-backend-service
if you apply theRATE
balancing mode inexternal-https-backend-service
. - You cannot also use
instance-group-a
inexternal-https-backend-service
with theUTILIZATION
balancing mode.
To change the balancing mode for an instance group serving as a backend for multiple backend services:
- Remove the instance group from all backend services except for one.
- Change the balancing mode for the backend on the one remaining backend service.
- Re-add the instance group as a backend to the remaining backend services, if they support the new balancing mode.
If your instance group is associated with several backend services, each backend service can reference the same named port or a different named port on the instance group.
We recommend not adding an autoscaled managed instance group to more than one backend service. Doing so might cause unpredictable and unnecessary scaling of instances in the group, especially if you use the HTTP Load Balancing Utilization autoscaling metric.
- While not recommended, this scenario might work if the autoscaling metric is either CPU Utilization or a Cloud Monitoring Metric that is unrelated to the load balancer's serving capacity. Using one of these autoscaling metrics might prevent erratic scaling.
Zonal network endpoint groups
Network endpoints represent services by their IP address or an IP address/port combination, rather than referring to a VM in an instance group. A network endpoint group (NEG) is a logical grouping of network endpoints.
Zonal network endpoint groups (NEGs) are zonal resources that represent collections of either IP addresses or IP address/port combinations for Google Cloud resources within a single subnet.
There are two types of network endpoints available for zonal NEGs:
GCE_VM_IP
endpoints.GCE_VM_IP_PORT
endpoints.
For details, see Zonal NEGs overview.
A backend service that uses zonal NEGs as its backends distributes traffic among applications or containers running within VMs.
Zonal network endpoint groups (NEGs) using GCE_VM_IP_PORT
endpoints can be
used as backends for the following load balancer types:
- Internal HTTP(S) Load Balancing
- HTTP(S) Load Balancing
- SSL Proxy Load Balancing
- TCP Proxy Load Balancing
Traffic Director also supports zonal NEG backends with GCE_VM_IP_PORT
endpoints.
Zonal network endpoint groups (NEGs) using GCE_VM_IP
endpoints can be
used as backends for Internal TCP/UDP Load Balancing only.
Zonal NEGs are not supported by Network Load Balancing.
For more information, see Overview of network endpoint groups in load balancing.
Internet network endpoint groups
Internet NEGs are global resources that are hosted within on-premises infrastructure or on infrastructure provided by third parties.
An internet NEG is a combination of an IP address or hostname, plus an optional port:
- A publicly resolvable fully qualified domain name and an optional port,
for example
backend.example.com:443
(default ports:80
for HTTP and443
for HTTPS). - A publicly accessible IP address and an optional port, for example
203.0.113.8:80
or203.0.113.8:443
(default ports:80
for HTTP and443
for HTTPS)
A backend service of an external HTTP(S) load balancer that uses an internet network endpoint group as its backend distributes traffic to a destination outside of Google Cloud.
For more information, including which load balancers support internet NEGs, see Internet network endpoint group overview.
Serverless network endpoint groups
A network endpoint group (NEG) specifies a group of backend endpoints for a load balancer. A serverless NEG is a backend that points to a Cloud Run, App Engine, or Cloud Functions service.
A serverless NEG can represent:
- A Cloud Run service or a group of services sharing the same URL pattern.
- A Cloud Functions function or a group of functions sharing the same URL pattern.
- An App Engine app (Standard or Flex), a specific service within an app, or even a specific version of an app.
For more information, including which load balancers support serverless NEGs, see Serverless network endpoint group overview.
Traffic distribution
The values of the following fields in the backend services resource determine some aspects of the backend's behavior:
- A balancing mode defines how the load balancer measures backend readiness for new requests or connections.
- A target capacity defines a target maximum number of connections, a target maximum rate, or target maximum CPU utilization.
- A capacity scaler adjusts overall available capacity without modifying the target capacity.
Balancing mode
The balancing mode determines whether backends of a load balancer can handle additional traffic or are fully loaded. Google Cloud has three balancing modes:
CONNECTION
RATE
UTILIZATION
The balancing mode options depend on the backend service's load balancing scheme, the backend service's protocol, and the type of backends connected to the backend service.
You set the balancing mode when you add a backend to the backend service. Note that you cannot specify a balancing mode when using serverless NEGs or internet NEGs as backends for a load balancer.
Balancing mode | Supported load balancing schemes | Compatible backend service protocols1 | Compatible backends2 | Applicable products |
---|---|---|---|---|
CONNECTION |
EXTERNAL INTERNAL |
SSL, TCP, UDP |
Either instance groups or zonal NEGs, if supported. |
|
RATE |
EXTERNAL INTERNAL_MANAGED INTERNAL_SELF_MANAGED |
HTTP, HTTPS, HTTP2, gRPC | Instance groups or zonal NEGs |
|
UTILIZATION |
EXTERNAL INTERNAL_MANAGED INTERNAL_SELF_MANAGED |
No special restriction | Instance groups only. Zonal NEGs do not support utilization mode. |
|
1Protocols are further restricted based on the type of load balancer.
2For the supported backend types (for example, instance groups and zonal NEGs), see Backends on the Load balancer features page.
If the average utilization of all VMs that are associated with a backend service is less than 10%, Google Cloud might prefer specific zones. This can happen when you use managed regional instance groups, managed zonal instance groups in different zones, and unmanaged zonal instance groups. This zonal imbalance automatically resolves as more traffic is sent to the load balancer.
For more information, see gcloud beta compute backend-services add-backend.
Changing the balancing mode of a load balancer
For some load balancers, you cannot change the balancing mode because the backend service has only one possible balancing mode. For others, depending on the backend used, you can change the balancing mode because more than one mode is available to those backend services.
Load balancer | Backends | Balancing modes available |
---|---|---|
HTTP(S) Load Balancing | Instance groups | RATE or UTILIZATION |
Zonal NEGs (GCE_VM_IP_PORT endpoints) |
RATE |
|
Internal HTTP(S) Load Balancing | Instance groups | RATE or UTILIZATION |
Zonal NEGs (GCE_VM_IP_PORT endpoints) |
RATE |
|
TCP Proxy Load Balancing | Instance groups | CONNECTION or UTILIZATION |
Zonal NEGs (GCE_VM_IP_PORT endpoints) |
CONNECTION |
|
SSL Proxy Load Balancing | Instance groups | CONNECTION or UTILIZATION |
Zonal NEGs (GCE_VM_IP_PORT endpoints) |
CONNECTION |
|
Network Load Balancing | Instance groups | CONNECTION |
Internal TCP/UDP Load Balancing | Instance groups | CONNECTION |
Zonal NEGs (GCE_VM_IP endpoints) |
CONNECTION |
Target capacity
Each balancing mode has a corresponding target capacity, which defines one of the following target maximums:
- Number of connections
- Rate
- CPU utilization
For every balancing mode, the target capacity is not a circuit breaker. A load balancer can exceed the maximum under certain conditions, for example, if all backend VMs or endpoints have reached the maximum.
Connection balancing mode
For CONNECTION
balancing mode, the target capacity defines a target
maximum number of concurrent connections. Except for internal TCP/UDP load balancers
and network load balancers, you must use one of the following settings to specify a
target maximum number of connections:
max-connections-per-instance
(per VM): Target average number of connections for a single VM.max-connections-per-endpoint
(per endpoint in a zonal NEG): Target average number of connections for a single endpoint.max-connections
(per zonal NEGs and for zonal instance groups): Target average number of connections for the whole NEG or instance group. For regional managed instance groups, usemax-connections-per-instance
instead.
The following table shows how the target capacity parameter defines the following:
- The target capacity for the whole backend
- The expected target capacity for each instance or endpoint
Backend type | Target capacity | ||
---|---|---|---|
If you specify | Whole backend capacity | Expected per instance or per endpoint capacity | |
Instance groupN instances,H healthy |
max-connections-per-instance=X
|
X × N
|
(X × N)/H
|
Zonal NEGN endpoints,H healthy
|
max-connections-per-endpoint=X
|
X × N
|
(X × N)/H
|
Instance groups (except regional managed instance groups) H healthy instances
|
max-connections=Y
|
Y
|
Y/H
|
As illustrated, the max-connections-per-instance
and
max-connections-per-endpoint
settings are proxies for calculating a
target maximum number of connections for the whole instance group or whole zonal
NEG:
- In an instance group with
N
instances, settingmax-connections-per-instance=X
has the same meaning as settingmax-connections=X × N
. - In a zonal NEG with
N
endpoints, settingmax-connections-per-endpoint=X
has the same meaning as settingmax-connections=X × N
.
Rate balancing mode
For the RATE
balancing mode, you must define the target capacity using
one of the following parameters:
max-rate-per-instance
(per VM): Provide a target average HTTP request rate for a single VM.max-rate-per-endpoint
(per endpoint in a zonal NEG): Provide a target average HTTP request rate for a single endpoint.max-rate
(per zonal NEGs and for zonal instance groups): Provide a target average HTTP request rate for the whole NEG or instance group. For regional managed instance groups, usemax-rate-per-instance
instead.
The following table shows how the target capacity parameter defines the following:
- The target capacity for the whole backend
- The expected target capacity for each instance or endpoint
Backend type | Target capacity | ||
---|---|---|---|
If you specify | Whole backend capacity | Expected per instance or per endpoint capacity | |
Instance groupN instances,H healthy |
max-rate-per-instance=X
|
X × N
|
(X × N)/H
|
zonal NEGN endpoints,H healthy
|
max-rate-per-endpoint=X
|
X × N
|
(X × N)/H
|
Instance groups (except regional managed instance groups) H healthy instances
|
max-rate=Y
|
Y
|
Y/H
|
As illustrated, the max-rate-per-instance
and max-rate-per-endpoint
settings
are proxies for calculating a target maximum rate of HTTP requests for the whole
instance group or whole zonal NEG:
- In an instance group with
N
instances, settingmax-rate-per-instance=X
has the same meaning as settingmax-rate=X × N
. - In a zonal NEG with
N
endpoints, settingmax-rate-per-endpoint=X
has the same meaning as settingmax-rate=X × N
.
Utilization balancing mode
The UTILIZATION
balancing mode has no mandatory target capacity. You have a
number of options that depend on the type of backend, as summarized in
the table in the following section.
Balancing mode combinations
This table summarizes all possible balancing modes for a given load balancer and type of backend. It also shows the available or required capacity settings that you must specify with the balancing mode.
Load balancer | Type of backend | Balancing mode | Target capacity |
---|---|---|---|
Internal TCP/UDP Load Balancing | Instance group | CONNECTION |
You cannot specify a target maximum number of connections. |
Zonal NEGs (GCP_VM_IP ) |
CONNECTION |
You cannot specify a target maximum number of connections. | |
External TCP/UDP Network Load Balancing | Instance group | CONNECTION |
You cannot specify a target maximum number of connections. |
SSL Proxy Load Balancing, TCP Proxy Load Balancing | Instance group | CONNECTION |
You must specify one of the following:
|
UTILIZATION |
You can optionally specify one of the following:
|
||
Zonal NEG (GCP_VM_IP_PORT ) |
CONNECTION |
You must specify one of the following:
|
|
HTTP(S) Load Balancing, Internal HTTP(S) Load Balancing, Traffic Director | Instance group | RATE |
You must specify one of the following:
|
UTILIZATION |
You can optionally specify one of the following:
|
||
Zonal NEG (GCP_VM_IP_PORT ) |
RATE |
You must specify one of the following:
|
Capacity scaler
You can optionally adjust the capacity scaler to scale down the target capacity (max utilization, max rate, or max connections) without changing the target capacity. The capacity scaler is supported for all load balancers that support a target capacity. The only exceptions are the network load balancer and internal TCP/UDP load balancer.
By default, the value of the capacity scaler is 1.0
(100%). You can set the
capacity scaler to either of these values:
- exactly
0.0
, which will prevent all new connections - a value between
0.1
(10%) and1.0
(100%)
The following examples demonstrate how the capacity scaler works in conjunction with the target capacity setting.
If the balancing mode is
RATE
, the max-rate is set to 80 RPS, and the capacity scaler is1.0
, the effective target capacity is also 80 RPS.If the balancing mode is
RATE
, the max utilization is set to 80 RPS, and the capacity scaler is0.5
, the effective target capacity is 40 RPS (0.5 times 80
).If the balancing mode is
RATE
, the max utilization is set to 80 RPS, and the capacity scaler is0.0
, the effective target capacity is zero. A capacity scaler of zero will take the backend out of rotation.
Traffic Director and traffic distribution
Traffic Director also uses backend service resources. Specifically,
Traffic Director uses backend services whose load balancing scheme is
INTERNAL_SELF_MANAGED
. For an internal self-managed backend service, traffic
distribution is based on the combination of a load balancing mode and a
load balancing policy. The backend service directs traffic to a backend
according to the backend's balancing mode. Then Traffic Director distributes
traffic according to a load balancing policy.
Internal self-managed backend services support the following balancing modes:
UTILIZATION
, if all the backends are instance groupsRATE
, if all the backends are either instance groups or zonal NEGs
If you choose RATE
balancing mode, you must specify a maximum rate, maximum
rate per instance, or maximum rate per endpoint.
For more information about Traffic Director, see Traffic Director concepts.
Session affinity
When session affinity is disabled (--session-affinity=NONE
), by default, load
balancers distribute new requests based on a 5-tuple hash, as follows:
- Packet's source IP address
- Packet's source port
- Packet's destination IP address
- Packet's destination port
- Packet's protocol
The balancing mode of the backend instance group or zonal NEG determines when the backend is at capacity. Some applications need multiple requests from a given user to be directed to the same backend or endpoint. Such applications include stateful servers used by ads serving, games, or services with heavy internal caching.
Session affinity is available for TCP traffic, including the SSL, HTTP(S), and HTTP/2 protocols. If a backend instance or endpoint is healthy and is not at capacity (based on the balancing mode) subsequent requests go to the same backend VM or endpoint. Keep the following in mind when configuring session affinity:
Protocols other than TCP (for example, UDP) don't have a native concept of a session; however, the session affinity selection can affect non-TCP packets.
For more information, see the following pages:
When proxyless gRPC services are configured, Traffic Director does not support session affinity.
Do not rely on session affinity for authentication or security purposes. Session affinity is designed to break when a backend is at or above capacity or if it becomes unhealthy.
Google Cloud load balancers provide session affinity on a best-effort basis. Factors such as changing backend health check states or changes to backend fullness, as measured by the balancing mode, can break session affinity. Using a session affinity other than
None
with theUTILIZATION
balancing mode is not recommended. This is because changes in the instance utilization can cause the load balancing service to direct new requests or connections to backend VMs that are less full. This breaks session affinity. Instead, use either theRATE
orCONNECTION
balancing mode to reduce the chance of breaking session affinity.
The following table shows the session affinity options:
Product | Session affinity options |
---|---|
• Internal TCP/UDP Load Balancing | • None • Client IP • Client IP and protocol • Client IP, protocol, and port |
• TCP Proxy Load Balancing • SSL Proxy Load Balancing |
• None • Client IP |
• External HTTP(S) Load Balancing | • None • Client IP • Generated cookie |
• Internal HTTP(S) Load Balancing | • None • Client IP • Generated cookie • Header field • HTTP cookie |
• Network Load Balancing | • None • Client IP • Client IP and protocol • Client IP, protocol, and port |
• Traffic Director | • None • Client IP • Generated cookie (HTTP protocols only) • Header field (HTTP protocols only) • HTTP cookie (HTTP protocols only) |
The following sections discuss the different types of session affinity.
Client IP affinity
Client IP affinity directs requests from the same client IP address to the same backend instance. Client IP affinity is an option for every Google Cloud load balancer that uses backend services.
When you use client IP affinity, keep the following in mind:
Client IP affinity is a two-tuple hash consisting of the client's IP address and the IP address of the load balancer's forwarding rule that the client contacts.
The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy. Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address. This can cause incoming traffic to clump unnecessarily onto the same backend instances.
If a client moves from one network to another, its IP address changes, resulting in broken affinity.
Generated cookie affinity
When you set generated cookie affinity, the load balancer issues a cookie on the first request. For each subsequent request with the same cookie, the load balancer directs the request to the same backend VM or endpoint.
- For external HTTP(S) load balancers, the cookie is named
GCLB
. - For internal HTTP(S) load balancers and Traffic Director, the cookie is named
GCILB
.
Cookie-based affinity can more accurately identify a client to a load balancer, compared to client IP-based affinity. For example:
With cookie-based affinity, the load balancer can uniquely identify two or more client systems that share the same source IP address. Using client IP-based affinity, the load balancer treats all connections from the same source IP address as if they were from the same client system.
If a client changes its IP address, cookie-based affinity lets the load balancer recognize subsequent connections from that client instead of treating the connection as new. An example of when a client changes its IP address is when a mobile device moves from one network another.
When a load balancer creates a cookie for generated cookie-based affinity, it
sets the path
attribute of the cookie to /
. If the URL map's path matcher
has multiple backend service for a host name, all backend services share the
same session cookie.
The lifetime of the HTTP cookie generated by the load balancer is
configurable. You can set it to 0
(default), which means the cookie is only
a session cookie. Or you can set the lifetime of the cookie to a value from
1
to 86400
seconds (24 hours) inclusive.
Header field affinity
An internal HTTP(S) load balancer can use header field affinity when both of the following are true:
- The load balancing locality policy is RING_HASH or MAGLEV.
- The backend service's consistent hash specifies the name of the HTTP header.
Header field affinity routes requests to backend VMs or endpoints in a zonal NEG
based on the value of the HTTP header named in the --custom-request-header
flag.
For more information about Internal HTTP(S) Load Balancing, in which header field affinity is used, see Internal HTTP(S) Load Balancing overview.
HTTP cookie affinity
An internal HTTP(S) load balancer can use HTTP cookie affinity when both of the following are true:
- The load balancing locality policy is RING_HASH or MAGLEV.
- The backend service's consistent hash specifies the name of the HTTP cookie.
HTTP cookie affinity routes requests to backend VMs or endpoints in a NEG based
on the HTTP cookie named in the HTTP_COOKIE flag. If the client does not provide
the cookie, the proxy generates the cookie and returns it to the client in a
Set-Cookie
header.
For more information about Internal HTTP(S) Load Balancing, in which HTTP cookie affinity is used, see Internal HTTP(S) Load Balancing overview.
Losing session affinity
Regardless of the type of affinity chosen, a client can lose affinity with a backend in the following situations:
- If the backend instance group or zonal NEG runs out of capacity, as defined by the balancing mode's target capacity. In this situation, Google Cloud directs traffic to a different backend instance group or zonal NEG, which might be in a different zone. You can mitigate this by ensuring that you specify the correct target capacity for each backend based on your own testing.
- Autoscaling adds instances to, or removes instances from, a managed instance group. When this happens, the number of instances in the instance group changes, so the backend service recomputes hashes for session affinity. You can mitigate this by ensuring that the minimum size of the managed instance group can handle a typical load. Autoscaling is then only performed during unexpected increases in load.
- If a backend VM or endpoint in a NEG fails health checks, the load balancer directs traffic to a different healthy backend. Refer to the documentation for each Google Cloud load balancer for details about how the load balancer behaves when all of its backends fail health checks.
- When the
UTILIZATION
balancing mode is in effect for backend instance groups, session affinity breaks because of changes in backend utilization. You can mitigate this by using theRATE
orCONNECTION
balancing mode, whichever is supported by the load balancer's type.
When you use HTTP(S) Load Balancing, SSL Proxy Load Balancing, or TCP Proxy Load Balancing, keep the following additional points in mind:
- If the routing path from a client on the internet to Google changes between requests or connections, a different Google Front End (GFE) might be selected as the proxy. This can break session affinitity.
- When you use the
UTILIZATION
balancing mode — especially without a defined target maximum target capacity — session affinity is likely to break when traffic to the load balancer is low. Switch to usingRATE
orCONNECTION
balancing mode, as supported by your chosen load balancer.
Backend service timeout
Most Google Cloud load balancers have a backend service timeout. The default value is 30 seconds. The full range of timeout values allowed is 1 - 2,147,483,647 seconds.
For external HTTP(S) load balancers and internal HTTP(S) load balancers using the HTTP, HTTPS, or HTTP/2 protocol, the backend service timeout is a request/response timeout for HTTP(S) traffic. This is the amount of time that the load balancer waits for a backend to return a full response to a request. For example, if the value of the backend service timeout is the default value of 30 seconds, the backends have 30 seconds to deliver a complete response to requests. The load balancer retries the HTTP GET request once if the backend closes the connection or times out before sending response headers to the load balancer. If the backend sends response headers (even if the response body is otherwise incomplete) or if the request sent to the backend is not an HTTP GET request, the load balancer does not retry. If the backend does not reply at all, the load balancer returns an HTTP
5xx
response to the client. To change the alloted time for backends to respond to requests, change the timeout value.For HTTP traffic, the maximum amount of time for the client to complete sending its request is equal to the backend service timeout. If you see HTTP
408
responses with thejsonPayload.statusDetail
client_timed_out
, this means that there was insufficient progress while the request from the client was proxied or the response from the backend was proxied. If the problem is because of clients that are experiencing performance issues, you can resolve this issue by increasing the backend service timeout.For external HTTP(S) load balancers and internal HTTP(S) load balancers, if the HTTP connection is upgraded to a WebSocket, the backend service timeout defines the maximum amount of time that a WebSocket can be open, whether idle or not.
For SSL proxy load balancers and TCP proxy load balancers, the timeout is an idle timeout. To allow more or less time before the connection is deleted, change the timeout value. This idle timeout is also used for WebSocket connections.
For internal TCP/UDP load balancers and network load balancers, you can set the value of the backend service timeout using
gcloud
or the API, but the value is ignored. Backend service timeout has no meaning for these pass-through load balancers.When proxyless gRPC services are configured, Traffic Director does not support the backend service timeout.
Health checks
Each backend service whose backends are instance groups or zonal NEGs must have an associated health check. Backend services using a serverless NEG or an internet NEG as a backend must not reference a health check.
When you create a load balancer using the Google Cloud Console, you can create the health check, if it is required, when you create the load balancer, or you can reference an existing health check.
When you create a backend service using either instance group or zonal NEG
backends using the gcloud
command-line tool or the API, you must reference an
existing health check. Refer to the load balancer
guide in the Health
Checks Overview for details about the type and scope of health check required.
For more information, read the following documents:
Additional features enabled on the backend service resource
The following optional Google Cloud features are available for backend services used by an external HTTP(S) load balancer. They are not discussed in this document, but are discussed on the following pages:
- Google Cloud Armor provides protection against DDoS and other attacks with security policies.
- Cloud CDN is a low-latency content delivery system.
- Creating custom headers are additional headers that the load balancer adds to requests.
- IAP lets you do the following:
- Require authentication with a Google Account with OAuth 2.0 sign-in
- Control access by using Identity and Access Management permissions
Other notes
The following features are supported only with internal HTTP(S) load balancers and Traffic Director; however, they are not supported when you use proxyless gRPC services with Traffic Director.
- Circuit breaking
- Outlier detection
- Load balancing policies
What's next
For related documentation and information about how backend services are used in load balancing, review the following:
- Creating custom headers
- Creating an HTTP(S) load balancer
- Conceptual information about HTTP(S) Load Balancing
- Enabling Connection Draining
- Encryption in Transit in Google Cloud