Understanding backend services

A backend service is a resource with fields containing configuration values for the following Google Cloud load balancing services:

  • External HTTP(S) Load Balancing
  • Internal HTTP(S) Load Balancing
  • SSL Proxy Load Balancing
  • TCP Proxy Load Balancing
  • Internal TCP/UDP Load Balancing

Network Load Balancing does not use a backend service.

The load balancers use the configuration information in the backend service resource for the following functions:

  • To direct traffic to the correct backends, which are instance groups or network endpoint groups
  • To distribute traffic according to a balancing mode. The balancing mode is defined in the backend service for each backend.
  • To monitor backend health by using the health check designated in the backend service
  • To maintain session affinity


The number of backend services per load balancer depends on the load balancer type:

Load balancer type Number of backend services
External HTTP(S) Load Balancing Multiple
Internal HTTP(S) Load Balancing Multiple
SSL Proxy Load Balancing 1
TCP Proxy Load Balancing 1
Internal TCP/UDP Load Balancing 1

Each backend service contains one or more backends.

For a given backend service, all backends must either be instance groups or network endpoint groups. You can associate different types of instance groups (for example, managed and unmanaged instance groups) with the same backend service, but you cannot associate instance groups and network endpoint groups with the same backend service.

Backend service settings

Each backend service has the following configurable settings:

  • Session affinity (optional). Normally, load balancers use a hash algorithm to distribute requests among available instances. In normal use, the hash is based on the source IP address, destination IP address, source port, destination port, and protocol (a 5-tuple hash). Session affinity adjusts what is included in the hash and attempts to send all requests from the same client to the same virtual machine instance.
  • The backend service timeout. This value is interpreted in different ways depending on the type of load balancer and protocol used:
    • For an HTTP(S) load balancer, the backend service timeout is a request/response timeout, except for connections that are upgraded to use the Websocket protocol.
    • When sending WebSocket traffic to an HTTP(S) load balancer, the backend service timeout is interpreted as the maximum amount of time that a WebSocket, idle or active, can remain open.
    • For an SSL proxy or TCP proxy load balancer,the backend service timeout is interpreted as an idle timeout for all traffic.
    • For an internal TCP/UDP load balancer, the backend service timeout parameter is ignored.
  • A Health check. The health checker polls instances attached to the backend service at configured intervals. Instances that pass the health check are allowed to receive new requests. Unhealthy instances are not sent requests until they are healthy again.

See the backend service API resource or the gcloud command-line tool user guide for descriptions of the properties that are available when working with backend services.


You can add multiple backends to a single backend service. Each backend is a resource to which a Google Cloud load balancer distributes traffic. There are three different types of resources that can be used as backends:

For a given backend service, the backends must either all be instance groups, or, if supported, NEGs or backend buckets. You cannot use different types of backends with the same backend service. Additionally:

  • Backends for internal TCP/UDP load balancers only support instance group backends.
  • If an HTTP(S) load balancer has two (or more) backend services, you can use instance groups as backends for one backend service and NEGs as backends for the other backend service.

Backends and external IP addresses

The backend VMs do not need external IP addresses:

  • For HTTP(S), SSL Proxy, and TCP Proxy load balancers: Clients communicate with a Google Front End (GFE) using your load balancer's external IP address. The GFE communicates with backend VMs using the internal IP addresses of their primary network interface. Because the GFE is a proxy, the backend VMs themselves do not require external IP addresses.

  • For network load balancers: Network load balancers route packets using bidirectional network address translation (NAT). When backend VMs send replies to clients, they use the external IP address of the load balancer's forwarding rule as the source IP address.

  • For internal load balancers: Backend VMs for an internal load balancer do not need external IP addresses.

Traffic distribution

The values of the following fields in the backend services resource determine some aspects of the backend's behavior:

  • A balancing mode, which tells the load balancing system how to determine when the backend is at full usage. If all backends for the backend service in a region are at full usage, new requests are automatically routed to the nearest region that can still handle requests. The balancing mode can be based on connections, backend utilization, or requests per second (rate).
  • A capacity setting. Capacity is an additional control that interacts with the balancing mode setting. For example, if you normally want your instances to operate at a maximum of 80% backend utilization, you would set your balancing mode to backend utilization and your capacity to 80%. If you want to cut instance utilization in half, you could leave the capacity at 80% backend utilization and set capacity scaler to 0.5. To drain the backend service, set capacity scaler to 0 and leave the capacity as is. For more information about capacity and backend utilization, read Scaling Based on CPU or Load Balancing Serving Capacity.

Note the following:

  • If the average utilization of all instances in backend instance groups connected to the same backend service is less than 10%, GCP might prefer specific zones. This can happen when you use managed regional instance groups, managed zonal instance groups, and unmanaged zonal instance groups. This zonal imbalance will automatically resolve itself as more traffic is sent to the load balancer. The backend services in other regions does not affect any of this.

Traffic Director also uses backend service resources. Specifically, Traffic Director uses backend services whose load balancing scheme is INTERNAL_SELF_MANAGED. For an internal self managed backend service, traffic distribution is accomplished by using a combination of a load balancing mode and a load balancing policy. The backend service directs traffic to a backend (instance group or NEG) according to the backend's balancing mode, then, once a backend has been selected, Traffic Director distributes traffic according to a load balancing policy.

Internal self managed backend services support the following balancing modes:

  • UTILIZATION, if all the backends are instance groups
  • RATE, if all the backends are either instance groups or NEGs

If you choose RATE balancing mode, you must specify a maximum rate per backend, instance, or endpoint.

Protocol to the backends

When you create a backend service, you must specify a protocol used for communication with its backends. A backend service can only use one protocol. You cannot specify a secondary protocol to use as a fallback.

The available protocols are:

  • HTTP
  • HTTP/2
  • SSL
  • TCP
  • UDP

Which protocol is valid depends on the type of load balancer you create, including its load balancing scheme. Refer to the documentation for each type of load balancer for more information about which protocols can be used for its backend services.

HTTP/2 as a protocol to the backends is also available for load balancing with Ingress.

Changing a backend service's protocol makes the backends inaccessible through load balancers for a few minutes.

Instance groups

Backend services and autoscaled managed instance groups

Autoscaled managed instance groups are useful if you need many machines all configured the same way, and you want to automatically add or remove instances based on need.

The autoscaling percentage works with the backend service balancing mode. For example, suppose you set the balancing mode to a backend utilization of 80% and leave the capacity scaler at 100%, and you set the Target load balancing usage in the autoscaler to 80%. Whenever the backend utilization of the group rises above 64% (80% of 80%), the autoscaler will instantiate new instances from the template until usage drops down to about 64%. If the overall usage drops below 64%, the autoscaler will remove instances until usage gets back to 64%.

New instances have a cooldown period before they are considered part of the group, so it's possible for traffic to exceed the backend service's 80% backend utilization during that time, causing excess traffic to be routed to the next available backend service. Once the instances are available, new traffic will be routed to them. Also, if the number of instances reaches the maximum permitted by the autoscaler's settings, the autoscaler will stop adding instances no matter what the usage is. In this case, extra traffic will be load balanced to the next available region.

Configuring autoscaled managed instance groups

To configure autoscaled managed instance groups, perform the following steps:

  1. Create an instance template for your instance group.
  2. Create a managed instance group and assign the template to it.
  3. Turn on autoscaling based on load balancing serving capacity.

Restrictions and guidance for instance groups

Because Cloud Load Balancing offers a great deal of flexibility in how you configure load balancing, it is possible to create configurations that do not behave well. Keep the following restrictions and guidance in mind when creating instance groups for use with load balancing.

  • Do not put a virtual machine instance in more than one instance group.
  • Do not delete an instance group if it is being used by a backend.
  • Your configuration will be simpler if you do not add the same instance group to two different backends. If you do add the same instance group to two backends:
    • Both backends must use the same balancing mode, either UTILIZATION or RATE.
    • You can use maxRatePerInstance and maxRatePerGroup together. It is acceptable to set one backend to use maxRatePerInstance and the other to maxRatePerGroup.
    • If your instance group serves two or more ports for several backends respectively, you have to specify different port names in the instance group.
  • All instances in a managed or unmanaged instance group must be in the same VPC network and, if applicable, the same subnet.
  • If you are using a managed instance group with autoscaling, do not use the maxRate balancing mode in the backend service. You may use either the maxUtilization or maxRatePerInstance mode.
  • Do not make an autoscaled managed instance group the target of two different load balancers.
  • When resizing a managed instance group, the maximum size of the group should be smaller than or equal to the size of subnet.

Network endpoint groups

A network endpoint is a combination of an IP address and a port, specified in one of two ways:

  • By specifying an IP address:port pair, such as
  • By specifying a network endpoint IP address only. The default port for the NEG is automatically used as the port of the IP address:port pair.

Network endpoints represent services by their IP address and port, rather than referring to a particular VM. A network endpoint group (NEG) is a logical grouping of network endpoints.

A backend service that uses network endpoint groups as its backends distributes traffic among applications or containers running within VM instances. For more information, see Network Endpoint Groups in Load Balancing Concepts.

Session affinity

Without session affinity, load balancers distribute new requests according to the balancing mode of the backend instance group or NEG. Some applications – such as stateful servers used by ads serving, games, or services with heavy internal caching – need multiple requests from a given user to be directed to the same instance.

Session affinity makes this possible, identifying TCP traffic from the same client based on parameters such as the client's IP address or the value of a cookie, directing those requests to the same backend instance if the backend is healthy and has capacity (according to its balancing mode).

Session affinity has little meaningful effect on UDP traffic, because a session for UDP is a single request and response.

Session affinity can break if the instance becomes unhealthy or overloaded, so you should not assume perfect affinity.

For HTTP(S) Load Balancing, session affinity works best with the RATE balancing mode.

Different load balancers support different session affinity options, as summarized in the following table:

Load balancer Session affinity options
Internal • None
• Client IP
• Client IP and protocol
• Client IP, protocol, and port
TCP Proxy
SSL Proxy
• None
• Client IP
HTTP(S) • None
• Client IP
• Generated cookie
Network Network Load Balancing doesn't use backend services. Instead, you set session affinity for network load balancers through target pools. See the sessionAffinity parameter in Target Pools.

The following sections discuss two common types of session affinity.

Using client IP affinity

Client IP affinity directs requests from the same client IP address to the same backend instance based on a hash of the client's IP address. Client IP affinity is an option for every Google Cloud load balancer that uses backend services.

When using client IP affinity, keep the following in mind:

  • The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy. Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address. This can cause incoming traffic to clump unnecessarily onto the same backend instances.

  • If a client moves from one network to another, its IP address changes, resulting in broken affinity.


To set client IP affinity:

  1. In the Google Cloud Console, go to the Backend configuration portion of the load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for a Backend service.
  5. In the Edit backend service dialog box, select Client IP from the Session affinity drop-down menu.
    This action enables client IP session affinity. The Affinity cookie TTL field is grayed out as it has no meaning for client IP affinity.
  6. Click the Update button for the Backend service.
  7. Click the Update button for the load balancer.


You can use the create command to set session affinity for a new backend service, or the update command to set it for an existing backend service. This example shows using it with the update command.

gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
    --session-affinity client_ip


Consult the API reference for backend services.

When you set generated cookie affinity, the load balancer issues a cookie on the first request. For each subsequent request with the same cookie, the load balancer directs the request to the same backend VM or endpoint.

  • For external HTTP(S) load balancers, the cookie is named GCLB.
  • For internal HTTP(S) load balancers and Traffic Director, the cookie is named GCILB.

Cookie-based affinity can more accurately identify a client to a load balancer, compared to client IP-based affinity. For example:

  1. With cookie-based affinity, the load balancer can uniquely identify two or more client systems that share the same source IP address. Using client IP-based affinity, the load balancer treats all connections from the same source IP address as if they were from the same client system.

  2. If a client changes its IP address - for example, a mobile device moving from network to network - cookie-based affinity allows the load balancer to recognize subsequent connections from that client instead of treating the connection as new.

When a load balancer creates a cookie for generated cookie-based affinity, it sets the path attribute of the cookie to /. If the load balancer's URL map has a path matcher that specifies more than one backend service for a given host name, all backend services using cookie-based session affinity share the same session cookie.

The lifetime of the HTTP cookie generated by the load balancer is configurable. You can set it to 0 (default), which means the cookie is only a session cookie, or you can set the lifetime of the cookie to a value from 1 to 86400 seconds (24 hours) inclusive.


To set generated cookie affinity:

  1. In the Google Cloud Console, you can modify Generated Cookie Affinity in the Backend configuration portion of the HTTP(S) load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for a Backend service.
  5. Select Generated cookie from the Session affinity drop-down menu to select Generated Cookie Affinity.
  6. In the Affinity cookie TTL field, set the cookie's lifetime in seconds.
  7. Click the Update button for the Backend service.
  8. Click the Update button for the load balancer.


Turn on generated cookie affinity by setting --session-affinity to generated_cookie and setting --affinity-cookie-ttl to the cookie lifetime in seconds. You can use the create command to set it for a new backend service, or the update command to set it for an existing backend service. This example shows using it with the update command.

gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
    --session-affinity generated_cookie \
    --affinity-cookie-ttl 86400


Consult the API reference for backend services.

Disabling session affinity

You can turn off session affinity by updating the backend service and setting session affinity to none, or you can edit the backend service and set session affinity to none in a text editor. You can also use either command to modify the cookie lifetime.


To disable session affinity:

  1. In the Google Cloud Console, you can disable session affinity in the Backend configuration portion of the load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for a Backend service.
  5. Select None from the Session affinity drop-down menu to turn off session affinity.
  6. Click the Update button for the Backend service.
  7. Click the Update button for the load balancer.


To disable session affinity run the following command:

  gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
  --session-affinity none


gcloud compute backend-services edit [BACKEND_SERVICE_NAME]


Consult the API reference for backend services.

Losing session affinity

Regardless of the type of affinity chosen, a client can lose affinity with the instance in the following scenarios.

  • The instance group runs out of capacity, and traffic has to be routed to a different zone. In this case, traffic from existing sessions may be sent to the new zone, breaking affinity. You can mitigate this by ensuring that your instance groups have enough capacity to handle all local users.
  • Autoscaling adds instances to, or removes instances from, the instance group. In either case, the backend service reallocates load, and the target may move. You can mitigate this by ensuring that the minimum number of instances provisioned by autoscaling is enough to handle expected load, then only using autoscaling for unexpected increases in load.
  • The target instance fails health checks. Affinity is lost as the session is moved to a healthy instance.
  • The balancing mode is set to backend utilization, which may cause your computed capacities across zones to change, sending some traffic to another zone within the region. This is more likely at low traffic when computed capacity is less stable.
  • When the backends are in multiple cloud regions and your client routing is designed so that the first and subsequent requests in a connection egress from different geographical locations, you might lose session affinity. This is because the additional requests might get routed to a different cloud region that is determined by its new egress location.

Configuring the timeout setting

For longer-lived connections to the backend service from the load balancer, configure a timeout setting longer than the 30-second default.


To configure the timeout setting:

  1. In the Google Cloud Console, you can modify the timeout setting in the Backend configuration portion of the HTTP(S) load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for the Backend service.
  5. On the line for Protocol, Port, and Timeout settings, select the Edit pencil.
  6. Enter a new Timeout Setting in seconds.
  7. Click the Update button for the Backend service.
  8. Click the Update button for the load balancer.


To change the timeout setting with the gcloud command-line tool, use the `gcloud compute backend-services update' command. Append the command with --help for detailed information.

gcloud compute backend-services update [BACKEND_SERVICE] [--timeout=TIMEOUT]


Consult the REST API reference for backend services.

Named ports

For internal HTTP(S), external HTTP(S), SSL Proxy, and TCP Proxy load balancers, backend services must have an associated named port if their backends are instance groups. The named port informs the load balancer that it should use that configured named port on the backend instance group, which translates that to a port number. This is the port that the load balancer uses to connect to the backend VMs, which can be different from the port that clients use to contact the load balancer itself.

Named ports are key-value pairs representing a service name and a port number on which a service is running. The key-value pair is defined on an instance group. When a backend service uses that instance group as a backend, it can "subscribe" to the named port:

  • Each instance group can have up to five named ports (key-value pairs) defined.
  • Each backend service for an HTTP(S), SSL Proxy, or TCP Proxy load balancer using instance group backends can only "subscribe" to a single named port.
  • When you specify a named port for a backend service, all of the backend instance groups must have at least one named port defined that uses that same name.

Named ports cannot be used under these circumstances:

  • For NEG backends: NEGs define ports per endpoint, and there's no named port key-value pair associated with a NEG.
  • For internal TCP/UDP load balancers: Because internal TCP/UDP load balancers are pass-through load balancers (not proxies), their backend services do not support setting a named port.

Health checks

Each backend service must have a Health Check associated with it. The health check must exist before you create the backend service.

A health check runs continuously and its results help determine which instances are able to receive new requests.

Unhealthy instances do not receive new requests and continue to be polled. If an unhealthy instance passes a health check, it is deemed healthy and begins receiving new connections.

For more information, read the following documents:

Viewing the results of a backend services health check

After you create your health checks and backend service, you can view the health check results.


To view the result of a health check on a backend service:

  1. Go to the load balancing summary page.
    Go to the Load balancing page
  2. Click the name of a load balancer.
  3. Under Backend, for a Backend service, view the Healthy column in the Instance group table.


To view the results of the latest health check with the gcloud command-line tool, use the backend-services get-health command.

gcloud compute backend-services get-health [BACKEND_SERVICE]

The command returns a healthState value for all instances in the specified backend service, with a value of either HEALTHY or UNHEALTHY:

    - healthState: UNHEALTHY
      instance: us-central1-b/instances/www-video1
    - healthState: HEALTHY
      instance: us-central1-b/instances/www-video2
  kind: compute#backendServiceGroupHealth


For API commands, see the Health Checks page.

Additional features enabled on the backend service resource

The following optional Google Cloud features, which are enabled using the backend service resource, are not discussed in this document:

Other notes

Changes to your backend services are not instantaneous. It can take several minutes for changes to propagate throughout the network.

The following Traffic Director features are not supported with Google Cloud load balancers:

  • Circuit breaking
  • Outlier detection
  • Load balancing policies
  • HTTP cookie-based session affinity
  • HTTP header-based session affinity

What's next

For related documentation and information on how backend services are used in load balancing, review the following: