Backend Services

An HTTP(S) load balancing backend service is a centralized service for managing backends, which in turn manage instances that handle user requests. You configure your load balancing service to route requests to your backend service. The backend service in turn knows which instances it can use, how much traffic they can handle, and how much traffic they are currently handling. In addition, the backend service monitors health checking and does not send traffic to unhealthy instances.

Backend service components

A configured backend service contains one or more backends. Each backend service contains the following:

  • A health check. The health checker polls instances attached to the backend service at configured intervals. Instances that pass the health check are allowed to receive new requests. Unhealthy instances are not sent requests until they are healthy again.
  • Session affinity (optional). Normally, HTTP(S) load balancing uses a round-robin algorithm to distribute requests among available instances. This can be overridden with session affinity. Session affinity attempts to send all request from the same client to the same virtual machine instance.
  • One or more backends. A backend contains the following:
    • An Instance Group containing virtual machine instances. The instance group may be a Managed Instance Group (with or without Autoscaling) or an Unmanaged Instance Group. A backend cannot be added to a backend service if it doesn't contain an instance group.
    • A balancing mode, which tells the load balancing system how to determine when the backend is at full usage. If all the backends for the backend service in a region are at full usage, new requests are automatically routed to the nearest region that can still handle requests. The balancing mode can be based on CPU Utilization or Rate (requests per second (RPS)).
    • A capacity setting. Capacity is an additional control that interacts with the balancing mode setting. For example, if you normally want your instances to operate at a maximum of 80% CPU utilization, you would set your balancing mode to 80% CPU utilization and your capacity to 100%. If you want to cut instance utilization in half, you could leave the balancing mode at 80% CPU utilization and set Capacity to 50%. To drain the backend service, set Capacity to 0% and leave the balancing mode as is.

A backend service may only have up to 500 endpoints (IP address and port pairs) in a given zone.

See the backend service API resource or the gcloud user guide for descriptions of the properties that are available when working with backend services.

Backend services and regions

HTTP(S) load balancing is a global service. You may have more than one backend service in a region, and you may assign backend services to more than one region, all serviced by the same global load balancer. Traffic is allocated to backend services as follows:

  1. When a user request comes in, the load balancing service determines the approximate origin of the request from the source IP address.
  2. The load balancing service knows the locations of the instances owned by the backend service, their overall capacity, and their overall current usage.
  3. If the closest instances to the user have available capacity, then the request is forwarded to that closest set of instances.
  4. Incoming requests to the given region are distributed evenly across all available backend services and instances in that region. However, at very small loads, the distribution may appear to be uneven.
  5. If there are no healthy instances with available capacity in a given region, the load balancer instead sends the request to the next closest region with available capacity.

Session affinity

By default, HTTP(S) load balancing distributes requests evenly among available instances. However, some applications, such as stateful servers used by ads serving, games or services with heavy internal caching, need multiple requests from a given user to end up on the same instance. Session affinity makes this possible, identifying requests from a user by the client IP or the value of a cookie and directing such requests to a consistent instance as long as that instance is healthy and has capacity. Affinity can break if the instance becomes unhealthy or overloaded, so your system must not assume perfect affinity.

Setting session affinity

HTTP(S) load balancing offers the following types of session affinity: client IP affinity and generated cookie affinity.

Client IP affinity

Client IP affinity directs requests from the same client IP address to the same instance based on a hash of the IP address. This is simple and does not involve a user cookie. However, because of NATs, CDNs, and other internet routing technologies, sometimes requests from multiple independent users can look as if they come from the same client, causing many users to clump unnecessarily onto the same instances. In addition, clients who move from one network to another may change IP address, thus losing affinity.


In the Google Cloud Platform Console, you can modify session affinity in the Backend configuration portion of the HTTP(S) load balancer page.
Go to the Load balancing page

Select Client IP from the Session affinity pull-down menu to turn on client IP session affinity. Ignore the Affinity cookie TTL field as this has no meaning with client IP affinity.


You can use the create command to set session affinity for a new backend service, or the update command to set it for an existing backend service. This example shows using it with the update command.

gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
    --session-affinity client_ip

With generated cookie affinity, the load balancer issues a cookie named GCLB on the first request and then directs each subsequent request that has the same cookie to the same instance. Cookie-based affinity allows the load balancer to distinguish different clients using the same IP address so it can spread those clients across the instances more evenly. Cookie-based affinity allows the load balancer to maintain instance affinity even when the client’s IP address changes.

The path of the cookie is always /, so if different backend services on the same hostname both enable cookie-based affinity, they will both be balanced by the same cookie.

The lifetime of the HTTP cookie generated by the load balancer is configurable. It can be set to 0 (default), which means the cookie is only a session cookie, or it can have a lifetime of 1 to 86400 seconds (24 hours).


In the Google Cloud Platform Console, you can modify session affinity in the Backend configuration portion of the HTTP(S) load balancer page.
Go to the Load balancing page

Select Generated cookie from the Session affinity pull-down menu to turn on generated cookie affinity. In the Affinity cookie TTL field, set the cookie's lifetime in seconds.


Turn on generated cookie affinity by setting --session-affinity to generated_cookie and setting --affinity-cookie-ttl to the cookie lifetime in seconds. You can use the create command to set it for a new backend service, or the update command to set it for an existing backend service. This example shows using it with the update command.

gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
    --session-affinity generated_cookie \
    --affinity-cookie-ttl 86400

Disabling session affinity

You can turn off session affinity by updating the backend service and setting session affinity to none, or you can edit the backend service and set session affinity to none in a text editor. You can also use either command to modify the cookie lifetime.


In the Google Cloud Platform Console, you can modify session affinity in the Backend configuration portion of the HTTP(S) load balancer page.
Go to the Load balancing page

Select None from the Session affinity pull-down menu to turn off session affinity.


gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
    --session-affinity none

gcloud compute backend-services edit [BACKEND_SERVICE_NAME]

Losing session affinity

Regardless of the type of affinity chosen, a client can lose affinity with the instance in the following scenarios.

  • The instance group runs out of capacity, and traffic has to be routed to a different zone. In this case, traffic from existing sessions may be sent to the new zone, breaking affinity. You can mitigate this by ensuring that your instance groups have enough capacity to handle all local users.
  • Autoscaling adds instances to, or removes instances from, the instance group. In either case, the backend service reallocates load, and the target may move. You can mitigate this by ensuring that the minimum number of instances provisioned by autoscaling is enough to handle expected load, then only using autoscaling for unexpected increases in load.
  • The target instance fails health checks. Affinity is lost as the session is moved to a healthy instance.

Backend services and autoscaled managed instance groups

Autoscaled managed instance groups are useful if you need many machines all configured the same way, and you want to automatically add or remove instances based on need. First, create an instance template for your instance group. Then, create a managed instance group and assign the template to it. Lastly, turn on autoscaling for the managed instance group and select HTTP load balancing usage under Autoscale based on, then set a Target load balancing usage percentage.

The autoscaling percentage works with the backend service balancing mode. If, for example, you set the balancing mode to a CPU utilization of 80% and leave the capacity scaler at 100%, and you set the Target load balancing usage in the autoscaler to 80%, whenever CPU utilization of the group rises above 64% (80% of 80%), the autoscaler will instantiate new instances from the template until usage drops down to about 64%. If the overall usage drops below 64%, the autoscaler will remove instances until usage gets back to 64%.

New instances have a cooldown period before they are considered part of the group, so it's possible for traffic to exceed the backend service's 80% CPU utilization during that time, causing excess traffic to be routed to the next available backend service. Once the instances are available, new traffic will be routed to them. Also, if the number of instances reaches the maximum permitted by the autoscaler's settings, the autoscaler will stop adding instances no matter what the usage is. In this case, extra traffic will be load balanced to the next available region.

Restrictions and guidance

Because Compute Engine offers a great deal of flexibility in how you configure load balancing, it is possible to create configurations that do not behave well. Please keep the following restrictions and guidance in mind when creating instance groups for use with load balancing.

  • Do not put a virtual machine instance in more than one instance group.
  • Do not delete an instance group if it is being used by a backend.
  • Your configuration will be simpler if you do not add the same instance group to two different backends. If you do add the same instance group to two backends:
    • Both backends must use the same balancing mode, either UTILIZATION or RATE.
    • You can use maxRatePerInstance and maxRatePerGroup together. It is acceptable to set one backend to use maxRatePerInstance and the other to maxRatePerGroup.
    • If your instance group serves two or more ports for several backends respectively, you have to specify different port names in the instance group.
  • All instances in a managed or unmanaged instance group must be in the same network and, if applicable, the same subnetwork.
  • If you are using a managed instance group with autoscaling, do not use the maxRate balancing mode in the backend service. You may use either the maxUtilization or maxRatePerInstance mode.
  • Do not make an autoscaled managed instance group the target of two different load balancers.
  • When resizing a managed instance group, the maximum size of the group should be smaller than or equal to the size of subnetwork.

Create a backend service

Backend services need an HTTP or HTTPS health check. If you do not wish to use the default health check, create your own health check before creating the backend service. Use an HTTP health check if the traffic between the load balancing service and the instance is HTTP. Use an HTTPS health check if the traffic between the load balancing service and the instance is HTTPS.

To create a backend service with gcloud compute, use the backend-services create command. This creates the backend service, but does not fully define it. The next command lets you add a backend and specify an instance group to handle requests sent to this service.

[PORT_NAME] is the name of a service already specified in one or more of the instance groups that will be added to the backend service. This name has been mapped to a port number in the instance group. If neither --port-name nor --protocol are specified, the default value of http (port 80) is used for HTTP connection, and https (port 443) is used for HTTPS connections. You must add http or https as a service to your instance group if you are not specifying a different port name.

[TIMEOUT] is the amount of time to wait for a backend to respond to a request before considering the request failed. For example, specifying 10s gives backends 10 seconds to respond to requests. Valid units for this flag are s for seconds, m for minutes, and h for hours.

See Session affinity for information about --session-affinity and --affinity-cookie-ttl parameters.


gcloud compute backend-services create [BACKEND_SERVICE_NAME] \
  --http-health-checks [HTTP_HEALTH_CHECK] \
  [--protocol [PROTOCOL_TYPE]; default="http"]
  [--description [DESCRIPTION]] \
  [--port-name [PORT_NAME]; default="http"] \
  [--timeout [TIMEOUT]; default="30s"]


gcloud compute backend-services create [BACKEND_SERVICE_NAME] \
  --https-health-checks [HTTPS_HEALTH_CHECK] \
  --protocol "https" \
  [--description [DESCRIPTION]] \
  [--port-name [PORT_NAME]; default is "https" when --protocol is set to "https"] \
  [--timeout [TIMEOUT]; default="30s"]

Add instance groups to a backend service

To define the instances that are included in a backend service, you must add a backend and assign an instance group to it. You must create the instance group before you add it to the backend.

When adding the instance group to the backend, you must also define certain parameters. These parameters govern whether the load balancer considers the instance group to have enough capacity to handle new requests. If the group is at capacity, additional requests are automatically sent to the next-closest region that has capacity. In addition, if you are using autoscaling, these values affect when the autoscaler adds or removes instances.

  • --balancing-mode: Defines whether usage is determined by CPU utilization (UTILIZATION) or by number of incoming requests per second (RATE).
  • --max-utilization: If you set --balancing-mode to UTILIZATION, sets the maximum average CPU utilization of the backend service. Acceptable values are 0.0 (0%) through 1.0 (100%). Not valid if balancing mode is set to RATE.
  • --max-rate and --max-rate-per-instance: Sets the maximum number of requests per second that can be sent to the instance group (--max-rate) or the maximum number of requests per second that can be sent to each instance (--max-rate-per-instance). These settings are mutually exclusive. However, one of them can be set even if --balancing-mode is set to UTILIZATION. If either --max-rate or --max-rate-per-instance is set and --balancing-mode is set to RATE, then only that value is considered when judging capacity. If either --max-rate or --max-rate-per-instance is set and --balancing-mode is set to UTILIZATION, then instances are judged to be at capacity when either the UTILIZATION or RATE value is reached. --max-rate must not be used with Autoscaled Managed Instance Groups.
  • --capacity-scaler: An additional setting that applies to either balancing mode. This value is multiplied by the rate or utilization value to set the current max usage of the instance group. Acceptable values are 0.0 (0%) through 1.0 (100%). Setting this value to 0.0 (0%) drains the backend service. Note that draining a backend service only prevents new connections to instances in the group. All existing connections are allowed to continue until they close by normal means.

Use the backend-services add-backend command to add an instance group to a backend service. Specify the name of the instance group with the --instance-group flag:

gcloud compute backend-services add-backend [BACKEND_SERVICE_NAME] \
  --instance-group [INSTANCE_GROUP] \
  [--balancing-mode [BALANCING_MODE]] \
  [--capacity-scaler [CAPACITY_SCALER]] \
  [--description [DESCRIPTION]] \
  [--max-rate [MAX_RATE] | --max-rate-per-instance [MAX_RATE_PER_INSTANCE]] \
  [--max-utilization [MAX_UTILIZATION]]

To create a backend service with the API, send a POST request and specify the fully qualified URI of the instance group:



  "backends": [
      "group": "[PROJECT_ID]/zones/[ZONE]/instanceGroups/[INSTANCE_GROUP]"
  "healthChecks": ["[PROJECT_ID]/global/httpHealthChecks/[HEALTH_CHECK]"]



  "backends": [
      "group": "[PROJECT_ID]/zones/[ZONE]/instanceGroups/[INSTANCE_GROUP]"
  "healthChecks": ["[PROJECT_ID]/global/httpsHealthChecks/[HEALTH_CHECK]"]

The group and healthChecks values must contain the fully qualified URI for the resources.

Health checking

Each backend service must have a Health Check associated with it. This health check is run continuously and its results are used to determine which instances should receive new requests. Unhealthy instances do not receive new requests. Unhealthy instances continue to be polled. If an unhealthy instance passes a health check, it is deemed healthy and will begin receiving new connections.

Best practice configuration is to check health and serve traffic on the same port. However, it is possible to perform health checks on one port, but serve traffic on another. If you do use two different ports, be careful that firewall rules and services running on instances are configured appropriately. If you run them both on the same port, but decide to switch ports at some point, be sure to update both the backend service and the health check.

To view the results of the latest health check with gcloud compute, use the backend-services get-health command.

gcloud compute backend-services get-health [BACKEND_SERVICE]

The command returns a healthState value for all instances in the specified backend service, with a value of either HEALTHY or UNHEALTHY:

- healthState: UNHEALTHY
  instance: us-central1-b/instances/www-video1
- healthState: HEALTHY
  instance: us-central1-b/instances/www-video2
kind: compute#backendServiceGroupHealth

Backend services that do not have a valid global forwarding rule referencing it will not be health checked and so will have no health status.

To check health with the API, send a POST request to the following URI, with the instance group URI as the value of a group key in the JSON body:


  "group": [INSTANCE_GROUP]

List backend services

To list existing backend services with gcloud compute, use the backend-services list command:

gcloud compute backend-services list

In the API, send an empty GET request to:[PROJECT_ID]/global/backendServices

Get a backend service

To get information about a single backend service with gcloud compute, use the backend-services describe command:

gcloud compute backend-services describe [BACKEND_SERVICE]

In the API, send an empty GET request to:[PROJECT_ID]/global/backendServices/[BACKEND_SERVICE]

Delete backend services

To delete a backend service, you must first make sure that the backend service is not being referenced by any URL maps. If a URL map is currently referencing a backend service, you must delete the URL map to remove the reference.

To delete a backend service with gcloud compute, use the backend-services delete command:

gcloud compute backend-services delete [BACKEND_SERVICE]

In the API, send an empty DELETE request to:[PROJECT_ID]/global/backendServices/[BACKEND_SERVICE]

Update a backend service

To edit an existing backend service with gcloud compute, use the backend-services edit command. This launches a text editor containing the backend service file:

gcloud compute backend-services edit [BACKEND_SERVICE]

A sample resource is included in the file comments to indicate the required format. You can uncomment any line by deleting the leading hash (#).

In the API, you can update the backend service using the update method or the patch method. With the update method, you must reproduce the entire contents of the backend service object in the request body that you send as a POST request. The patch method accepts a partial resource representation as the body of a PATCH request. In both cases, send the requests to the following URI:[PROJECT_ID]/global/backendServices/[BACKEND_SERVICE]

The body of your request must define the resource representation fields that are appropriate for either the update or patch methods.

Send feedback about...

Compute Engine Documentation