Understanding backend services

A backend service is a resource with fields containing configuration values for the following Google Cloud Platform load balancing services:

  • HTTP(S) Load Balancing
  • SSL Proxy Load Balancing
  • TCP Proxy Load Balancing
  • Internal TCP/UDP Load Balancing
  • Internal HTTP(S) Load Balancing (Beta)

A backend service directs traffic to backends, which are instance groups or network endpoint groups.

The backend service performs various functions, such as:

  • Directing traffic according to a balancing mode. The balancing mode is defined in the backend service for each backend.
  • Monitoring backend health according to a health check
  • Maintaining session affinity

Architecture

The number of backend services per load balancer depends on the load balancer type:

Load balancer type Number of backend services
HTTP(S) Load Balancing Multiple
SSL Load Balancing 1
TCP Proxy Load Balancing 1
Internal TCP/UDP Load Balancing 1
Internal HTTP(S) Load Balancing Multiple

Each backend service contains one or more backends.

For a given backend service, all backends must either be instance groups or network endpoint groups. You can associate different types of instance groups (for example, managed and unmanaged instance groups) with the same backend service, but you cannot associate instance groups and network endpoint groups with the same backend service.

Backend service settings

Each backend service has the following configurable settings:

  • Session affinity (optional). Normally, load balancers use a hash algorithm to distribute requests among available instances. In normal use, the hash is based on the source IP address, destination IP address, source port, destination port, and protocol (a 5-tuple hash). Session affinity adjusts what is included in the hash and attempts to send all requests from the same client to the same virtual machine instance.
  • The backend service timeout. This value is interpreted in different ways depending on the type of load balancer and protocol used:
    • For an HTTP(S) load balancer, the backend service timeout is a request/response timeout, except for connections that are upgraded to use the Websocket protocol.
    • When sending WebSocket traffic to an HTTP(S) load balancer, the backend service timeout is interpreted as the maximum amount of time that a WebSocket, idle or active, can remain open.
    • For an SSL proxy or TCP proxy load balancer,the backend service timeout is interpreted as an idle timeout for all traffic.
    • For an internal TCP/UDP load balancer, the backend service timeout parameter is ignored.
  • A Health check. The health checker polls instances attached to the backend service at configured intervals. Instances that pass the health check are allowed to receive new requests. Unhealthy instances are not sent requests until they are healthy again.

See the backend service API resource or the gcloud command-line tool user guide for descriptions of the properties that are available when working with backend services.

Backends

Each backend is a resource to which a GCP load balancer distributes traffic. There are two different types of resources that can be used as backends:

For a given backend service, the backends must either all be instance groups or, if supported, NEGs. You cannot use both instance groups and NEGs as backends on the same backend service. Additionally:

  • Backends for internal TCP/UDP load balancers only support instance group backends.
  • If an HTTP(S) load balancer has two (or more) backend services, you can use instance groups as backends for one backend service and NEGs as backends for the other backend service.

Backends and external IP addresses

The backend VMs do not need external IP addresses:

  • For HTTP(S), SSL Proxy, and TCP Proxy load balancers: Clients communicate with a Google Front End (GFE) using your load balancer's external IP address. The GFE communicates with backend VMs using the internal IP addresses of their primary network interface. Because the GFE is a proxy, the backend VMs themselves do not require external IP addresses.

  • For network load balancers: Network load balancers route packets using bidirectional network address translation (NAT). When backend VMs send replies to clients, they use the external IP address of the load balancer's forwarding rule as the source IP address.

  • For internal load balancers: Backend VMs for an internal load balancer do not need external IP addresses.

Traffic distribution

The values of the following fields in the backend services resource determine some aspects of the backend's behavior:

  • A balancing mode, which tells the load balancing system how to determine when the backend is at full usage. If all backends for the backend service in a region are at full usage, new requests are automatically routed to the nearest region that can still handle requests. The balancing mode can be based on connections, CPU utilization, or requests per second (rate).
  • A capacity setting. Capacity is an additional control that interacts with the balancing mode setting. For example, if you normally want your instances to operate at a maximum of 80% CPU utilization, you would set your balancing mode to CPU utilization and your capacity to 80%. If you want to cut instance utilization in half, you could leave the capacity at 80% CPU utilization and set capacity scaler to 0.5. To drain the backend service, set capacity scaler to 0 and leave the capacity as is. For more information about capacity and CPU utilization, read Scaling Based on CPU or Load Balancing Serving Capacity.

Traffic Director also uses backend service resources. Specifically, Traffic Director uses backend services whose load balancing scheme is INTERNAL_SELF_MANAGED. For an internal self managed backend service, traffic distribution is accomplished by using a combination of a load balancing mode and a load balancing policy. The backend service directs traffic to a backend (instance group or NEG) according to the backend's balancing mode, then, once a backend has been selected, Traffic Director distributes traffic according to a load balancing policy.

Internal self managed backend services support the following balancing modes:

  • UTILIZATION, if all the backends are instance groups
  • RATE, if all the backends are either instance groups or NEGs

If you choose RATE balancing mode, you must specify a maximum rate per backend, instance, or endpoint.

Protocol to the backends

When you create a backend service, you must specify a protocol used for communication with its backends. A backend service can only use one protocol. You cannot specify a secondary protocol to use as a fallback.

The available protocols are:

  • HTTP
  • HTTPS
  • HTTP/2
  • SSL
  • TCP
  • UDP

Which protocol is valid depends on the type of load balancer you create, including its load balancing scheme. Refer to the documentation for each type of load balancer for more information about which protocols can be used for its backend services.

HTTP/2 as a protocol to the backends is also available for load balancing with Ingress.

Backend services and regions

HTTP(S) Load Balancing is a global service. You may have more than one backend service in a region, and you may assign backend services to more than one region, all serviced by the same global load balancer. Traffic is allocated to backend services as follows:

  1. When a user request comes in, the load balancing service determines the approximate origin of the request from the source IP address.
  2. The load balancing service knows the locations of the instances owned by the backend service, their overall capacity, and their overall current usage.
  3. If the closest instances to the user have available capacity, then the request is forwarded to that closest set of instances.
  4. Incoming requests to the given region are distributed evenly across all available backend services and instances in that region. However, at very small loads, the distribution may appear to be uneven.
  5. If there are no healthy instances with available capacity in a given region, the load balancer instead sends the request to the next closest region with available capacity.

Instance groups

Backend services and autoscaled managed instance groups

Autoscaled managed instance groups are useful if you need many machines all configured the same way, and you want to automatically add or remove instances based on need.

The autoscaling percentage works with the backend service balancing mode. For example, suppose you set the balancing mode to a CPU utilization of 80% and leave the capacity scaler at 100%, and you set the Target load balancing usage in the autoscaler to 80%. Whenever the CPU utilization of the group rises above 64% (80% of 80%), the autoscaler will instantiate new instances from the template until usage drops down to about 64%. If the overall usage drops below 64%, the autoscaler will remove instances until usage gets back to 64%.

New instances have a cooldown period before they are considered part of the group, so it's possible for traffic to exceed the backend service's 80% CPU utilization during that time, causing excess traffic to be routed to the next available backend service. Once the instances are available, new traffic will be routed to them. Also, if the number of instances reaches the maximum permitted by the autoscaler's settings, the autoscaler will stop adding instances no matter what the usage is. In this case, extra traffic will be load balanced to the next available region.

Configuring autoscaled managed instance groups

To configure autoscaled managed instance groups, perform the following steps:

  1. Create an instance template for your instance group.
  2. Create a managed instance group and assign the template to it.
  3. Turn on autoscaling based on load balancing serving capacity.

Restrictions and guidance for instance groups

Because Cloud Load Balancing offers a great deal of flexibility in how you configure load balancing, it is possible to create configurations that do not behave well. Keep the following restrictions and guidance in mind when creating instance groups for use with load balancing.

  • Do not put a virtual machine instance in more than one instance group.
  • Do not delete an instance group if it is being used by a backend.
  • Your configuration will be simpler if you do not add the same instance group to two different backends. If you do add the same instance group to two backends:
    • Both backends must use the same balancing mode, either UTILIZATION or RATE.
    • You can use maxRatePerInstance and maxRatePerGroup together. It is acceptable to set one backend to use maxRatePerInstance and the other to maxRatePerGroup.
    • If your instance group serves two or more ports for several backends respectively, you have to specify different port names in the instance group.
  • All instances in a managed or unmanaged instance group must be in the same VPC network and, if applicable, the same subnet.
  • If you are using a managed instance group with autoscaling, do not use the maxRate balancing mode in the backend service. You may use either the maxUtilization or maxRatePerInstance mode.
  • Do not make an autoscaled managed instance group the target of two different load balancers.
  • When resizing a managed instance group, the maximum size of the group should be smaller than or equal to the size of subnet.

Network endpoint groups

A network endpoint is a combination of an IP address and a port, specified in one of two ways:

  • By specifying an IP address:port pair, such as 10.0.1.1:80.
  • By specifying a network endpoint IP address only. The default port for the NEG is automatically used as the port of the IP address:port pair.

Network endpoints represent services by their IP address and port, rather than referring to a particular VM. A network endpoint group (NEG) is a logical grouping of network endpoints.

A backend service that uses network endpoint groups as its backends distributes traffic among applications or containers running within VM instances. For more information, see Network Endpoint Groups in Load Balancing Concepts.

Session affinity

Without session affinity, load balancers distribute new requests according to the balancing mode of the backend instance group or NEG. Some applications – such as stateful servers used by ads serving, games, or services with heavy internal caching – need multiple requests from a given user to be directed to the same instance.

Session affinity makes this possible, identifying TCP traffic from the same client based on parameters such as the client's IP address or the value of a cookie, directing those requests to the same backend instance if the backend is healthy and has capacity (according to its balancing mode).

Session affinity has little meaningful effect on UDP traffic, because a session for UDP is a single request and response.

Session affinity can break if the instance becomes unhealthy or overloaded, so you should not assume perfect affinity.

For HTTP(S) Load Balancing, session affinity works best with the RATE balancing mode.

Different load balancers support different session affinity options, as summarized in the following table:

Load balancer Session affinity options
Internal • None
• Client IP
• Client IP and protocol
• Client IP, protocol, and port
TCP Proxy
SSL Proxy
• None
• Client IP
HTTP(S) • None
• Client IP
• Generated cookie
Network Network Load Balancing doesn't use backend services. Instead, you set session affinity for network load balancers through target pools. See the sessionAffinity parameter in Target Pools.

The following sections discuss two common types of session affinity.

Using client IP affinity

Client IP affinity directs requests from the same client IP address to the same backend instance based on a hash of the client's IP address. Client IP affinity is an option for every GCP load balancer that uses backend services.

When using client IP affinity, keep the following in mind:

  • The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy. Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address. This can cause incoming traffic to clump unnecessarily onto the same backend instances.

  • If a client moves from one network to another, its IP address changes, resulting in broken affinity.

Console


To set client IP affinity:

  1. In the Google Cloud Platform Console, go to the Backend configuration portion of the load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for a Backend service.
  5. In the Edit backend service dialog box, select Client IP from the Session affinity drop-down menu.
    This action enables client IP session affinity. The Affinity cookie TTL field is grayed out as it has no meaning for client IP affinity.
  6. Click the Update button for the Backend service.
  7. Click the Update button for the load balancer.

gcloud


You can use the create command to set session affinity for a new backend service, or the update command to set it for an existing backend service. This example shows using it with the update command.

gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
    --session-affinity client_ip

API


Consult the API reference for backend services.

When generated cookie affinity is set, the load balancer issues a cookie named GCLB on the first request and then directs each subsequent request that has the same cookie to the same instance. Cookie-based affinity allows the load balancer to distinguish different clients using the same IP address so it can spread those clients across the instances more evenly. Cookie-based affinity allows the load balancer to maintain instance affinity even when the client's IP address changes.

The path of the cookie is always /, so if there are two backend services on the same hostname that enable cookie-based affinity, the two services are balanced by the same cookie.

The lifetime of the HTTP cookie generated by the load balancer is configurable. It can be set to 0 (default), which means the cookie is only a session cookie, or it can have a lifetime of 1 to 86400 seconds (24 hours).

Console


To set generated cookie affinity:

  1. In the Google Cloud Platform Console, you can modify Generated Cookie Affinity in the Backend configuration portion of the HTTP(S) load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for a Backend service.
  5. Select Generated cookie from the Session affinity drop-down menu to select Generated Cookie Affinity.
  6. In the Affinity cookie TTL field, set the cookie's lifetime in seconds.
  7. Click the Update button for the Backend service.
  8. Click the Update button for the load balancer.

gcloud


Turn on generated cookie affinity by setting --session-affinity to generated_cookie and setting --affinity-cookie-ttl to the cookie lifetime in seconds. You can use the create command to set it for a new backend service, or the update command to set it for an existing backend service. This example shows using it with the update command.

gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
    --session-affinity generated_cookie \
    --affinity-cookie-ttl 86400

API


Consult the API reference for backend services.

Disabling session affinity

You can turn off session affinity by updating the backend service and setting session affinity to none, or you can edit the backend service and set session affinity to none in a text editor. You can also use either command to modify the cookie lifetime.

Console


To disable session affinity:

  1. In the Google Cloud Platform Console, you can disable session affinity in the Backend configuration portion of the load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for a Backend service.
  5. Select None from the Session affinity drop-down menu to turn off session affinity.
  6. Click the Update button for the Backend service.
  7. Click the Update button for the load balancer.

gcloud


To disable session affinity run the following command:

  gcloud compute backend-services update [BACKEND_SERVICE_NAME] \
  --session-affinity none


OR

gcloud compute backend-services edit [BACKEND_SERVICE_NAME]

API


Consult the API reference for backend services.

Losing session affinity

Regardless of the type of affinity chosen, a client can lose affinity with the instance in the following scenarios.

  • The instance group runs out of capacity, and traffic has to be routed to a different zone. In this case, traffic from existing sessions may be sent to the new zone, breaking affinity. You can mitigate this by ensuring that your instance groups have enough capacity to handle all local users.
  • Autoscaling adds instances to, or removes instances from, the instance group. In either case, the backend service reallocates load, and the target may move. You can mitigate this by ensuring that the minimum number of instances provisioned by autoscaling is enough to handle expected load, then only using autoscaling for unexpected increases in load.
  • The target instance fails health checks. Affinity is lost as the session is moved to a healthy instance.
  • The balancing mode is set to CPU utilization, which may cause your computed capacities across zones to change, sending some traffic to another zone within the region. This is more likely at low traffic when computed capacity is less stable.

Configuring the timeout setting

For longer-lived connections to the backend service from the load balancer, configure a timeout setting longer than the 30-second default.

Console


To configure the timeout setting:

  1. In the Google Cloud Platform Console, you can modify the timeout setting in the Backend configuration portion of the HTTP(S) load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for the Backend service.
  5. On the line for Protocol, Port, and Timeout settings, select the Edit pencil.
  6. Enter a new Timeout Setting in seconds.
  7. Click the Update button for the Backend service.
  8. Click the Update button for the load balancer.

gcloud


To change the timeout setting with the gcloud command-line tool, use the `gcloud compute backend-services update' command. Append the command with --help for detailed information.

gcloud compute backend-services update [BACKEND_SERVICE] [--timeout=TIMEOUT]

API


Consult the REST API reference for backend services.

Named ports

For internal HTTP(S), external HTTP(S), SSL Proxy, and TCP Proxy load balancers, backend services must have an associated named port if their backends are instance groups. The named port informs the load balancer that it should use that configured named port on the backend instance group, which translates that to a port number. This is the port that the load balancer uses to connect to the backend VMs, which can be different from the port that clients use to contact the load balancer itself.

Named ports are key-value pairs representing a service name and a port number on which a service is running. The key-value pair is defined on an instance group. When a backend service uses that instance group as a backend, it can "subscribe" to the named port:

  • Each instance group can have up to five named ports (key-value pairs) defined.
  • Each backend service for an HTTP(S), SSL Proxy, or TCP Proxy load balancer using instance group backends can only "subscribe" to a single named port.
  • When you specify a named port for a backend service, all of the backend instance groups must have at least one named port defined that uses that same name.

Named ports cannot be used under these circumstances:

  • For NEG backends: NEGs define ports per endpoint, and there's no named port key-value pair associated with a NEG.
  • For internal TCP/UDP load balancers: Because internal TCP/UDP load balancers are pass-through load balancers (not proxies), their backend services do not support setting a named port.

Health checks

Each backend service must have a Health Check associated with it. A health check runs continuously and its results help determine which instances should receive new requests.

Unhealthy instances do not receive new requests and continue to be polled. If an unhealthy instance passes a health check, it is deemed healthy and will begin receiving new connections.

Best practices for health checks

The best practice when configuring a health check is to check health and serve traffic on the same port. However, it is possible to perform health checks on one port while serving traffic on another. If you use two different ports, ensure that firewall rules and services running on instances are configured appropriately. If you run health checks and serve traffic on the same port, but decide to switch ports at some point, be sure to update both the backend service and the health check.

Backend services that do not have a valid forwarding rule referencing it will not be health checked and will have no health status.

Creating health checks

You need to a create a health check before you create a backend service. We recommend you create a health check for the same protocol as the traffic you are load balancing.

Console

In the console, you can create a health check when you create your backend service.

gcloud

For gcloud commands, see the Health Check page to create a health check.

API

For API commands, see the Health Checks page.

Creating a backend service

Console


To create a new backend service:

  1. In the Google Cloud Platform Console, you can create a backend service in the Backend configuration portion of the HTTP(S) load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. In the Create or select backend services & backend buckets drop-down menu, select Backend services -> create a backend service.
  5. Enter a Name for the backend service.
  6. Optionally, enter a Description.
  7. Select the Edit pencil on the line for Protocol, Port, and Timeout settings.
    • Choose a Protocol of http, https, or http2.
    • Enter a port name for the Named port.
    • Optionally, change the default Timeout setting.
  8. Under BackendType, choose the Instance group radio button.
  9. In the Backends dialog box, you can create one or more backends.
    1. In the New backend dialog box, select an existing Instance group from the drop-down menu.
    2. Enter one or more Port numbers, separated by commas, through which the backend will receive requests.
      • For a Protocol of http, this field is set to 80.
      • For a Protocol of https, this field is set to 443.
      • For a Protocol of http2, this field is set to 443.
    3. Set the percentage for Maximum CPU utilization.
    4. Optionally, set the Maximum RPS, leaving the field blank for unlimited. Set RPS Per instance or RPS Per group.
    5. Set the percentage for Capacity.
    6. Click the Done button.
  10. Check or uncheck the Enable Cloud CDN box.
  11. In the Health check drop-down menu, select an existing health check or Create another health check by completing the following steps:
    1. Set the Name.
    2. Optionally, set the Description.
    3. Set the Protocol and Port. Consult the Health checks section for best practices.
    4. Set the Proxy protocol.
    5. Optionally, set the values for Request and Response.
    6. Under Health Criteria, set the following items:
      1. Set the Check interval in seconds.
      2. Set the Timeout in seconds.
      3. Set the number of consecutive successes in Healthy threshold.
      4. Set the number of consecutive failures in Unhealthy threshold.
      5. Click Save and continue.
    7. Click Advanced Configurations to modify Session affinity, the Connection draining timeout, Custom request headers, or Security policies.
      1. To set Session affinity, select Client IP or Generated cookie. If you selected *Generated cookie, set an Affinity cookie TTL in seconds.
      2. To set the Connection draining timeout, enter the number of seconds that the instance waits for in-flight connections to complete.
      3. To create custom request headers, click +Add header, then add the Header name and Header value for the header.
      4. To enable a Security policy, select one from the drop-down menu.
    8. Click the Update button for the Backend service.
    9. Click the Update button for the load balancer.

gcloud


To create a backend service using the gcloud command-line tool, consult the Cloud SDK documentation.

API


For API commands, see the Backend Services page to create a backend service.

Modifying a backend service

Changes to your backend services are not instantaneous. It can take several minutes for your changes to propagate throughout the network.

Console


To modify an existing backend service:

  1. In the Google Cloud Platform Console, you can edit a backend service in the Backend configuration portion of the HTTP(S) load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Under Backend services, select the Edit pencil for a backend service. You can modify the following fields:
    1. Under Backends, add a new Backend or select the Edit pencil for an existing backend.
    2. Select the Edit pencil on the line for Protocol, Port, and Timeout settings.
    3. Select an existing Health Check or create a new one by following the previous health check creation steps.
    4. Modify the Session affinity and, when required, the Affinity cookie TTL.
    5. Click Advanced Configurations to modify the Connection draining timeout.
    6. Check or uncheck the Enable Cloud CDN box.
  5. Click Advanced Configurations to modify Session affinity, the Connection draining timeout, Custom request headers, or Security policies.
    1. To set Session affinity, select Client IP or Generated cookie. If you selected *Generated cookie, set an Affinity cookie TTL in seconds.
    2. To set the Connection draining timeout, enter the number of seconds that the instance waits for in-flight connections to complete.
    3. To create custom request headers, click +Add header, then add the Header name and Header value for the header.
    4. To enable a Security policy, select one from the drop-down menu.
    5. Click the Update button for the Backend service.
    6. Click the Update button for the load balancer.

gcloud


To modify a backend service using the gcloud command-line tool, consult the Cloud SDK documentation.

API


To modify a backend service with the API, see the API docs.

Adding instance groups to a backend service

To define the instances that are included in a backend service, you must add a backend and assign an instance group to it. You must create the instance group before you add it to the backend.

When adding the instance group to the backend, you must also define certain parameters.

Console


To add an instance group to a backend service:

  1. In the Google Cloud Platform Console, you can add an instance group to a backend in the Backend configuration portion of the HTTP(S) load balancer page.
    Go to the Load balancing page
  2. Select the Edit pencil for your load balancer.
  3. Select Backend configuration.
  4. Select the Edit pencil for the Backend configuration.
  5. Select the Edit pencil for a Backend.
  6. In the Edit Backend dialog box, in the Instance group drop-down menu, select an Instance group.
  7. Click the Done button for Edit Backend.
  8. Click the Update button for the Backend service.
  9. Click the Update button for the load balancer.

gcloud


To add an instance group to a backend service using the gcloud command-line tool, consult the Cloud SDK.

API


To add an instance group to a backend service with the API, consult the API docs.

Adding network endpoint groups to a backend service

To add a network endpoint group to a backend service, see Adding a network endpoint group to a backend service.

Viewing the results of a backend services health check

Once you have created your health checks and backend service, you can view the health check results.

Console


To view the result of a health check on a backend service:

  1. Go to the load balancing summary page.
    Go to the Load balancing page
  2. Click the name of a load balancer.
  3. Under Backend, for a Backend service, view the Healthy column in the Instance group table.

gcloud


To view the results of the latest health check with the gcloud command-line tool, use the backend-services get-health command.

gcloud compute backend-services get-health [BACKEND_SERVICE]

The command returns a healthState value for all instances in the specified backend service, with a value of either HEALTHY or UNHEALTHY:

  healthStatus:
    - healthState: UNHEALTHY
      instance: us-central1-b/instances/www-video1
    - healthState: HEALTHY
      instance: us-central1-b/instances/www-video2
  kind: compute#backendServiceGroupHealth
  

API


For API commands, see the Health Checks page.

Other notes

The following Traffic Director features are not supported with GCP load balancers:

  • Circuit breaking
  • Outlier detection
  • Load balancing policies
  • HTTP cookie-based session affinity
  • HTTP header-based session affinity

What's next

For related documentation and information on how backend services are used in load balancing, review the following:

¿Te ha resultado útil esta página? Enviar comentarios:

Enviar comentarios sobre...