Cloud Load Balancing extensions overview

This page provides an overview about Cloud Load Balancing extensions.

Service Extensions lets you instruct supported Application Load Balancers to send a callout from the load balancing data path to extension backend services.

You can configure Application Load Balancers to use the following types of callout extensions:

  • Route extensions, early in the request processing lifecycle

  • Traffic extensions, just before the load balancer sends requests to backends or receives responses from them

Extensibility points in the load balancing data path

Service Extensions supports callout extensions in different stages of the load balancing data path.

Figure 1 shows how Service Extensions supports callout extensions in the routing and traffic management stages for these types of load balancers: Regional external Application Load Balancer, Regional internal Application Load Balancer, and Cross-region internal Application Load Balancer.

Regional external Application Load Balancers, regional internal Application Load Balancers,
    and cross-region internal Application Load Balancers support extensibility in the routing and
    traffic management stages.
Figure 1. Regional external Application Load Balancers, regional internal Application Load Balancers, and cross-region internal Application Load Balancers support extensions in the routing and traffic management stages (click to enlarge).

Figure 2 shows how Service Extensions supports callout extensions in the traffic management stage for Global external Application Load Balancers.

Global external Application Load Balancers support
    extensibility in the traffic management stage.
Figure 2. Global external Application Load Balancers support extensions in the traffic management stage (click to enlarge).

How route extensions work

Route extensions run first in the request processing path when the load balancer receives request headers and before it evaluates the URL map.

After a load balancer invokes a route extension for a request, it does the following:

  • Selects the backend service by evaluating the URL map
  • Applies Google Cloud Armor policies for the selected backend service
  • Applies Identity-Aware Proxy (IAP) policies for the selected backend service
  • Performs fault injection
  • Performs request header transformations and resolves custom request header variables
  • Invokes traffic extensions, if they exist in the processing path of the selected backend service
  • Performs URL rewrites
  • Performs redirects or routing to the selected backend service and applies timeouts and retry policies in the URL map and other load balancing settings for the backend service

How traffic extensions work

Load balancers run traffic extensions last in the request processing path and first in the response processing path.

These extensions let you modify the headers and payloads of both requests and responses without impacting the choice of the backend service. You can also use traffic callouts for custom logging by specifying the information that you want to log, the format, and the external provider.

Before a load balancer invokes a traffic extension on the request path for a request, it does the following:

  • Performs fault injection
  • Performs request header transformations and resolves custom request header variables
  • Selects a backend service for the request
  • Applies Google Cloud Armor policies for the selected backend service
  • Applies IAP policies for the selected backend service
  • Applies Cloud CDN caching policies for the selected backend service in the case of global external Application Load Balancers

After a load balancer invokes a traffic extension on the request path for a request, it does the following:

  • Performs URL rewrites
  • Performs header manipulation according to the URL map
  • Performs redirects or routing to the selected backend service while applying timeouts and retry policies in the URL map and the load balancing settings for the backend service
  • Performs request mirroring

After a load balancer invokes a traffic extension on the response path for a request, it does the following:

  • Performs response header transformations and resolves custom response header variables
  • Performs logging by using Cloud Logging
  • Performs Cloud CDN caching in the case of global external Application Load Balancers

Callout extensions data flow

A load balancer communicates with a callout extension by using Envoy's ext proc gRPC API. This API lets the extension service respond to events in the lifecycle of an HTTP request by examining and modifying the headers or the body of the request.

An abbreviated version of the API is as follows.

// The gRPC API to be implemented by the external processing server
service ExternalProcessor {
  rpc Process(stream ProcessingRequest) returns (stream ProcessingResponse) {
  }
}

// Envoy sets one of these fields depending on the processing stage.
message ProcessingRequest {
  oneof request {
    HttpHeaders request_headers = 2;
    HttpHeaders response_headers = 3;
    HttpBody request_body = 4;
    HttpBody response_body = 5;
  }
}

// For every ProcessingRequest message received by the server, the server must
// send back exactly one ProcessingResponse message.
message ProcessingResponse {
  // The server must set one of these fields corresponding to the field set in
  // the ProcessingRequest message. Alternatively, the server can set the
  // immediate_response field to make the load balancer terminate request
  // processing and send the specified response back to the client.
  oneof response {
    HeadersResponse request_headers = 1;
    HeadersResponse response_headers = 2;
    BodyResponse request_body = 3;
    BodyResponse response_body = 4;

    ImmediateResponse immediate_response = 7;
  }
}

Figure 3 shows how you can deploy the extension backend service with a gRPC server on a user-managed compute resource such as virtual machine (VM) instances or Google Kubernetes Engine and represent it to the load balancer as a regular backend service.

Application Load Balancers use callouts to include custom logic from extension backend services.
Figure 3. Application Load Balancers send Service Extensions callouts to extension backend services (click to enlarge).

For example, on receiving the headers for an HTTP request, the load balancer sends the ProcessingRequest message to the extension service with the request_headers field set to the HTTP headers from the client. The extension service must respond with a suitable ProcessingResponse message with any configured changes to the headers or body.

For REQUEST_HEADER and RESPONSE_HEADER events, the extension service can manipulate the HTTP headers in the request or response. The service can add, modify, or delete headers by setting the request_headers or response_headers field in the ProcessingResponse message appropriately. Use the raw_value field for headers.

Traffic extensions allow changing the headers and the body of both requests and responses. The extension server can override the processing mode dynamically and allow it to enable or disable the extension for subsequent phases of request processing.

Route extensions have the following restrictions:

  • Route extensions allow changing only the request headers. So, the extension service must not set anything other than request_headers in the ProcessingResponse message.

  • Route extensions cannot override the processing mode of the ext_proc stream. Load balancers call them only for request headers. You can deploy the ext_proc gRPC service on VM instances or on GKE and configure an instance group or network endpoint group (NEG) to represent the endpoints of this service. You can't host extension backend services on Cloud Run.

For route extensions, to recompute the route, you must set the clear_route_cache field in the request_headers section of the ProcessingResponse message`. If the field is left unset, the proxy does not recompute the backend service for the request.

Load balancers don't re-evaluate route rules after calling a traffic extension. If you set the clear_route_cache field in the ProcessingResponse of traffic extensions, it's ignored.

Supported Application Load Balancers

Service Extensions supports callout extensions for the following Application Load Balancers:

Application Load Balancer Route extensions Traffic extensions
Global external Application Load Balancer
Regional external Application Load Balancer
Regional internal Application Load Balancer
Cross-region internal Application Load Balancer (Preview) (Preview)
Classic Application Load Balancer

Supported backends for extension services

You can host an extension on a backend service that uses one of the following types of backends that run the ext_proc gRPC service:

Limitations

This section lists some limitations with callout extensions.

Limitations with header manipulation

The following are the limitations with header manipulation:

  • Header manipulation is not supported for the following headers:

    • X-user-IP
    • CDN-Loop
    • Headers starting with X-Forwarded, X-Google, X-GFE, or X-Amz-
    • connection
    • keep-alive
    • transfer-encoding, te
    • upgrade
    • proxy-connection, proxy-authenticate, proxy-authorization
    • trailers

    For LbTrafficExtension, header manipulation is also not supported for these: :method, :authority, :scheme, or host headers.

  • When the ext_proc server specifies header values in HeaderMutation, the load balancer ignores the value field. Use the raw_value field instead.

Limitations with HTTP/1.1 clients and backends

The following are the limitations with HTTP/1.1 clients and backends:

  • When you configure either REQUEST_BODY or RESPONSE_BODY for an extension, if the load balancer receives a matching request, it removes the Content-Length header from the response and response and switches to chunked body encoding.

  • While streaming a message body to the ext_proc server, at the end, the load balancer might send a tailing ProcessingRequest message with an empty body with end_stream set to true to indicate that the stream has ended.

Other limitations

The following is a limitation with ProcessingResponse messages:

  • The maximum size of one ProcessingResponse message is 128KB. If a message received is over this limit, the stream is closed with a RESOURCE_EXHAUSTED error.

What's next