Service Extensions lets you instruct supported Application Load Balancers to send a callout from the load balancing data path to user-managed callout services or Google services.
Callouts data flow
A load balancer communicates with a callout by using Envoy's ext_proc gRPC API. This API lets the extension service respond to events in the lifecycle of an HTTP request by examining and modifying the headers or the body of the request.
An abbreviated version of the API is as follows.
// The gRPC API to be implemented by the external processing server service ExternalProcessor { rpc Process(stream ProcessingRequest) returns (stream ProcessingResponse) { } } // Envoy sets one of these fields depending on the processing stage. message ProcessingRequest { oneof request { HttpHeaders request_headers = 2; HttpHeaders response_headers = 3; HttpBody request_body = 4; HttpBody response_body = 5; } } message ProcessingResponse { oneof response { HeadersResponse request_headers = 1; HeadersResponse response_headers = 2; BodyResponse request_body = 3; BodyResponse response_body = 4; ImmediateResponse immediate_response = 7; } }
You can deploy the ext_proc gRPC service on VM instances or on
GKE and configure an instance group or network endpoint group
(NEG) to represent the endpoints of this service.
Figure 1 shows how you can deploy the callout backend service with a gRPC server on a user-managed compute resource such as virtual machine (VM) instances or Google Kubernetes Engine (GKE) and represent it to the load balancer as a regular backend service.
After receiving the headers for an HTTP request, the load balancer sends the
ProcessingRequest message to the extension service with the request_headers
field set to the HTTP headers from the client.
The extension service must respond to the ProcessingRequest message with a
corresponding ProcessingResponse message that contains any configured changes
to the headers or body of the ProcessingRequest message. Alternatively, the
service can set the immediate_response field to make the load balancer end request
processing and send the specified response back to the client.
For REQUEST_HEADER and RESPONSE_HEADER events, the extension service can
manipulate the HTTP headers in the request or response. The service can add,
modify, or delete headers by setting the request_headers or response_headers
field in the ProcessingResponse message appropriately. Use the raw_value
field for headers.
Traffic extensions allow changing the headers and the body of both requests and responses. The extension server can override the processing mode dynamically and allow it to enable or disable the extension for subsequent phases of request processing. Load balancers don't reevaluate route rules after calling a traffic extension.
Edge and route extensions only support HTTP headers. They can't inspect or mutate HTTP bodies.
Body processing modes
You can configure one of the following two send modes for request and response
body processing, by setting it as the value for the
request_body_send_mode or response_body_send_mode field, respectively.
The default mode is STREAMED, which is recommended for most use cases.
| Mode | Description | Supported events required | Extensions supported |
|---|---|---|---|
STREAMED
|
Calls are executed in the streaming mode. This default setting is also used if the mode isn't set. The proxy sends body chunks to the extension service and expects a single response per chunk. The extension can send modified chunks back, acknowledge chunks without any changes, or delete chunks. The proxy sends only a limited amount of data at a time. So, the extension service must acknowledge chunks as soon as possible. Although the body mode can't be changed dynamically, an advanced
extension server can dynamically select the future HTTP events to receive.
By returning the |
Must include REQUEST_BODY for requests or
RESPONSE_BODY for responses. |
Traffic extensions (for both requests and responses). |
FULL_DUPLEX_STREAMED |
Calls are executed in the full duplex mode. The proxy sends chunks as they arrive and doesn't buffer them. Because there is no buffering, the proxy is less sensitive to extension latency. The proxy can receive as many reply chunks as needed. Reply chunks are disconnected from the chunks that the proxy sends. Subsequent chunks are sent for processing as they arrive at the proxy, without waiting for the previous chunks and events to be fully processed. The extension can freely buffer, modify, and rechunk the body contents. If the extension doesn't send the body contents back, the next extension in the chain receives an empty body. The |
Must include REQUEST_BODY and REQUEST_TRAILERS
for requests or RESPONSE_BODY and RESPONSE_TRAILERS
for responses. |
Traffic extensions (for both requests and responses).
Route extensions (for requests). |
Supported backends for user-managed callout backend services
You can host user-managed callout extensions on a backend service that uses one
of the following types of backends that run the ext_proc gRPC service:
- All managed and unmanaged instance group backends
- All zonal NEGs
- All hybrid connectivity NEGs
- Private Service Connect NEGs pointing to VPC services
- Serverless NEGs pointing to Cloud Run services
Recommended optimizations for callouts
Integrating an extension into the load balancing processing path incurs additional latency for requests and responses. Each type of data that the extension service processes—including request headers, request body, response headers, and response body—adds latency.
Consider the following optimizations to minimize the latency:
- Configure the extension to process only the data that you need. For example,
to modify only request headers, set the
supported_eventsfield in the extension toREQUEST_HEADERS. - Deploy callouts in the same zones as the regular destination backend service for the load balancer. When using a cross-region internal Application Load Balancer, place the extension service backends in the same region as the load balancer's proxy-only subnets.
- When using a global external Application Load Balancer, place the callout service backends in the geographic regions where the regular load balancer's destination VMs, GKE workloads, and Cloud Run functions are located.
Limitations
This section lists some limitations with callouts.
Limitations with header manipulation
You can't change some headers. The following are the limitations with header manipulation:
Header manipulation is not supported for the following headers:
X-user-IPCDN-Loop- Headers starting with
X-Forwarded,X-Google,X-GFE, orX-Amz- connectionkeep-alivetransfer-encoding,teupgradeproxy-connection,proxy-authenticate,proxy-authorizationtrailers
For
LbTrafficExtension, header manipulation is also not supported for these::method,:authority,:scheme, or host headers.When the
ext_procserver specifies header values inHeaderMutation, the load balancer ignores thevaluefield. Use theraw_valuefield instead.
Limitations with HTTP/1.1 clients and backends
The following are the limitations with HTTP/1.1 clients and backends:
When you configure either
REQUEST_BODYorRESPONSE_BODYfor an extension, if the load balancer receives a matching request, it removes theContent-Lengthheader from the response and switches to chunked body encoding.While streaming a message body to the
ext_procserver, at the end, the load balancer might send a tailingProcessingRequestmessage with an empty body withend_streamset totrueto indicate that the stream has ended.
Other limitations
The following is a limitation with ProcessingResponse messages:
The maximum size of one
ProcessingResponsemessage is 128KB. If a message received is over this limit, the stream is closed with aRESOURCE_EXHAUSTEDerror.The callout backend service can't use Cloud Armor, IAP, or Cloud CDN policies.
The callout backend service must use HTTP/2 as the protocol.
The callout backend service used by route extensions can't override the processing mode of an
ext_procstream.
What's next
Configure a callout backend service.
This is a prerequisite to configuring route, authorization, and user-managed traffic extensions by using callouts.