Service Extensions lets you instruct supported Application Load Balancers to send a callout from the load balancing data path to callout backend services. This page provides an overview about Cloud Load Balancing callouts.
Callouts data flow
A load balancer communicates with a callout by using Envoy's ext proc gRPC API. This API lets the extension service respond to events in the lifecycle of an HTTP request by examining and modifying the headers or the body of the request.
An abbreviated version of the API is as follows.
// The gRPC API to be implemented by the external processing server service ExternalProcessor { rpc Process(stream ProcessingRequest) returns (stream ProcessingResponse) { } } // Envoy sets one of these fields depending on the processing stage. message ProcessingRequest { oneof request { HttpHeaders request_headers = 2; HttpHeaders response_headers = 3; HttpBody request_body = 4; HttpBody response_body = 5; } } // For every ProcessingRequest message received by the server, the server must // send back exactly one ProcessingResponse message. message ProcessingResponse { // The server must set one of these fields corresponding to the field set in // the ProcessingRequest message. Alternatively, the server can set the // immediate_response field to make the load balancer terminate request // processing and send the specified response back to the client. oneof response { HeadersResponse request_headers = 1; HeadersResponse response_headers = 2; BodyResponse request_body = 3; BodyResponse response_body = 4; ImmediateResponse immediate_response = 7; } }
Figure 3 shows how you can deploy the callout backend service with a gRPC server on a user-managed compute resource such as virtual machine (VM) instances or Google Kubernetes Engine (GKE) and represent it to the load balancer as a regular backend service.
For example, on receiving the headers for an HTTP request, the load balancer
sends the ProcessingRequest
message to the extension service with the
request_headers
field set to the HTTP headers from the client. The
extension service must respond with a suitable ProcessingResponse
message with
any configured changes to the headers or body.
For REQUEST_HEADER
and RESPONSE_HEADER
events, the extension service can
manipulate the HTTP headers in the request or response. The service can add,
modify, or delete headers by setting the request_headers
or response_headers
field in the ProcessingResponse
message appropriately. Use the raw_value
field for headers.
You can deploy the ext_proc
gRPC service on VM instances or on
GKE and configure an instance group or network endpoint group
(NEG) to represent the endpoints of this service.
Traffic extensions allow changing the headers and the body of both requests and responses. The extension server can override the processing mode dynamically and allow it to enable or disable the extension for subsequent phases of request processing.
Route extensions and authorization extensions have the following restrictions:
They allow changing only the request headers. So, the extension service must not set anything other than
request_headers
in theProcessingResponse
message.They can't override the processing mode of the
ext_proc
stream. Load balancers call them only for request headers.
Load balancers don't re-evaluate route rules after calling a traffic extension.
Supported backends for extension services
You can host an extension on a backend service that uses one of the following
types of backends that run the ext_proc
gRPC service:
- All managed and unmanaged instance group backends
- All zonal NEGs
- All hybrid connectivity NEGs
- Private Service Connect NEGs pointing to VPC services
- Serverless NEGs pointing to Cloud Run services
Recommended optimizations for callouts
Integrating an extension into the load balancing processing path incurs additional latency for requests and responses. Each type of data that the extension service processes—including request headers, request body, response headers, and response body—adds latency.
Consider the following optimizations to minimize the latency:
- Configure the extension to process only the data that you need. For example,
to modify only request headers, set the
supported_events
field in the extension toREQUEST_HEADERS
. - Deploy callouts in the same zones as the regular destination backend service for the load balancer. When using a cross-region internal Application Load Balancer, place the extension service backends in the same region as the load balancer's proxy-only subnets.
- When using a global external Application Load Balancer, place the callout service backends in the geographic regions where the regular load balancer's destination VMs, GKE workloads, and Cloud Run functions are located.
Limitations
This section lists some limitations with callouts.
Limitations with header manipulation
You cannot change some headers. The following are the limitations with header manipulation:
Header manipulation is not supported for the following headers:
X-user-IP
CDN-Loop
- Headers starting with
X-Forwarded
,X-Google
,X-GFE
, orX-Amz-
connection
keep-alive
transfer-encoding
,te
upgrade
proxy-connection
,proxy-authenticate
,proxy-authorization
trailers
For
LbTrafficExtension
, header manipulation is also not supported for these::method
,:authority
,:scheme
, or host headers.When the
ext_proc
server specifies header values inHeaderMutation
, the load balancer ignores thevalue
field. Use theraw_value
field instead.
Limitations with HTTP/1.1 clients and backends
The following are the limitations with HTTP/1.1 clients and backends:
When you configure either
REQUEST_BODY
orRESPONSE_BODY
for an extension, if the load balancer receives a matching request, it removes theContent-Length
header from the response and switches to chunked body encoding.While streaming a message body to the
ext_proc
server, at the end, the load balancer might send a tailingProcessingRequest
message with an empty body withend_stream
set totrue
to indicate that the stream has ended.
Other limitations
The following is a limitation with ProcessingResponse
messages:
The maximum size of one
ProcessingResponse
message is 128KB. If a message received is over this limit, the stream is closed with aRESOURCE_EXHAUSTED
error.The callout backend service cannot use Google Cloud Armor, IAP, or Cloud CDN policies.
The callout backend service must use HTTP/2 as the protocol.
The callout backend service used by route extensions cannot override the processing mode of
ext_proc
stream.