Troubleshooting a StreamingPull subscription

This document provides some common troubleshooting tips for Pub/Sub StreamingPull subscriptions. StreamingPull subscriptions use the StreamingPull API.

Read more about StreamingPull subscriptions in the Pull subscriber guide.

StreamingPull has a 100% error rate

StreamingPull streams always close with a non-OK status. Unlike in unary RPCs, this status for StreamingPull is simply an indication that the stream is broken. The requests are not failing. Therefore, while the StreamingPull API might have a surprising 100% error rate, this behavior is by design.

Failed precondition, unavailable, or not found errors

Since StreamingPull streams always close with an error, it isn't helpful to examine stream termination metrics while diagnosing errors. Rather, focus on the StreamingPull response metric such as subscription/streaming_pull_response_count.

Look for these errors:

  • Failed precondition errors that can occur in these cases:

    • Pub/Sub attempts to decrypt a message with a disabled Cloud KMS key.

    • Subscriptions are temporarily suspended if there are messages in the subscription backlog that are encrypted with a disabled Cloud KMS key.

  • Unavailable errors that can occur when Pub/Sub is unable to process a request. This is most likely a transient condition and the client library retries the requests.

  • Not found errors that can occur when the subscription is deleted or it never existed in the first place. The latter case happens when you provide an invalid subscription path.

Large backlog of small messages in a StreamingPull subscription

The gRPC StreamingPull stack is optimized for high throughput and therefore buffers messages. If you're attempting to process large backlogs of small messages, you might see messages delivered multiple times. These messages might not be load-balanced effectively across clients.

The buffer between the Pub/Sub service and the client library user space is roughly 10 MB. To understand the effect of this buffer on client library behavior, consider this example:

  • There's a backlog of 10,000 1-KB messages on a subscription.

  • Each message takes one second for sequential processing by a single-threaded client instance.

  • The first client instance to establish a StreamingPull connection to the service for that subscription fills its buffer with all 10,000 messages.

  • It takes 10,000 seconds (almost three hours) to process the buffer.

  • In that time, some buffered messages exceed their acknowledgment deadlines and are resent to the same client, resulting in duplicates.

  • When multiple client instances are running, the messages stuck in the one client's buffer are not available to any other client instance. This situation does not occur if you use flow control for StreamingPull. The service never has the entire 10 MB of messages at a time and is able to effectively load balance messages across multiple subscribers.