Troubleshooting

Learn about troubleshooting steps that you might find helpful if you run into problems using Pub/Sub.

Cannot create a subscription

Check that you have done the following:

  • Specified a subscription name in the name field. For version v1beta2 and above, the subscription name must be in the form projects/project-identifier/subscriptions/subscription-name.
  • Specified the name of an existing topic to which you want to subscribe, in the topic field. For version v1beta2 and above, the topic name must be in the form projects/project-identifier/topics/topic-name.
  • Specified https:// in lower case (not http:// or HTTPS://) as the protocol for your receiving URL in the pushEndpoint field.

403 (Forbidden) error

If you get this error, do the following:

  • Make sure you've enabled the Pub/Sub API in the Google Cloud console.
  • Make sure that the principal making the request has the required permissions on the relevant Pub/Sub API resources, especially if you are using Pub/Sub API for cross-project communication.
  • If you're using Dataflow, make sure that both <projectId>@cloudservices.gserviceaccount.com and the Compute Engine Service account <projectId>-compute@developer.gserviceaccount.com have the required permissions on the relevant Pub/Sub API resource. See Dataflow Security and Permissions for more information.
  • If you're using App Engine, check your project's Permissions page to see if an App Engine Service Account is listed as an Editor. If it is not, add your App Engine Service Account as an Editor. Normally, the App Engine Service Account is of the form <project-id>@appspot.gserviceaccount.com.

Dealing with duplicates and forcing retries

When you do not acknowledge a message before its acknowledgement deadline has expired, Pub/Sub resends the message. As a result, Pub/Sub can send duplicate messages. Use Cloud Monitoring to monitor acknowledge operations with the expired response code to detect this condition. To get this data, select the subscription/expired_ack_deadlines_count metric.

Use Cloud Monitoring to to search for expired message acknowledgement deadlines

To reduce the duplication rate, extend the message deadline.

  • Client libraries handle deadline extension automatically, but you should note that there are default limits on the maximum extension deadline that can be configured.
  • If you are building your own client library, use the modifyAckDeadline method to extend the acknowledgement deadline.

Alternately, to force Pub/Sub to retry a message, set modifyAckDeadline to 0.

Publish operations fail with DEADLINE_EXCEEDED

This is likely caused by a client side bottleneck, such as insufficient service CPUs, bad thread health, or network congestion. If a Publish call returns DEADLINE_EXCEEDED, asynchronous Publish calls are being enqueued faster than they are sent to the service, which progressively increases the request latency. To determine the Publish throughput with a single VM for different parameters (cores, workers, message size) see Testing Cloud Pub/Sub clients to maximize streaming performance. Alternatively, you could be running a version of the client library with a known issue; check the issue tracker for your client library from the list.

Finally, you could be setting a deadline lower than Pub/Sub's typical publish latency performance (recommended is to set the initial deadline to 10 seconds and the total timeout to 600 seconds).

Check if any of the following helps:

  • Check whether you are publishing messages faster than the client can send them. Usually each asynchronous Publish call returns a Future object. To track the number of messages waiting to be sent, store the number of messages to be sent with this Publish call and delete it only in the callback of the Future object. When you make Publish calls faster than the corresponding requests to the Pub/Sub service can be completed, the latency for an individual Publish call made later will drastically increase.
  • Check that you have sufficient upload bandwidth between the machine where the publisher is running and Google Cloud. It is common for WiFi networks used for development to have bandwidth of 1-10MB/s, or 1000-10000 typical messages per second. Publishing these messages in a loop, without any rate limiting, might create a short burst of high bandwidth over a short time period. You might get more bandwidth by running the publisher on a machine within Google Cloud or reducing the rate at which you publish the messages to match your available bandwidth.
  • Ensure you have a long enough timeout for the call defined in the retry settings. There will be cases where spikes in request latency lead to errors even if you are not accumulating a large backlog of unsent messages. Increasing the initial deadline to 10 seconds and the total timeout to 600 seconds leads to a drop in the rate of timeouts. Note that if your issues are caused by a persistent bottleneck, rather than occasional timeouts, retrying more times will lead to more errors.
  • Check whether you see very high latency between your host and Google Cloud for any of the reasons like startup network congestion or firewalls. Calculating network throughput has pointers on finding out your bandwidth and latency for different scenarios.
  • Upgrade to the latest version of the client library. Make sure that you have picked up any relevant updates that might include fixes like performance improvements.
  • Check whether the VM hosting the publisher client is running out of resources, including CPU, RAM and threads. It could be that the call is waiting a long time to be scheduled before actually making a request to the service.
  • Ultimately, there are limits to how much data a single machine can publish. You may need to try to scale horizontally or run multiple instances of the publisher client on several machines. Testing Cloud Pub/Sub clients to maximize streaming performance demonstrates how Pub/Sub scales on a single Google Cloud VM with increasing number of CPUs. For instance, you can achieve 500 MB/s to 700 MB/s for 1KB messages on a 16 core Compute Engine instance.

Using excessive administrative operations

If you find that you're using up too much of your quota for administrative operations, you might need to refactor your code. As an illustration, consider this pseudo-code. In this example, an administrative operation (GET) is being used to check for the presence of a subscription before it attempts to consume its resources. Both GET and CREATE are admin operations:


    if !GetSubscription my-sub {
     CreateSubscription my-sub
    }
    Consume from subscription my-sub
            

A more efficient pattern is to try to consume messages from the subscription (assuming that you can be reasonably sure of the subscription's name). In this optimistic approach, you only get or create the subscription if there is an error. Consider this example:

    try {
      Consume from subscription my-sub
    } catch NotFoundError {
      CreateSubscription my-sub
      Consume from subscription my-sub
    }