Troubleshoot Cloud Run issues

This page shows you how to resolve issues with Cloud Run.

Deployment errors

This section lists issues that you might encounter with deployment and provides suggestions for how to fix each of them.

Container failed to start

The following error occurs when you try to deploy:

Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable.

To resolve this issue, rule out the following potential causes:

  1. Verify that you can run your container image locally. If your container image cannot run locally, you need to diagnose and fix the issue locally first.

  2. Check if your container is listening for requests on the expected port as documented in the container runtime contract. Your container must listen for incoming requests on the port that is defined by Cloud Run and provided in the PORT environment variable. See Configuring containers for instructions on how to specify the port.

  3. Check if your container is listening on all network interfaces, commonly denoted as 0.0.0.0.

  4. Verify that your container image is compiled for 64-bit Linux as required by the container runtime contract.

  5. Use Cloud Logging to look for application errors in stdout or stderr logs. You can also look for crashes captured in Error Reporting.

    You might need to update your code or your revision settings to fix errors or crashes. You can also troubleshoot your service locally.

Internal error, resource readiness deadline exceeded

The following error occurs when you try to deploy or try to call another Google Cloud API:

The server has encountered an internal error. Please try again later. Resource readiness deadline exceeded.

This issue might occur when the Cloud Run service agent does not exist, or when it does not have the Cloud Run Service Agent (roles/run.serviceAgent) role.

To verify that the Cloud Run service agent exists in your Cloud project and has the necessary role, perform the following steps:

  1. Open the Cloud Console:

    Go to the Permissions page

  2. In the upper-right corner of the Permissions page, select the Include Google-provided role grants checkbox.

  3. In the Principals list, locate the ID of the Cloud Run service agent, which uses the ID
    service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com.

  4. Verify that the service agent has the Cloud Run Service Agent role. If the service agent does not have the role, grant it.

Default Compute Engine service account is deleted

The following error occurs when you try to deploy:

ERROR: (gcloud.run.deploy) User EMAIL_ADDRESS does not have permission to access namespace NAMESPACE_NAME (or it may not exist): Permission 'iam.serviceaccounts.actAs' denied on service account PROJECT_NUMBER-compute@developer.gserviceaccount.com (or it may not exist).

This issue occurs in one of the following situations:

To resolve this issue:

  1. Specify a service account using the --service-account gcloud flag.
  2. Verify that the service account you specify has the permissions required to deploy.

If you want to verify if the default Compute Engine service agent exists in your Cloud project, perform the following steps:

  1. Open the Cloud Console:

    Go to the Permissions page

  2. In the upper-right corner of the Permissions page, select the Include Google-provided role grants checkbox.

  3. In the Principals list, locate the ID of the Compute Engine service agent, which uses the ID
    PROJECT_NUMBER-compute@developer.gserviceaccount.com.

Cloud Run Service Agent doesn't have permission to read the image

The following error occurs when you try to deploy from PROJECT-ID using an image that is stored in Container Registry in PROJECT-ID-2:

Google Cloud Run Service Agent must have permission to read the image, gcr.io/PROJECT-ID/IMAGE-NAME. Ensure that the provided container image URL is correct and that above account has permission to access the image. If you just enabled the Cloud Run API, the permissions might take a few minutes to propagate. Note that PROJECT-ID/IMAGE-NAME is not in project PROJECT-ID-2. Permission must be granted to the Google Cloud Run Service Agent from this project.

To resolve this issue, follow the instructions for deploying container images from other Google Cloud projects to ensure that your principals have the necessary permissions.

Serving errors

This section lists issues that you might encounter with serving and provides suggestions for how to fix each of them.

HTTP 403: Client is not authorized to invoke/call the service

One of the following errors occurs during serving:

403 Forbidden
Your client does not have permission to get URL from this server.
The request was not authenticated. Either allow unauthenticated invocations or set the proper Authorization header
The request was not authorized to invoke this service

To resolve this issue:

  • If the service is meant to be invocable by anyone, update its IAM settings to make the service public.
  • If the service is meant to be invocable only by certain identities, make sure that you invoke it with the proper authorization token.
    • If invoked by a developer or invoked by an end user: Ensure the developer or user has the run.routes.invoke permission, which you can provide through the Cloud Run Admin (roles/run.admin) and Cloud Run Invoker (roles/run.invoker) role.
    • If invoked by a service account: Ensure the service account is a member of the Cloud Run service and has the Cloud Run Invoker (roles/run.invoker) role. Additionally, the Google-signed ID token must have the audience claim (aud) set to the URL of the receiving service.
  • If the project is within a VPC-SC perimeter, verify that VPC-SC policies are not denying run.googleapis.com/HttpIngress traffic that originates from the caller's IP or identity. To check if this is the case:

    1. Open Logs Explorer in the Cloud Console (not the Logs page for Cloud Run):

      Go to Logs Explorer

    2. Enter the following text in the query field:

      resource.type="audited_resource"
      log_name="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy"
      resource.labels.method="run.googleapis.com/HttpIngress"
      
    3. If you see any log entries after you use this query, examine the log entries to determine if you need to update your VPC-SC policies.

HTTP 404: Not Found

The following issue occurs during serving:

You encounter an HTTP 404 error.

To resolve this issue:

  1. Verify that the app does not return 404 when you run it locally.
  2. Verify that the URL you are requesting is correct by checking the service detail page in the Cloud Console or running the following command:

    gcloud run services describe SERVICE_NAME | grep URL
    
  3. Inspect where your app logic may be explicitly returning 404s

HTTP 429: Service reached its maximum number of container instances

The following error occurs during serving:

HTTP 429
The request was aborted because there was no available instance.
The Cloud Run service probably has reached its maximum container instance limit. Consider increasing this limit. This error can also be caused by a sudden increase in traffic, a long container startup time or a long request processing time.

To resolve this issue, increase the "max instance" settings, or, if you need more than 1000 instances, request a quota increase.

HTTP 500: Cloud Run couldn't manage the rate of traffic

The following error occurs during serving:

HTTP 500
The request was aborted because there was no available instance

This error can be caused by one of the following:

To resolve this issue, address the previously listed issues.

In addition to fixing these issues, as a workaround you can implement exponential backoff and retries for requests that the client must not drop.

When the root cause of the issue is a period of heightened transient errors attributable solely to Cloud Run, you can contact Support

HTTP 500: Container instances are exceeding memory limits

The following error occurs during serving:

In Cloud Logging:

While handling this request, the container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision. If you see this message frequently, you may have a memory leak in your code or may need more memory. Consider creating a new revision with more memory.

To resolve this issue:

  1. Determine if your container instances are exceeding the available memory. Look for related errors in the varlog/system logs.
  2. If the instances are exceeding the available memory, consider increasing the memory limit.

Note that in Cloud Run, files written to the local filesystem count towards the available memory. This also includes any log files that are written to locations other than /var/log/* and /dev/log.

HTTP 503: Long running requests are timing out

One of the following errors occurs during serving:

HTTP 503
The request failed because either the HTTP response was malformed or connection to the instance had an error.
[CRITICAL] WORKER TIMEOUT

To resolve this issue:

  • If your service is processing long requests, you can increase the request timeout. If your service doesn't return a response within the time specified, the request ends and the service returns an HTTP 504 error.

  • If requests are terminating earlier with error code 503, you might need to update the request timeout setting for your language framework:

HTTP 503: Unable to process some requests due to high concurrency setting

The following errors occurs during serving:

HTTP 503
The Cloud Run service probably has reached its maximum container instance limit. Consider increasing this limit.

This issue occurs when your container instances are using a lot of CPU to process requests, and as a result, the container instances cannot process all of the requests, so some requests return a 503 error code.

To resolve this issue, try one or more of the following:

Connection reset by peer

The following errors occurs during serving:

Connection reset by peer

This error occurs when an application has an established TCP connection with a peer across the network and that peer unexpectedly closes the connection.

To resolve this issue:

  • If you are trying to perform background work with CPU throttling, try using the "CPU is always allocated" CPU allocation setting.

  • Ensure that you are within the outbound requests timeouts. If your application maintains any connection in an idle state beyond this thresholds, the gateway needs to reap the connection.

  • By default, the TCP socket option keepalive is disabled for Cloud Run. There is no direct way to configure the keepalive option in Cloud Run at the service level, but you can enable the keepalive option for each socket connection by providing the correct socket options when opening a new TCP socket connection, depending on the client library that you are using for this connection in your application.

Identity token signature redacted by Google

The following errors occurs during serving:

SIGNATURE_REMOVED_BY_GOOGLE

This can occur during development and testing in the following circumstance:

  1. A user logs in using gcloud command-line tool or Cloud Shell.
  2. The user generates an ID token using gcloud commands.
  3. The user tries to use the ID token to invoke a non-public Cloud Run service.

This is by design. Google removes the token signature due to security concerns to prevent any non-public Cloud Run service from replaying ID tokens that are generated in this manner.

To resolve this issue, invoke your private service with a new ID token. Refer to testing authentication in your service for more information.

Issue caused by a limitation in the container sandbox

The following errors occurs during serving in the container sandbox:

Container Sandbox: Unsupported syscall setsockopt(0x3,0x1,0x6,0xc0000753d0,0x4,0x0)

If your container runs locally but fails in Cloud Run, the Cloud Run container sandbox might be responsible for the failure of your container.

To resolve this issue:

  1. Open Logs Explorer in the Cloud Console (not the Logs page for Cloud Run):

    Go to Logs Explorer

  2. Enter the following text in the query field:

    resource.type="cloud_run_revision"
    logName="projects/PROJECT_ID/logs/run.googleapis.com%2Fvarlog%2Fsystem"
    
  3. If you find a Container Sandbox log with a DEBUG severity and you suspect that it is responsible for the failure of your container, contact Support and provide the log message in your support ticket.

    Google Cloud support might ask you to trace system calls made by your service to diagnose lower-level system calls that are not surfaced in Cloud Logging logs.

OpenBLAS warning in logs

If you use OpenBLAS-based libraries such as NumPy with the first generation execution environment, you might see the following warning in your logs:

OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

This is just a warning and it doesn't impact your service. This warning results because the container sandbox used by the first generation execution environment does not expose low level hardware features. You can optionally switch to the second generation execution environment if you don't want to have these warnings in your logs.

Mapping custom domains

Custom domain is stuck certificate provisioning state

One of the following errors occurs when you try to map a custom domain:

The domain is available over HTTP.  Waiting for certificate provisioning. You must configure your DNS records for certificate issuance to begin and to accept HTTP traffic.
Waiting for certificate provisioning. You must configure your DNS records for certificate issuance to begin.

To resolve this issue:

  • Wait at least 24 hours. Provisioning the SSL certificate usually takes about 15 minutes, but it can take up to 24 hours.
  • Verify that you've properly updated your DNS records at your domain registrar using the Google Admin Toolbox dig tool

    The DNS records in your domain registrar need to match what the Cloud Console prompts you to add.

  • Confirm that the root of the domain is verified under your account using one of the following methods:

    • Follow the instructions for adding verified domain owners and check that your account is listed as a Verified Owner.
    • Visit the following URL:

      https://www.google.com/webmasters/verification/details?domain=ROOT_DOMAIN
      
  • Verify that the certificate for the domain is not expired. To find the expiry bounds, use the following command:

    echo | openssl s_client -servername 'ROOT_DOMAIN' -connect 'ROOT_DOMAIN:443' 2>/dev/null | openssl x509 -startdate -enddate -noout
    

Admin API

The feature is not supported in the declared launch stage

The following error occurs when you call the Cloud Run Admin API:

The feature is not supported in the declared launch stage

This error occurs when you call the Cloud Run Admin API directly and use a beta feature without specifying a launch stage annotation.

To resolve this issue, annotate the resource with a run.googleapis.com/launch-stage value of BETA in the request if any beta feature is used.

The following example adds a launch stage annotation to a service request:

kind: Service
metadata:
  annotations:
    run.googleapis.com/launch-stage: BETA