Troubleshoot Cloud Run issues

This page shows you how to resolve issues with Cloud Run.

Deployment errors

This section lists issues that you might encounter with deployment and provides suggestions for how to fix each of them.

Container failed to start

The following error occurs when you try to deploy:

Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable.

To resolve this issue, rule out the following potential causes:

  1. Verify that you can run your container image locally. If your container image cannot run locally, you need to diagnose and fix the issue locally first.

  2. Check if your container is listening for requests on the expected port as documented in the container runtime contract. Your container must listen for incoming requests on the port that is defined by Cloud Run and provided in the PORT environment variable. See Configuring containers for instructions on how to specify the port.

  3. Check if your container is listening on all network interfaces, commonly denoted as 0.0.0.0.

  4. Verify that your container image is compiled for 64-bit Linux as required by the container runtime contract.

  5. Use Cloud Logging to look for application errors in stdout or stderr logs. You can also look for crashes captured in Error Reporting.

    You might need to update your code or your revision settings to fix errors or crashes. You can also troubleshoot your service locally.

Internal error, resource readiness deadline exceeded

The following error occurs when you try to deploy or try to call another Google Cloud API:

The server has encountered an internal error. Please try again later. Resource readiness deadline exceeded.

This issue might occur when the Cloud Run service agent does not exist, or when it does not have the Cloud Run Service Agent (roles/run.serviceAgent) role.

To verify that the Cloud Run service agent exists in your Cloud project and has the necessary role, perform the following steps:

  1. Open the Google Cloud console:

    Go to the Permissions page

  2. In the upper-right corner of the Permissions page, select the Include Google-provided role grants checkbox.

  3. In the Principals list, locate the ID of the Cloud Run service agent, which uses the ID
    service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com.

  4. Verify that the service agent has the Cloud Run Service Agent role. If the service agent does not have the role, grant it.

Error user 'root' is not found in /etc/passwd

The following error occurs when you try to deploy:

ERROR: "User \"root\""not found in /etc/passwd

The issue occurs when customer managed encryption keys are specified using a --key parameter

To resolve this issue, specify USER 0 instead of USER root in the Dockerfile.

Default Compute Engine service account is deleted

The following error occurs when you try to deploy:

ERROR: (gcloud.run.deploy) User EMAIL_ADDRESS does not have permission to access namespace NAMESPACE_NAME (or it may not exist): Permission 'iam.serviceaccounts.actAs' denied on service account PROJECT_NUMBER-compute@developer.gserviceaccount.com (or it may not exist).

This issue occurs in one of the following situations:

To resolve this issue:

  1. Specify a service account using the --service-account gcloud flag.
  2. Verify that the service account you specify has the permissions required to deploy.

If you want to verify if the default Compute Engine service agent exists in your Cloud project, perform the following steps:

  1. Open the Google Cloud console:

    Go to the Permissions page

  2. In the upper-right corner of the Permissions page, select the Include Google-provided role grants checkbox.

  3. In the Principals list, locate the ID of the Compute Engine service agent, which uses the ID
    PROJECT_NUMBER-compute@developer.gserviceaccount.com.

Cloud Run Service Agent doesn't have permission to read the image

The following error occurs when you try to deploy from PROJECT-ID using an image that is stored in Container Registry in PROJECT-ID-2:

Google Cloud Run Service Agent must have permission to read the image, gcr.io/PROJECT-ID/IMAGE-NAME. Ensure that the provided container image URL is correct and that above account has permission to access the image. If you just enabled the Cloud Run API, the permissions might take a few minutes to propagate. Note that PROJECT-ID/IMAGE-NAME is not in project PROJECT-ID-2. Permission must be granted to the Google Cloud Run Service Agent from this project.

To resolve this issue, follow these troubleshooting recommendations:

  • Follow the instructions for deploying container images from other Google Cloud projects to ensure that your principals have the necessary permissions.

  • This issue might also occur if the project is in a VPC-SC perimeter with a restriction on the Cloud Storage API that prohibits requests from the Cloud Run service agent. To fix this:

    1. Open Logs Explorer in the Google Cloud console. (Do not use the Logs page inside the Cloud Run page):

      Go to Logs Explorer

    2. Enter the following text in the query field:

      protoPayload.@type="type.googleapis.com/google.cloud.audit.AuditLog"
      severity=ERROR
      protoPayload.status.details.violations.type="VPC_SERVICE_CONTROLS"
      protoPayload.authenticationInfo.principalEmail="service-PROJECT_ID@serverless-robot-prod.iam.gserviceaccount.com"
      
    3. If you see any log entries after you use this query, examine the log entries to determine whether you need to update your VPC-SC policies. They may indicate that you need to add service-PROJECT_ID@serverless-robot-prod.iam.gserviceaccount.com to a pre-existing access policy.

Container import errors

The following error occurs when you try to deploy:

The service has encountered an error during container import. Please try again later. Resource readiness deadline exceeded.

To resolve this issue, rule out the following potential causes:

  1. Ensure container's file system does not contain non-utf8 characters.

  2. Some Windows based Docker images make use of foreign layers. Although Container Registry doesn't throw an error when foreign layers are present, Cloud Run's control plane does not support them. To resolve, you may try setting the --allow-nondistributable-artifacts flag in the Docker daemon.

Serving errors

This section lists issues that you might encounter with serving and provides suggestions for how to fix each of them.

HTTP 403: Client is not authorized to invoke/call the service

One of the following errors occurs during serving:

403 Forbidden
Your client does not have permission to get URL from this server.
The request was not authenticated. Either allow unauthenticated invocations or set the proper Authorization header
The request was not authorized to invoke this service

To resolve this issue:

  • If the service is meant to be invocable by anyone, update its IAM settings to make the service public.
  • If the service is meant to be invocable only by certain identities, make sure that you invoke it with the proper authorization token.
    • If invoked by a developer or invoked by an end user: Ensure the developer or user has the run.routes.invoke permission, which you can provide through the Cloud Run Admin (roles/run.admin) and Cloud Run Invoker (roles/run.invoker) role.
    • If invoked by a service account: Ensure the service account is a member of the Cloud Run service and has the Cloud Run Invoker (roles/run.invoker) role. Additionally, the Google-signed ID token must have the audience claim (aud) set to the URL of the receiving service.
  • If the project is within a VPC-SC perimeter, verify that VPC-SC policies are not denying run.googleapis.com/HttpIngress traffic that originates from the caller's IP or identity. To check if this is the case:

    1. Open Logs Explorer in the Google Cloud console (not the Logs page for Cloud Run):

      Go to Logs Explorer

    2. Enter the following text in the query field:

      resource.type="audited_resource"
      log_name="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy"
      resource.labels.method="run.googleapis.com/HttpIngress"
      
    3. If you see any log entries after you use this query, examine the log entries to determine if you need to update your VPC-SC policies.

HTTP 404: Not Found

The following issue occurs during serving:

You encounter an HTTP 404 error.

To resolve this issue:

  1. Verify that the app does not return 404 when you run it locally.
  2. Verify that the URL you are requesting is correct by checking the service detail page in the Cloud console or running the following command:

    gcloud run services describe SERVICE_NAME | grep URL
    
  3. Inspect where your app logic may be explicitly returning 404s

  4. Make sure your app does not start listening on its configured port before it is ready to receive requests.

HTTP 429: No available container instances

The following error occurs during serving:

HTTP 429
The request was aborted because there was no available instance.
The Cloud Run service might have reached its maximum container instance
limit or the service was otherwise not able to scale to incoming requests.
This might be caused by a sudden increase in traffic, a long container startup time or a long request processing time.

To resolve this issue, check the "Container instance count" metric for your service and consider increasing this limit if your usage is nearing the maximum. See "max instance" settings, and if you need more instances, request a quota increase.

HTTP 500: Cloud Run couldn't manage the rate of traffic

The following error occurs during serving:

HTTP 500
The request was aborted because there was no available instance

This error can be caused by one of the following:

To resolve this issue, address the previously listed issues.

In addition to fixing these issues, as a workaround you can implement exponential backoff and retries for requests that the client must not drop.

When the root cause of the issue is a period of heightened transient errors attributable solely to Cloud Run, you can contact Support

HTTP 500 / HTTP 503: Container instances are exceeding memory limits

The following error occurs during serving:

In Cloud Logging:

While handling this request, the container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision. If you see this message frequently, you may have a memory leak in your code or may need more memory. Consider creating a new revision with more memory.

To resolve this issue:

  1. Determine if your container instances are exceeding the available memory. Look for related errors in the varlog/system logs.
  2. If the instances are exceeding the available memory, consider increasing the memory limit.

Note that in Cloud Run, files written to the local filesystem count towards the available memory. This also includes any log files that are written to locations other than /var/log/* and /dev/log.

HTTP 503: Malformed response or container instance connection issue

One of the following errors occurs during serving:

HTTP 503
The request failed because either the HTTP response was malformed or connection to the instance had an error.
[CRITICAL] WORKER TIMEOUT

To resolve this issue, follow these troubleshooting recommendations:

  • Use Cloud Logging to look for out of memory errors in the logs. If you see error messages regarding container instances exceeding memory limits, follow the recommendations to resolve this issue.

  • If requests are terminating with error code 503 before reaching the request timeout set in Cloud Run, you might need to update the request timeout setting for your language framework:

  • In some instances a 503 error code can be an indirect result of a downstream network bottleneck, sometimes seen when load testing. For example, if your service routes traffic through a Serverless VPC Access connector ensure that the connector has not exceeded its throughput threshold by following these steps:

    1. Open Serverless VPC Access in the Google Cloud console:

      Go to Serverless VPC Access

    2. Check for any red bars in the throughput chart histogram. If there is a red bar consider increasing the max instances or instance type your connector uses. Alternatively, compress traffic sent through a Serverless VPC Access connector.

  • Setting a lower Cloud Run service concurrency configuration may alleviate 503 errors depending on the underlying application.

HTTP 503: Unable to process some requests due to high concurrency setting

The following errors occurs during serving:

HTTP 503
The Cloud Run service probably has reached its maximum container instance limit. Consider increasing this limit.

This issue occurs when your container instances are using a lot of CPU to process requests, and as a result, the container instances cannot process all of the requests, so some requests return a 503 error code.

To resolve this issue, try one or more of the following:

HTTP 504: Gateway timeout error

HTTP 504
The request has been terminated because it has reached the maximum request timeout.

If your service is processing long requests, you can increase the request timeout. If your service doesn't return a response within the time specified, the request ends and the service returns an HTTP 504 error, as documented in the container runtime contract.

Connection reset by peer

The following errors occurs during serving:

Connection reset by peer

This error occurs when an application has an established TCP connection with a peer across the network and that peer unexpectedly closes the connection.

To resolve this issue:

  • If you are trying to perform background work with CPU throttling, try using the "CPU is always allocated" CPU allocation setting.

  • Ensure that you are within the outbound requests timeouts. If your application maintains any connection in an idle state beyond this thresholds, the gateway needs to reap the connection.

  • By default, the TCP socket option keepalive is disabled for Cloud Run. There is no direct way to configure the keepalive option in Cloud Run at the service level, but you can enable the keepalive option for each socket connection by providing the correct socket options when opening a new TCP socket connection, depending on the client library that you are using for this connection in your application.

Identity token signature redacted by Google

The following errors occurs during serving:

SIGNATURE_REMOVED_BY_GOOGLE

This can occur during development and testing in the following circumstance:

  1. A user logs in using Google Cloud CLI or Cloud Shell.
  2. The user generates an ID token using gcloud commands.
  3. The user tries to use the ID token to invoke a non-public Cloud Run service.

This is by design. Google removes the token signature due to security concerns to prevent any non-public Cloud Run service from replaying ID tokens that are generated in this manner.

To resolve this issue, invoke your private service with a new ID token. Refer to testing authentication in your service for more information.

Issue caused by a limitation in the container sandbox

The following errors occurs during serving in the container sandbox:

Container Sandbox: Unsupported syscall setsockopt(0x3,0x1,0x6,0xc0000753d0,0x4,0x0)

If your container runs locally but fails in Cloud Run, the Cloud Run container sandbox might be responsible for the failure of your container.

To resolve this issue:

  1. Open Logs Explorer in the Google Cloud console (not the Logs page for Cloud Run):

    Go to Logs Explorer

  2. Enter the following text in the query field:

    resource.type="cloud_run_revision"
    logName="projects/PROJECT_ID/logs/run.googleapis.com%2Fvarlog%2Fsystem"
    
  3. If you find a Container Sandbox log with a DEBUG severity and you suspect that it is responsible for the failure of your container, contact Support and provide the log message in your support ticket.

    Google Cloud support might ask you to trace system calls made by your service to diagnose lower-level system calls that are not surfaced in Cloud Logging logs.

OpenBLAS warning in logs

If you use OpenBLAS-based libraries such as NumPy with the first generation execution environment, you might see the following warning in your logs:

OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

This is just a warning and it doesn't impact your service. This warning results because the container sandbox used by the first generation execution environment does not expose low level hardware features. You can optionally switch to the second generation execution environment if you don't want to have these warnings in your logs.

Spark fails when obtaining IP address of machine to bind to

One of the following errors occurs during serving:

assertion failed: Expected hostname (not IP) but got <IPv6 ADDRESS>
assertion failed: Expected hostname or IPv6 IP enclosed in [] but got <IPv6 ADDRESS>

To resolve this issue:

To change the environment variable value and resolve the issue, set ENV SPARK_LOCAL_IP="127.0.0.1" in your Dockerfile. In Cloud Run, if the variable SPARK_LOCAL_IP is not set, it will default to its IPv6 counterpart instead of localhost. Note that setting RUN export SPARK_LOCAL_IP="127.0.0.1" will not be available on runtime and Spark will act as if it was not set.

Mapping custom domains

Custom domain is stuck certificate provisioning state

One of the following errors occurs when you try to map a custom domain:

The domain is available over HTTP.  Waiting for certificate provisioning. You must configure your DNS records for certificate issuance to begin and to accept HTTP traffic.
Waiting for certificate provisioning. You must configure your DNS records for certificate issuance to begin.

To resolve this issue:

  • Wait at least 24 hours. Provisioning the SSL certificate usually takes about 15 minutes, but it can take up to 24 hours.
  • Verify that you've properly updated your DNS records at your domain registrar using the Google Admin Toolbox dig tool

    The DNS records in your domain registrar need to match what the Google Cloud console prompts you to add.

  • Confirm that the root of the domain is verified under your account using one of the following methods:

    • Follow the instructions for adding verified domain owners and check that your account is listed as a Verified Owner.
    • Visit the following URL:

      https://www.google.com/webmasters/verification/details?domain=ROOT_DOMAIN
      
  • Verify that the certificate for the domain is not expired. To find the expiry bounds, use the following command:

    echo | openssl s_client -servername 'ROOT_DOMAIN' -connect 'ROOT_DOMAIN:443' 2>/dev/null | openssl x509 -startdate -enddate -noout
    

Admin API

The feature is not supported in the declared launch stage

The following error occurs when you call the Cloud Run Admin API:

The feature is not supported in the declared launch stage

This error occurs when you call the Cloud Run Admin API directly and use a beta feature without specifying a launch stage annotation.

To resolve this issue, annotate the resource with a run.googleapis.com/launch-stage value of BETA in the request if any beta feature is used.

The following example adds a launch stage annotation to a service request:

kind: Service
metadata:
  annotations:
    run.googleapis.com/launch-stage: BETA

Troubleshooting network file system issues

Learn more about Using network file systems.

Cannot access files using NFS

Error Suggested remedy
mount.nfs: Protocol not supported Some base images, for example debian and adoptopenjdk/openjdk11, are missing dependency nfs-kernel-server.
mount.nfs: Connection timed out If the connection times out, make sure you are providing the correct IP address of the filestore instance.
mount.nfs: access denied by server while mounting IP_ADDRESS:/FILESHARE If access was denied by the server, check to make sure the file share name is correct.

Cannot access files using Cloud Storage FUSE

See Cloud Storage FUSE troubleshooting guide.