This page shows you how to resolve issues with Cloud Run.
For other issues not listed below, check to see if they may be a known issue.
Deployment errors
This section lists issues that you might encounter with deployment and provides suggestions for how to fix each of them.
Container failed to start
The following error occurs when you try to deploy:
Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable.
To resolve this issue, rule out the following potential causes:
Verify that you can run your container image locally. If your container image cannot run locally, you need to diagnose and fix the issue locally first.
Check if your container is listening for requests on the expected port as documented in the container runtime contract. Your container must listen for incoming requests on the port that is defined by Cloud Run and provided in the
PORT
environment variable. See Configuring containers for instructions on how to specify the port.Check if your container is listening on all network interfaces, commonly denoted as
0.0.0.0
.Verify that your container image is compiled for 64-bit Linux as required by the container runtime contract.
Use Cloud Logging to look for application errors in
stdout
orstderr
logs. You can also look for crashes captured in Error Reporting.You might need to update your code or your revision settings to fix errors or crashes. You can also troubleshoot your service locally.
Internal error, resource readiness deadline exceeded
The following error occurs when you try to deploy or try to call another Google Cloud API:
The server has encountered an internal error. Please try again later. Resource readiness deadline exceeded.
This issue might occur when the Cloud Run
service agent does not exist, or when it does not
have the Cloud Run Service Agent (roles/run.serviceAgent
) role.
To verify that the Cloud Run service agent exists in your Google Cloud project and has the necessary role, perform the following steps:
Open the Google Cloud console:
In the upper-right corner of the Permissions page, select the Include Google-provided role grants checkbox.
In the Principals list, locate the ID of the Cloud Run service agent, which uses the ID
service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com
.Verify that the service agent has the Cloud Run Service Agent role. If the service agent does not have the role, grant it.
Error user 'root' is not found in /etc/passwd
The following error occurs when you try to deploy:
ERROR: "User \"root\""not found in /etc/passwd
The issue occurs when customer managed encryption keys are specified using a --key parameter
To resolve this issue, specify USER 0
instead of USER root
in the
Dockerfile.
Default Compute Engine service account is deleted
The following error occurs when you try to deploy:
ERROR: (gcloud.run.deploy) User EMAIL_ADDRESS does not have permission to access namespace NAMESPACE_NAME (or it may not exist): Permission 'iam.serviceaccounts.actAs' denied on service account PROJECT_NUMBER-compute@developer.gserviceaccount.com (or it may not exist).
This issue occurs in one of the following situations:
- The
default Compute Engine service account
does not exist in the project, and no service account is specified with the
--service-account
gcloud
flag at the time of deployment. - The developer or principal deploying the service does not have the permissions for the default Compute Engine service account that are required to deploy.
To resolve this issue:
- Specify a service account using the
--service-account
gcloud
flag. - Verify that the service account you specify has the permissions required to deploy.
If you want to verify if the default Compute Engine service agent exists in your Google Cloud project, perform the following steps:
Open the Google Cloud console:
In the upper-right corner of the Permissions page, select the Include Google-provided role grants checkbox.
In the Principals list, locate the ID of the Compute Engine service agent, which uses the ID
PROJECT_NUMBER-compute@developer.gserviceaccount.com
.
Cloud Run Service Agent doesn't have permission to read the image
The following error occurs when you try to deploy from PROJECT-ID using an image that is stored in Container Registry in PROJECT-ID-2:
Google Cloud Run Service Agent must have permission to read the image, gcr.io/PROJECT-ID/IMAGE-NAME. Ensure that the provided container image URL is correct and that above account has permission to access the image. If you just enabled the Cloud Run API, the permissions might take a few minutes to propagate. Note that PROJECT-ID/IMAGE-NAME is not in project PROJECT-ID-2. Permission must be granted to the Google Cloud Run Service Agent from this project.
To resolve this issue, follow these troubleshooting recommendations:
Follow the instructions for deploying container images from other Google Cloud projects to ensure that your principals have the necessary permissions.
This issue might also occur if the project is in a VPC Service Controls perimeter with a restriction on the Cloud Storage API that prohibits requests from the Cloud Run service agent. To fix this:
Open Logs Explorer in the Google Cloud console. (Do not use the Logs page inside the Cloud Run page):
Enter the following text in the query field:
protoPayload.@type="type.googleapis.com/google.cloud.audit.AuditLog" severity=ERROR protoPayload.status.details.violations.type="VPC_SERVICE_CONTROLS" protoPayload.authenticationInfo.principalEmail="service-PROJECT_ID@serverless-robot-prod.iam.gserviceaccount.com"
If you see any log entries after you use this query, examine the log entries to determine whether you need to update your VPC Service Controls policies. They may indicate that you need to add
service-PROJECT_ID@serverless-robot-prod.iam.gserviceaccount.com
to a pre-existing access policy.
Container import errors
The following error occurs when you try to deploy:
The service has encountered an error during container import. Please try again later. Resource readiness deadline exceeded.
To resolve this issue, rule out the following potential causes:
Ensure container's file system does not contain non-utf8 characters.
Some Windows based Docker images make use of foreign layers. Although Container Registry doesn't throw an error when foreign layers are present, Cloud Run's control plane does not support them. To resolve, you may try setting the
--allow-nondistributable-artifacts
flag in the Docker daemon.
Serving errors
This section lists issues that you might encounter with serving and provides suggestions for how to fix each of them.
HTTP 401: Client is not authenticated properly
The following error occurs during serving:
The request was not authorized to invoke this service
To resolve this issue:
If invoked by a service account, the audience claim (
aud
) of the Google-signed ID token must be set to the following:- The Cloud Run URL of the receiving service, using the
form
https://service-xyz.run.app
.- The Cloud Run service must require authentication.
- The Cloud Run service can be invoked by the Cloud Run URL or through a load balancer URL.
- The Client ID of an OAuth 2.0 Client ID with type Web application, using the
form
nnn-xyz.apps.googleusercontent.com
.- The Cloud Run service can be invoked through an HTTPS load balancer secured by IAP.
- This is great for a GCLB backed by multiple Cloud Run services in different regions.
- A configured custom audience using the exact values provided. For example,
if custom audience is
service.example.com
, the audience claim (aud
) value must also beservice.example.com
. If custom audience ishttps://service.example.com
, the audience claim value must also behttps://service.example.com
.
- The Cloud Run URL of the receiving service, using the
form
The jwt.io tool is good for checking claims on a JWT.
If the auth token is of an invalid format a
401
error occurs. If the token is of a valid format and the IAM member used to generate the token is missing therun.routes.invoke
permission, a403
error occurs.
HTTP 403: Client is not authorized to invoke or call the service
The following error might or might not be in Cloud Logging with resource.type = "cloud_run_revision":
The request was not authenticated. Either allow unauthenticated invocations or set the proper Authorization header
The following error is present in the HTTP response returned to the client:
403 Forbidden Your client does not have permission to get URL from this server.
To resolve this issue when the resource.type = "cloud_run_revision" Cloud Logging error is present:
- If the service is meant to be invocable by anyone, update its IAM settings to make the service public.
- If the service is meant to be invocable only by certain identities, make
sure that you
invoke it with the proper authorization token.
- If
invoked by a developer
or
invoked by an end user:
Ensure that the developer or user has the
run.routes.invoke
permission, which you can provide through the Cloud Run Admin (roles/run.admin
) and Cloud Run Invoker (roles/run.invoker
) roles. - If
invoked by a service account:
Ensure that the service account is a member of the Cloud Run
service and that it has the Cloud Run Invoker (
roles/run.invoker
) role. - Calls missing an auth token or with an auth token that is of valid
format, but the IAM member used to generate the token is missing the
run.routes.invoke
permission, result in this403
error.
- If
invoked by a developer
or
invoked by an end user:
Ensure that the developer or user has the
To resolve this issue when the resource.type = "cloud_run_revision" Cloud Logging error is not present:
- A 403 status code can be returned when a service has ingress
configured to
All
, but was blocked due to VPC Service Controls. See the next section on 404 errors for more information on troubleshooting VPC Service Controls denials.
HTTP 404: Not Found
The following issue occurs during serving:
You encounter an HTTP 404
error.
To resolve this issue:
Verify that the URL you are requesting is correct by checking the service detail page in the Cloud console or by running the following command:
gcloud run services describe SERVICE_NAME | grep URL
Inspect where your app logic might be returning
404
error codes. If your app is returning the404
, it will be visible in Cloud Logging.Make sure your app does not start listening on its configured port before it is ready to receive requests.
Verify that the app does not return a
404
error code when you run it locally.
A 404
is returned when a Cloud Run service's
ingress settings are set to
"Internal" or "Internal and Cloud Load Balancing" and a request does not
satisfy the specified network restriction. In this scenario, the request
does not reach the container and the 404
is not present in
Cloud Logging with the following filter:
resource.type="cloud_run_revision"
log_name="projects/PROJECT_ID/logs/run.googleapis.com%2Frequests"
httpRequest.status=404
With the same ingress settings the request might be blocked by VPC Service Controls based on the caller's context including project and IP address. To check for a VPC Service Controls policy violation:
Open Logs Explorer in the Google Cloud console (not the Logs page for Cloud Run):
Enter the following text in the query field:
resource.type="audited_resource" log_name="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy" resource.labels.method="run.googleapis.com/HttpIngress"
If you see any log entries after you use this query, examine the log entries to determine whether or not you need to update your VPC Service Controls policies.
HTTP 429: No available container instances
The following error occurs during serving:
HTTP 429 The request was aborted because there was no available instance. The Cloud Run service might have reached its maximum container instance limit or the service was otherwise not able to scale to incoming requests. This might be caused by a sudden increase in traffic, a long container startup time or a long request processing time.
To resolve this issue, check the "Container instance count" metric for your service and consider increasing this limit if your usage is nearing the maximum. See "max instance" settings, and if you need more instances, request a quota increase.
HTTP 500: Cloud Run couldn't manage the rate of traffic
The following error occurs during serving and can also occur when the service has not reached its maximum container instance limit:
HTTP 500 The request was aborted because there was no available instance
This error can be caused by one of the following:
- A huge sudden increase in traffic.
- A long cold start time.
- A long request processing time, or a sudden increase in request processing time.
- The service reaching its maximum container instance limit (
HTTP 429
). - Transient factors attributed to the Cloud Run service.
To resolve this issue, address the previously listed issues.
In addition to fixing these issues, as a workaround you can implement exponential backoff and retries for requests that the client must not drop.
Note that a short and sudden increase in traffic or request processing time might only be visible in Cloud Monitoring if you zoom in to 10 second resolution.
When the root cause of the issue is a period of heightened transient errors attributable solely to Cloud Run, you can contact Support
HTTP 500 / HTTP 503: Container instances are exceeding memory limits
The following error occurs during serving:
In Cloud Logging:
While handling this request, the container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision. If you see this message frequently, you may have a memory leak in your code or may need more memory. Consider creating a new revision with more memory.
To resolve this issue:
- Determine if your container instances are exceeding the available memory.
Look for related errors in the
varlog/system
logs. - If the instances are exceeding the available memory, consider increasing the memory limit.
Note that in Cloud Run, files written to the local filesystem count
towards the available memory. This also includes any log files that are written
to locations other than /var/log/*
and /dev/log
.
HTTP 503: Malformed response or container instance connection issue
One of the following errors occurs during serving:
HTTP 503 The request failed because either the HTTP response was malformed or connection to the instance had an error.
To resolve this issue, follow these troubleshooting recommendations:
Check Cloud Logging Use Cloud Logging to look for out of memory errors in the logs. If you see error messages regarding container instances exceeding memory limits, follow the recommendations to resolve this issue.
App-level timeouts If requests are terminating with error code
503
before reaching the request timeout set in Cloud Run, you might need to update the request timeout setting for your language framework:- Node.js developers might need to update the
server.timeout
property viaserver.setTimeout
(useserver.setTimeout(0)
to achieve an unlimited timeout) depending on the version you are using. [CRITICAL] WORKER TIMEOUT
Python developers need to update Gunicorn's default timeout.
- Node.js developers might need to update the
Downstream network bottleneck In some instances a
503
error code can result indirectly from a downstream network bottleneck, such as during load testing. For example, if your service routes traffic through a Serverless VPC Access connector, ensure that the connector has not exceeded its throughput threshold by following these steps:Open Serverless VPC Access in the Google Cloud console:
Check for any red bars in the throughput chart histogram. If there is a red bar consider increasing the max instances or instance type your connector uses. Alternatively, compress traffic sent through a Serverless VPC Access connector.
Inbound request limit to a single container There is a known issue where there are high request rates per container that will result in this
503
error. If a container instance receives more than 800 requests per second, the available TCP sockets can be exhausted. To remedy this, try any of the following:Turn on HTTP/2 for your service and make any needed changes to your service to support HTTP/2.
Avoid sending more than 800 requests/second to a single container instance by lowering the configured concurrency. Use the following equation to get an estimate on request rate per container instance:
requests/sec/container_instance ~= concurrency * (1sec / median_request_latency)
HTTP 503: Unable to process some requests due to high concurrency setting
The following errors occurs during serving:
HTTP 503 The Cloud Run service probably has reached its maximum container instance limit. Consider increasing this limit.
This issue occurs when your container instances are using a lot of CPU to
process requests, and as a result, the container instances cannot process all of
the requests, so some requests return a 503
error code.
To resolve this issue, try one or more of the following:
Increase the maximum number of container instances for your service.
Lower the service's concurrency. Refer to setting concurrency for more detailed instructions.
HTTP 504: Gateway timeout error
HTTP 504 The request has been terminated because it has reached the maximum request timeout.
If your service is processing long requests, you can
increase the request timeout. If your
service doesn't return a response within the time specified, the request ends
and the service returns an HTTP 504
error, as documented in the
container runtime contract.
To troubleshoot this issue, try one or more of the following:
Instrument logging and tracing to understand where your app is spending time before exceeding your configured request timeout.
Outbound connections are reset occasionally, due to infrastructure updates. If your application reuses long-lived connections, then we recommend that you configure your application to re-establish connections to avoid the reuse of a dead connection.
- Depending on your app's logic or error handling, a
504
error might be a signal that your application is trying to reuse a dead connection and the request blocks until your configured request timeout. - You can use a liveness probe to help terminate an instance that returns persistent errors.
- Depending on your app's logic or error handling, a
Out of memory errors that happen inside the app's code, for example,
java.lang.OutOfMemoryError
, do not necessarily terminate a container instance. If memory usage does not exceed the container memory limit, then the instance will not be terminated. Depending on how the app handles app-level out of memory errors, requests might hang until they exceed your configured request timeout.- If you want the container instance to terminate in the above scenario, then configure your app-level memory limit to be greater than your container memory limit.
- You can use a liveness probe to help terminate an instance that returns persistent errors.
Connection reset by peer
One of the following errors occurs during serving:
Connection reset by peer
asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation
grpc.StatusRuntimeException: UNAVAILABLE: io exception
psycopg.OperationalError: the connection is closed
ECONNRESET
This error occurs when an application has an established TCP connection with a peer across the network and that peer unexpectedly closes the connection.
To resolve this issue:
If you are trying to perform background work with CPU throttling, try using the "CPU is always allocated" CPU allocation setting.
Ensure that you are within the outbound requests timeouts. If your application maintains any connection in an idle state beyond this thresholds, the gateway needs to reap the connection.
By default, the TCP socket option
keepalive
is disabled for Cloud Run. There is no direct way to configure thekeepalive
option in Cloud Run at the service level, but you can enable thekeepalive
option for each socket connection by providing the correct socket options when opening a new TCP socket connection, depending on the client library that you are using for this connection in your application.Occasionally outbound connections will be reset due to infrastructure updates. If your application reuses long-lived connections, then we recommend that you configure your application to re-establish connections to avoid the reuse of a dead connection.
Identity token signature redacted by Google
The following errors occurs during serving:
SIGNATURE_REMOVED_BY_GOOGLE
This can occur during development and testing in the following circumstance:
- A user logs in using Google Cloud CLI or Cloud Shell.
- The user generates an ID token using
gcloud
commands. - The user tries to use the ID token to invoke a non-public Cloud Run service.
This is by design. Google removes the token signature due to security concerns to prevent any non-public Cloud Run service from replaying ID tokens that are generated in this manner.
To resolve this issue, invoke your private service with a new ID token. Refer to testing authentication in your service for more information.
OpenBLAS warning in logs
If you use OpenBLAS-based libraries such as NumPy with the first generation execution environment, you might see the following warning in your logs:
OpenBLAS WARNING - could not determine the L2 cache size on this system,
assuming 256k
This is just a warning and it doesn't impact your service. This warning results because the container sandbox used by the first generation execution environment does not expose low level hardware features. You can optionally switch to the second generation execution environment if you don't want to have these warnings in your logs.
Spark fails when obtaining IP address of machine to bind to
One of the following errors occurs during serving:
assertion failed: Expected hostname (not IP) but got <IPv6 ADDRESS>
assertion failed: Expected hostname or IPv6 IP enclosed in [] but got <IPv6 ADDRESS>
To resolve this issue:
To change the environment variable value and resolve the issue, set ENV
SPARK_LOCAL_IP="127.0.0.1"
in your Dockerfile. In Cloud Run, if the
variable
SPARK_LOCAL_IP
is not set, it will default to its IPv6 counterpart instead of localhost. Note
that setting RUN export SPARK_LOCAL_IP="127.0.0.1"
will not be available on
runtime and Spark will act as if it was not set.
Mapping custom domains
Custom domain is stuck certificate provisioning state
One of the following errors occurs when you try to map a custom domain:
The domain is available over HTTP. Waiting for certificate provisioning. You must configure your DNS records for certificate issuance to begin and to accept HTTP traffic.
Waiting for certificate provisioning. You must configure your DNS records for certificate issuance to begin.
To resolve this issue:
- Wait at least 24 hours. Provisioning the SSL certificate usually takes about 15 minutes, but it can take up to 24 hours.
Verify that you've properly updated your DNS records at your domain registrar using the Google Admin Toolbox dig tool
The DNS records in your domain registrar need to match what the Google Cloud console prompts you to add.
Confirm that the root of the domain is verified under your account using one of the following methods:
- Follow the instructions for adding verified domain owners and check that your account is listed as a Verified Owner.
- Use the Search Console.
Verify that the certificate for the domain is not expired. To find the expiry bounds, use the following command:
echo | openssl s_client -servername 'ROOT_DOMAIN' -connect 'ROOT_DOMAIN:443' 2>/dev/null | openssl x509 -startdate -enddate -noout
Admin API
The feature is not supported in the declared launch stage
The following error occurs when you call the Cloud Run Admin API:
The feature is not supported in the declared launch stage
This error occurs when you call the Cloud Run Admin API directly and use a beta feature without specifying a launch stage annotation.
To resolve this issue, annotate the resource with a
run.googleapis.com/launch-stage
value of BETA
in the request if any beta
feature is used.
The following example adds a launch stage annotation to a service request:
kind: Service metadata: annotations: run.googleapis.com/launch-stage: BETA
Troubleshooting network file system issues
Learn more about Using network file systems.
Cannot access files using NFS
Error | Suggested remedy |
---|---|
mount.nfs: Protocol not supported |
Some base images, for example debian and adoptopenjdk/openjdk11 , are missing dependency nfs-kernel-server. |
mount.nfs: Connection timed out |
If the connection times out, make sure you are providing the correct IP address of the filestore instance. |
mount.nfs: access denied by server while mounting IP_ADDRESS:/FILESHARE |
If access was denied by the server, check to make sure the file share name is correct. |
Cannot access files using Cloud Storage FUSE
See Cloud Storage FUSE troubleshooting guide.