This page describes the common troubleshooting strategies for Cloud Run errors.
All Cloud Run incidents that stem from the underlying Google Cloud infrastructure are published to Personalized Service Health, which identifies Google Cloud service disruptions relevant to your projects. You should also consider setting up alerts on Personalized Service Health events. For information about incidents affecting all Google Cloud services, see Google Cloud Service Health dashboard.
See the following sections in the Cloud Run troubleshooting guide for resolving issues related to your Cloud Run resource:
Cloud Run troubleshooting strategies
The following sections explain how you can apply general troubleshooting strategies to resolve your error. If you continue to encounter errors even after following the steps in the troubleshooting guide, contact support.
Output good logs using Cloud Logging
Troubleshooting your Cloud Run resource is easier if you have good logs for debugging. You should write logs in a way that correlates your container logs with a request log.
With correlated logs, you can identify the request that needs further analysis, find the request trace, and analyze the root cause of the issue. For more information on writing logs, see Write container logs.
Investigate instances using the Logs Explorer
Each request log in Cloud Run contains an instanceId
field that identifies an instance that handles your
request. Depending on the concurrency value you
specify, a single instance can handle multiple requests at the same time.
When you have multiple instances emitting logs at once, you should filter your instances to identify the sequential requests that lead up to an instance crash.
Filtering an instance lets you debug specific performance issues related to cold starts or increased latencies. These issues could also be bound to variables declared in a global scope, when the value is reused in subsequent concurrent requests. An example of this is when you create a single connection pool global object for the instance, and then use it within multiple requests.
To filter a specific instance in the Logs Explorer, follow these steps:
In the Google Cloud console, go to the Logs Explorer page:
Select an existing Google Cloud project at the top of the page, or create a new project.
Select the resource Cloud Run Revision for a service, or Cloud Run Job for a job.
Expand a log entry to filter by a specific instance.
Click the instance ID value, and select Show matching entries.
Resolve unexpected request latencies
If you encounter issues with latency, do the following:
Check if the latency is affecting all requests to your Cloud Run resource or only a small percentage. Cloud Run is automatically integrated with Cloud Monitoring with no setup or configuration required.
To see individual request latency metrics, follow these steps:
In the Google Cloud console, go to the Cloud Run page:
Select the service or jobs from the list.
Click the METRICS tab to show the Request latencies dashboard.
To view latency metrics in Cloud Monitoring, select from the Metrics list, Cloud Run Revision > Request_latencies > Request latency.
For a list of all available Cloud Run metrics and more in-depth details, see Google Cloud metrics in Cloud Monitoring.
Identify the request with high latency to understand the source of latency. You can use Cloud Trace or Cloud Logging to understand how long a particular request has taken.
To identify requests with high latency using Cloud Logging, apply the
traceSampled=true
filter to correlate logs in Cloud Logging with traces in Cloud Trace. For more information, see Integrate with Cloud Logging.Sometimes dependencies such as requests to other services might cause latency issues. To identify such requests, you should have explicit logging targeting the requests. If you don't output such logs, it could appear as a latency issue originating from a Cloud Run service.
Additionally, you should consider evaluating latency spikes in the context of the chosen time window. A spike's significance is relative; a large spike in a small window might be negligible in a larger window, and vice-versa. Therefore, the time window significantly impacts the interpretation of latency data.
Try increasing the number of minimum instances to reduce latency for incoming requests, and avoid cold starts. You should also consider modifying your source code, and adjust the scaling settings to limit the number of connections to a backing service.
For more information, see Optimizing performance.