Introduction to Cloud Run troubleshooting

This page describes the common troubleshooting strategies for Cloud Run errors. Personalized Service Health publishes all Cloud Run incidents that stem from the underlying Google Cloud infrastructure to identify Google Cloud service disruptions impacting your projects. You should also consider setting up alerts on Personalized Service Health events. For information about incidents affecting all Google Cloud services, see the Google Cloud Service Health dashboard.

See the following sections in the Cloud Run troubleshooting guide for resolving issues related to your Cloud Run resource:

Cloud Run troubleshooting strategies

The following sections explain how you can apply general troubleshooting strategies to resolve your error. If you continue to encounter errors even after following the steps in the troubleshooting guide, see What's next.

Output good logs using Cloud Logging

Troubleshooting your Cloud Run resource is easier if you have good logs for debugging. You should write logs in a way that correlates your container logs with a request log.

With correlated logs, you can identify the request that needs further analysis, find the request trace, and analyze the root cause of the issue. For more information on writing logs, see Write container logs.

Investigate instances using the Logs Explorer

Each request log in Cloud Run contains an instanceId field that identifies an instance that handles your request. Depending on the concurrency value you specify, a single instance can handle multiple requests at the same time.

When you have multiple instances emitting logs at once, you should filter your instances to identify the sequential requests that lead up to an instance crash.

Filtering an instance lets you debug specific performance issues related to cold starts or increased latencies. These issues could also be bound to variables declared in a global scope, when the value is reused in subsequent concurrent requests. An example of this is when you create a single connection pool global object for the instance, and then use it within multiple requests.

To filter a specific instance in the Logs Explorer, follow these steps:

In the Google Cloud console, go to the Logs Explorer page:

Go to Logs Explorer
Select an existing Google Cloud project at the top of the page, or create a new project.
Select the resource Cloud Run Revision for a service, or Cloud Run Job for a job.
Expand a log entry to filter by a specific instance.
Click the instance ID value, and select Show matching entries.

While you investigate instances, you can use Gemini Cloud Assist Investigations to gain additional insights into your logs. For more information about different ways to initiate an investigation by using the Logs Explorer, see Troubleshoot issues with Gemini Cloud Assist Investigations in the Gemini documentation.

Resolve unexpected request latencies

If you encounter issues with latency, do the following:

Check if the latency is affecting all requests to your Cloud Run resource or only a small percentage. Cloud Run is automatically integrated with Cloud Monitoring with no setup or configuration required.

To see individual request latency metrics, follow these steps:
1. In the Google Cloud console, go to the Cloud Run page:
  
  Go to Cloud Run
2. Select the service or jobs from the list.
3. Click the METRICS tab to show the Request latencies dashboard.
To view latency metrics in Cloud Monitoring, select from the Metrics list, Cloud Run Revision > Request_latencies > Request latency.

For a list of all available Cloud Run metrics and more in-depth details, see Google Cloud metrics in Cloud Monitoring.
Identify the request with high latency to understand the source of latency. You can use Cloud Trace or Cloud Logging to understand how long a particular request has taken.

To identify requests with high latency using Cloud Logging, apply the traceSampled=true filter to correlate logs in Cloud Logging with traces in Cloud Trace. For more information, see Integrate with Cloud Logging.

Sometimes dependencies such as requests to other services might cause latency issues. To identify such requests, you should have explicit logging targeting the requests. If you don't output such logs, it could appear as a latency issue originating from a Cloud Run service.

Additionally, you should consider evaluating latency spikes in the context of the chosen time window. A spike's significance is relative; a large spike in a small window might be negligible in a larger window, and vice-versa. Therefore, the time window significantly impacts the interpretation of latency data.
Try increasing the number of minimum instances to reduce latency for incoming requests, and avoid cold starts. You should also consider modifying your source code, and adjust the scaling settings to limit the number of connections to a backing service.

For more information, see Optimizing performance.

Troubleshoot connectivity issues

If your Cloud Run service is experiencing connectivity issues, consider these strategies and tools to diagnose the problem:

PCAP sidecar: For deeper network-level analysis, deploy a PCAP sidecar alongside your Cloud Run service. This sidecar container performs a packet capture using tcpdump within the same network namespace. The sidecar decouples from the main ingress container and doesn't require any modifications to perform a packet capture. Sidecars also use their own resources, which prevents tcpdump from competing with the resources you allocate to the primary service.
Network intelligence and connectivity tests for Cloud Run revisions and Cloud Run functions: Perform automated checks on the network path between your Cloud Run resource and an endpoint. This helps you find misconfigurations that might block traffic to or from your Cloud Run resource when connecting to a VM instance, an IP address, or a Google-managed service.
Review logs for your Cloud Run resource: Logs show error messages about connection problems, such as failures, timeouts, or refused connections. These logs often reveal if the connection issue is with your application or the network.

Troubleshoot with Gemini assistance

You can use Gemini Cloud Assist chat to analyze logs and troubleshoot errors. With Gemini assistance, you can quickly pinpoint and resolve errors by leveraging the tool's log analysis capabilities, saving valuable time and effort.

To use Gemini Cloud Assist from the Google Cloud console, do the following:

Ensure that Gemini Cloud Assist is set up for your Google Cloud user account and project.
Set up your Cloud Run development environment in your Google Cloud project and ensure you have the appropriate deployment permissions.
Go to the Cloud Run page in the Google Cloud console.

Go to Cloud Run
In the console toolbar, select a Google Cloud project. Use a project associated with a project ID you submitted after you were granted access to Gemini Cloud Assist.
Click spark Open or close Gemini AI chat.

The Gemini panel opens.
If necessary, click Accept if you agree to the terms.
If you have a question about a specific application, provide context by going to the page that shows your resource before asking your question. When generating a response, Gemini includes information about the current console page and project.

Enter a prompt in the Gemini panel.

The following table provide some example prompts for using Gemini Cloud Assist with Cloud Run.

Prompt	Type of response
"Can you explain this error message I'm seeing in my Cloud Run container logs?"	Without a specific error message in the prompt, the output provides troubleshooting guidance for common Cloud Run container log error messages.
"Can you explain this error message I'm seeing in my logs for my Cloud Run service? `HTTP 429 The request was aborted because there was no available instance. The Cloud Run service might have reached its maximum container instance limit or the service was otherwise not able to scale to incoming requests. This might be caused by a sudden increase in traffic, a long container startup time or a long request processing time."`	Explanation of the Cloud Run error message and how to address the error.
"How do I fix the following error message when deploying my Cloud Run service? `HTTP 404: Not found`	Common causes of the error and how to troubleshoot the error.
"My Cloud Run service keeps crashing. What could be causing this?"	Approach for investigating the cause and how to address the problem.
"How do I know if a 5XX error surfaced in the logs was due to a Google infrastructure service outage?"	Steps for determining the cause of the 5XX error.
"How do I identify the cause of this error: `com.google.apps.framework.request.BadRequestException Project PROJECT has serving status SYSTEM_DISABLED and cannot be modified`?"	Suggests that the Google Cloud project has been administratively disabled at the system level. Provides steps to investigate the cause further.
"Why can I access my Cloud Run service from a browser if I have set `ingress=internal`?"	An explanation of the expected behavior when configuring the `ingress` setting to `internal`. Includes other scenarios where access might seem to work from your browser.

For more details, see the following resources:

Learn how to write better prompts.
Learn how to use the Gemini Cloud Assist panel.
Read Use Gemini for AI assistance and development.
Learn how Gemini for Google Cloud uses your data.

Use Gemini Cloud Assist Investigations

In addition to interactive chat, Gemini Cloud Assist can perform more automated, in-depth analysis through Gemini Cloud Assist Investigations. This feature is integrated directly into workflows like Logs Explorer, and is a powerful root-cause analysis tool.

When you initiate an investigation from an error or a specific resource, Gemini Cloud Assist analyzes logs, configurations, and metrics. It uses this data to produce ranked observations and hypotheses about probable root causes, and then provides you with recommended next steps. If you have a support package, you can also transfer the investigation results to a Google Cloud support case, providing additional context that can help resolve your case faster.

For more information about different ways to initiate an investigation, see Troubleshoot issues with Gemini Cloud Assist Investigations in the Gemini documentation.

What's next

If you can't find a solution to your problem in the Cloud Run documentation, follow these steps:

Open a support case by contacting Cloud Customer Care.
Get support from the community by asking questions on StackOverflow, or search for similar issues using the google-cloud-run tag.
Open bugs or feature requests by using the public issue tracker.