In Cloud Run, each revision is automatically scaled to the number of container instances needed to handle all incoming requests.
When more instances are processing requests, more CPU and memory will be used, likely resulting in higher costs. To help you manage this, Cloud Run provides a concurrency setting that specifies the maximum number of requests that can be sent at the same time to a given container instance.
A maximum of 80 requests can be sent at the same time to each container instance in Cloud Run, and this is the default. If you want to change this, you can change the concurrency setting at any time. Note that in comparison, Cloud Functions has a fixed concurrency of 1.
The following diagram shows how the concurrency setting affects the number of container instances needed to handle incoming concurrent requests.
When to limit concurrency to one request at a time.
You can limit concurrency so that only one request at a time will be sent to each running container instance. You should consider doing this in cases where:
- Each request uses most of the available CPU or memory.
- Your container image is not designed for handling multiple requests at the same time, for example, if your container relies on global state.
Note that a concurrency of
1 is likely to negatively affect scaling
performance, because many container instances will have to start up to handle a
spike in incoming requests.
The following metrics show a use case where 400 clients are making 3 requests per second to a Cloud Run service that is set to a maximum concurrency of 1. The green top line shows the requests over time, the bottom blue line shows the number of container instances started to handle the requests.
The following metrics show 400 clients making 3 requests per second to a Cloud Run service that is set to a maximum concurrency of 80. The green top line shows the requests over time, the bottom blue line shows the number of container instances started to handle the requests. Notice that far fewer instances are needed to handle the same request volume.
To manage the concurrency of your Cloud Run services, see Setting concurrency.
To optimize your concurrency setting, see development tips for tuning concurrency.