In Cloud Run, each revision is automatically scaled to the number of container instances needed to handle all incoming requests. The number of instances scheduled is impacted by concurrency setting and by maximum number of container instances settings:
Maximum simultaneous requests = concurrency per instance * max instances
Although allowing for a higher maximum concurrency can reduce the number of container instances needed, in some cases you may want to limit the total number of container instances that can be started, for cost control reasons, or for better compatibility with other resources used by your service. For example, your Cloud Run service might interact with a database that can only handle a certain number of concurrent open connections.
About maximum container instances
You use the maximum container instances setting to limit the total number of instances that can be started, as documented in Setting a maximum number of container instances.
Exceeding maximum instances
Under normal circumstances, your revision scales up by creating new instances
to handle incoming traffic load. But when you set a maximum instances limit, in some
scenarios there will be insufficient instances to meet that traffic load. In
that case, incoming requests queue for up to 60 seconds. During this 60 second
window, if an instance finishes processing requests, it becomes available to
process queued requests. If no instances become available during the 60 second
window, the request fails with a
429 error code on Cloud Run (fully managed).
The maximum instances limit is an upper limit. Setting a high limit does not mean that your revision will scale up to the specified number of container instances. It only means that the number of container instances at any point in time should not exceed the limit.
In some cases, such as rapid traffic surges, Cloud Run may, for a short period of time, create slightly more container instances than the specified max instances value. If your service cannot tolerate this temporary behavior, you may want to factor in a safety margin and set a lower max instances value.
When you deploy a new revision, Cloud Run gradually migrates traffic from the old revision to the new one. Because maximum instances limits are set for each revision, you may temporarily exceed the specified limit during the period after deployment.
Idle instances and minimizing cold starts
To minimize the impact of cold starts, Cloud Run may maintain a reserve of idle container instances for your revision. These instances are ready to handle requests in case of a sudden traffic spike. Note that for Cloud Run (fully managed), you are not billed for this.
For example, when an instance has finished handling requests, the container instance may remain idle for a period of time in case another request needs to be handled. An idle container instance may persist resources, such as open database connections. However, for Cloud Run (fully managed), the CPU will not be available
To manage the max instances of your Cloud Run services, see Setting a maximum number of container instances.
To manage the maximum concurrency of your Cloud Run services, see Setting concurrency.
To optimize your concurrency setting, see development tips for tuning concurrency.