You can avoid slow container start times for your service and reduce service latency by setting a minimum number of instances. This page describes how to enable idle instances for your service using the minimum instances settings.
For Cloud Run services, Cloud Run by default scales in to the number of instances based on the number of incoming requests.
However, if your service requires reduced latency, especially when scaling from zero active instances, you can change this default behavior by specifying a minimum number of container instances to be kept warm and ready to serve requests. Refer to General development tips for more details on this optimization.
Cloud Run removes instances that are not serving requests (idle).
With minimum instances set, Cloud Run keeps at least the number of
minimum instances running, even if they're not serving requests. Active
instances above the min-instances
number
might become idle, if they
are not receiving requests.
For example, if min-instances
is 10
, and the number of active instances is
0
, then the number of idle instances is 10
. When the number of active
instances increases to 6
, then the number of idle instances decreases to 4
.
Note that if a service has not recently served traffic, the active instances metric can indicate that no instances are active, even if you specified one or more for minimum instances.
Applying minimum instances at service-level versus revision-level
You can configure minimum instances at the service level or at the revision level. Google recommends that you apply minimum instances at the service level and avoid combining service-level and revision-level minimum instances.
If you apply minimum instances at the revision-level, the settings go into effect upon deployment of the revision. If you apply this feature at the service-level, the setting goes into effect without needing to deploy a new revision.
Tagged revisions and service-level minimum instances
Tagged revisions are started, but only count toward the service-level minimum instances if they are a part of a traffic split.
Billing
Instances kept running using the minimum instances feature do incur billing costs. Because these charges are very predictable, Google recommends purchasing a Committed use discount.
Minimum instances and always-allocated CPU
You can configure CPU to be always-allocated if you need CPU outside of requests.
Minimum instances restarts
Minimum instances can be restarted at any time.
Revisions and minimum instances
When minimum instances are set at the service level, they are distributed to all revisions that are serving traffic proportionally to the traffic split.
When minimum instances are set at the revision level, minimum instances are started whenever the revision is referenced in a traffic split (even at 0%) or has a traffic tag assigned.
Required roles
To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:
-
Cloud Run Developer (
roles/run.developer
) on the Cloud Run service -
Service Account User (
roles/iam.serviceAccountUser
) on the service identity
For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.
Setting and updating service-level minimum instances
By default, container instances have service-level minimum instances turned off,
with a setting of 0
. You can change this default using the
Google Cloud console, the Google Cloud CLI, or a YAML file:
Console
In the Google Cloud console, go to Cloud Run:
If you are configuring a new service, click Deploy container and select Service to display the Create service form. If you are configuring an existing service, click the service to display its detail panel, then click the pen icon next to Min instances at the top right of the detail panel.
Locate the Service autoscaling form.
- In the field labelled Minimum number of instances, specify the number of container instances to be kept warm, ready to receive requests.
Click Create for a new service or Deploy for an existing service.
gcloud
Update service-min-instances
for a given service using the following command:
gcloud run services update SERVICE --service-min-instances MIN-VALUE
Replace:
- SERVICE with the name of your service.
- MIN-VALUE with the number of container instances to be kept
warm, ready to receive requests. Specify
default
to clear any minimum instance setting.
Alternatively, you can set service-min-instances
during
deployment using the command:
gcloud run deploy --image IMAGE_URL --service-min-instances MIN-VALUE
Replace
- IMAGE_URL with a reference to the container image, for
example,
us-docker.pkg.dev/cloudrun/container/hello:latest
. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL has the shapeLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG
- MIN-VALUE with the number of container
instances to be kept warm, ready to receive requests. Specify
default
to clear any minimum instance setting.
YAML
If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Update the
run.googleapis.com/minScale
attribute:apiVersion: serving.knative.dev/v1 kind: Service metadata: name: SERVICE annotations: run.googleapis.com/minScale: 'MIN_INSTANCE'
Replace:
- SERVICE with the name of your Cloud Run service
- MIN-INSTANCE with the number of instances to be kept warm, ready to receive requests.
Create or update the service using the following command:
gcloud run services replace service.yaml
Client libraries
To update service-level minimum instances for your service from code:
REST API
To update service-level minimum instances for a given service, send a PATCH
HTTP request to the Cloud Run Admin API
service
endpoint.
For example, using curl
:
curl -H "Content-Type: application/json" \ -H "Authorization: Bearer ACCESS_TOKEN" \ -X PATCH \ -d '{ "scaling": { "minInstanceCount": MIN-VALUE }}' \ https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=scaling.minInstanceCount
Replace:
- ACCESS_TOKEN with a valid access token for an account that
has the IAM permissions to update a service.
For example, if you are logged into
gcloud
, you can retrieve an access token usinggcloud auth print-access-token
. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server. - MIN-VALUE with the number of container instances to be kept warm, ready to receive requests.
- SERVICE with the name of the service.
- REGION with the Google Cloud region of the service.
- PROJECT-ID with the Google Cloud project ID.
View service-level minimum instances
To view the current service-level minimum instances settings for your Cloud Run service:
Console
In the Google Cloud console, go to Cloud Run:
Click the service you are interested in to open the Service details panel.
The current setting is shown at the upper right of the service details panel, next to Min instances.
gcloud
Use the following command:
gcloud run services describe SERVICE
Locate the value for Service-level Min Instances: in the returned configuration.
Setting and updating revision-level minimum instances
Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.
By default, container instances have min-instances
turned off, with a setting
of 0
. You can change this default using the Google Cloud console, the
Google Cloud CLI, or a YAML file when you create a new service or
deploy a new revision:
Console
In the Google Cloud console, go to Cloud Run:
Click Deploy container and select Service to configure a new service. If you are configuring an existing service, click the service, then click Edit and deploy new revision.
If you are configuring a new service, fill out the initial service settings page, then click Container(s), volumes, networking, security to expand the service configuration page.
Click the Container tab.
- In the field labelled Minimum number of instances, specify the number of container instances to be kept warm, ready to receive requests.
Click Create or Deploy.
gcloud
You can update min-instance
of a given service by using the following command:
gcloud run services update SERVICE --min-instances MIN-VALUE
Replace:
- SERVICE with the name of your service.
- MIN-VALUE with the number of container instances to be kept
warm, ready to receive requests. Specify
default
to clear any minimum instance setting.
You can also set min-instance
during
deployment using the command:
gcloud run deploy --image IMAGE_URL --min-instances MIN-VALUE
Replace:
- IMAGE_URL with a reference to the container image, for
example,
us-docker.pkg.dev/cloudrun/container/hello:latest
. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL has the shapeLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG
- MIN-VALUE with the number of container instances to be kept
warm, ready to receive requests. Specify
default
to clear any minimum instance setting.
YAML
If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Update the
autoscaling.knative.dev/minScale:
attribute:apiVersion: serving.knative.dev/v1 kind: Service metadata: name: SERVICE spec: template: metadata: annotations: autoscaling.knative.dev/minScale: 'MIN-INSTANCE' name: REVISION
Replace:
- SERVICE with the name of your Cloud Run service
- MIN-INSTANCE with the number of instances to be kept warm, ready to receive requests.
- REVISION with a new revision name or delete it (if present). If you supply a new revision name, it must meet the following criteria:
- Starts with
SERVICE-
- Contains only lowercase letters, numbers and
-
- Does not end with a
-
- Does not exceed 63 characters
- Starts with
Create or update the service using the following command:
gcloud run services replace service.yaml
Terraform
To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.
The following google_cloud_run_v2_service
resource specifies a minimum number
of instances of 1
under template.scaling
.
Replace 1
with your own minimum number of instances.
View revision-level minimum instances settings
To view the current revision-level minimum instances settings for your Cloud Run service:
Console
In the Google Cloud console, go to Cloud Run:
Click the service you are interested in to open the Service details page.
Click the Revisions tab.
In the details panel at the right, the revision-level minimum instances setting is listed under the Container tab.
gcloud
Use the following command:
gcloud run services describe SERVICE
Locate the revision-level minimum instances setting in the returned configuration.
Using both service-level and revision-level minimum or maximum instances
The following table shows the behavior if you combine service-level minimum instances and revision-level minimum or maximum instances:
Configuration setting | Behavior |
---|---|
Both service level minimum instances and revision-level minimum instances are set. | The effective value for the revision is the larger of revision-level minimum instances and service-level minimum instances. |
Both service level minimum instances and revision-level maximum instances are set. | The effective value for the revision is the smaller of revision-level maximum instances and service level minimum instances. This holds true even if the revision-level maximum instances prevents the service from reaching the number of instances configured for service level minimum instances. |
Using service level minimum instances with traffic splitting
If you use traffic splitting, the service-level minimum instances are divided across the revisions based on the proportion of the traffic split. For example, if the service-level minimum instances = 10, a 50/50 traffic split allocates 5 service-level minimum instances to each revision.
The following table shows sample configuration scenarios:
Sample use case | Sample configuration | Resulting behavior |
---|---|---|
No revision-level settings | Service-level minimum instances: 10
|
Revision A receives 6 instances from service-level minimum instances proportional to the traffic split. Revision B receives 4 instances from service-level minimum instances proportional to the traffic split. |
Receiving more than the service-level minimum instances due to revision-level minimum instances | Service-level minimum instances: 10
|
Revision A receives 6 instances from revision-level minimum instances. Revision B receives 5 instances from service-level minimum instances proportional to the traffic split. This exceeds service-level minimum instances and is intended. |
Receiving less than service-level minimum instances due to revision-level maximum instances. | Service-level minimum instances: 10
|
Revision A receives 3 instances from service-level minimum instances driven by the traffic split, but is limited to its revision level maximum instances. Revision B receives 5 instances from service-level minimum instances proportional to the traffic split. This results in 8 service-level instances, as 2 are lost due to revision-level maximum instances of revision A. |
Service-level minimum instances is greater than the number of revisions in the traffic split and there is a fractional amount of instances proportional to the traffic split | Service-level minimum instances: 3
|
Revision A gets 1 minimum instance and revision B gets 2 minimum instances. Instance count for the service is 3. |