This page describes how to set the maximum number of instances that can be used for your Cloud Run service. Specifying maximum instances in Cloud Run allows you to limit the scaling of your service in response to incoming requests, although this maximum setting can be exceeded for a brief period due to circumstances such as traffic spikes. Use this setting as a way to control your costs or to limit the number of connections to a backing service, such as to a database.
For information about the maximum instance limits that might apply to your service, refer to Maximum instances limits.
For more information on the way Cloud Run autoscales container instances, refer to Instance autoscaling.
Required roles
To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:
-
Cloud Run Developer (
roles/run.developer
) on the Cloud Run service -
Service Account User (
roles/iam.serviceAccountUser
) on the service identity
For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.
Setting and updating maximum instances
Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.
By default, Cloud Run services are configured to scale out to a maximum of 100 instances.
You can change the maximum instances setting using the Google Cloud console, the gcloud command line, or a YAML file when you create a new service or deploy a new revision.
Console
In the Google Cloud console, go to Cloud Run:
Click Deploy container and select Service to configure a new service. If you are configuring an existing service, click the service, then click Edit and deploy new revision.
If you are configuring a new service, fill out the initial service settings page, then click Container(s), volumes, networking, security to expand the service configuration page.
Click the Container tab.
- In the field labelled Maximum number of instances, specify the desired
maximum number of instances, using any integer value from
1
to the maximum limit
- In the field labelled Maximum number of instances, specify the desired
maximum number of instances, using any integer value from
Click Create or Deploy.
gcloud
You can update the maximum number of instances of a given service by using the following command:
gcloud run services update SERVICE --max-instances MAX-VALUE
Replace
- SERVICE with the name of your service and
- MAX-VALUE with the desired maximum number of container
instances, using any integer value from
1
to the maximum limit. Specifydefault
to clear any maximum instance setting and restore the default of 100 instances.
You can also set the maximum number of instances during deployment using the command:
gcloud run deploy --image IMAGE_URL --max-instances MAX-VALUE
Replace
- IMAGE_URL with a reference to the container image, for
example,
us-docker.pkg.dev/cloudrun/container/hello:latest
. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL has the shapeLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG
. - MAX-VALUE with the desired maximum number of container instances.
YAML
If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Update the
autoscaling.knative.dev/maxScale:
attribute:apiVersion: serving.knative.dev/v1 kind: Service metadata: name: SERVICE spec: template: metadata: annotations: autoscaling.knative.dev/maxScale: 'MAX-INSTANCE' name: REVISION
Replace
- SERVICE with the name of your Cloud Run service
- MAX-INSTANCE with the desired maximum number.
- REVISION with a new revision name or delete it (if present). If you supply a new revision name, it must meet the following criteria:
- Starts with
SERVICE-
- Contains only lowercase letters, numbers and
-
- Does not end with a
-
- Does not exceed 63 characters
- Starts with
Create or update the service using the following command:
gcloud run services replace service.yaml
Terraform
To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.
The following google_cloud_run_v2_service
resource specifies a maximum number
of instances of 10
under template.scaling
.
Replace 10
with your desired maximum number of instances.
Maximum instances limits
By default, Cloud Run services are configured to a maximum of 100 instances
The maximum limit depends on the region of the Cloud Run service and its CPU and memory configurations.
The quotas page shows the baseline quotas per-region.
The maximum number of instances is determined as the minimum of:
- regional quota baseline / requested multiple of 1 CPU
- regional quota baseline / requested multiple of 2GB memory
For example, a baseline quota of 1000 instances with either 4GB memory or 2 CPU will get an effective limit of 500.
If you want to specify a maximum number of instances greater than the maximum allowed in the region of the Cloud Run service, you must request a quota increase.
View maximum instances settings
To view the current maximum instances settings for your Cloud Run service:
Console
In the Google Cloud console, go to Cloud Run:
Click the service you are interested in to open the Service details page.
Click the Revisions tab.
In the details panel at the right, the maximum instances setting is listed under the Container tab.
gcloud
Use the following command:
gcloud run services describe SERVICE
Locate the maximum instances setting in the returned configuration.