This page describes how to manually scale your service. It also provides instructions for a common use case, changing the instance count based on a schedule using Cloud Scheduler jobs and the Cloud Run Admin API.
Overview
By default, Cloud Run automatically scales out to a specified or default maximum number of instances depending on traffic and CPU utilization. However, for some use-cases, you might want the ability to set a specific number of instances, using manual scaling.
Manual scaling lets you set a specific instance count, regardless of traffic or utilization, and without requiring redeployment. All of this gives you the option to write your own scaling logic using an external system. See Schedule-based scaling for an example of this.
Switching between automatic and manual scaling
Switching scaling modes impacts instance count and service-level minimum and maximum instance settings as shown in the following table:
Scaling switch direction | Instance count | Min and max instances |
---|---|---|
From automatic to manual | If instance count is unspecified in the command that switches modes, inherit the service-level minimum instances setting. | After the switch, service-level minimum and maximum instances are unset. |
From manual to automatic | The manual instance count is unset | You must specify either both service-level minimum and maximum instances, or neither of them. (Specifying only one returns an error.) If you specify neither of these in the command that switches modes, service-level minimum and maximum instances inherit the manual instance count. |
Revision-level minimum and maximum settings and manual scaling
If you set your service to manual scaling, any revision-level minimum and maximum instance settings are ignored.
Traffic splits for manual scaling
The following list describes how instances are allocated when splitting traffic under manual scaling. This includes behavior for traffic-tag-only revisions.
During a traffic split, each revision is allocated instances proportionally, based on traffic split, similar to traffic splitting with service-level minimum instances.
If the number of revisions receiving traffic exceeds the manual instance count, some of the revisions will have no instances. Traffic sent to those revisions will get the same error as if the revisions were disabled.
For all revisions receiving traffic in a traffic split, any revision-level minimum and maximum instances are disabled.
If a revision is active only due to traffic tags:
- If revision-level minimum instances is set, the specified number of instances will start but does not count toward the total service manual instance count. The revision will not autoscale.
- If revision-level minimum instances is not set, the revision scales out to at most one instance, in response to traffic sent to the tag URL.
Billing behavior using manual scaling
When you use manual scaling, billing behavior is similar to the behavior when you use the minimum instances feature.
That is, with manual scaling and instance-based billing, manually scaled idle instances are billed as active instances.
If you use manual scaling with request-based billing, manually scaled idle instances are billed as idle minimum-instances. For complete billing details, see the pricing page.
Required roles
To get the permissions that you need to deploy Cloud Run services, ask your administrator to grant you the following IAM roles:
-
Cloud Run Developer (
roles/run.developer
) on the Cloud Run service -
Service Account User (
roles/iam.serviceAccountUser
) on the service identity -
Artifact Registry Reader (
roles/artifactregistry.reader
) on the Artifact Registry repository of the deployed container image (if applicable)
For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.
Configure scaling
You can configure the scaling mode using the Google Cloud console, the Google Cloud CLI, YAML file, or API when you create or update a service:
Console
In the Google Cloud console, go to Cloud Run:
If you are configuring a new service, click Deploy container and select Service to display the Create service form. If you are configuring an existing service, click the service to display its detail panel, then click the pen icon next to Scaling at the top right of the detail panel.
Locate the Service scaling form (for a new service) or the Edit scaling form for an existing service.
In the field labelled Number of instances, specify the number of container instances for the service.
Click Create for a new service or Save for an existing service.
gcloud
To specify scaling for a new service, use the deploy command:
gcloud beta run deploy SERVICE \ --scaling=INSTANCE_COUNT \ --image IMAGE_URL
Replace the following:
- SERVICE with the name of your service
- INSTANCE_COUNT with the number of instances for the service.
This sets the service to manual scaling. Specify a value of
0
to disable the service. Specify a value ofauto
to use the default Cloud Run autoscaling behavior. - IMAGE_URL with a reference to the container image, for
example,
us-docker.pkg.dev/cloudrun/container/hello:latest
. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL has the shapeLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG
.
Specify scaling for an existing service by using the following update command:
gcloud beta run services update SERVICE \ --scaling=INSTANCE_COUNT
YAML
If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Update the
scalingMode
andmanualInstanceCount
attributes:apiVersion: serving.knative.dev/v1 kind: Service metadata: name: SERVICE annotations: run.googleapis.com/launch-stage: BETA run.googleapis.com/scalingMode: MODE run.googleapis.com/manualInstanceCount: INSTANCE_COUNT
Replace the following:
- SERVICE with the name of your Cloud Run service
- MODE with
manual
for manual scaling, orautomatic
for the default Cloud Run autoscaling behavior. - INSTANCE_COUNT with the number of instances you are manually
scaling for the service. Specify a value of
0
to disable the service.
Create or update the service using the following command:
gcloud run services replace service.yaml
REST API
To update service-level minimum instances for a given service, send a PATCH
HTTP request to the Cloud Run Admin API
service
endpoint.
For example, using curl
:
curl -H "Content-Type: application/json" \ -H "Authorization: Bearer ACCESS_TOKEN" \ -X PATCH \ -d '{"launchStage":"BETA","scaling":{"manualInstanceCount":MANUAL_INSTANCE_COUNT }}' \ https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=launchStage,scaling.manualInstanceCount
Replace:
- ACCESS_TOKEN with a valid access token for an account that
has the IAM permissions to update a service.
For example, if you are logged into
gcloud
, you can retrieve an access token usinggcloud auth print-access-token
. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server. - MANUAL_INSTANCE_COUNT with the number of instances for the service.
This sets the service to manual scaling. Specify a value of
0
to disable the service. - SERVICE with the name of the service.
- REGION with the Google Cloud region that the service is deployed in.
- PROJECT_ID with the Google Cloud project ID.
View scaling configuration for your service
To view the scaling configuration instances for your Cloud Run service:
Console
In the Google Cloud console, go to Cloud Run:
Click the service you are interested in to open the Service details panel.
The current scaling setting is shown at the upper right of the service details panel, after the Scaling label, next to the pen icon.
gcloud
Use the following command to view the current scaling configuration for the service:
gcloud beta run services describe SERVICE
Replace SERVICE with the name of your service.
Look for the field Scaling: Manual (Instances: )
near the top of the text
returned from the describe
.
YAML
Use the following command to download the service YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
The scaling configuration is contained in the scalingMode
and
manualInstanceCount
attributes.
Disable a service
If you disable a service, requests to its service URL will fail with a
Service unavailable
or Service disabled
error. Requests to service revisions
that are only active due to traffic tags are not impacted because those
revisions are not disabled.
To disable a service, you set scaling to zero. You can disable a service using the Google Cloud console, the Google Cloud CLI, YAML file, or API:
Console
In the Google Cloud console, go to Cloud Run:
Click the service you want to disable to display its detail panel, then click the pen icon next to Scaling at the top right of the detail panel.
Locate the Edit scaling form and select Manual scaling.
In the field labelled Number of instances, enter the value
0
(zero).Click Save.
gcloud
To disable a service, use the following command to set scaling to zero:
gcloud beta run services update SERVICE --scaling=0
Replace SERVICE with the name of your service.
YAML
Download your service's YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Set the
manualInstanceCount
attribute to zero (0
):apiVersion: serving.knative.dev/v1 kind: Service metadata: name: SERVICE annotations: run.googleapis.com/launch-stage: BETA run.googleapis.com/scalingMode: manual run.googleapis.com/manualInstanceCount: `0`
Replace SERVICE with the name of your Cloud Run service.
Create or update the service using the following command:
gcloud run services replace service.yaml
REST API
To disable a service, send a PATCH
HTTP request to the Cloud Run Admin API
service
endpoint.
For example, using curl
:
curl -H "Content-Type: application/json" \ -H "Authorization: Bearer ACCESS_TOKEN" \ -X PATCH \ -d '{"launchStage":"BETA","scaling":{"manualInstanceCount":0 }}' \ https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=launchStage,scaling.manualInstanceCount
Replace:
- ACCESS_TOKEN with a valid access token for an account that
has the IAM permissions to update a service.
For example, if you are logged into
gcloud
, you can retrieve an access token usinggcloud auth print-access-token
. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server. - SERVICE with the name of the service.
- REGION with the Google Cloud region that the service is deployed in.
- PROJECT_ID with the Google Cloud project ID.
Schedule-based scaling example
A common use case of manual scaling is changing the instance count based on a predefined schedule. In this example, we use Cloud Scheduler to schedule two jobs, each of which invokes the Cloud Run Admin API to scale the number of instances. The first job sets the service to manually scale out to 10 instances during business hours (9am-5pm, M-F). The second job sets the service to scale in to zero instances during off-hours.
Notice that setting the instances to zero as shown in the example disables the service, but not the Cloud Scheduler jobs. Those jobs continue to run and will reset (and re-enable) the service to 10 instances as scheduled.
In this example, we use the Cloud Run quickstart for simplicity, but you can use a service of your choice.
To set up schedule-based manual scaling:
Deploy your service using the following command:
gcloud beta run deploy SERVICE \ --image=us-docker.pkg.dev/cloudrun/container/hello \ --region=REGION \ --project PROJECT_ID
Replace the following variables:
- REGION with the region the Cloud Run service is deployed to.
- SERVICE with the name of the Cloud Run service.
Configure your service for manual scaling to 10 instances using the following command:
gcloud beta run services update SERVICE \ --region=REGION \ --scaling=10
Create a Cloud Scheduler job that manually scales the service instances out to 10 instances during business hours:
gcloud scheduler jobs create http hello-start-instances \ --location=REGION \ --schedule="0 9 * * MON-FRI" \ --time-zone=America/Los_Angeles \ --uri=https://run.googleapis.com/v2/projects/PROJECT_ID/ locations/REGION/services/hello?update_mask=launchStage,scaling.manualInstanceCount \ --headers=Content-Type=application/json,X-HTTP-Method-Override=PATCH \ --http-method=PUT \ --message-body='{"launchStage":"BETA","scaling":{"manualInstanceCount":10}}' \ --oauth-service-account-email=PROJECT_NUMBER-compute@developer.gserviceaccount.com
This command create a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting the number of instances to
10
. The example uses the Compute Engine default service accountPROJECT_NUMBER-compute@developer.gserviceaccount.com
for the Cloud Scheduler jobs. You can use any service account that has permissions to update Cloud Run services.Create a Cloud Scheduler job that manually scales the service instances in to zero instances during off hours, disabling the service:
gcloud scheduler jobs create http hello-stop-instances \ --location=REGION \ --schedule="0 17 * * MON-FRI" \ --time-zone=America/Los_Angeles \ --uri=https://run.googleapis.com/v2/projects/PROJECT_ID/ locations/REGION/services/hello?update_mask=launchStage,scaling.manualInstanceCount \ --headers=Content-Type=application/json,X-HTTP-Method-Override=PATCH \ --http-method=PUT \ --message-body='{"launchStage":"BETA","scaling":{"manualInstanceCount":0}}' \ --oauth-service-account-email=PROJECT_NUMBER-compute@developer.gserviceaccount.com
This command create a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting manual scaling instances to zero. This effectively disables the service, but not the Cloud Scheduler jobs, which will continue to run and reset (and re-enable) the service to 10 instances as scheduled.