Manual scaling

This page describes how to manually scale your service. It also provides instructions for a common use case, changing the instance count based on a schedule using Cloud Scheduler jobs and the Cloud Run Admin API.

Overview

By default, Cloud Run automatically scales out to a specified or default maximum number of instances depending on traffic and CPU utilization. However, for some use-cases, you might want the ability to set a specific number of instances, using manual scaling.

Manual scaling lets you set a specific instance count, regardless of traffic or utilization, and without requiring redeployment. All of this gives you the option to write your own scaling logic using an external system. See Schedule-based scaling for an example of this.

Revision-level minimum and maximum settings and manual scaling

If you set your service to manual scaling, any revision-level minimum and maximum instance settings are ignored.

Traffic splits for manual scaling

The following list describes how instances are allocated when splitting traffic under manual scaling. This includes behavior for traffic-tag-only revisions.

During a traffic split, each revision is allocated instances proportionally, based on traffic split, similar to traffic splitting with service-level minimum instances.
If the number of revisions receiving traffic exceeds the manual instance count, some of the revisions will have no instances. Traffic sent to those revisions will get the same error as if the revisions were disabled.
For all revisions receiving traffic in a traffic split, any revision-level minimum and maximum instances are disabled.
If a revision is active only due to traffic tags:
- If revision-level minimum instances is set, the specified number of instances will start but does not count toward the total service manual instance count. The revision will not autoscale.
- If revision-level minimum instances is not set, the revision scales out to at most one instance, in response to traffic sent to the tag URL.

Billing behavior using manual scaling

When you use manual scaling, billing behavior is similar to the behavior when you use the minimum instances feature.

That is, with manual scaling and instance-based billing, manually scaled idle instances are billed as active instances.

If you use manual scaling with request-based billing, manually scaled idle instances are billed as idle minimum-instances. For complete billing details, see the pricing page.

Required roles

To get the permissions that you need to deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

Cloud Run Developer (roles/run.developer) on the Cloud Run service
Service Account User (roles/iam.serviceAccountUser) on the service identity
Artifact Registry Reader (roles/artifactregistry.reader) on the Artifact Registry repository of the deployed container image (if applicable)

For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.

Configure scaling

You can configure the scaling mode using the Google Cloud console, the Google Cloud CLI, YAML file, or API when you create or update a service:

Console

In the Google Cloud console, go to Cloud Run:

Go to Cloud Run
If you are configuring a new service, select Services from the menu, and click Deploy container to display the Create service form. If you are configuring an existing service, click the service to display its detail panel, then click the pen icon next to Scaling at the top right of the detail panel.
Locate the Service scaling form (for a new service) or the Edit scaling form for an existing service.

In the field labelled Number of instances, specify the number of container instances for the service.
Click Create for a new service or Save for an existing service.

gcloud

To specify scaling for a new service, use the deploy command:

gcloud beta run deploy SERVICE \
    --scaling=INSTANCE_COUNT \
    --image IMAGE_URL

Replace the following:

SERVICE: the name of your service.
INSTANCE_COUNT: the number of instances for the service. This sets the service to manual scaling. Specify a value of 0 to disable the service. Specify a value of auto to use the default Cloud Run autoscaling behavior.
IMAGE_URL: a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG .

Specify scaling for an existing service by using the following update command:

gcloud beta run services update SERVICE \
   --scaling=INSTANCE_COUNT

YAML

If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
```
gcloud run services describe SERVICE --format export > service.yaml
```
Update the scalingMode and manualInstanceCount attributes:
```
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: SERVICE
  annotations:
  run.googleapis.com/launch-stage: BETA
  run.googleapis.com/scalingMode: MODE
  run.googleapis.com/manualInstanceCount: INSTANCE_COUNT
```
Replace the following:
- SERVICE: the name of your Cloud Run service
- MODE: manual for manual scaling, or automatic for the default Cloud Run autoscaling behavior.
- INSTANCE_COUNT: the number of instances you are manually scaling for the service. Specify a value of 0 to disable the service.
Create or update the service using the following command:
```
gcloud run services replace service.yaml
```

REST API

To update service-level minimum instances for a given service, send a PATCH HTTP request to the Cloud Run Admin API service endpoint.

For example, using curl:

    curl -H "Content-Type: application/json" \
    -H "Authorization: Bearer ACCESS_TOKEN" \
    -X PATCH \
    -d '{"launchStage":"BETA","scaling":{"manualInstanceCount":MANUAL_INSTANCE_COUNT }}' \
    https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=launchStage,scaling.manualInstanceCount

Replace the following:

ACCESS_TOKEN: a valid access token for an account that has the IAM permissions to update a service. For example, if you are logged into gcloud, you can retrieve an access token using gcloud auth print-access-token. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server.
MANUAL_INSTANCE_COUNT: the number of instances for the service. This sets the service to manual scaling. Specify a value of 0 to disable the service.
SERVICE: the name of the service.
REGION: the Google Cloud region that the service is deployed in.
PROJECT_ID: the Google Cloud project ID.

Terraform

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

Add the following to a google_cloud_run_v2_service resource in your Terraform configuration:

resource "google_cloud_run_v2_service" "default" {
  name     = "SERVICE_NAME"
  location = "REGION"
  launch_stage = "BETA"

  template {
    containers {
      image = "IMAGE_URL"
    }
  }
  scaling {
    scaling_mode = "MANUAL"
    manual_instance_count = "INSTANCE_COUNT"
  }
}

Replace the following:

SERVICE_NAME: the name of your Cloud Run service.
REGION: the Google Cloud region. For example, europe-west1.
IMAGE_URL: a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG .
INSTANCE_COUNT: the number of instances you are manually scaling for the service. This number of instances is divided among all revisions with specified traffic based on the percent of traffic they are receiving.

View scaling configuration for your service

To view the scaling configuration instances for your Cloud Run service:

Console

In the Google Cloud console, go to Cloud Run:

Go to Cloud Run
Click the service you are interested in to open the Service details panel.
The current scaling setting is shown at the upper right of the service details panel, after the Scaling label, next to the pen icon.

gcloud

Use the following command to view the current scaling configuration for the service:

gcloud beta run services describe SERVICE

Replace SERVICE with the name of your service.

Look for the field Scaling: Manual (Instances: ) near the top of the text returned from the describe.

YAML

Use the following command to download the service YAML configuration:

gcloud run services describe SERVICE --format export > service.yaml

The scaling configuration is contained in the scalingMode and manualInstanceCount attributes.

Disable a service

When you disable a service, any requests that are currently being processed will be allowed to complete. However, any further requests to the service URL will fail with a Service unavailable or Service disabled error.

Requests to service revisions that are only active due to traffic tags are not impacted because those revisions are not disabled.

To disable a service, you set scaling to zero. You can disable a service using the Google Cloud console, the Google Cloud CLI, YAML file, or API:

Console

In the Google Cloud console, go to Cloud Run:

Go to Cloud Run
Click the service you want to disable to display its detail panel, then click the pen icon next to Scaling at the top right of the detail panel.
Locate the Edit scaling form and select Manual scaling.

In the field labelled Number of instances, enter the value 0 (zero).
Click Save.

gcloud

To disable a service, use the following command to set scaling to zero:

gcloud beta run services update SERVICE --scaling=0

Replace SERVICE with the name of your service.

YAML

Download your service's YAML configuration:

gcloud run services describe SERVICE --format export > service.yaml

Set the manualInstanceCount attribute to zero (0):

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: SERVICE
  annotations:
  run.googleapis.com/launch-stage: BETA
  run.googleapis.com/scalingMode: manual
  run.googleapis.com/manualInstanceCount: `0`

Replace SERVICE with the name of your Cloud Run service.

Create or update the service using the following command:
```
gcloud run services replace service.yaml
```

REST API

To disable a service, send a PATCH HTTP request to the Cloud Run Admin API service endpoint.

For example, using curl:

    curl -H "Content-Type: application/json" \
    -H "Authorization: Bearer ACCESS_TOKEN" \
    -X PATCH \
    -d '{"launchStage":"BETA","scaling":{"manualInstanceCount":0 }}' \
    https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=launchStage,scaling.manualInstanceCount

Replace the following:

ACCESS_TOKEN: a valid access token for an account that has the IAM permissions to update a service. For example, if you are logged into gcloud, you can retrieve an access token using gcloud auth print-access-token. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server.
SERVICE: the name of the service.
REGION: the Google Cloud region that the service is deployed in.
PROJECT_ID: the Google Cloud project ID.

Terraform

To disable a service, set the manual_instance_count attribute to zero (0):

resource "google_cloud_run_v2_service" "default" {
  name     = "SERVICE_NAME"
  location = "REGION"
  launch_stage = "BETA"

  template {
    containers {
      image = "IMAGE_URL"
    }
  }
  scaling {
    scaling_mode = "MANUAL"
    manual_instance_count = "0"
  }
}

Replace the following:

SERVICE_NAME: the name of your Cloud Run service.
REGION: the Google Cloud region. For example, europe-west1.
IMAGE_URL: a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG

Schedule-based scaling example

A common use case of manual scaling is changing the instance count based on a predefined schedule. In this example, we use Cloud Scheduler to schedule two jobs, each of which invokes the Cloud Run Admin API to scale the number of instances. The first job sets the service to manually scale out to 10 instances during business hours (9am-5pm, M-F). The second job sets the service to scale in to zero instances during off-hours.

Notice that setting the instances to zero as shown in the example disables the service, but not the Cloud Scheduler jobs. Those jobs continue to run and will reset (and re-enable) the service to 10 instances as scheduled.

In this example, we use the Cloud Run quickstart for simplicity, but you can use a service of your choice.

To set up schedule-based manual scaling:

Deploy your service using the following command:
```
gcloud beta run deploy SERVICE \
   --image=us-docker.pkg.dev/cloudrun/container/hello \
   --region=REGION \
   --project PROJECT_ID
```
Replace the following:
- REGION: the region the Cloud Run service is deployed to.
- SERVICE: the name of the Cloud Run service.

Configure your service for manual scaling to 10 instances using the following command:

gcloud beta run services update SERVICE \
   --region=REGION \
   --scaling=10

Create a Cloud Scheduler job that manually scales the service instances out to 10 instances during business hours:
```
gcloud scheduler jobs create http hello-start-instances \
  --location=REGION \
  --schedule="0 9 * * MON-FRI" \
  --time-zone=America/Los_Angeles \
  --uri=https://run.googleapis.com/v2/projects/PROJECT_ID/
  locations/REGION/services/hello?update_mask=launchStage,scaling.manualInstanceCount \
  --headers=Content-Type=application/json,X-HTTP-Method-Override=PATCH \
  --http-method=PUT \
  --message-body='{"launchStage":"BETA","scaling":{"manualInstanceCount":10}}' \
  --oauth-service-account-email=PROJECT_NUMBER-compute@developer.gserviceaccount.com
```
This command creates a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting the number of instances to 10. The example uses the Compute Engine default service account PROJECT_NUMBER-compute@developer.gserviceaccount.com for the Cloud Scheduler jobs. You can use any service account that has permissions to update Cloud Run services.

Note: This example uses the X-HTTP-Method-Override=PATCH header because the Cloud Scheduler CLI does not support setting http-method=PATCH. If you configure the Cloud Scheduler job using the Google Cloud console, you can set the HTTP method to PATCH, and exclude the header.

Create a Cloud Scheduler job that manually scales the service instances in to zero instances during off hours, disabling the service:

gcloud scheduler jobs create http hello-stop-instances \
  --location=REGION \
  --schedule="0 17 * * MON-FRI" \
  --time-zone=America/Los_Angeles \
  --uri=https://run.googleapis.com/v2/projects/PROJECT_ID/
  locations/REGION/services/hello?update_mask=launchStage,scaling.manualInstanceCount \
  --headers=Content-Type=application/json,X-HTTP-Method-Override=PATCH \
  --http-method=PUT \
  --message-body='{"launchStage":"BETA","scaling":{"manualInstanceCount":0}}' \
  --oauth-service-account-email=PROJECT_NUMBER-compute@developer.gserviceaccount.com

This command creates a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting manual scaling instances to zero. This effectively disables the service, but not the Cloud Scheduler jobs, which will continue to run and reset (and re-enable) the service to 10 instances as scheduled.

Manual scaling Stay organized with collections Save and categorize content based on your preferences.

Overview

Revision-level minimum and maximum settings and manual scaling

Traffic splits for manual scaling

Billing behavior using manual scaling

Required roles

Configure scaling

Console

gcloud

YAML

REST API

Terraform

View scaling configuration for your service

Console

gcloud

YAML

Disable a service

Console

gcloud

YAML

REST API

Terraform

Schedule-based scaling example

Manual scaling