Manual scaling

This page describes how to manually scale your service. It also provides instructions for a common use case, changing the instance count based on a schedule using Cloud Scheduler jobs and the Cloud Run Admin API.

Overview

By default, Cloud Run automatically scales out to a specified or default maximum number of instances depending on traffic and CPU utilization. However, for some use-cases, you might want the ability to set a specific number of instances, using manual scaling.

Manual scaling lets you set a specific instance count, regardless of traffic or utilization, and without requiring redeployment. All of this gives you the option to write your own scaling logic using an external system. See Schedule-based scaling for an example of this.

Switching between automatic and manual scaling

Switching scaling modes impacts instance count and service-level minimum and maximum instance settings as shown in the following table:

Scaling switch direction Instance count Min and max instances
From automatic to manual If instance count is unspecified in the command that switches modes, inherit the service-level minimum instances setting. After the switch, service-level minimum and maximum instances are unset.
From manual to automatic The manual instance count is unset You must specify either both service-level minimum and maximum instances, or neither of them. (Specifying only one returns an error.) If you specify neither of these in the command that switches modes, service-level minimum and maximum instances inherit the manual instance count.

Revision-level minimum and maximum settings and manual scaling

If you set your service to manual scaling, any revision-level minimum and maximum instance settings are ignored.

Traffic splits for manual scaling

The following list describes how instances are allocated when splitting traffic under manual scaling. This includes behavior for traffic-tag-only revisions.

  • During a traffic split, each revision is allocated instances proportionally, based on traffic split, similar to traffic splitting with service-level minimum instances.

  • If the number of revisions receiving traffic exceeds the manual instance count, some of the revisions will have no instances. Traffic sent to those revisions will get the same error as if the revisions were disabled.

  • For all revisions receiving traffic in a traffic split, any revision-level minimum and maximum instances are disabled.

  • If a revision is active only due to traffic tags:

    • If revision-level minimum instances is set, the specified number of instances will start but does not count toward the total service manual instance count. The revision will not autoscale.
    • If revision-level minimum instances is not set, the revision scales out to at most one instance, in response to traffic sent to the tag URL.

Billing behavior using manual scaling

When you use manual scaling, billing behavior is similar to the behavior when you use the minimum instances feature.

That is, with manual scaling and instance-based billing, manually scaled idle instances are billed as active instances.

If you use manual scaling with request-based billing, manually scaled idle instances are billed as idle minimum-instances. For complete billing details, see the pricing page.

Required roles

To get the permissions that you need to deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.

Configure scaling

You can configure the scaling mode using the Google Cloud console, the Google Cloud CLI, YAML file, or API when you create or update a service:

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. If you are configuring a new service, click Deploy container and select Service to display the Create service form. If you are configuring an existing service, click the service to display its detail panel, then click the pen icon next to Scaling at the top right of the detail panel.

  3. Locate the Service scaling form (for a new service) or the Edit scaling form for an existing service.

    image

    In the field labelled Number of instances, specify the number of container instances for the service.

  4. Click Create for a new service or Save for an existing service.

gcloud

To specify scaling for a new service, use the deploy command:

gcloud beta run deploy SERVICE \
    --scaling=INSTANCE_COUNT \
    --image IMAGE_URL

Replace the following:

  • SERVICE with the name of your service
  • INSTANCE_COUNT with the number of instances for the service. This sets the service to manual scaling. Specify a value of 0 to disable the service. Specify a value of auto to use the default Cloud Run autoscaling behavior.
  • IMAGE_URL with a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL has the shape LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG .

Specify scaling for an existing service by using the following update command:

gcloud beta run services update SERVICE \
   --scaling=INSTANCE_COUNT

YAML

  1. If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:

    gcloud run services describe SERVICE --format export > service.yaml
  2. Update the scalingMode and manualInstanceCount attributes:

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: SERVICE
      annotations:
      run.googleapis.com/launch-stage: BETA
      run.googleapis.com/scalingMode: MODE
      run.googleapis.com/manualInstanceCount: INSTANCE_COUNT

    Replace the following:

    • SERVICE with the name of your Cloud Run service
    • MODE with manual for manual scaling, or automatic for the default Cloud Run autoscaling behavior.
    • INSTANCE_COUNT with the number of instances you are manually scaling for the service. Specify a value of 0 to disable the service.
  3. Create or update the service using the following command:

    gcloud run services replace service.yaml

REST API

To update service-level minimum instances for a given service, send a PATCH HTTP request to the Cloud Run Admin API service endpoint.

For example, using curl:

    curl -H "Content-Type: application/json" \
    -H "Authorization: Bearer ACCESS_TOKEN" \
    -X PATCH \
    -d '{"launchStage":"BETA","scaling":{"manualInstanceCount":MANUAL_INSTANCE_COUNT }}' \
    https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=launchStage,scaling.manualInstanceCount

Replace:

  • ACCESS_TOKEN with a valid access token for an account that has the IAM permissions to update a service. For example, if you are logged into gcloud, you can retrieve an access token using gcloud auth print-access-token. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server.
  • MANUAL_INSTANCE_COUNT with the number of instances for the service. This sets the service to manual scaling. Specify a value of 0 to disable the service.
  • SERVICE with the name of the service.
  • REGION with the Google Cloud region that the service is deployed in.
  • PROJECT_ID with the Google Cloud project ID.

View scaling configuration for your service

To view the scaling configuration instances for your Cloud Run service:

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. Click the service you are interested in to open the Service details panel.

  3. The current scaling setting is shown at the upper right of the service details panel, after the Scaling label, next to the pen icon.

gcloud

Use the following command to view the current scaling configuration for the service:

gcloud beta run services describe SERVICE

Replace SERVICE with the name of your service.

Look for the field Scaling: Manual (Instances: ) near the top of the text returned from the describe.

YAML

Use the following command to download the service YAML configuration:

gcloud run services describe SERVICE --format export > service.yaml

The scaling configuration is contained in the scalingMode and manualInstanceCount attributes.

Disable a service

If you disable a service, requests to its service URL will fail with a Service unavailable or Service disabled error. Requests to service revisions that are only active due to traffic tags are not impacted because those revisions are not disabled.

To disable a service, you set scaling to zero. You can disable a service using the Google Cloud console, the Google Cloud CLI, YAML file, or API:

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. Click the service you want to disable to display its detail panel, then click the pen icon next to Scaling at the top right of the detail panel.

  3. Locate the Edit scaling form and select Manual scaling.

    image

    In the field labelled Number of instances, enter the value 0 (zero).

  4. Click Save.

gcloud

To disable a service, use the following command to set scaling to zero:

gcloud beta run services update SERVICE --scaling=0

Replace SERVICE with the name of your service.

YAML

  1. Download your service's YAML configuration:

    gcloud run services describe SERVICE --format export > service.yaml
  2. Set the manualInstanceCount attribute to zero (0):

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: SERVICE
      annotations:
      run.googleapis.com/launch-stage: BETA
      run.googleapis.com/scalingMode: manual
      run.googleapis.com/manualInstanceCount: `0`

    Replace SERVICE with the name of your Cloud Run service.

  3. Create or update the service using the following command:

    gcloud run services replace service.yaml

REST API

To disable a service, send a PATCH HTTP request to the Cloud Run Admin API service endpoint.

For example, using curl:

    curl -H "Content-Type: application/json" \
    -H "Authorization: Bearer ACCESS_TOKEN" \
    -X PATCH \
    -d '{"launchStage":"BETA","scaling":{"manualInstanceCount":0 }}' \
    https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=launchStage,scaling.manualInstanceCount

Replace:

  • ACCESS_TOKEN with a valid access token for an account that has the IAM permissions to update a service. For example, if you are logged into gcloud, you can retrieve an access token using gcloud auth print-access-token. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server.
  • SERVICE with the name of the service.
  • REGION with the Google Cloud region that the service is deployed in.
  • PROJECT_ID with the Google Cloud project ID.

Schedule-based scaling example

A common use case of manual scaling is changing the instance count based on a predefined schedule. In this example, we use Cloud Scheduler to schedule two jobs, each of which invokes the Cloud Run Admin API to scale the number of instances. The first job sets the service to manually scale out to 10 instances during business hours (9am-5pm, M-F). The second job sets the service to scale in to zero instances during off-hours.

Notice that setting the instances to zero as shown in the example disables the service, but not the Cloud Scheduler jobs. Those jobs continue to run and will reset (and re-enable) the service to 10 instances as scheduled.

In this example, we use the Cloud Run quickstart for simplicity, but you can use a service of your choice.

To set up schedule-based manual scaling:

  1. Deploy your service using the following command:

    gcloud beta run deploy SERVICE \
       --image=us-docker.pkg.dev/cloudrun/container/hello \
       --region=REGION \
       --project PROJECT_ID

    Replace the following variables:

    • REGION with the region the Cloud Run service is deployed to.
    • SERVICE with the name of the Cloud Run service.
  2. Configure your service for manual scaling to 10 instances using the following command:

    gcloud beta run services update SERVICE \
       --region=REGION \
       --scaling=10
  3. Create a Cloud Scheduler job that manually scales the service instances out to 10 instances during business hours:

    gcloud scheduler jobs create http hello-start-instances \
      --location=REGION \
      --schedule="0 9 * * MON-FRI" \
      --time-zone=America/Los_Angeles \
      --uri=https://run.googleapis.com/v2/projects/PROJECT_ID/
      locations/REGION/services/hello?update_mask=launchStage,scaling.manualInstanceCount \
      --headers=Content-Type=application/json,X-HTTP-Method-Override=PATCH \
      --http-method=PUT \
      --message-body='{"launchStage":"BETA","scaling":{"manualInstanceCount":10}}' \
      --oauth-service-account-email=PROJECT_NUMBER-compute@developer.gserviceaccount.com

    This command create a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting the number of instances to 10. The example uses the Compute Engine default service account PROJECT_NUMBER-compute@developer.gserviceaccount.com for the Cloud Scheduler jobs. You can use any service account that has permissions to update Cloud Run services.

  4. Create a Cloud Scheduler job that manually scales the service instances in to zero instances during off hours, disabling the service:

    gcloud scheduler jobs create http hello-stop-instances \
      --location=REGION \
      --schedule="0 17 * * MON-FRI" \
      --time-zone=America/Los_Angeles \
      --uri=https://run.googleapis.com/v2/projects/PROJECT_ID/
      locations/REGION/services/hello?update_mask=launchStage,scaling.manualInstanceCount \
      --headers=Content-Type=application/json,X-HTTP-Method-Override=PATCH \
      --http-method=PUT \
      --message-body='{"launchStage":"BETA","scaling":{"manualInstanceCount":0}}' \
      --oauth-service-account-email=PROJECT_NUMBER-compute@developer.gserviceaccount.com

    This command create a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting manual scaling instances to zero. This effectively disables the service, but not the Cloud Scheduler jobs, which will continue to run and reset (and re-enable) the service to 10 instances as scheduled.