Billing settings for services

This page describes billing settings assuming the use of the default Cloud Run autoscaling behavior. See Billing behavior using manual scaling for additional considerations if you use manual scaling.

There are two billing settings in Cloud Run services:

Request-based billing (default): Cloud Run instances are only charged when they process requests, when they start, and when they shut down. See instance lifecycle for more details. This setting was previously called CPU only allocated during request processing.
Instance-based billing: Cloud Run instances are charged for the entire lifecycle of instances, even when there are no incoming requests. Instance-based billing can be useful for running short-lived background tasks and other asynchronous processing tasks. This setting was previously called CPU always allocated.

If you choose request-based billing, you are charged per request and only when the instance processes a request. If you choose instance-based billing, you are charged for the entire lifecycle of the instance. See the Cloud Run pricing tables for details.

Recommender automatically looks at traffic received by your Cloud Run service over the past month, and will recommend switching from request-based billing to instance-based billing, if this is cheaper.

CPU allocation impact

Selecting a billing setting impacts how CPU is allocated.

With request-based billing, CPU is only allocated during request processing.
With instance-based billing, CPU is allocated for the entire container instance lifecycle.

How to choose the appropriate billing setting

Choosing the appropriate billing setting for your use case depends on several factors, such as traffic patterns, background execution, and cost, each of which is described in the following sections.

Traffic patterns considerations

Request-based billing is recommended when incoming traffic is sporadic, bursty or spiky.
Instance-based billing is recommended when incoming traffic is steady, slowly varying.

Background execution considerations

Selecting instance-based billing allocates CPU even outside of request processing, letting you execute short-lived background tasks and other asynchronous processing work after returning responses. For example:

Leveraging monitoring agents like OpenTelemetry that may assume to be able to run in the background.
Using Go's Goroutines, Node.js async, Java threads, and Kotlin coroutines.
Using application frameworks that rely on built-in scheduling/background functionalities.

Idle instances, including those kept warm using minimum instances, can be shut down at any time. If you need to finish outstanding tasks before the container is terminated, you can trap SIGTERM to give a instance 10 seconds grace time before it is stopped.

Consider using Cloud Tasks for executing asynchronous tasks. Cloud Tasks automatically retries failed tasks and supports running times up to 30 minutes.

Cost considerations

If you are using request-based billing, instance-based billing can be more economical if:

Your Cloud Run service is processing high number of current requests at a rather steady rate.
You don't see a lot of "idle" instances when looking at the instance count metric.

You can use the pricing calculator to estimate cost differences.

Autoscaling considerations

Cloud Run by default autoscales the number of container instances.

For a service set to request-based billing, Cloud Run autoscales the number of instances based on CPU utilization only during request processing.

For a service set to instance-based billing, Cloud Run autoscales the number of instances based on CPU utilization for the entire lifecycle of the container instance, except when scaling to and from zero, where it only uses requests.

See manual scaling for additional considerations if you use manual scaling instead of the Cloud Run autoscaling feature.

Instance-based billing considerations

Even if the billing setting is set to instance-based billing, Cloud Run autoscaling is still in effect, and may terminate instances if they aren't needed to handle incoming traffic or current CPU utilization outside of requests. An instance will never stay idle for more than 15 minutes after processing a request unless it is kept active using minimum instances.

Combining instance-based billing with a number of minimum instances results in a number of instances up and running with full access to CPU resources, enabling background processing use cases. When using this pattern, Cloud Run applies instance autoscaling even if a service is using CPU outside of any requests.

If you use healthcheck probes, you must use instance-based billing for every probe. See container healthcheck probes for billing details.

Required roles

To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

Cloud Run Developer (roles/run.developer) on the Cloud Run service
Service Account User (roles/iam.serviceAccountUser) on the service identity

If you are deploying a service or function from source code, you must also have additional roles granted to you on your project and Cloud Build service account.

For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.

Set and update billing

Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.

If you select instance-based billing, you must specify at least 512MiB of memory.

You can change the billing setting using the Google Cloud console, the gcloud CLI, or a YAML file when you create a new service or deploy a new revision:

Console

In the Google Cloud console, go to the Cloud Run Services page:

Go to Cloud Run
Click Deploy container to configure a new service. If you are configuring an existing service, click the service, then click Edit and deploy new revision.
If you are configuring a new service, fill out the initial service settings page.
Select a billing setting under Billing. Select request-based billing for your instances to be charged only during request processing. Select instance-based billing for your instances to be charged for the entire lifetime of instances.
Click Create or Deploy.

gcloud

You can update the billing setting. To set instance-based billing for a given service:

gcloud run services update SERVICE --no-cpu-throttling

Replace SERVICE with the name of your service.

To set request-based billing:

gcloud run services update SERVICE --cpu-throttling

You can also set your billing setting during deployment. To set your billing setting to instance-based billing:

gcloud run deploy --image IMAGE_URL --no-cpu-throttling

To set your billing setting to request-based billing:

gcloud run deploy --image IMAGE_URL --cpu-throttling

Replace IMAGE_URL with a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG .

YAML

If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
```
gcloud run services describe SERVICE --format export > service.yaml
```
Update the cpu attribute:
```
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: SERVICE
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/cpu-throttling: 'BOOLEAN'
      name: REVISION
```
Replace the following:
- SERVICE: the name of your Cloud Run service
- BOOLEAN: true to set request-billing, or false to set instance-based billing.
- REVISION with a new revision name or delete it (if present). If you supply a new revision name, it must meet the following criteria:
  - Starts with SERVICE-
  - Contains only lowercase letters, numbers and -
  - Does not end with a -
  - Does not exceed 63 characters
Create or update the service using the following command:
```
gcloud run services replace service.yaml
```

Terraform

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

Add the following to a google_cloud_run_v2_service resource in your Terraform configuration:

resource "google_cloud_run_v2_service" "default" {
  name     = "cloudrun-service-cpu-allocation"
  location = "us-central1"

  deletion_protection = false # set to "true" in production

  template {
    containers {
      image = "us-docker.pkg.dev/cloudrun/container/hello"
      resources {
        # If true, garbage-collect CPU when once a request finishes
        cpu_idle = false
      }
    }
  }
}

View Billing settings

To view the current Billing settings for your Cloud Run service:

Console

In the Google Cloud console, go to the Cloud Run Services page:

Go to Cloud Run
Click the service you are interested in to open the Service details page.
Click the Revisions tab.
In the details panel at the right, the Billing setting is listed under the General tab.

gcloud

Use the following command:
```
gcloud run services describe SERVICE
```
Locate the Billing setting in the returned configuration.