Deployment options and resource model

Deployment options

Cloud Run offers multiple deployment options. After deployment, all options run as sandboxed container instances on Cloud Run's fully managed and highly scalable infrastructure.

Deployable container images

You can deploy any container image that adheres to Cloud Run's container runtime contract to a Cloud Run service, job, or worker pool.

Deploy from source code

For convenience, Cloud Run lets you build and deploy source code from a single command. See deploying services from source code and deploying worker pools from source code for details.

When you deploy from source code, Cloud Build transforms the code into a container image stored in Artifact Registry. You can deploy source code that includes a Dockerfile or that uses one of the supported language runtimes.

Functions

You can deploy single-purpose functions that respond to events emitted from your cloud infrastructure and services. Cloud Run triggers your function when a watched event fires.

A functions deployment is a special type of source code deployment, where you only have to provide the function code. You can write Cloud Run functions using a number of supported programming languages.

Deploying a function creates a Cloud Run service.

Continuous source code deployment from git

Cloud Run helps you configure continuous deployment from Git. Like source deployments, you can deploy source code that include a Dockerfile or are written in one of the supported language runtimes.

Continuous deployment from Git is available for Cloud Run services. You can manually configure them in Cloud Build for Cloud Run jobs.

Cloud Run services

Services are one of the main resources of Cloud Run. Each service is located in a specific Google Cloud region. To provide redundancy and failover, Cloud Run automatically replicates services across multiple zones within a region. A given Google Cloud project can run many services in different regions.

Each service exposes a unique endpoint. By default, Cloud Run automatically scales to handle incoming requests. You can optionally change the scaling behavior to manual scaling if needed. You can deploy a service from a container, repository, or source code.

The following diagram shows the Cloud Run resource model for services:

Cloud Run services and revisions

The diagram shows a Google Cloud project containing three Cloud Run services, Service A, Service B and Service C, each of which has several revisions:

Service A is getting multiple requests so Cloud Run has started multiple instances to handle the load. Each of these instances runs just one container (the application's container).
Service B has no requests so it is idle and Cloud Run isn't running any instances.
Service C has requests and has scaled to handle the load by creating multiple instances. In this case, each of these instances runs a set of multiple containers. In each set, only the ingress container receives the request but the other containers help to fulfill the request.

Cloud Run service revisions

Each deployment to a service creates a revision. A revision consists of one or more container images, along with configuration settings such as environment variables, memory limits, or request concurrency value.

You cannot modify a revision after its creation. For example, when you deploy a container image to a new service, Cloud Run creates the first revision. If you then deploy a different container image to that same service, Cloud Run creates a second revision. If you subsequently set an environment variable, Cloud Run creates a third revision. Over time, older, Cloud Run eventually removes unused revisions.

Cloud Run automatically routes requests as soon as possible to the latest healthy service revision.

Cloud Run service instances

Cloud Run automatically scales each service revision receiving requests to the number of instances needed to handle all these requests. Note that instances can receive many requests at the same time. With the request concurrency setting, you can set the maximum number of requests that can be sent in parallel to each instance of a revision.

Cloud Run jobs

Each job is located in a specific Google Cloud region and consists of one or more job tasks that are executed to run one or more containers to completion. Job tasks are independent and can be executed in parallel in a given job execution.

Cloud Run job executions

When a job is executed, a job execution is created in which all job tasks are started. All tasks in a job execution must complete successfully for the job execution to be successful. You can set timeouts on task and specify the number of retries in case of task failure.

If any task exceeds its maximum number of retries, Cloud Run markes that task as failed and the job as failed. By default, tasks execute in parallel up to a maximum of 100, but you can specify a lower maximum if any of your backing resources, such as a database, require it.

Cloud Run job tasks

Every job execution executes a number of tasks in parallel, with each task running one instance. Cloud Run automatically attempts to run any failed tasks again, depending on the job's configuration for maxRetries.

Cloud Run worker pools

Worker pools are a Cloud Run resource specifically designed for non-request workloads, such as pull queues. Note that worker pools don't have the following features:

No endpoint/URL
No requirement for the deployed container to listen for requests at a port
No automatic scaling

Similar to a Cloud Run service, deploying or updating a worker pool creates a new revision.

Worker pool instances can be manually scaled as needed to scale enough instances for the workloads. However, you can create your own autoscaler if necessary. An example of this is the Kafka autoscaler, which handles the scaling for workloads incoming from the Kafka message queue.