Resource model

The following diagram shows the Cloud Run resource model for services:

Cloud Run services and revisions

The diagram shows a Google Cloud project containing three Cloud Run services, Service A, Service B and Service C, each of which has several revisions.

In the diagram, Service A is receiving many requests, which results in the startup and running of several instances, each running a single container. Note that Service B is not currently receiving requests, so no instance is started yet. Service C is running multiple containers per instance within each revision: note that only the ingress container receives the request. Every instance with multiple containers scales as an independent unit.

Cloud Run services

The service is the main resource of Cloud Run. Each service is located in a specific Google Cloud region (Cloud Run). For redundancy and failover, services are automatically replicated across multiple zones in the region they are in. A given Google Cloud project can run many services in different regions.

Each service exposes a unique endpoint and automatically scales the underlying infrastructure to handle incoming requests.

Cloud Run revisions

Each deployment to a service creates a revision. A revision consists of one or more container images, along with environment settings such as environment variables, memory limits, or concurrency value.

Revisions are immutable: once a revision has been created, it cannot be modified. For example, when you deploy a container image to a new Cloud Run service, the first revision is created. If you then deploy a different container image to that same service, a second revision is created. If you subsequently set an environment variable, a third revision is created, and so on.

Requests are automatically routed as soon as possible to the latest healthy service revision.

Cloud Run jobs

Each job is located in a specific Google Cloud region and executes one or more containers to completion. A job consists of one or multiple independent tasks that are executed in parallel in a given job execution. Each task runs one container, and might retry it.

Cloud Run job executions

When a job is executed, a job execution is created in which all job tasks are started. All tasks in a job execution must complete successfully for the job execution to be successful. You can set timeouts on tasks and specify the number of retries in case of task failure. If any task exceeds its maximum number of retries, that task is marked as failed and the job is marked as failed. By default, tasks execute in parallel up to a maximum of 100, but you can specify a lower maximum if any of your backing resources require it.

Cloud Run instances

Each revision receiving requests is automatically scaled to the number of instances needed to handle all these requests. Note that the ingress container within an instance can receive many requests at the same time. With the concurrency setting, you can set the maximum number of requests that can be sent in parallel to a given instance.