This page lists key requirements and behaviors of containers in Cloud Run. There are a few differences between Cloud Run services and Cloud Run jobs: these are called out where appropriate.
Supported languages and images
Your container image can run code written in the programming language of your choice and use any base image, provided that it respects the constraints listed in this page.
Executables in the container image must be compiled for Linux 64-bit. Cloud Run specifically supports the Linux x86_64 ABI format.
Cloud Run accepts container images in the Docker Image Manifest V2, Schema 1, Schema 2, and OCI image formats. Cloud Run also accepts Zstd compressed container images.
If deploying a multi-architecture image,
the manifest list
must include linux/amd64
.
For functions deployed with Cloud Run, you can use one of the Cloud Run runtime base images that are published by Google Cloud's buildpacks to receive automatic security and maintenance updates. See the Runtime support schedule the supported runtimes.
Listening for requests on the correct port (services)
A Cloud Run service starts Cloud Run instances to handle incoming requests. A Cloud Run instance always has one single ingress container that listens for requests, and optionally one or more sidecar containers. The following port configuration details apply only to the ingress container, not to sidecars.
The ingress container within an instance must listen for requests on
0.0.0.0
on the port to which requests are sent. By default, requests are sent to 8080
, but you can
configure Cloud Run
to send requests to the port of your choice. Cloud Run injects the
PORT
environment variable into the ingress container.
Container running in a job execution must exit upon completion
For Cloud Run jobs, the container must exit with exit code 0 when the job has successfully completed, and exit with a non-zero exit code when the job has failed.
Because jobs should not serve requests, the container should not listen on a port or start a web server.
Transport layer encryption (TLS)
The container should not implement any transport layer security directly. TLS is terminated by Cloud Run for HTTPS and gRPC, and then requests are proxied as HTTP/1 or gRPC to the container without TLS.
If you configure a Cloud Run service to use HTTP/2 end-to-end, your container must handle requests in HTTP/2 cleartext (h2c) format, because TLS is still terminated automatically by Cloud Run.
Responses (services)
For Cloud Run services, your container must send a response within the time specified in the request timeout setting after it receives a request, including the container startup time. Otherwise the request is ended and a 504 error is returned.
Response caching and cookies
If your Cloud Run service's response contains a Set-Cookie
header, Cloud Run sets the Cache-Control
header to private
so
that the response is not cached. This prevents other users from retrieving the
cookie.
Environment variables
Different sets of environment variables are available for Cloud Run services and jobs.
Environment variables for services
The following environment variables are automatically added to all the running
containers except PORT
. The PORT
variable is only added to the ingress container:
Name | Description | Example |
---|---|---|
PORT |
The port your HTTP server should listen on. | 8080 |
K_SERVICE |
The name of the Cloud Run service being run. | hello-world |
K_REVISION |
The name of the Cloud Run revision being run. | hello-world.1 |
K_CONFIGURATION |
The name of the Cloud Run configuration that created the revision. | hello-world |
Environment variables for jobs
For Cloud Run jobs, the following environment variables are set:
Name | Description | Example |
---|---|---|
CLOUD_RUN_JOB |
The name of the Cloud Run job being run. | hello-world |
CLOUD_RUN_EXECUTION |
The name of the Cloud Run execution being run. | hello-world-abc |
CLOUD_RUN_TASK_INDEX |
The index of this task. Starts at 0 for the first task and increments by 1 for every successive task, up to the maximum number of tasks minus 1. If you set --parallelism to greater than 1, tasks might not follow the index order. For example, it would be possible for task 2 to start before task 1. |
0 |
CLOUD_RUN_TASK_ATTEMPT |
The number of times this task has been retried. Starts at 0 for the first attempt and increments by 1 for every successive retry, up to the maximum retries value. | 0 |
CLOUD_RUN_TASK_COUNT |
The number of tasks defined in the --tasks parameter. |
1 |
Request and response header requirements (services)
For Cloud Run services, header names are restricted to printable non-whitespace ASCII, and cannot contain colons. Header values are restricted to visible ASCII characters plus space and horizontal tab as per IETF RFC 7230
Filesystem access
The filesystem in each of your containers is writable and is subject to the following behavior:
- This is an in-memory filesystem, so writing to it uses the instance's memory.
- Data written to the filesystem does not persist when the instance is stopped.
Note that you cannot specify a size limit for this filesystem, so you can potentially use up all the memory allocated to your instance by writing to the in-memory filesystem, which will crash the instance. You can avoid this issue if you use a dedicated in-memory volume with a size limit.
Instance lifecycle
Lifecycle characteristics differ for Cloud Run jobs and services, so these are described separately in the following subsections.
For services
The following apply to services only.
Service scaling
A Cloud Run service is automatically scaled to the number of instances needed to handle all incoming requests, events, or CPU utilization.
Every instance runs a fixed number of containers – one ingress container and optionally one or more sidecar containers.
When a revision does not receive any traffic, it is scaled in to the minimum number of instances configured (zero by default).
Startup
For Cloud Run services, your instances must listen for requests within 4 minutes after being started and all containers within the instance need to be healthy. During this startup time, instances are allocated CPU. You can enable startup CPU boost to temporarily increase CPU allocation during instance startup in order to reduce startup latency.
Requests will be sent to the ingress container as soon as it is listening on the configured port.
A request waiting for an instance will be kept pending in a queue as follows:
- If new instances are starting up, such as during a scale-out, requests will pend for at least the average startup time of container instances of this service. This includes when the request initiates a scale-out, such as when scaling from zero.
- If the startup time is less than 10 seconds, requests will pend for up to 10 seconds.
- If there are no instances in the process of starting, and the request does not initiate a scale-out, requests will pend for up to 10 seconds.
You can configure a startup probe to determine whether the container has started and is ready to serve requests.
For a Cloud Run service consisting of multi-container instances, you can specify the sequence in which the containers are started within the instance by configuring the container startup order.
Processing a request
For Cloud Run services, CPU is always allocated to all containers including sidecars within an instance as long as the Cloud Run revision is processing at least one request.
Idle
For Cloud Run services, an idle instance is one that is not processing any requests.
The CPU allocated to all containers in an idle instance depends on the configured billing settings.
Unless an instance must be kept idle due to the minimum number of instances configuration setting, it will not be kept idle for longer than 15 minutes.
Shutdown
For Cloud Run services, an idle instance can be shut down at any time, including instances kept warm via a minimum number of instances. If an instance that is processing requests needs to be shut down, new incoming requests are routed to other instances and requests currently being processed are given time to complete. In exceptional cases, Cloud Run might initiate a shutdown and send a SIGTERM signal to a container that is still processing requests.
Before shutting down an instance, Cloud Run sends a SIGTERM
signal to all the containers in an instance,
indicating the start of a 10 second period
before the actual shutdown occurs, at which point Cloud Run sends a SIGKILL
signal.
During this period, the instance is allocated CPU and billed.
In services that use the first generation execution environment, if the
instance does not trap the SIGTERM
signal, it is
immediately shut down. (Refer to the code samples
to learn how to trap the SIGTERM
signal.)
Forced termination
If one or more Cloud Run containers exceed the
total container memory limit,
the instance is terminated. All requests that are still processing on the instance
end with an HTTP 500
error.
For jobs
For Cloud Run jobs, container instances run until the container instance exits, or until the task timeout is reached or until the container crashes.
Forced termination
A Cloud Run container instance that exceeds the
allowed memory limit
is terminated. All requests that are still processing on the container instance
end with an HTTP 500
error.
If a task exceeds the task timeout,
Cloud Run sends a 'SIGTERM' signal indicating the start of a
10 second period before the actual shutdown
occurs, at which point Cloud Run sends a
SIGKILL
signal, shutting down the container instance.
During this period, container instances are allocated CPU for their entire lifecycle and are billed.
Refer to the SIGTERM code sample
to learn how to trap the SIGTERM
signal.
Container instance resources
CPU
Each Cloud Run container in an instance by default gets allocated the vCPU that has been configured (1 by default). It is possible to configure CPU limits on each container separately.
A vCPU is implemented as an abstraction of underlying hardware to provide the approximate equivalent CPU time of a single hardware hyper-thread on variable CPU platforms. All CPU platforms used by Cloud Run support the AVX2 instruction set. Note that the container contract does not contain any additional CPU platform details.
The container might be executed on multiple cores simultaneously.
For Cloud Run services, CPU allocation depends on the selected billing.
If you select instance-based billing, CPU is allocated during the life of the instance. If you select request-based billing (default), CPU is allocated when instances are processing requests. Refer to billing settings for details.
If you have configured a number of minimum instances, you must use instance-based billing to ensure that CPU is allocated outside of requests.
You can enable startup CPU boost to temporarily increase CPU allocation during instance startup in order to reduce startup latency.
Memory
Each Cloud Run container by default gets allocated the memory that has been configured, (512 MiB by default). It is possible to configure memory limits on each container separately.
Typical uses of memory include:
- Code loaded into memory to run the service
- Writing to the filesystem
- Extra processes running in the container such as an nginx server
- In-memory caching systems such as the PHP OpCache
- Per request memory usage
- Shared in-memory volumes
GPU
You can configure a container in a Cloud Run instance to access a GPU. If the Cloud Run service is deployed with sidecar containers, only one container in the deployment can access the GPU. See Configure GPU for requirements and details.
NVIDIA libraries
By default, all of the NVIDIA L4 driver libraries are mounted under
/usr/local/nvidia/lib64
. Cloud Run automatically appends this path to
the LD_LIBRARY_PATH
environment variable (i.e. ${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64
)
of the container with the GPU. This allows the dynamic linker to find the
NVIDIA driver libraries.
If you want to use a CUDA version greater than 12.2,
the easiest way is to depend on a newer NVIDIA base image
with forward compatibility packages already installed. Another option is to
manually install the NVIDIA forward compatibility packages
and add them to LD_LIBRARY_PATH
. Consult NVIDIA's compatibility matrix
to determine which CUDA versions are forward compatible with the provided NVIDIA
driver version (535.129.03).
Concurrency (services)
For Cloud Run services, each Cloud Run instance by default is set to multiple concurrency, where the ingress container can receive more than one request at the same time. You can change this by setting concurrency.
Container sandbox
If you use the first generation execution environment, the Cloud Run containers are sandboxed using the gVisor container runtime sandbox. As documented in the gVisor syscall compatibility reference, some system calls might not be supported by this container sandbox.
If you use the second generation execution environment,
you have full Linux compatibility.
Cloud Run jobs always use the second generation execution environment.
Within the second generation execution environment,
/sys/class/dmi/id/product_name
is set to Google Compute Engine
.
The second generation execution environment runs your service code in a separate process namespace, so it starts as the container init process which has special process semantics. In the first generation execution environment, your service code does not run as the container init process.
Instance metadata server
Cloud Run instances expose a metadata server that you can use to retrieve details about your containers, such as the project ID, region, instance ID or service accounts. You can also use the metadata server to generate tokens for the service identity.
To access metadata server data, use HTTP requests to the
http://metadata.google.internal/
endpoint with the Metadata-Flavor: Google
header: no client libraries are required. For more information, see
Getting metadata.
The following table lists some of the available metadata server information:
Path | Description |
---|---|
/computeMetadata/v1/project/project-id |
Project ID of the project the Cloud Run service or job belongs to |
/computeMetadata/v1/project/numeric-project-id |
Project number of the project the Cloud Run service or job belongs to |
/computeMetadata/v1/instance/region |
Region of this Cloud Run service or job, returns projects/PROJECT-NUMBER/regions/REGION |
/computeMetadata/v1/instance/id |
Unique identifier of the instance (also available in logs). |
/computeMetadata/v1/instance/service-accounts/default/email |
Email for the service identity of this Cloud Run service or job. |
/computeMetadata/v1/instance/service-accounts/default/token |
Generates an OAuth2 access token for the service account of this Cloud Run service or job. The Cloud Run service agent is used to fetch a token. This endpoint will return a JSON response with an access_token attribute. Read more about how to extract and use this access token. |
Note that Cloud Run does not provide details about which
Google Cloud zone the instances
are running in. As a consequence, the metadata attribute /computeMetadata/v1/instance/zone
always returns projects/PROJECT-NUMBER/zones/REGION-1
.
File names
The file names that you use in containers must be UTF-8 compatible, either UTF-8 or something that can be safely auto-converted to UTF-8. If your file names use different encodings, run Docker build on a machine with UTF-8 compatible filenames, and avoid copying files to a container that contains incompatible UTF-8 names.
Container deployment fails if file names are not UTF-8 compatible. Note that there's no restriction on the character encoding you use within a file.
Outbound connections
Outbound request timeouts
For Cloud Run services and jobs, there is a timeout after 10 minutes of idle time for requests from your container to VPC. For requests from your container to the internet, there is a timeout after 20 minutes of idle time.
Outbound connection resets
Connection streams from your container to both VPC and internet can be occasionally terminated and replaced when underlying infrastructure is restarted or updated. If your application reuses long-lived connections, we recommend that you configure your application to re-establish connections to avoid the reuse of a dead connection.