This documentation is for the Latest version of Knative serving, which uses fleets and Anthos Service Mesh. Learn more.

The past version (Cloud Run for Anthos) has been archived but the documentation remains available for existing users.

Available versions

Latest
Archive

Container runtime contract

This page lists key requirements and behaviors of containers in Knative serving.

Supported languages and images

Your container image can run code written in the programming language of your choice and use any base image, provided that it respects the constraints listed in this page.

Executables in the container image must be compiled for Linux 64-bit. Knative serving specifically supports the Linux x86_64 ABI format.

Listening for requests on the correct port

The container must listen for requests on 0.0.0.0 on the port to which requests are sent. By default, requests are sent to 8080, but you can configure Knative serving to send requests to the port of your choice.

Inside Knative serving container instances, the value of the PORT environment variable always reflects the port to which requests are sent. It defaults to 8080.

Transport layer encryption (TLS)

The container should not implement any transport layer security directly. TLS is terminated by Knative serving for HTTPS and gRPC, and then requests are proxied as HTTP or gRPC to the container without TLS.

Responses

Your container instance must send a response within the time specified in the request timeout setting after it receives a request, including the container instance startup time. Otherwise the request is ended and a 504 error is returned.

Environment variables

The following environment variables are automatically added to the running containers:

Name	Description	Example
`PORT`	The port your HTTP server should listen on.	`8080`
`K_SERVICE`	The name of the Knative serving service being run.	`hello-world`
`K_REVISION`	The name of the Knative serving revision being run.	`hello-world.1`
`K_CONFIGURATION`	The name of the Knative serving configuration that created the revision.	`hello-world`

Filesystem access

The filesystem of your container is writable and is subject to the following behavior:

This is an in-memory filesystem, so writing to it uses the container instance's memory.
Data written to the filesystem does not persist when the container instance is stopped.

Container instance lifecycle

In response to incoming requests, a service is automatically scaled to a certain number of container instances, each of which runs the deployed container image.

When a revision does not receive any traffic, it is scaled in to the minimum number of container instances configured (zero by default).

Startup

Your container instances must listen for requests within 4 minutes after being started. During this startup time, container instances are allocated CPU.

Computation is scoped to a request

After startup, you should only expect to be able to do computation within the scope of a request: a container instance does not have any CPU allocated if it is not processing a request.

Shutdown

A container instance can be shut down at any time.

When a container instance needs to be shut down, new incoming requests are routed to other instances and requests currently being processed are given time to complete. The container instance then receives a SIGTERM signal indicating the start of a 10 second period before being shut down (with a SIGKILL signal). During this period, the container instance is allocated CPU and billed. If the container instance does not catch the SIGTERM signal, it is immediately shut down.

Unless a container instance must be kept idle due to the minimum number of container instances configuration setting, it will not be kept idle for longer than 15 minutes.

Container instance resources

The resource requests for your container instances are scheduled in the nodes of your GKE cluster. Each node shares the total amount of compute resource that is available to your GKE cluster.

Therefore, the amount of compute resource that is available to a Knative serving service is limited only by the amount of available resources in that node. Learn more about compute resources for requests.

For example, if you allocate 512MiB of memory for a container, and that container is running in the only pod within a node that has 8GiB of memory, then that container can try to use more RAM.

CPU

By default, the queue proxy sidecar reserves 25 milliCPU and there is no limit to the amount of vCPU that your Knative serving services can use. The queue proxy's resource consumption depends on how many requests are getting queued and the size of the requests.

A vCPU is implemented as an abstraction of underlying hardware to provide the approximate equivalent CPU time of a single hardware hyper-thread on variable CPU platforms. The container instance may be executed on multiple cores simultaneously. The vCPU is only allocated during container instance startup and request processing, it is throttled otherwise.

To allocate a different vCPU value, refer to the documentation for allocating CPU.

Memory

By default, the queue proxy sidecar does not reserve any memory and there is no limit to the amount of memory that your Knative serving services can use. If desired, you can configure memory limits for your Knative serving services. For more information about how GKE handles memory, see Allocatable memory and CPU resources.

Typical uses of memory include:

Code loaded into memory to run the service
Writing to the filesystem
Extra processes running in the container such as an nginx server
In-memory caching systems such as the PHP OpCache
Per request memory usage

Concurrency

Each Knative serving container instance by default is set to multiple concurrency, where each container instance can receive more than one request at the same time. You can change this by setting concurrency.

Container instance sandbox

Knative serving does not use a container sandbox.

Container instance metadata server

Knative serving container instances expose a metadata server that you can use to retrieve details about your container instance, such as the project ID, region, instance ID or service accounts.

You can access this data from the metadata server using simple HTTP requests to the http://metadata.google.internal/ endpoint with the Metadata-Flavor: Google header: no client libraries are required. For more information, see Getting metadata.

The following table lists some of the available metadata server information:

Path	Description
`/computeMetadata/v1/project/project-id`	Project ID of this Knative serving service
`/computeMetadata/v1/instance/region`	Region of this Knative serving service
`/computeMetadata/v1/instance/id`	Unique identifier of the container instance (also available in logs).
`/computeMetadata/v1/instance/service-accounts/default/token`	Generates a token for the runtime service account of this Knative serving service