Stay organized with collections
Save and categorize content based on your preferences.
This page provides an architectural overview of Knative serving
and covers the changes that occur when you enable Knative serving in your
Google Kubernetes Engine cluster.
This information is useful for the following types of users:
Users getting started with Knative serving.
Operators with experience in running GKE clusters.
Application developers who need to know more about how
Knative serving integrates with Kubernetes clusters to design
better applications or configure their Knative serving
application.
Components in the default installation
When you install Knative serving as an add-on to your
Google Kubernetes Engine cluster, knative-serving and gke-system namespaces are
automatically created. The following components are deployed into one of those
namespaces:
Components running in the knative-serving namespace:
Activator: When pods
are scaled in to zero or become overloaded with requests sent to the
revision, Activator temporarily queues the requests and sends metrics to
Autoscaler to spin up more pods. Once Autoscaler scales the revision based
on the reported metrics and available pods, Activator forwards queued
requests to the revision. Activator is a data plane component; data plane
components manage all functions and processes forwarding user traffic.
Autoscaler: Aggregates and processes metrics from Activator and the
queue proxy sidecar container, a component in the data plane that enforces
request concurrency limits. Autoscaler then calculates the observed
concurrency for the revision and adjusts the size of the deployment based
on the desired pod count. When pods are available in the revision,
Autoscaler is a control plane component; otherwise, when pods are scaled
in to zero, Autoscaler is a data plane component.
Controller: Creates and updates the child resources of Autoscaler and
the Service objects. Controller is a control plane
component; control plane components manage all functions and processes
establishing the request path of user traffic.
Webhook: Sets default values, rejects inconsistent and invalid
objects, and validates
and mutates
Kubernetes API calls against Knative serving resources.
Webhook is a control plane component.
Components running in the gke-system namespace:
Cluster Local Gateway: Load balancer in the data plane responsible for
handling internal traffic that arrives from one
Knative serving to another. The Cluster Local
Gateway can only be accessed from within your GKE
cluster and does not register an external domain to prevent accidental
exposure of private information or internal processes.
Istio Ingress Gateway: Load balancer in the data plane that is
responsible for receiving and handling incoming traffic from outside
the cluster, including traffic from either external
or internal networks.
Istio Pilot: Configures the Cluster Local Gateway and the Istio
Ingress Gateway to handle HTTP requests at the correct endpoints. Istio
Pilot is a control plane component. For more information, see Istio Pilot.
Knative serving components are updated automatically with any
GKE control plane cluster updates. For more information,
see Available GKE versions.
Cluster resource usage
The initial installation for Knative serving approximately requires 1.5
virtual CPU and 1 GB of memory for your cluster. The number of nodes in your
cluster do not affect the space and memory requirements for a
Knative serving installation.
An Activator can consume requests at a maximum of 1000 milliCPU and 600 MiB RAM.
When an existing Activator can't support the number of incoming requests, an
additional Activator spins up, which requires a reservation of 300 milliCPU and
60 MiB RAM.
Every pod created by the Knative serving service creates a
queue proxy sidecar that enforces request concurrency limits. The queue proxy
reserves 25 milliCPU and has no memory reservation. The queue proxy's
consumption depends on how many requests are getting queued and the size of the
requests; there are no limits on the CPU and memory resources it can consume.
Creating a Service
Knative serving Service architecture (click to enlarge)
Knative serving extends Kubernetes by defining a set of Custom
Resource Definitions (CRDs):
Service, Revision, Configuration, and Route. These CRDs define and control how
your applications behave on the cluster:
Knative serving Service is the top level custom resource
defined by Knative serving. It is a single application that
manages the whole lifecycle of your workload. Your service ensures your app
has a route, a configuration, and a new revision for each update of
the service.
Revision is a point-in-time, immutable snapshot of the code and
configuration.
Configuration maintains the current settings for your latest revision and
records a history of all past revisions. Modifying a configuration creates a
new revision.
Route defines an HTTP endpoint and associates the endpoint with one or more
revisions to which requests are forwarded.
When a user creates a Knative serving Service, the following
happen:
The Knative serving Service object defines:
A configuration for how to serve your revisions.
An immutable revision for this version of your service.
A route to manage specified traffic allocation to your revision.
The route object creates VirtualService. The VirtualService object configures
Ingress Gateway and Cluster Local Gateway to route gateway traffic to the
correct revision.
The revision object creates the following control plane components: a
Kubernetes Service object and a Deployment object.
Network configuration connects Activator, Autoscaler, and load balancers
for your app.
Request handling
The following diagram shows a high level overview of a possible request path for
user traffic through the Knative serving data plane components on a
sample Google Kubernetes Engine cluster:
Knative serving cluster architecture (click to enlarge)
The next diagram expands from the diagram above to give an in depth view into
the user traffic's request path, also described in detail below:
Knative serving request path (click to enlarge)
For a Knative serving request path:
Traffic arrives through:
The Ingress Gateway for traffic from outside of clusters
The Cluster Local Gateway for traffic within clusters
The VirtualService component, which specifies traffic routing rules,
configures the gateways so that user traffic is routed to the correct
revision.
Kubernetes Service, a control plane component, determines the next step in
the request path dependent on the availability of pods to handle the
traffic:
If there are no pods in the revision:
Activator temporarily queues the request received and pushes a
metric to Autoscaler to scale more pods.
Autoscaler scales to desired state of pods in Deployment.
Deployment creates more pods to receive additional requests.
Activator retries requests to the queue proxy sidecar.
If the service is scaled out (pods are available), the Kubernetes Service
sends the request to the queue proxy sidecar.
The queue proxy sidecar enforces request queue parameters, single or
multi-threaded requests, that the container can handle at a time.
If the queue proxy sidecar has more requests than it can handle, Autoscaler
creates more pods to handle additional requests.
The queue proxy sidecar sends traffic to the user container.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThis document details the architecture of Knative serving and how it integrates with Google Kubernetes Engine (GKE) clusters, including the automatic creation of \u003ccode\u003eknative-serving\u003c/code\u003e and \u003ccode\u003egke-system\u003c/code\u003e namespaces upon installation.\u003c/p\u003e\n"],["\u003cp\u003eKnative serving utilizes components like Activator, Autoscaler, Controller, and Webhook within the \u003ccode\u003eknative-serving\u003c/code\u003e namespace, and Cluster Local Gateway, Istio Ingress Gateway, and Istio Pilot within the \u003ccode\u003egke-system\u003c/code\u003e namespace to manage traffic and application scaling.\u003c/p\u003e\n"],["\u003cp\u003eThe core concepts of Knative serving are built upon Custom Resource Definitions (CRDs) such as Service, Revision, Configuration, and Route, which collectively define and control the behavior of applications running within the cluster.\u003c/p\u003e\n"],["\u003cp\u003eKnative serving manages the request path, routing incoming traffic through Ingress or Cluster Local Gateways, then through a VirtualService to the appropriate Revision, and finally through the Kubernetes Service, queue proxy sidecar, and user container, scaling up pods via Activator and Autoscaler as needed.\u003c/p\u003e\n"],["\u003cp\u003eInstallation of Knative serving requires approximately 1.5 virtual CPU and 1 GB of memory for the cluster, with additional resources needed for components like Activator and the queue proxy sidecar based on incoming traffic and requests.\u003c/p\u003e\n"]]],[],null,["# Architectural overview of Knative serving\n\nThis page provides an architectural overview of Knative serving and covers the changes that occur when you enable Knative serving in your Google Kubernetes Engine cluster.\n\n\u003cbr /\u003e\n\nThis information is useful for the following types of users:\n\n- Users getting started with Knative serving.\n- Operators with experience in running GKE clusters.\n- Application developers who need to know more about how Knative serving integrates with Kubernetes clusters to design better applications or configure their Knative serving application.\n\nComponents in the default installation\n--------------------------------------\n\nWhen you install Knative serving as an add-on to your\nGoogle Kubernetes Engine cluster, `knative-serving` and `gke-system` namespaces are\nautomatically created. The following components are deployed into one of those\nnamespaces:\n\n- Components running in the `knative-serving` namespace:\n\n - **Activator** : When [pods](https://kubernetes.io/docs/concepts/workloads/pods/pod/) are scaled in to zero or become overloaded with requests sent to the revision, Activator temporarily queues the requests and sends metrics to Autoscaler to spin up more pods. Once Autoscaler scales the revision based on the reported metrics and available pods, Activator forwards queued requests to the revision. Activator is a data plane component; data plane components manage all functions and processes forwarding user traffic.\n - **Autoscaler**: Aggregates and processes metrics from Activator and the queue proxy sidecar container, a component in the data plane that enforces request concurrency limits. Autoscaler then calculates the observed concurrency for the revision and adjusts the size of the deployment based on the desired pod count. When pods are available in the revision, Autoscaler is a control plane component; otherwise, when pods are scaled in to zero, Autoscaler is a data plane component.\n - **Controller** : Creates and updates the child resources of Autoscaler and the [Service objects](#creating_a_service). Controller is a control plane component; control plane components manage all functions and processes establishing the request path of user traffic.\n - **Webhook** : Sets default values, rejects inconsistent and invalid objects, and [validates](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook) and [mutates](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) Kubernetes API calls against Knative serving resources. Webhook is a control plane component.\n- Components running in the `gke-system` namespace:\n\n - **Cluster Local Gateway**: Load balancer in the data plane responsible for handling internal traffic that arrives from one Knative serving to another. The Cluster Local Gateway can only be accessed from within your GKE cluster and does not register an external domain to prevent accidental exposure of private information or internal processes.\n - **Istio Ingress Gateway**: Load balancer in the data plane that is responsible for receiving and handling incoming traffic from outside the cluster, including traffic from either external or internal networks.\n - **Istio Pilot** : Configures the Cluster Local Gateway and the Istio Ingress Gateway to handle HTTP requests at the correct endpoints. Istio Pilot is a control plane component. For more information, see [Istio Pilot](https://istio.io/docs/ops/deployment/architecture/#pilot).\n\nKnative serving components are updated automatically with any\nGKE control plane cluster updates. For more information,\nsee [Available GKE versions](/anthos/run/archive/docs/cluster-versions).\n\n### Cluster resource usage\n\nThe initial installation for Knative serving approximately requires 1.5\nvirtual CPU and 1 GB of memory for your cluster. The number of nodes in your\ncluster do not affect the space and memory requirements for a\nKnative serving installation.\n\nAn Activator can consume requests at a maximum of 1000 milliCPU and 600 MiB RAM.\nWhen an existing Activator can't support the number of incoming requests, an\nadditional Activator spins up, which requires a reservation of 300 milliCPU and\n60 MiB RAM.\n\nEvery pod created by the Knative serving service creates a\nqueue proxy sidecar that enforces request concurrency limits. The queue proxy\nreserves 25 milliCPU and has no memory reservation. The queue proxy's\nconsumption depends on how many requests are getting queued and the size of the\nrequests; there are no limits on the CPU and memory resources it can consume.\n\nCreating a Service\n------------------\n\n[](/static/anthos/run/archive/docs/images/CRfAGCP-service-architecture.svg) Knative serving Service architecture (click to enlarge)\n\nKnative serving extends Kubernetes by defining a set of [Custom\nResource Definitions (CRDs)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources):\nService, Revision, Configuration, and Route. These CRDs define and control how\nyour applications behave on the cluster:\n\n- *Knative serving Service* is the top level custom resource defined by Knative serving. It is a single application that manages the whole lifecycle of your workload. Your service ensures your app has a *route* , a *configuration* , and a new *revision* for each update of the service.\n- *Revision* is a point-in-time, immutable snapshot of the code and configuration.\n- *Configuration* maintains the current settings for your latest revision and records a history of all past revisions. Modifying a configuration creates a new revision.\n- *Route* defines an HTTP endpoint and associates the endpoint with one or more revisions to which requests are forwarded.\n\nWhen a user creates a Knative serving Service, the following\nhappen:\n\n1. The Knative serving Service object defines:\n\n 1. A configuration for how to serve your revisions.\n 2. An immutable revision for this version of your service.\n 3. A route to manage specified traffic allocation to your revision.\n2. The route object creates VirtualService. The VirtualService object configures\n Ingress Gateway and Cluster Local Gateway to route gateway traffic to the\n correct revision.\n\n3. The revision object creates the following control plane components: a\n Kubernetes Service object and a Deployment object.\n\n4. Network configuration connects Activator, Autoscaler, and load balancers\n for your app.\n\nRequest handling\n----------------\n\nThe following diagram shows a high level overview of a possible request path for\nuser traffic through the Knative serving data plane components on a\nsample Google Kubernetes Engine cluster:\n[](/static/anthos/run/archive/docs/images/CRfAGCP-cluster-architecture.svg) Knative serving cluster architecture (click to enlarge)\n\nThe next diagram expands from the diagram above to give an in depth view into\nthe user traffic's request path, also described in detail below:\n[](/static/anthos/run/archive/docs/images/CRfAGCP-request-handling.svg) Knative serving request path (click to enlarge)\n\n\u003cbr /\u003e\n\nFor a Knative serving request path:\n\n1. Traffic arrives through:\n\n - The Ingress Gateway for traffic from outside of clusters\n - The Cluster Local Gateway for traffic within clusters\n2. The VirtualService component, which specifies traffic routing rules,\n configures the gateways so that user traffic is routed to the correct\n revision.\n\n3. Kubernetes Service, a control plane component, determines the next step in\n the request path dependent on the availability of pods to handle the\n traffic:\n\n - If there are no pods in the revision:\n\n 1. Activator temporarily queues the request received and pushes a metric to Autoscaler to scale more pods.\n 2. Autoscaler scales to desired state of pods in Deployment.\n 3. Deployment creates more pods to receive additional requests.\n 4. Activator retries requests to the queue proxy sidecar.\n - If the service is scaled out (pods are available), the Kubernetes Service\n sends the request to the queue proxy sidecar.\n\n4. The queue proxy sidecar enforces request queue parameters, single or\n multi-threaded requests, that the container can handle at a time.\n\n5. If the queue proxy sidecar has more requests than it can handle, Autoscaler\n creates more pods to handle additional requests.\n\n6. The queue proxy sidecar sends traffic to the user container."]]