Quotas and limits


This document lists the quotas and system limits that apply to Google Kubernetes Engine. Quotas specify the amount of a countable, shared resource that you can use, and they are defined by Google Cloud services such as Google Kubernetes Engine. System limits are fixed values that cannot be changed.

Google Cloud uses quotas to help ensure fairness and reduce spikes in resource use and availability. A quota restricts how much of a Google Cloud resource your Google Cloud project can use. Quotas apply to a range of resource types, including hardware, software, and network components. For example, quotas can restrict the number of API calls to a service, the number of load balancers used concurrently by your project, or the number of projects that you can create. Quotas protect the community of Google Cloud users by preventing the overloading of services. Quotas also help you to manage your own Google Cloud resources.

The Cloud Quotas system does the following:

  • Monitors your consumption of Google Cloud products and services
  • Restricts your consumption of those resources
  • Provides a way to request changes to the quota value

In most cases, when you attempt to consume more of a resource than its quota allows, the system blocks access to the resource, and the task that you're trying to perform fails.

Quotas generally apply at the Google Cloud project level. Your use of a resource in one project doesn't affect your available quota in another project. Within a Google Cloud project, quotas are shared across all applications and IP addresses.

To adjust most quotas, use the Google Cloud console. For more information, see Request a quota adjustment.

There are also system limits on GKE resources. System limits can't be changed.

Limits per project

In a single project, you can create a maximum of 100 zonal clusters per zone, plus 100 regional clusters per region.

Note: Clusters created in the Autopilot mode are pre-configured as regional clusters.

Limits per cluster

The following tables describe the limits per GKE cluster.

Any GKE versions specified in the following table apply to both cluster nodes and the control plane.

Limits GKE Standard cluster GKE Autopilot cluster
Nodes per cluster 15,000 nodes

Note: If you plan to run more than 2,000 nodes, use a regional cluster.

Note: Running more than 5,000 nodes is only available for clusters that are regional, either private or with Private Service Connect, and with GKE Dataplane V2 disabled. Contact support to increase this quota limit.

5,000 nodes

Note: If you plan to run more than 1,000 nodes, use GKE Autopilot version 1.23 or newer.

Note: Running more than 400 nodes may require lifting a cluster size quota for clusters that were created on earlier versions. Contact support for assistance.

Nodes per node pool 1,000 nodes per zone

2,000 TPU nodes per zone - requires the following or newer versions: 1.28.5-gke.135500, 1.29.1-gke.1206000, 1.30
Not applicable
Nodes in a zone
  • No node limitations for container-native load balancing with NEG-based Ingress, which is recommended whenever possible. In GKE versions 1.17 and later, NEG-based Ingress is the default mode.
  • 1,000 nodes if you are using Instance Group-based Ingress.
Not applicable
Pods per node1 256 Pods

Note: For GKE versions earlier than 1.23.5-gke.1300, the limit is 110 Pods.

Set dynamically to any value between 8 and 256. GKE considers the cluster size and the number of workloads to provision the maximum Pods per node.

  • For GKE versions earlier than 1.28, the limit is 32 Pods.
  • For Accelerator class Pods and Performance class Pods, the limit is one Pod per node.
Pods per cluster2 200,000 Pods1 200,000 Pods
Containers per cluster 400,000 containers 400,000 containers
Etcd database size 6 GB 6 GB

As a platform administrator, we recommend you to get familiar with how quotas affect large workloads that run on GKE. For additional recommendations, best practices, limits, and quotas for large workloads, see Guidelines for creating scalable clusters.

Limit for API requests

The default rate limit for the Kubernetes Engine API is 3000 requests per min, enforced at intervals of every 100 seconds.

Resource quotas

For clusters with under 100 nodes, GKE applies Kubernetes resource quota to every namespace. These quotas protect the cluster's control plane from instability caused by potential bugs in applications deployed to the cluster. You cannot remove these quotas because they are enforced by GKE.

GKE automatically updates resource quota values in proportion to the number of nodes. For clusters with over 100 nodes, GKE removes the resource quota.

To examine resource quotas, use the following command:

kubectl get resourcequota gke-resource-quotas -o yaml

To view the values for a given namespace, specify the namespace by adding the --namespace option.

Check your quota

Console

  1. In the Google Cloud console, go to the Quotas page.

    Go to Quotas

  2. The Quotas page displays the list of quotas that are prefiltered to GKE quotas.
  3. To search for the exact quota, use the Filter table. If you don't know the name of the quota, you can use the links on the Quotas page.

gcloud

  1. To check your quotas, run the following command:
    gcloud compute project-info describe --project PROJECT_ID

    Replace PROJECT_ID with your own project ID.

  2. To check your used quota in a region, run the following command:
    gcloud compute regions describe example-region

Notes

  1. The maximum number of Pods per GKE Standard cluster includes system Pods. The number of system Pods varies depending on cluster configuration and enabled features.

  2. The maximum number of Pods that can fit in a node depends on the size of your Pod resource requests and the capacity of the node. You might not reach every limit at the same time. As a best practice, we recommend that you load test large deployments.