Quotas and limits

This page explains Google Distributed Cloud release 1.16 quotas and limits for Google Cloud projects, clusters, and nodes.

Limits

The following sections outline some basic limits for your clusters. Take these limits into account when designing your applications to run on Google Distributed Cloud.

Maximum user clusters per admin cluster

Admin clusters manage the lifecycle for user clusters and their associated nodes. Admin clusters control critical user cluster operations, such as cluster creation, cluster or node resets, cluster upgrades, and cluster updates. The total number of user cluster nodes is one of the primary factors limiting performance and reliability.

Based on ongoing testing, an admin cluster can reliably support a maximum of 100 user clusters having 10 nodes each for a total of 1,000 nodes.

Maximum number of pods per user cluster

We recommend that you limit the number of pods per user cluster to 15,000 or fewer. For example, if your cluster has 200 nodes, you should restrict the number of pods per node to 75 or fewer. Likewise, if you want to run 110 pods per node, you should restrict the number of nodes in your cluster to 136 or fewer. The following table provides examples of configurations that are and aren't recommended.

Pods per node Nodes per cluster Pods per Cluster Result
110 200 22,000 Too many pods, not recommended
110 136 14,960 Within limit
100 150 15,000 Within limit
75 200 15,000 Within limit

The maximum number of pods per user cluster recommendation takes precedence over the recommendations for pods per node and nodes per user cluster in the following sections.

Maximum number of nodes per user cluster

We test Google Distributed Cloud to run workloads with up to 500 nodes. However, to ensure optimal performance and reliability, we recommend that you don't exceed 200 nodes per cluster when running workloads in production.

Cluster type Minimum nodes Recommended maximum nodes Absolute maximum nodes
User, Standalone, or Hybrid 1 200 500

For single-node clusters, you must remove the node-role.kubernetes.io/master:NoSchedule taint to run workloads on the node. For details, see Kubernetes taints and tolerations.

Maximum number of pods per node

Google Distributed Cloud supports the configuration of maximum pods per node in the nodeConfig.PodDensity.MaxPodsPerNode setting of the cluster configuration file. The following table shows the minimum and maximum values supported for MaxPodsPerNode, which includes pods running add-on services:

Cluster type Minimum allowed value Recommended maximum value Maximum allowed value
All HA clusters and non-HA user clusters 32 110 250
All other non-HA clusters 64 110 250

Maximum number of endpoints

On RHEL and CentOS, there's a cluster-level limitation of 100,000 endpoints. This number is the sum of all pods that are referenced by a Kubernetes service. If two services reference the same set of pods, this situation counts as two separate sets of endpoints. The underlying nftable implementation on RHEL and CentOS causes this limitation; it's not an intrinsic limitation of Google Distributed Cloud.

Mitigation

For RHEL and CentOS, there are no mitigations. For Ubuntu and Debian systems, we recommend switching from the default nftables to legacy iptables on large-scale clusters.

Dataplane V2 eBPF limit

The maximum number of entries in the BPF lbmap for Dataplane V2 is 65,536. Increases in the following areas can cause the total number of entries to grow:

  • Number of services
  • Number of ports per service
  • Number of backends per service

We recommend that you monitor the actual number of entries used by your cluster to ensure that you don't exceed the limit. Use the following command to get the current entries:

kubectl get po -n kube-system -l k8s-app=cilium | cut -d " " -f1 | grep anetd | head -n1 | \
    xargs -I % kubectl -n kube-system exec % -- cilium bpf lb list | wc -l

We also recommend that you use your own monitoring pipeline to collect metrics from the anetd DaemonSet. Monitor for the following conditions to identify when the number of entries are causing problems:

cilium_bpf_map_ops_total{map_name="lb4_services_v2",operation="update",outcome="fail" } > 0
cilium_bpf_map_ops_total{map_name="lb4_backends_v2",operation="update",outcome="fail" } > 0

LoadBalancer and NodePort Services port limit

The port limit for LoadBalancer and NodePort Services is 2,768. The default port range is 30000-32767. If you exceed the limit, you can't create new LoadBalancer or NodePort Services and you can't add new node ports for existing services.

By default, Kubernetes allocates node ports to Services of type LoadBalancer. These allocations can quickly exhaust available node ports from the 2,768 allotted to your cluster. To save node ports, disable load balancer node port allocation by setting the allocateLoadBalancerNodePorts field to false in the LoadBalancer Service spec. This setting prevents Kubernetes from allocating node ports to LoadBalancer Services. For more information, see Disabling load balancer NodePort allocation in the Kubernetes documentation.

Use the following command to check the number of ports allocated:

kubectl get svc -A | grep : | tr -s ' ' | cut -d ' '  -f6 | tr ',' '\n' | wc -l

Bundled load balancer node connection limits

The number of connections allowed for each node used for bundled load balancing (MetalLB) is 28,000. The default ephemeral port range for these connections is 32768-60999. If you exceed the connection limit, requests to the LoadBalancer Service might fail.

If you need to expose a load balancer service that is capable of handling a substantial number of connections (for Ingress, for example), we recommend that you consider an alternate load balancing method to avoid this limitation with MetalLB.

Cluster quotas

You can register a maximum of 15 clusters by default. To register more clusters in GKE Hub, you can submit a request to increase your quota in the Google Cloud console:

Go to Quotas

Scaling issues

This section describes some issues to keep in mind when scaling your clusters.

Resources reserved for system daemons

Starting from version 1.14, Google Distributed Cloud automatically reserves resources on a node for system daemons such as sshd or udev. CPU and memory resources are reserved on a node for system daemons so that these daemons have the resources they require. Without this feature, which is enabled by default, Pods can potentially consume most of the resources on a node, making it impossible for system daemons to complete their tasks.

Specifically, Google Distributed Cloud reserves 50 millicores of CPU (50 mCPU) and 280 Mebibytes (280 MiB) of memory on each node for system daemons. Note that the CPU unit 'mCPU' stands for "thousandth of a core", and so 50/1000 or 5% of a core on each node is reserved for system daemons. The amount of reserved resources is small and doesn't have a significant impact on Pod performance. However, the kubelet on a node may evict Pods if their use of CPU or memory exceeds the amounts that have been allocated to them.

etcd performance

Disk speed is critical to etcd performance and stability. A slow disk increases etcd request latency, which can lead to cluster stability problems. To improve cluster performance, Google Distributed Cloud stores Event objects in a separate, dedicated etcd instance. The standard etcd instance uses /var/lib/etcd as its data directory and port 2379 for client requests. The etcd-events instance uses /var/lib/etcd-events as its data directory and port 2382 for client requests.

We recommend that you use a solid-state disk (SSD) for your etcd stores. For optimal performance, mount separate disks to /var/lib/etcd and /var/lib/etcd-events. Using dedicated disks ensures that the two etcd instances don't share disk IO.

The etcd documentation provides additional hardware recommendations for ensuring the best etcd performance when running your clusters in production.

To check your etcd and disk performance, use the following etcd I/O latency metrics in the Metrics Explorer:

  • etcd_disk_backend_commit_duration_seconds: the duration should be less than 25 milliseconds for the 99th percentile (p99).
  • etcd_disk_wal_fsync_duration_seconds: the duration should be less than 10 milliseconds for the 99th percentile (p99).

For more information about etcd performance, see What does the etcd warning "apply entries took too long" mean? and What does the etcd warning "failed to send out heartbeat on time" mean?.

Didn't find what you were looking for? Click Send feedback and tell us what's missing.