[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-11-27。"],[],[],null,["# Plan for large GKE clusters\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page describes the best practices you can follow when planning and designing very large-size clusters.\n\n\u003cbr /\u003e\n\nWhy plan for large GKE clusters\n-------------------------------\n\nEvery computer system including Kubernetes has some architectural limits.\nExceeding the limits may affect the performance of your cluster or in some cases\neven cause downtimes. Follow the best practices and execute recommended actions\nto ensure your clusters run your workloads reliably at scale.\n\nLimitations of large GKE clusters\n---------------------------------\n\nWhen GKE scales a cluster to a large number of nodes,\nGKE makes an effort to change the amount of resources available\nto match your system needs while staying within its\n[service-level objectives (SLOs)](https://landing.google.com/sre/sre-book/chapters/service-level-objectives/).\nGoogle Cloud supports large clusters. However, based on your use case, you must\nconsider the limitations of large clusters to better respond to your\ninfrastructure scale requirements.\n\nThis section describes the limitations and considerations when\ndesigning large GKE clusters based on the expected number of\nnodes.\n\n### Clusters with up to 5,000 nodes\n\nWhen designing your cluster architecture to scale up to 5,000 nodes, consider\nthe following conditions:\n\n- Only available for [regional](/kubernetes-engine/docs/concepts/regional-clusters) cluster.\n- Only available for [clusters that use Private Service Connect](/kubernetes-engine/docs/concepts/network-overview#public-cluster-psc).\n- Migrating from zonal to regional clusters requires you to recreate the cluster to unlock higher node quota level.\n\nIf you expect to scale your cluster beyond 5,000 nodes, contact\n[Cloud Customer Care](/support-hub) to increase the cluster size and quota limit.\n\n### Clusters with more than 5,000 nodes\n\nGKE supports large Standard clusters up to 15,000 nodes.\nIn version 1.31 and later, GKE supports large clusters up to\n65,000 nodes. The 65,000 limit is meant to be used to run large-scale AI workloads.\n\nIf you expect to scale your cluster to either 15,000 or 65,000 nodes, complete the following tasks:\n\n1. Consider the following limitations:\n\n - [Cluster autoscaler](/kubernetes-engine/docs/concepts/cluster-autoscaler) is not supported. Instead, [scale your node pools up or down](/kubernetes-engine/docs/how-to/node-pools#scale-node-pool) using the GKE API.\n - [Multi-network](/kubernetes-engine/docs/how-to/setup-multinetwork-support-for-pods) is not supported.\n - Services with more than 100 Pods must be [headless](/kubernetes-engine/docs/concepts/service#headless_service).\n - Every Pod should run on its own node, with the exception of system DaemonSets. To define Pod scheduling on specific nodes, you can use [Kubernetes Pod affinity or anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#more-practical-use-cases).\n - Migrating from zonal to regional clusters requires you to recreate the cluster to unlock higher node quota level.\n - Migrating to clusters that use Private Service Connect requires you to recreate the cluster to unlock the higher node quota level.\n2. Contact [Cloud Customer Care](/support-hub) to increase the cluster size and quota limit to either 15,000 nodes or to 65,000 nodes, depending on your scaling needs.\n\nBest practices for splitting workloads between multiple clusters\n----------------------------------------------------------------\n\nYou can run your workloads on a single, large cluster. This approach is easier\nto manage, more cost efficient, and provides better resource utilization than\nmultiple clusters. However, in some cases you need to consider splitting your\nworkload into multiple clusters:\n\n- Review [Multi-cluster use cases](/kubernetes-engine/fleet-management/docs/multi-cluster-use-cases) to learn more about general requirements and scenarios for using multiple clusters.\n- In addition, from the scalability point of view, split your cluster when it could exceed one of the limits described in the section below or one of [GKE quotas](/kubernetes-engine/quotas). Lowering any risk to reach the GKE limits, reduces the risk of downtime or other reliability issues.\n\nIf you decide to split your cluster, use\n[Fleet management](/kubernetes-engine/fleet-management/docs) to\nsimplify management of a multi-cluster fleet.\n\nLimits and best practices\n-------------------------\n\nTo ensure that your architecture supports large-scale GKE\nclusters, review the following limits and related best practices. Exceeding\nthese limits may cause degradation of cluster performance or reliability issues.\n\nThese best practices apply to any default Kubernetes cluster with no extensions\ninstalled. Extending Kubernetes clusters with webhooks or\n[custom resource definitions (CRDs)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)\nis common but can constrain your ability to scale the cluster.\n\nThe following table extends the main\n[GKE quotas and limits](/kubernetes-engine/quotas#limits_per_cluster).\nYou should also familiarize yourself with the open-source\n[Kubernetes limits](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md)\nfor large-scale clusters.\n\nThe GKE version requirements mentioned in the table apply to both the nodes and the control plane.\n\n\u003cbr /\u003e\n\nWhat's next?\n------------\n\n- [Plan for large workloads](/kubernetes-engine/docs/concepts/planning-large-workloads)"]]