apiVersion:apps/v1kind:Deploymentmetadata:name:topology-spread-deploymentlabels:app:myappspec:replicas:30selector:matchLabels:app:myapptemplate:metadata:labels:app:myappspec:topologySpreadConstraints:-maxSkew:1# Default. Spreads evenly. Maximum difference in scheduled Pods per Node.topologyKey:kubernetes.io/hostnamewhenUnsatisfiable:DoNotSchedule# Default. Alternatively can be ScheduleAnywaylabelSelector:matchLabels:app:myappmatchLabelKeys:# beta in 1.27-pod-template-hashcontainers:# pause is a lightweight container that simply sleeps-name:pauseimage:registry.k8s.io/pause:3.2
apiVersion:apps/v1kind:Deploymentmetadata:name:pod-affinity-deploymentlabels:app:myappspec:replicas:30selector:matchLabels:app:myapptemplate:metadata:name:with-pod-affinitylabels:app:myappspec:affinity:podAntiAffinity:# requiredDuringSchedulingIgnoredDuringExecution# prevents Pod from being scheduled on a Node if it# does not meet criteria.# Alternatively can use 'preferred' with a weight# rather than 'required'.requiredDuringSchedulingIgnoredDuringExecution:-labelSelector:matchExpressions:-key:appoperator:Invalues:-myapp# Your nodes might be configured with other keys# to use as `topologyKey`. `kubernetes.io/region`# and `kubernetes.io/zone` are common.topologyKey:kubernetes.io/hostnamecontainers:# pause is a lightweight container that simply sleeps-name:pauseimage:registry.k8s.io/pause:3.2
此示例 Deployment 指定了 30 个副本,但只会分布到集群中可用的节点(数量不限)。
使用 Pod 反亲和性时,需要注意以下事项:
Pod 的 labels.app: myapp 与约束的 labelSelector 匹配。
topologyKey 用于指定 kubernetes.io/hostname。此标签会自动关联到所有节点,并使用节点的主机名填充。如果您的集群支持使用其他标签,例如 region 或 zone,则您可以选择使用其他标签。
预拉取容器映像
在没有任何其他约束的情况下,默认情况下,kube-scheduler 首选将 Pod 安排到已下载容器映像的节点上。在没有其他安排配置的情况下,这种行为可能适用于较小的集群,因为在这种节点上,每个节点都可以下载映像。但是,依赖这一概念应被视为最后的手段。更好的解决方案是使用 nodeSelector、拓扑分布约束或亲和性/反亲和性。如需了解详情,请参阅将 Pod 分配给节点。
如果您要确保将容器映像预拉取到所有节点,则可以使用 DaemonSet,如以下示例所示:
apiVersion:apps/v1kind:DaemonSetmetadata:name:prepulled-imagesspec:selector:matchLabels:name:prepulled-imagestemplate:metadata:labels:name:prepulled-imagesspec:initContainers:-name:prepulled-imageimage:IMAGE# Use a command the terminates immediatelycommand:["sh","-c","'true'"]containers:# pause is a lightweight container that simply sleeps-name:pauseimage:registry.k8s.io/pause:3.2
在所有节点上的 Pod 都处于 Running 状态后,再次重新部署 Pod,以查看容器现在是否在各节点上均匀分布。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-03。"],[],[],null,["This pages shows you how to resolve issues with the Kubernetes scheduler\n(`kube-scheduler`) for Google Distributed Cloud.\n\nKubernetes always schedules Pods to the same set of nodes\n\nThis error might be observed in a few different ways:\n\n- **Unbalanced cluster utilization.** You can inspect cluster utilization for\n each Node with the `kubectl top nodes` command. The following exaggerated\n example output shows pronounced utilization on certain Nodes:\n\n NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%\n XXX.gke.internal 222m 101% 3237Mi 61%\n YYY.gke.internal 91m 0% 2217Mi 0%\n ZZZ.gke.internal 512m 0% 8214Mi 0%\n\n- **Too many requests.** If you schedule a lot of Pods at once onto the same\n Node and those Pods make HTTP requests, it's possible for the Node to be rate\n limited. The common error returned by the server in this scenario is `429 Too\n Many Requests`.\n\n- **Service unavailable.** A webserver, for example, hosted on a Node under high\n load might respond to all requests with `503 Service Unavailable` errors until\n it's under lighter load.\n\nTo check if you have Pods that are always scheduled to the same nodes, use the\nfollowing steps:\n\n1. Run the following `kubectl` command to view the status of the Pods:\n\n kubectl get pods -o wide -n default\n\n To see the distribution of Pods across Nodes, check the `NODE` column in the\n output. In the following example output, all of the Pods are scheduled on the\n same Node: \n\n NAME READY STATUS RESTARTS AGE IP NODE\n nginx-deployment-84c6674589-cxp55 1/1 Running 0 55s 10.20.152.138 10.128.224.44\n nginx-deployment-84c6674589-hzmnn 1/1 Running 0 55s 10.20.155.70 10.128.226.44\n nginx-deployment-84c6674589-vq4l2 1/1 Running 0 55s 10.20.225.7 10.128.226.44\n\nPods have a number of features that allow you to fine tune their scheduling\nbehavior. These features include topology spread constraints and anti-affinity\nrules. You can use one, or a combination, of these features. The requirements\nyou define are ANDed together by `kube-scheduler`.\n\nThe scheduler logs aren't captured at the default logging verbosity level. If\nyou need the scheduler logs for troubleshooting, do the following steps to\ncapture the scheduler logs:\n\n1. Increase the logging verbosity level:\n\n 1. Edit the `kube-scheduler` Deployment:\n\n kubectl --kubeconfig \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_KUBECONFIG\u003c/var\u003e edit deployment kube-scheduler \\\n -n \u003cvar translate=\"no\"\u003eUSER_CLUSTER_NAMESPACE\u003c/var\u003e\n\n 2. Add the flag `--v=5` under the `spec.containers.command` section:\n\n containers:\n - command:\n - kube-scheduler\n - --profiling=false\n - --kubeconfig=/etc/kubernetes/scheduler.conf\n - --leader-elect=true\n - --v=5\n\n2. When you are finished troubleshooting, reset the verbosity level back\n to the default level:\n\n 1. Edit the `kube-scheduler` Deployment:\n\n kubectl --kubeconfig \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_KUBECONFIG\u003c/var\u003e edit deployment kube-scheduler \\\n -n \u003cvar translate=\"no\"\u003eUSER_CLUSTER_NAMESPACE\u003c/var\u003e\n\n 2. Set the verbosity level back to the default value:\n\n containers:\n - command:\n - kube-scheduler\n - --profiling=false\n - --kubeconfig=/etc/kubernetes/scheduler.conf\n - --leader-elect=true\n\nTopology spread constraints\n\n[Topology spread constraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/)\ncan be used to evenly distribute Pods among Nodes according to their `zones`,\n`regions`, `node`, or other custom-defined topology.\n\nThe following example manifest shows a Deployment that spreads replicas evenly\namong all schedulable Nodes using topology spread constraints: \n\n apiVersion: apps/v1\n kind: Deployment\n metadata:\n name: topology-spread-deployment\n labels:\n app: myapp\n spec:\n replicas: 30\n selector:\n matchLabels:\n app: myapp\n template:\n metadata:\n labels:\n app: myapp\n spec:\n topologySpreadConstraints:\n - maxSkew: 1 # Default. Spreads evenly. Maximum difference in scheduled Pods per Node.\n topologyKey: kubernetes.io/hostname\n whenUnsatisfiable: DoNotSchedule # Default. Alternatively can be ScheduleAnyway\n labelSelector:\n matchLabels:\n app: myapp\n matchLabelKeys: # beta in 1.27\n - pod-template-hash\n containers:\n # pause is a lightweight container that simply sleeps\n - name: pause\n image: registry.k8s.io/pause:3.2\n\nThe following considerations apply when using topology spread constraints:\n\n- A Pod's `labels.app: myapp` is matched by the constraint's `labelSelector`.\n- The `topologyKey` specifies `kubernetes.io/hostname`. This label is automatically attached to all Nodes and is populated with the Node's hostname.\n- The `matchLabelKeys` prevents rollouts of new Deployments from considering Pods of old revisions when calculating where to schedule a Pod. The `pod-template-hash` label is automatically populated by a Deployment.\n\nPod anti-affinity\n\n[Pod anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity)\nlets you define constraints for which Pods can be co-located on the same Node.\n\nThe following example manifest shows a Deployment that uses anti-affinity to\nlimit replicas to one Pod per Node: \n\n apiVersion: apps/v1\n kind: Deployment\n metadata:\n name: pod-affinity-deployment\n labels:\n app: myapp\n spec:\n replicas: 30\n selector:\n matchLabels:\n app: myapp\n template:\n metadata:\n name: with-pod-affinity\n labels:\n app: myapp\n spec:\n affinity:\n podAntiAffinity:\n # requiredDuringSchedulingIgnoredDuringExecution\n # prevents Pod from being scheduled on a Node if it\n # does not meet criteria.\n # Alternatively can use 'preferred' with a weight\n # rather than 'required'.\n requiredDuringSchedulingIgnoredDuringExecution:\n - labelSelector:\n matchExpressions:\n - key: app\n operator: In\n values:\n - myapp\n # Your nodes might be configured with other keys\n # to use as `topologyKey`. `kubernetes.io/region`\n # and `kubernetes.io/zone` are common.\n topologyKey: kubernetes.io/hostname\n containers:\n # pause is a lightweight container that simply sleeps\n - name: pause\n image: registry.k8s.io/pause:3.2\n\nThis example Deployment specifies `30` replicas, but only expands to as many Nodes are\navailable in your cluster.\n\nThe following considerations apply when using Pod anti-affinity:\n\n- A Pod's `labels.app: myapp` is matched by the constraint's `labelSelector`.\n- The `topologyKey` specifies `kubernetes.io/hostname`. This label is automatically attached to all Nodes and is populated with the Node's hostname. You can choose to use other labels if your cluster supports them, such as `region` or `zone`.\n\nPre-pull container images\n\nIn the absence of any other constraints, by default `kube-scheduler` prefers to\nschedule Pods on Nodes that already have the container image downloaded onto\nthem. This behavior might be of interest in smaller clusters without other\nscheduling configurations where it would be possible to download the images on\nevery Node. However, relying on this concept should be seen as a last resort. A\nbetter solution is to use `nodeSelector`, topology spread constraints, or\naffinity / anti-affinity. For more information, see\n[Assigning Pods to Nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node).\n\nIf you want to make sure container images are pre-pulled onto all Nodes, you\ncan use a `DaemonSet` like the following example: \n\n apiVersion: apps/v1\n kind: DaemonSet\n metadata:\n name: prepulled-images\n spec:\n selector:\n matchLabels:\n name: prepulled-images\n template:\n metadata:\n labels:\n name: prepulled-images\n spec:\n initContainers:\n - name: prepulled-image\n image: \u003cvar label=\"image\" translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-l devsite-syntax-l-Scalar devsite-syntax-l-Scalar-Plain\"\u003eIMAGE\u003c/span\u003e\u003c/var\u003e\n # Use a command the terminates immediately\n command: [\"sh\", \"-c\", \"'true'\"]\n containers:\n # pause is a lightweight container that simply sleeps\n - name: pause\n image: registry.k8s.io/pause:3.2\n\nAfter the Pod is `Running` on all Nodes, redeploy your Pods again to see if the\ncontainers are now evenly distributed across Nodes.\n\nWhat's next\n\nIf you need additional assistance, reach out to\n\n[Cloud Customer Care](/support-hub).\nYou can also see\n[Getting support](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support) for more information about support resources, including the following:\n\n- [Requirements](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#intro-support) for opening a support case.\n- [Tools](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#support-tools) to help you troubleshoot, such as your environment configuration, logs, and metrics.\n- Supported [components](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#what-we-support)."]]