[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-03。"],[],[],null,["# Auto-repair nodes\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page explains how node auto-repair works and\nhow to use the feature for Standard Google Kubernetes Engine (GKE) clusters.\n\n*Node auto-repair* helps keep the nodes in your GKE cluster in a\nhealthy, running state. When enabled, GKE makes periodic checks\non the health state of each node in your cluster. If a node fails consecutive\nhealth checks over an extended time period, GKE initiates a\nrepair process for that node.\n\nSettings for Autopilot and Standard\n-----------------------------------\n\nAutopilot clusters always automatically repair nodes. You can't disable\nthis setting.\n\nIn Standard clusters, node auto-repair is enabled by default for new\nnode pools. You can [disable auto repair](#disable) for an existing node pool,\nhowever we recommend keeping the default configuration.\n\nRepair criteria\n---------------\n\nGKE uses the node's health status to determine if a node\nneeds to be repaired. A node reporting a `Ready` status is considered healthy.\nGKE triggers a repair action if a node reports consecutive\nunhealthy status reports for a given time threshold.\nAn unhealthy status can mean:\n\n- A node reports a `NotReady` status on consecutive checks over the given time threshold (approximately 10 minutes).\n- A node does not report any status at all over the given time threshold (approximately 10 minutes).\n- A node's boot disk is out of disk space for an extended time period (approximately 30 minutes).\n- A node in an Autopilot cluster is cordoned for longer than the given time threshold (approximately 10 minutes).\n\nYou can manually check your node's health signals at any time by using the\n`kubectl get nodes` command.\n\nNode repair process\n-------------------\n\nIf GKE detects that a node requires repair, the node is drained\nand re-created. This process preserves the original name of the node.\nGKE waits one hour for the drain to complete. If the drain\ndoesn't complete, the node is shut down and a new node is created.\n\nIf multiple nodes require repair, GKE might repair nodes in\nparallel. GKE balances the number of repairs depending on the\nsize of the cluster and the number of broken nodes. GKE will\nrepair more nodes in parallel on a larger cluster, but fewer nodes as the number\nof unhealthy nodes grows.\n\nIf you disable node auto-repair at any time during the repair process, in-\nprogress repairs are *not* canceled and continue for any node under repair.\n| **Note:** Modifications on the boot disk of a node VM don't persist across node re-creations. To preserve modifications across node re-creation, use a [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/).\n| **Note:** Node auto-repair uses a set of signals, including signals from the [Node Problem Detector](https://github.com/kubernetes/node-problem-detector). The Node Problem Detector is enabled by default on nodes that use [Container-Optimized OS](/container-optimized-os/docs/how-to/monitoring) and Ubuntu images.\n\n### Node repair history\n\nGKE generates a log entry for automated repair events. You\ncan check the logs by running the following command: \n\n gcloud container operations list\n\n### Node auto repair in TPU slice nodes\n\nIf a TPU slice node in a [multi-host TPU slice node\npool](/kubernetes-engine/docs/concepts/tpus#node_pool) is unhealthy and requires\nauto repair, the *entire* node pool is recreated. To learn more about the TPU\nslice node conditions, see [TPU slice node auto\nrepair](/kubernetes-engine/docs/how-to/tpus#node-auto-repair).\n\nEnable auto-repair for an existing Standard node pool\n-----------------------------------------------------\n\nYou enable node auto-repair on a *per-node pool* basis.\n\nIf auto-repair is disabled on an existing node pool in a Standard\ncluster, use the following instructions to enable it: \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. In the cluster list, click the name of the cluster you want to modify.\n\n3. Click the **Nodes** tab.\n\n4. Under **Node Pools**, click the name of the node pool you want to modify.\n\n5. On the **Node pool details** page, click *edit* **Edit**.\n\n6. Under **Management** , select the **Enable auto-repair** checkbox.\n\n7. Click **Save**.\n\n### gcloud\n\n gcloud container node-pools update \u003cvar translate=\"no\"\u003ePOOL_NAME\u003c/var\u003e \\\n --cluster \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eCONTROL_PLANE_LOCATION\u003c/var\u003e \\\n --enable-autorepair\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePOOL_NAME\u003c/var\u003e: the name of your node pool.\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of your Standard cluster.\n- \u003cvar translate=\"no\"\u003eCONTROL_PLANE_LOCATION\u003c/var\u003e: the Compute Engine [location](/compute/docs/regions-zones#available) of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters.\n\nVerify node auto-repair is enabled for a Standard node pool\n-----------------------------------------------------------\n\nNode auto-repair is enabled on a *per-node pool* basis. You can verify that a\nnode pool in your cluster has node auto-repair enabled with the Google Cloud CLI\nor the Google Cloud console. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. On the **Google Kubernetes Engine** page, click the name of the cluster of\n the node pool you want to inspect.\n\n3. Click the **Nodes** tab.\n\n4. Under **Node Pools**, click the name of the node pool you want to inspect.\n\n5. Under **Management** , in the **Auto-repair** field, verify that\n auto-repair is enabled.\n\n### gcloud\n\nDescribe the node pool: \n\n gcloud container node-pools describe \u003cvar translate=\"no\"\u003eNODE_POOL_NAME\u003c/var\u003e \\\n --cluster=\u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e\n\nIf node auto-repair is enabled, the output of the command includes these\nlines: \n\n management:\n ...\n autoRepair: true\n\nDisable node auto-repair\n------------------------\n\nYou can disable node auto-repair for an existing node pool in a Standard\ncluster by using the gcloud CLI or the Google Cloud console.\n**Note:** You can only disable auto-repair with the gcloud CLI for a node pool in a Standard cluster enrolled in a release channel. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. In the cluster list, click the name of the cluster you want to modify.\n\n3. Click the **Nodes** tab.\n\n4. Under **Node Pools**, click the name of the node pool you want to modify.\n\n5. On the **Node pool details** page, click *edit* **Edit**.\n\n6. Under **Management** , clear the **Enable auto-repair** checkbox.\n\n7. Click **Save**.\n\n### gcloud\n\n gcloud container node-pools update \u003cvar translate=\"no\"\u003ePOOL_NAME\u003c/var\u003e \\\n --cluster \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eCONTROL_PLANE_LOCATION\u003c/var\u003e \\\n --no-enable-autorepair\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePOOL_NAME\u003c/var\u003e: the name of your node pool.\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of your Standard cluster.\n- \u003cvar translate=\"no\"\u003eCONTROL_PLANE_LOCATION\u003c/var\u003e: the Compute Engine [location](/compute/docs/regions-zones#available) of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters.\n\nWhat's next\n-----------\n\n- [Learn more about node pools](/kubernetes-engine/docs/concepts/node-pools)."]]