특정 조건에서는 PodDisruptionBudgets(PDB) 정책으로 인해 노드 풀에서 노드가 삭제되지 못할 수 있습니다.
이러한 조건에서 노드는 삭제되더라도 노드 상태는 Ready,SchedulingDisabled를 보고합니다. 이 문서에서는 현재 PDB 문제로 차단된 Google Distributed Cloud 클러스터에서 노드를 삭제하는 방법을 설명합니다.
이 페이지는 기본 기술 인프라의 수명 주기를 관리하고, 서비스 수준 목표(SLO)가 충족되지 않았거나 애플리케이션 오류가 발생했을 때 경보 및 호출에 대응하는 관리자, 설계자, 운영자를 위해 작성되었습니다. Google Cloud 콘텐츠에서 참조하는 일반적인 역할 및 예시 작업에 대해 자세히 알아보려면 일반 GKE Enterprise 사용자 역할 및 작업을 참조하세요.
노드 풀에서 노드를 삭제한 후 노드 상태를 확인합니다. 영향을 받는 노드는 Ready, SchedulingDisabled를 보고합니다.
kubectl get nodes --kubeconfig ${KUBECONFIG}
노드 상태는 다음 예시 출력과 유사합니다.
NAME STATUS ROLES AGE VERSION
CP2 Ready Master 11m v.1.18.6-gke.6600
CP3 Ready,SchedulingDisabled <none> 9m22s v.1.18.6-gke.6600
CP4 Ready <none> 9m18s v.1.18.6-gke.6600
클러스터의 PDB를 확인합니다.
kubectl get pdb --kubeconfig ${KUBECONFIG} -A
시스템은 다음 예시 출력에 표시된 것과 유사한 PDB를 보고합니다.
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
gke-system istio-ingress 1 N/A 1 19m
gke-system istiod 1 N/A 1 19m
kube-system coredns 1 N/A 0 19m
kube-system log-aggregator N/A 0 0 19m
kube-system prometheus N/A 0 0 19m
PDB를 검사합니다. PDB 내 포드 라벨과 노드의 일치하는 포드 간의 일치를 찾습니다. 이 일치는 올바른 PDB를 사용 중지하여 노드를 성공적으로 삭제합니다.
kubectl --kubeconfig ${KUBECONFIG} get pdb log-aggregator -n kube-system -o 'jsonpath={.spec}'
kubectl --kubeconfig ${KUBECONFIG} get pods -A --selector=app=stackdriver-log-aggregator \
-o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\n"}{end}'
이 명령어는 PDB 라벨과 일치하는 포드 목록을 반환하고 삭제해야 하는 PDB 정책을 확인합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2024-12-18(UTC)"],[],[],null,["Under certain conditions, [PodDisruptionBudgets (PDB)](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets)\npolicies can prevent nodes from being removed successfully from nodepools.\nUnder these conditions, the node status reports `Ready,SchedulingDisabled`\ndespite being removed. This document shows how to remove nodes from your\nGoogle Distributed Cloud clusters that are currently blocked by PDB issues.\n\nThis page is for Admins and architects and Operators who manage the\nlifecycle of the underlying tech infrastructure, and respond to alerts and pages\nwhen service level objectives (SLOs) aren't met or applications fail. To learn\nmore about common roles and example tasks that we reference in Google Cloud\ncontent, see\n[Common GKE user roles and tasks](/kubernetes-engine/enterprise/docs/concepts/roles-tasks).\n\nPDB conflicts with the number of Pods available\n\nPDB policies help ensure app performance by preventing Pods going down at the\nsame time when you make changes to the system. Consequently, PDB policies limit\nthe number of simultaneously unavailable Pods in a replicated application.\n\nHowever, the PDB policy can sometimes prevent node deletions that you want to\nmake if you would violate the policy by removing a node.\n\nFor example, a PDB policy can define that there should always be two Pods\navailable in the system (`.spec.minAvailable` is 2). But if you only have two\nPods, and you try to remove the node containing one of them, then the PDB policy\ntakes effect and prevents the removal of the node.\n\nSimilarly, when the PDB policy defines that no Pods should be unavailable\n(`.spec.maxUnavailable` is 0), the policy also prevents any associated nodes\nfrom being deleted. Even if you try to remove a single Pod at a time, the PDB\npolicy prevents you from deleting the affected node.\n\nDisable and re-enable the PDB policy\n\nTo resolve a PDB conflict, back-up and then remove the PDB policy. After the\nPDB is deleted successfully, the node drains and the associated Pods are\nremoved. You can then make the changes you want, and re-enable the PDB policy.\n\nThe following example shows how to delete a node in this condition, which can\naffect all types of Google Distributed Cloud clusters: admin, hybrid, standalone,\nand user clusters.\n\nThe same general procedure works for all cluster types. However, the specific\ncommands for deleting a node from an admin cluster nodepool\n(for admin, hybrid, or standalone clusters) vary slightly from the commands for\ndeleting a node from a user cluster nodepool.\n\n1. For ease of reading, the `${KUBECONFIG}` variable is used in the following\n commands.\n\n Depending on the cluster type, export the admin cluster kubeconfig\n (\u003cvar translate=\"no\"\u003eADMIN_KUBECONFIG\u003c/var\u003e) or user cluster kubeconfig\n (\u003cvar translate=\"no\"\u003eUSER_CLUSTER_CONFIG\u003c/var\u003e) path to `$(KUBECONFIG)`\n and complete the following steps:\n - To delete a node from a user cluster, set `export KUBECONFIG=USER_CLUSTER_CONFIG`\n - To delete node from an admin cluster, set `export KUBECONFIG=ADMIN_KUBECONFIG`.\n2. Optional: If you are deleting a node from a user cluster nodepool, run the\n following command to extract the user cluster kubeconfig file:\n\n ```\n kubectl --kubeconfig ADMIN_KUBECONFIG -n cluster-USER_CLUSTER_NAME \\\n get secret USER_CLUSTER_NAME-kubeconfig \\\n -o 'jsonpath={.data.value}' | base64 -d \u003e USER_CLUSTER_CONFIG\n ```\n\n Replace the following entries with information specific to your cluster\n environment:\n - \u003cvar translate=\"no\"\u003eADMIN_KUBECONFIG\u003c/var\u003e: the path to the admin cluster kubeconfig file.\n - \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the cluster you want to take a snapshot of.\n - \u003cvar translate=\"no\"\u003eUSER_CLUSTER_CONFIG\u003c/var\u003e: the path to the user cluster config file.\n3. After you remove the node from the node pool, check the node status. The\n affected node reports `Ready, SchedulingDisabled`:\n\n ```\n kubectl get nodes --kubeconfig ${KUBECONFIG}\n ```\n\n Node status looks similar to the following example output: \n\n NAME STATUS ROLES AGE VERSION\n CP2 Ready Master 11m v.1.18.6-gke.6600\n CP3 Ready,SchedulingDisabled \u003cnone\u003e 9m22s v.1.18.6-gke.6600\n CP4 Ready \u003cnone\u003e 9m18s v.1.18.6-gke.6600\n\n4. Check the PDBs in your cluster:\n\n ```\n kubectl get pdb --kubeconfig ${KUBECONFIG} -A\n ```\n\n The system reports PDBs similar to the ones shown in the following example\n output: \n\n NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE\n gke-system istio-ingress 1 N/A 1 19m\n gke-system istiod 1 N/A 1 19m\n kube-system coredns 1 N/A 0 19m\n kube-system log-aggregator N/A 0 0 19m\n kube-system prometheus N/A 0 0 19m\n\n5. Inspect the PDB. Find a match between the Pod label within the PDB and the\n matching Pods in the node. This match ensures you disable the correct PDB to\n remove the node successfully:\n\n ```\n kubectl --kubeconfig ${KUBECONFIG} get pdb log-aggregator -n kube-system -o 'jsonpath={.spec}'\n ```\n\n The system returns matching label results in the PDB policy: \n\n {\"maxUnavailable\":0,\"selector\":{\"matchLabels\":{\"app\":\"stackdriver-log-aggregator\"}}}\n\n6. Find Pods that match the PDB policy label:\n\n ```\n kubectl --kubeconfig ${KUBECONFIG} get pods -A --selector=app=stackdriver-log-aggregator \\\n -o=jsonpath='{range .items[*]}{.metadata.name}{\"\\t\"}{.spec.nodeName}{\"\\n\"}{end}'\n ```\n\n The command returns a list of Pods that match the PDB label, and verifies the\n PDB policy you need to remove: \n\n stackdriver-log-aggregator-0 CP3\n stackdriver-log-aggregator-1 CP3\n\n7. After you confirm the affected Pod, make a backup copy of the PDB policy. The\n following example backs up the `log-aggregator` policy:\n\n ```\n kubectl get pdb log-aggregator --kubeconfig ${KUBECONFIG} -n kube-system \\\n -o yaml \u003e\u003e log-aggregator.yaml\n ```\n8. Delete the specific PDB policy. Again, the following examples deletes the\n `log-aggregator` policy:\n\n ```\n kubectl delete pdb log-aggregator --kubeconfig ${KUBECONFIG} -n kube-system\n ```\n\n After you delete the PDB policy, the node proceeds to drain. However, it can\n take up to 30 minutes for the node to be fully deleted. Continue to check the\n node status to confirm that the process has successfully completed.\n\n If you want to remove the node permanently, and also remove storage resources\n associated with the node, you can do this before you restore the PDB policy.\n For more information, see [Remove storage resources](#remove-storage).\n9. Restore the PDB policy from your copy:\n\n ```\n kubectl apply -f log-aggregator.yaml --kubeconfig ${KUBECONFIG}\n ```\n10. Verify that the deleted Pods are recreated successfully. In this example, if\n there were two `stackdriver-log-aggregator-x` Pods, then they are recreated:\n\n ```\n kubectl get pods -o wide --kubeconfig ${KUBECONFIG} -A\n ```\n11. If you want to restore the node, edit the appropriate nodepool config, and\n restore the node IP address.\n\nRemove storage resources from permanently deleted nodes\n\nIf you permanently delete a node, and don't want to restore it to your system,\nyou can also delete the storage resources associated with that node.\n\n1. Check and get the name of the persistent volume (PV) associated with the node:\n\n ```\n kubectl get pv --kubeconfig ${KUBECONFIG} \\\n -A -o=jsonpath='{range .items[*]}{\"\\n\"}{.metadata.name}{\":\\t\"}{.spec.claimRef.name}{\":\\t\"} \\\n {.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values}{\"\\n\"}{end}'\n ```\n2. Delete the PV associated with the node:\n\n ```\n kubectl delete pv PV_NAME --kubeconfig ${KUBECONFIG}\n ```\n\n Replace \u003cvar translate=\"no\"\u003ePV_NAME\u003c/var\u003e with the name of the\n persistent volume to delete.\n\nWhat's next\n\nIf you need additional assistance, reach out to\n\n[Cloud Customer Care](/support-hub).\nYou can also see\n[Getting support](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support) for more information about support resources, including the following:\n\n- [Requirements](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#intro-support) for opening a support case.\n- [Tools](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#support-tools) to help you troubleshoot, such as your environment configuration, logs, and metrics.\n- Supported [components](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#what-we-support)."]]