Diagnose cluster issues
The health check feature regularly monitors the health of the cluster control plane and several critical components, and helps you detect and diagnose potential problems with your clusters.
If you need additional assistance, reach out to Cloud Customer Care.Issues detected
The cluster health checker detects and alerts you to the following issues in a cluster:
kube-schedulerhealth on control plane nodes: If thekube-scheduleris unhealthy, this suggests that the cluster is having trouble assigning Pods to nodes. To investigate further, you can examine thekube-schedulerPod log.kube-controller-managerhealth on control plane nodes: Thekube-controller-managermonitors various controllers, such as the ReplicaSet, Deployment, and Namespace controllers, among others. If thekube-controller-manageris deemed unhealthy, this suggests that one or more of the controllers it manages might not be working properly. To determine the precise issue, you can examine thekube-controller-managerPod log, which might provide more information about the malfunctioning controller(s).Root volume capacity: The health checker checks for sufficient capacity on the root volume of each control plane node. If the available capacity falls under 512MB, the health checker alerts you to the potential risk of running out of disk space.
View health check events
To view alerts from the health checker for a specific cluster, run the following command:
gcloud container azure clusters describe CLUSTER_NAME \
--location GOOGLE_CLOUD_LOCATION
Replace the following:
CLUSTER_NAME: your cluster's nameGOOGLE_CLOUD_LOCATION: the name of the Google Cloud location that manages the cluster
Here's an excerpt of the kind of output you can expect:
{
"name": "some-cluster-name",
"description": "test-cluster",
...
"errors": [
{
"message": "Replica (replica-name)": kube-controller-manager is unhealthy"
},
{
"message": "Replica (replica-name)": not enough disk space on root volume, only 9 MB left"
}
]
...
}
In this example, the error message indicates that a kube-controller-manager
component is unhealthy, and that the capacity on a control plane node's root
volume is getting low.