本页面适用于平台管理员。
本页面介绍如何检查集群中节点、系统 Pod 和网络连接的运行状况。
使用 Actl 检查集群运行状况
运行以下命令以检查集群的运行状况:
actl clusters baremetal check cluster CLUSTER_NAME --kubeconfig=ADMIN_KUBECONFIG
该命令会检查以下内容:
- 集群中节点的运行状况,例如 kubelet 运行状态、containerd 状态、磁盘容量和注册表镜像可达性等。
- 如果集群是管理员集群,则
anthos-cluster-operator
等裸机系统 Pod 的运行状况。 - 节点之间的网络连接,例如主节点之间的 L2 连接。
以下是成功的健康检查的示例:
Please check the logs at actl-workspace/user-1/log/check-cluster-20210616-215509/check-cluster.log
[2021-06-16 21:55:16+0000] Waiting for health check job to finish... OK
[2021-06-16 21:55:46+0000] - Validation Category: machines, network, add-ons and kubernetes
[2021-06-16 21:55:46+0000] - [PASSED] add-ons
[2021-06-16 21:55:46+0000] - [PASSED] kubernetes
[2021-06-16 21:55:46+0000] - [PASSED] node-network
[2021-06-16 21:55:46+0000] - [PASSED] 10.200.0.6
[2021-06-16 21:55:46+0000] - [PASSED] 10.200.0.7
[2021-06-16 21:55:46+0000] - [PASSED] 10.200.0.8
[2021-06-16 21:55:46+0000] Flushing logs... OK
以下是健康检查失败的示例:
Please check the logs at actl-workspace/user-1/log/check-cluster-20210807-001826/check-cluster.log
[2021-08-07 00:18:32+0000] Waiting for health check job to finish... OK
[2021-08-07 00:20:52+0000] - Validation Category: machines, network, add-ons and kubernetes
[2021-08-07 00:20:52+0000] - [FAILED] 10.200.0.6
actl-workspace/user-1/log/check-cluster-20210807-001826/10.200.0.6
[2021-08-07 00:20:52+0000] - [FAILED] 10.200.0.7
actl-workspace/user-1/log/check-cluster-20210807-001826/10.200.0.7
[2021-08-07 00:20:52+0000] - [FAILED] 10.200.0.8
actl-workspace/user-1/log/check-cluster-20210807-001826/10.200.0.8
[2021-08-07 00:20:52+0000] - [PASSED] add-ons
[2021-08-07 00:20:52+0000] - [PASSED] kubernetes
[2021-08-07 00:20:52+0000] - [PASSED] node-network
[2021-08-07 00:20:52+0000] Flushing logs... OK
[2021-08-07 00:20:52+0000] Error waiting for health check job: health check failed
后续步骤
- 了解如何删除集群。