排查 Arm 工作负载问题

Autopilot Standard

本页面介绍了如何解决在 Google Kubernetes Engine (GKE) Autopilot 或 Standard 集群上部署的 Arm 工作负载的问题。

如果您需要其他帮助，请与 Cloud Customer Care 联系。

Arm 节点上的 Pod 崩溃

如果您在 Arm 节点上部署 Pod，但容器映像不是专为 Arm 架构而构建的，则会出现以下问题。

如需找出此问题，请执行以下操作：

获取 Pod 的状态：
```
kubectl get pods
```
获取崩溃的 Pod 的日志：
```
kubectl logs POD_NAME
```
将 POD_NAME 替换为崩溃的 Pod 的名称。

Pod 日志中的错误消息类似于以下内容：
```
exec ./hello-app: exec format error
```

如需解决此问题，请确保您的容器映像支持 Arm 架构。最佳实践是构建多个架构映像。

Pod 未触发扩容

适用对象：Autopilot

如果您尝试在不受支持的 GKE 版本或不受支持的Google Cloud 区域中部署 Arm 工作负载，则 Autopilot 集群中会出现以下问题。

如需找出此问题，请获取集群事件日志：

kubectl get events -w

输出类似于以下内容：

117s        Normal    NotTriggerScaleUp   pod/hello-app2-78fc858558-pg4hz   pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match Pod's node affinity/selector

如需解决此问题，请确保 Autopilot 集群正在运行 GKE 1.24.1-gke.1400 版或更高版本，并且Google Cloud 区域支持 Arm 节点。

Pod 卡滞在“待处理”状态

适用对象：Autopilot

如果您尝试在 Arm 架构上部署 Autopilot Pod，但您的 Google Cloud 项目已用完配额，则会出现以下问题。

如需找出此问题，请获取集群事件日志：

kubectl get events -w

输出类似于以下内容：

29m         Warning   FailedScaleUp       pod/hello-app-7b86c88cb8-8vt2k   Node scale up in zones asia-southeast1-b associated with this pod failed: GCE quota exceeded. Pod is at risk of not being scheduled.

部署 Pod 后，此事件可能不会立即显示在日志中。

如需解决此问题，请尝试申请增加配额。

后续步骤

如果您需要其他帮助，请与 Cloud Customer Care 联系。