排查 Google Distributed Cloud 更新问题

如果您在更新 Google Distributed Cloud 时遇到问题，以下部分可帮助您排查问题。如需详细了解可以更新哪些设置，请参阅集群中可以和不能更新的设置。

更新超时

系统会根据要更新的资源动态计算更新超时。但是，计算结果并不总是准确的。如果更新超时，您会看到类似于以下内容的错误：

在用户集群中：

Failed to update the cluster:...timed out waiting for the condition...

在管理员集群中：

Failed to update the admin cluster:...timed out waiting for the condition...

您可以放心地忽略此类超时错误，并可以重试更新命令。如果您在重试命令时再次超时，并显示相同的错误消息，请与 Cloud Customer Care 联系。

更新包含多项更改

gkectl update admin 和 gkectl update cluster 命令不允许在一个命令中更新多个设置。如果配置包含会更改多个设置的差异，则会返回类似于以下示例的错误：

Update summary for cluster X:
    antiAffinityGroups: enabled to be set to true from false          &config.AAGSpec{
        -   Enabled: false,
        +   Enabled: true,
          }
    user master cpu to be set to 5 from 4          config.NodePoolProps{
            Role:        "master",
            MachineType: "standard-master",
        -   CPUs:        4,
        +   CPUs:        5,
            MemoryMB:    8192,
            Replicas:    3,
            ... // 2 identical fields
            Labels:         nil,
            NodeTaints:     nil,
        -   Vsphere:        nil,
        &config.NodePoolVsphereSpec{Datastore: "lifecycle-workloads1-datastore1"},
        +   Vsphere:        nil,
            BootDiskSizeGB: nil,
            OSImageType:    "",
            ... // 5 identical fields
          }

Exit with error:
Failed to update the cluster: the update contains multiple changes. Please
update only one feature at a time

此错误可能是由多种原因造成的，包括：

错误或错误配置。
您之前使用配置差异运行了 gkectl upgrade，并预期更改被应用。
- gkectl upgrade 不会应用除版本碰撞外的任何配置差异。
您之前修改了配置以应用另一项功能更新，但忘记运行 gkectl update 命令。

如果您遇到此行为，请查看错误消息中的差异，并使用多个 gkectl update 命令逐个更新所需的设置。为帮助识别更改，您可以使用 gkectl get-config 从集群生成配置文件，并查看现有状态和配置。

更改不受支持

gkectl update cluster 和 gkectl update admin 命令会忽略不受支持的更改，并显示类似于以下示例的错误消息：

detected unsupported changes: (-current +desired)
    ...
-   AdvancedNetworking:       &true,
+   AdvancedNetworking:       &false,
    ...
, which will be ignored

如果您遇到此行为，请查看错误消息中的差异并执行以下操作：

如果是意外更改，请修改配置 YAML 文件，然后仅使用正确的预期更改进行更新。
- 在前面的示例中，如果您不打算停用 AdvancedNetworking，请在配置 YAML 文件中设置 advancedNetworking: true。
如果是预期更改，则错误表示该更改不受支持。请执行以下操作之一：
- 重新创建集群（如果适用）。
- 与 Google 支持团队联系。

操作系统映像不存在

gkectl update cluster 和 gkectl update admin 命令可能会失败并显示 OS Images 预检检查失败，类似于以下示例：

在用户集群中：

- Validation Category: OS Images
    - [FAILURE] User cluster OS images exist: os images  [xxxx] don't exist,
    please run `gkectl prepare` to upload os images.

在管理员集群中：

- Validation Category: OS Images
    - [FAILURE] Admin cluster OS images exist: os images [xxxx] don't exist,
    please run `gkectl prepare` to upload os images.

如果操作系统映像在 vCenter 环境中被意外移除（例如被定期清理作业移除），则可能会发生这些错误。

如需重新导入操作系统映像，请运行 gkectl prepare 命令，如下所示：

gkectl prepare \
    --bundle-path /var/lib/gke/bundles/gke-onprem-vsphere-TARGET_VERSION.tgz \
    --kubeconfig ADMIN_CLUSTER_KUBECONFIG \
    --skip-upload-container-images

数据存储区没有足够的可用空间来运行新节点池

添加新节点池时，gkectl update cluster 命令可能会失败并显示 VSphere Datastore FreeSpace 预检检查错误，类似于以下示例：

  - [FAILURE] VSphere Datastore FreeSpace: vCenter datastore: xxxx insufficient
  FreeSpace, requires at least xxx  GB

此失败表示数据存储区没有足够的可用空间来运行新的节点池。您可通过以下方法提供空间，以使操作成功：

释放数据存储区的空间。
为节点池配置其他 nodePools[].vsphere.datastore 数据存储区。

后续步骤

如果您需要其他帮助，请与 Cloud Customer Care 联系。

您还可以参阅获取支持，详细了解支持资源，包括：

提交支持请求的要求。
可帮助您排查问题的工具，例如日志和指标。
Google Distributed Cloud for VMware（纯软件）的受支持组件、版本和功能。