Recover a failed upgrade in GKE on Bare Metal

Occasionally, GKE on Bare Metal may run into conditions where you need to start an upgrade over again, or fix an error condition to complete the upgrade successfully. The following conditions may occur when upgrading clusters. Try to complete the cluster upgrade as noted with the condition.

Recover admin, hybrid, and standalone cluster upgrades

Admin, hybrid, and standalone clusters are all upgraded with the bmctl upgrade command.

If your upgrade doesn't complete successfully, check the following conditions and try the upgrade again.

  • bmctl fails to parse the config file due to a mistake. Fix the config file and rerun the command.
  • bmctl fails to bootstrap the temporary cluster. Retry the command.
  • the log files indicate that the upgrade failed a preflight check. Correct the preflight condition, or force the upgrade.
  • the upgrade took too long (more than 30 minutes) or timed out. Retry the upgrade command.

Recover user cluster upgrades

User clusters are upgraded from an associated admin cluster with the kubectl apply command.

If your upgrade doesn't complete successfully, check the following conditions and try the upgrade again.

  • Preflight check failed. Check the preflight check log and fix the corresponding error.
  • You may need to construct a new preflight custom resource following these instructions and apply it to the cluster. Check that the triggered preflight passes, and that the cluster reconciler picks up the latest passing state.