If vCenter Server is down

This document describes how a cluster behaves if vCenter Server is down.

While vCenter Server is down:

  • The machines are in the Available state

  • The nodes are in the Ready state.

  • The Pods are in the Running state.

  • There are some expected errors in Pods that connect to vCenter Server; for example, the vsphere-controller-manager and cluster-health-controller Pods.

  • Stateless Pods can be created and deleted.

  • The creation of a stateful Pod will fail, because attaching a disk requires access to vCenter Server. These Pods will be in the Pending state.

  • The gkectl diagnose command will fail with an error similar to the following:

    Exit with error:
    failed to prepare diagnose parameters: failed to create vSphere client: Post "https://my-server": dial tcp 203.0.113.1:443: connect: connection timed out
    
  • Auto repair is not triggered. This is because the machine and node states do not change states on connection errors to vCenter Server.

After vCenter Server comes back online (versions < 7.0U2)

  • The machines go to the Unavailable state, and auto repair or or a manual workaround is needed to get back the correct states.

  • The cluster functions correctly even though the machines are in the Unavailable state.

After vCenter Server comes back online (versions >= 7.0U2)

  • No extra steps are needed, and the cluster is healthy again.