Private cloud maintenance and updates

Private cloud environments are designed to have no single point of failure.

  • ESXi clusters are configured with vSphere high availability (HA). Clusters are sized to have at least one spare node for resiliency.
  • vSAN provides redundant primary storage, requiring at least three nodes to provide protection against a single failure. For larger clusters, you can configure vSAN to provide higher resiliency.
  • vCenter, PSC, and NSX Manager virtual machines (VMs) are configured with RAID-10 storage to protect against storage failure. The VMs are additionally protected against node and network failures by vSphere HA.
  • ESXi hosts have redundant fans and NICs.
  • TOR and spine switches are configured in HA pairs to provide resiliency.

VMware Engine continuously monitors uptime, monitors availability, and provides availability SLAs for the following VMs:

  • ESXi hosts
  • vCenter
  • PSC
  • NSX Manager

VMware Engine continuously monitors the following for failures:

  • Hard disks
  • Physical NIC ports
  • Servers
  • Fans
  • Power
  • Switches
  • Switch ports

If a disk or node fails, VMware Engine immediately and automatically adds a new node to the affected VMware cluster to restore service operability.

The following VMware elements in private clouds are backed up, maintained, and updated:

  • ESXi
  • vCenter Platform Services Controller
  • vSAN
  • NSX

Backup and restore

Backup includes:

  • Nightly incremental backups of vCenter, PSC, and DVS rules.
  • vCenter native APIs to back up components at the application layer.
  • Automatic backup prior to update or upgrade of the VMware management software.

Maintenance

The following types of planned maintenance are included.

Backend and internal maintenance

Backend and internal maintenance typically involves reconfiguring physical assets or installing software patches. It doesn't affect normal consumption of the assets being serviced. With redundant NICs going to each physical rack, normal network traffic and private cloud operations aren't affected. You might notice a performance impact only if your organization expects to use the full redundant bandwidth during the maintenance interval.

Portal maintenance

Some limited service downtime is required when the control plane or infrastructure is updated. Maintenance intervals can be as frequent as once per month, and are expected to decline in frequency over time. VMware Engine notifies you about impending portal maintenance and makes an effort to keep the maintenance interval as short as possible. During a portal maintenance interval, the following services continue to function without any impact:

  • VMware management plane and applications
  • vCenter access
  • All networking and storage

VMware infrastructure maintenance

It's occasionally necessary to make changes to the configuration of the VMware infrastructure. These intervals can occur every one to two months, but the frequency is expected to decline over time. This type of maintenance can usually be done without interrupting normal private cloud consumption. During a VMware maintenance interval, the following services continue to function without any impact:

  • VMware management plane and applications
  • vCenter access
  • All networking and storage

Updates and upgrades

VMware Engine is responsible for lifecycle management of VMware software (ESXi, vCenter, PSC, and NSX) in private clouds.

Software updates include the following:

  • Patches: security patches or bug fixes released by VMware
  • Updates: minor version change of a VMware stack component
  • Upgrades: major version change of a VMware stack component

VMware Engine tests critical security patches as soon as they become available from VMware. Per SLA, VMware Engine targets a roll out of security patches to private cloud environments within one week of their availability.

When a new major version of VMware software is available, VMware Engine works with customers to coordinate a suitable maintenance window for applying the upgrade. VMware Engine applies major version upgrades at least six months after the major version is released and notifies customers one month in advance of applying major version upgrades.

VMware Engine also works with key industry vendors to ensure that they support the latest VMware software version before rolling out a major version upgrade. To get information about support for specific vendors, you can contact Cloud Customer Care.

Preparation

Google recommends taking the following preparations before starting an update or upgrade:

  • Check storage capacity: Ensure your vSphere cluster's storage space utilization is lower than 75% to maintain the SLA. If the utilization is higher than 75%, upgrades might take longer than normal or fail completely. If your storage utilization is higher than 70%, add a node to expand the cluster and avoid any potential downtime during upgrades.
  • Change vSAN storage policies with FTT of 0: Change VMs configured with a vSAN storage policy for Failures to Tolerate (FTT) of 0 to a vSAN storage policy with FTT of 1 to maintain the SLA.
  • Remove VM CD mounts: Remove any CDs mounted on your workload VMs.
  • Complete VMware tool installations: Complete any installations or upgrades of VMware tools before the scheduled upgrade begins.
  • Remove SCSI bus sharing on VMs: Remove SCSI bus sharing on VMs if you don't want the VMs to be powered off.
  • Remove inaccessible VMs and datastores: Remove orphaned and inaccessible VMs from the vCenter inventory. Remove any inaccessible external datastores.
  • Disable DRS rules: DRS rules that pin a VM to a host prevent a node from entering maintenance mode. You can disable the DRS rules before the upgrade and enable them after the upgrade is complete.
  • Update VMware add-ons and third-party solutions: Verify that VMware add-ons and third party solutions deployed on your private cloud vCenter are compatible with the post-upgrade versions mentioned previously. Examples of tools include those for backup, monitoring, disaster recovery orchestration, and other similar functions. Check with the solution vendor and update ahead of time if necessary to ensure compatibility after the upgrade.

Actions taken during update or upgrade:

  • DRS rules: During the upgrade, all DRS rules will be disabled. Once the upgrade has completed, the rules will be reapplied.
  • Serial Ports: During the upgrade, all serial port mapping will be removed.
  • VM CD mounts: During the upgrade, all CD mounts will be removed.

What's next