Zone virtualization


This document explains zone virtualization, which is the method Google uses to map public zones to clusters of internal physical hardware within our data centers. Zone virtualization enables us to seamlessly expand our zones, upgrade hardware, and decommission physical infrastructure without customer-facing impact. Read about this topic if your applications are distributed across multiple projects and if you want to learn how Google spreads its zones across physical infrastructure.

Google Cloud resources are hosted in multiple regions worldwide. Regions are independent geographic areas that consist of zones. Zones and regions are logical abstractions of underlying physical resources. A region consists of three or more zones housed in three or more physical data centers. The regions Mexico, Osaka, and Montreal have three zones housed in one or two physical data centers. These regions are in the process of expanding to at least three physical data centers. When you architect your solutions in Google Cloud, consider the guidance in Cloud locations, Google Cloud Platform SLAs, and the appropriate Google Cloud product documentation.

Placing your resources in multiple zones within a region reduces the risk of correlated physical and software infrastructure failures affecting your applications. Placing your resources in different regions provides an even higher degree of failure independence.

For a list of Google Cloud geographical locations, see Regions and zones. For additional information about building resilient applications, see the Google Cloud solutions series on disaster recovery planning.

Clusters

All Google Cloud hardware is organized into clusters. A cluster represents a set of compute, network, and storage resources supported by building, power, and cooling infrastructure. Infrastructure components typically support a single cluster, ensuring that clusters share few dependencies.

asia-east1 has 3 zones and 3 clusters.

Figure 1: There are three zones in the asia-east1 region. Each zone has its own cluster with individual resources.

However, components with highly demonstrated reliability and downstream redundancy can be shared between clusters. For example, multiple clusters typically share a utility grid substation because substations are extremely reliable and clusters use redundant power systems. Google Cloud designs its physical infrastructure to support the Service Level Agreements (SLAs) and Service Level Objectives (SLOs) of Google Cloud services.

Zone-to-cluster mapping

When a project uses a region for the first time, Google Cloud selects a unique cluster for each zone in the region that is the default cluster for that project's zonal resources. However, hardware constraints might result in additional clusters being used for that zone. This selection is called a zone-to-cluster mapping. Default zone-to-cluster mappings are selected on a per-project basis so that every customer experiences the same capabilities and performance. Within a project, the mapping between a logical zone and a physical cluster is consistent, however another project might have a completely different zone-to-cluster mapping based on the project's zonal resources. A project never has two zones mapped to the same physical cluster.

You can align the zone-to-cluster mappings between projects by using Virtual Private Cloud (VPC) networks. Google Cloud attempts to assign the same zone-to-cluster map to all projects that share a VPC network. Consistent zone-to-cluster maps across your projects might be desirable for predictable, atomic application component failures.

Virtualized zones

As regions expand, each zone is supported by multiple clusters. We aim to group clusters with shared infrastructure, such as a building or cooling infrastructure, into logical zones so that shared infrastructure failures affect only one zone within a region.

asia-east1 zones A and B have each expanded to two clusters.

Figure 2 Two of the three zones in asia-east1 have expanded and now have two clusters.

Customer workloads are maintained in the fewest number of clusters as possible. Usually, your zonal workload is contained in a single cluster. However, zone-to-cluster mappings might include additional clusters in cases where additional capacity or specialized hardware is not available in the primary cluster for the map.

asia-east1 zones A and B have each expanded to two clusters.

Figure 3 Displays a diagram of the zone-to-cluster mapping for two projects:

  • Project Fizz has two clusters mapped to asia-east1-a because only Cluster z supports GPU workloads and only Cluster y supports TPU workloads.
  • Project Fizz and Project Buzz have different clusters mapped to asia-east1-b.

Although zone-to-cluster mapping seldom change, changes do occur as the capacity needs and underlying hardware offerings evolve. For example, clusters are added to a zone to increase capacity, and are removed from a zone when they're decommissioned. During any maintenance event, Google attempts to limit your downtime by using Live migration when possible.

In the event of a cluster outage, the logical zone associated with that cluster is reported as having an outage on the Google Cloud Status Dashboard, however, not all customer resources are affected, since the zone might be composed of multiple clusters. Therefore, some customers might remain unaffected by a single cluster outage. We strongly encourage the adoption of multi-zonal architectures to minimize outage impact.

Shared networks and virtualized zones

Virtual Private Cloud (VPC) networks are virtualized networks that provide connectivity between resources within a project. Multiple projects can share a VPC network to enable cross-project connectivity, and an organization can peer a shared VPC network to enable cross-organizational connectivity. Our zone virtualization mapping algorithm attempts to assign the same zone-to-cluster map to all projects that share a VPC network, or extend their VPC network via VPC peering. This is true even when the projects are in different Google Cloud organizations. As the network complexity grows with the number of projects and VPC, maintaining a consistent zone mapping becomes more challenging and therefore can't be guaranteed.

What's next