This document explains zone virtualization, which is the method Google uses to map public zones to clusters of internal physical hardware within our data centers. Zone virtualization enables us to seamlessly expand our zones, upgrade hardware, and decommission physical infrastructure without customer-facing impact. Read about this topic if your apps are distributed across multiple projects and if you want to learn how Google spreads its zones across physical infrastructure.
Google Cloud resources are hosted in multiple regions worldwide. Each region is composed of three or four zones. Zones are logical groups of resources, designed to avoid correlated failures. Placing your resources in multiple zones within a region reduces the risk of correlated physical and software infrastructure failures impacting your apps. Placing your resources in different regions provides an even higher degree of failure independence.
For a list of Google Cloud geographical locations, see Regions and zones. For additional information about building resilient apps, see the Google Cloud solutions series on disaster recovery planning.
All Google Cloud hardware is organized into clusters. A cluster represents a set of compute, network, and storage resources supported by building, power, and cooling infrastructure. Infrastructure components typically support a single cluster, ensuring that clusters share few dependencies.
Figure 1: There are three zones in the
asia-east1 region. Each zone has
its own cluster with individual resources.
However, components with highly-demonstrated reliability and downstream redundancy can be shared between clusters. For example, a utility grid substation is typically shared by multiple clusters because it's extremely reliable and each cluster uses redundant power systems. Google Cloud designs its physical infrastructure to support the Service Level Agreements (SLAs) and Service Level Objectives (SLOs) of Google Cloud services.
When a project uses a region for the first time, Google Cloud selects a single unique cluster for each zone in the region that is used for that project's zonal resources. This selection is called a zone-to-cluster mapping. Default zone-to-cluster mappings are selected on a per-project basis so that every customer experiences the same capabilities and performance. Within a project, the mapping between a logical zone and a physical cluster is consistent, however another project might have a completely different zone-to-cluster mapping based on the project's zonal resources. A project never has two zones mapped to the same physical cluster.
You can align the zone-to-cluster mappings between projects by using Virtual Private Cloud (VPC) networks. Google Cloud attempts to assign the same zone-to-cluster map to all projects that share a VPC network. This might be desirable for predictable, atomic application component failures.
As regions expand, each zone is supported by multiple clusters. We aim to group clusters with shared infrastructure, such as a building or cooling infrastructure, into logical zones so that shared infrastructure failures affect only one zone within a region.
Figure 2 Two of the three zones in
asia-east1 have expanded and now have
Customer workloads are maintained in the fewest number of clusters as possible. In most cases, your zonal workload is contained in a single cluster. However, zone-to-cluster mappings might include additional clusters in cases where additional capacity or specialized hardware is not available in the primary cluster for the map.
Figure 3 Displays a diagram of the zone-to-cluster mapping for two projects:
- Project Fizz has two clusters mapped to
asia-east1-abecause only Cluster z supports GPU workloads and only Cluster y supports TPU workloads.
- Project Fizz and Project Buzz have different clusters mapped to
Although zone-to-cluster mapping seldom change, changes do occur as the capacity needs and underlying hardware offerings evolve. For example, clusters are added to a zone to increase capacity, and are removed from a zone when they're decommissioned. During any maintenance event, Google attempts to limit your downtime by using Live migration when possible.
In the event of a cluster outage, the logical zone associated with that cluster is reported as having an outage on the Google Cloud Status Dashboard, however, not all customer resources are impacted, since the zone might be composed of multiple clusters. Therefore, some customers might remain unaffected by a single cluster outage. We strongly encourage the adoption of multi-zonal architectures to minimize outage impact.
Shared networks and virtualized zones
Virtual Private Cloud (VPC) networks are virtualized networks that provide connectivity between resources within a project. Multiple projects can share a VPC network to enable cross-project connectivity, and an organization can peer a shared VPC network to enable cross-organizational connectivity. Our zone virtualization mapping algorithm attempts to assign the same zone-to-cluster map to all projects that share a VPC network. This is true even when the projects are in different Google Cloud organizations. As the network complexity grows with the number of projects and VPC, maintaining a consistent zone mapping becomes more challenging and therefore can't be guaranteed.