Multi-zone overview

Google Distributed Cloud (GDC) air-gapped provides deployment capabilities to ensure high availability and disaster recovery. This functionality is referred to as multi-zone on this page.

Multi-zone lets you run disconnected, mission critical workloads on GDC by delivering high availability (HA) and disaster recovery (DR) capabilities similar to public hyperscale cloud providers. GDC provides managed and infrastructure services that are resilient to local failures. For some services, you must decide the level of resiliency your workload requires. Examples of services that provide multi-zone capabilities include the following:

GDC multi-zone provides global resource management capabilities to simplify managing resources across GDC zones. You can view all GDC resources and services managed in a universe with visibility on alerts, health, usage, logging, monitoring, and billing.

Multi-zone universes in GDC require global resource and hardware symmetry. This symmetry means that hardware and organizations must be the same across all zones and cannot be modified independently in a specific zone.

Multi-zone also provides you with building blocks to meet your disaster recovery and business continuity goals. There are three main capabilities that multi-zone provides for you:

  • Continuity of control plane services. In case of a zonal disaster, critical functionality required to recover an organization and its associated services will already be present in another zone.

  • Support for critical infrastructure for workloads. For example, asynchronous storage replication across GDC zones.

  • Managed services that offer global and zonal variants of their resources. You can deploy highly available applications, or isolate an application to a single zone.

What is a zone?

Each zone is an independent disaster domain. It is a full implementation of GDC air-gapped – a hardware and software solution that does not require connectivity to Google Cloud at any time. A zone manages infrastructure, services, APIs, and tooling that use a local control plane.

A GDC air-gapped zone is composed of four layers:

  • Hardware: The underlying hardware and rack design defined by Google.

  • Infrastructure: Manages the hardware, and provides abstractions which allows the software layers to run without reference to hardware-specific configurations.

  • Service Platform: A framework for building services on Distributed Cloud that provides consistency among managed services and marketplace services.

  • Managed and Marketplace Services: Customer-facing cloud services running on GDC.

A group of connected air-gapped zones is considered a universe. To deploy fault-tolerant applications with high availability that help protect against unexpected failures, you must deploy your applications across multiple zones in a universe.

What is a region?

A region is a grouping of zones in a universe within our defined latency requirements. A zone with no peers close enough is considered its own region. Zones in a region must be separated by at least 10 km to ensure they are separate disaster domains in many compliance regimes.

Regions can be hundreds of kilometers apart. For this reason, GDC only offers synchronous services within regions; asynchronous services are available within and between regions.

Asynchronous services perform replication in the background, providing low but non-zero recovery point objectives (RPO). Typically, asynchronous services remain available during network partitions.

Zones within a single region are required to meet the latency requirements which enable the delivery of strongly consistent services. Synchronous services perform replication immediately, giving assurance that every write is available in at least two zones. This is the core step needed to achieve zero RPO. Typically, synchronous services have higher latency than non-replicated services, and might become unavailable during network partitions.

What is a universe?

Zones with direct network connectivity, regardless of distance or latency, and a shared management and control plane belong to a universe. You're limited to a maximum of six zones per universe. Zones are considered a single disaster domain within a universe.

Each universe can consist of multiple zones organized into regions that are interconnected. For example, two regions in the US State of Virginia and Amsterdam, Netherlands, respectively, each with three zones:

  • GDC Region 1 (Virginia)

    • Zone 1 (us-virginia1-a)
    • Zone 2 (us-virginia1-b)
    • Zone 3 (us-virginia1-c)
  • GDC Region 2 (Netherlands)

    • Zone 1 (eu-ams1-a)
    • Zone 2 (eu-ams1-b)
    • Zone 3 (eu-ams1-c)

The following diagram shows an example GDC universe.

A universe consists of zones that are grouped across regions.

A universe can have 1-6 zones, and can have one or two operation centers.

Universes offer the following automated recovery strategies, regardless of region configuration:

  • For universes with two zones, recovery must be triggered manually.
  • For universes with three or more zones, recovery can be triggered automatically.

Reach out to your operator for more information.

Zonal resources

Zonal resources operate within a single zone. Zonal outages can affect some or all of the resources in that zone. An example of a zonal resource is a virtual machine (VM) instance that resides within a specific zone. For more information on how zonal resources are managed in a GDC universe, see Management API servers.

Global resources

Global resources are resources that are redundantly deployed across zones within and across the regions of a universe, such as organizations. This gives them higher availability relative to zonal resources. Global resources are deployed to and managed in the global API server.

For every organization, there are global APIs and a zonal APIs.

Disaster domains

A disaster domain represents a collection of buildings that might be impacted at the same time due to physical proximity of the buildings. Thus, it is a durability-related construct used to simplify the requirements for zone separation. Usually, a single disaster domain corresponds to a single campus and is often referred to as a failure domain.

In most GDC universes, Google does not own the facilities, but rather works with colocation vendors who have data centers which provide access to robust infrastructure, redundant power, and high-speed connectivity. This approach ensures optimal performance and uptime for applications and services based on Google's strategy and best practices for HA and DR.

Reach out to your operator for more information on the disaster domain specifications for your universe.

Global and zonal APIs

GDC air-gapped offers two levels of management plane APIs to create and manage both global and zonal resources: global APIs and zonal APIs. See the documentation on global and zonal API servers for information on how these API types are managed in a GDC universe.

Both global and zonal APIs are Kubernetes declarative APIs served at different endpoints, and GDC resources are represented as Kubernetes custom resources in the API servers. The global API servers share a single etcd cluster distributed across zones to provide strong consistency with fault tolerance, at the cost of higher latency and reduced QPS (queries per second) compared to the zonal API servers. In every organization, a zonal management API server provides the zonal API for administrators and developers to manage zonal resources, and a global management API server provides the global API to manage global resources.

For more information on APIs in GDC, see the APIs overview.

gdcloud CLI

The gdcloud CLI provides ways to interact with the zonal or global API to manage your resources and their deployment strategy, such as:

  • Sign in to the zonal or global console URL using the CLI
  • Use a zonal CLI flag for specific zone actions

The global URL is what is configured by default when initializing the gdcloud CLI. You can update your gdcloud configuration to set zonal URLs and sign in to them to complete zone-specific tasks.

Likewise, the gdcloud CLI offers a --zone flag that you can set for many resource management tasks across command groups. When logged in to the global URL configuration, your CLI actions on global resources are applied to all zones for which they are in scope.

For more information on using the gdcloud CLI for zonal and global services, see Manage resources across zones.

GDC console

The GDC console for a given organization is accessible from every zone within the same universe. Therefore, you can use the GDC console to manage all global and zonal resources within an organization.

The following multi-zone features are available from the GDC console:

  • Navigate using a fully qualified domain name (FQDN): You can use the global FQDN to automatically resolve to the most appropriate zonal console endpoint. If the global FQDN fails to resolve in a disruption, you can use the zonal FQDN to navigate to a specific console endpoint in a target zone.

  • Manage zonal resource creation: A zone picker is available when creating a zonal resource, which determines the zone in which the resource is created within. Conversely, the zone picker is not visible when you create a global resource.

  • View existing resources across zones: Various resource pages in the GDC console display zonal resources by zone. You can use the zone picker to select from which zone to view the list of resources.

For more information on managing resources across multiple zones in a GDC universe with the GDC console, see Manage resources across zones.

When a zonal connectivity issue is detected, the GDC console displays a persistent banner notifying you that you might be unable to make changes to multi-zone resources.

The GDC console displays a banner indicating that a zonal connectivity issue is detected.

Resources from all zones are accessible by navigating to the global and zonal GDC console URLs, but there is no aggregate view of resources from across zones.

Resource containers

An organization defines a security boundary that encloses infrastructure resources to be administered together. Each organization in GDC air-gapped provide both a global API and zonal API to allow for the creation of both global and zonal resources within the organization. When creating a global organization, the operator is responsible for deploying zones and configuring zonal settings, such as the amount of storage and the number of physical servers available to user workloads.

Contact your operator for more information on the specific zone setup for your organization.

A project provides logical grouping of service resources within an organization and provide a lifecycle and policy boundary for managing resources. All projects are global and span across the zones you've configured in your universe by default.

Although all service resources must be created in a project, not all services are global. For services that are only supported on a zonal level, you must deploy and manage them within the zones you choose. Refer to the appropriate documentation of a resource type for more information.

IAM

The following Identity and Access Management (IAM) services must be configured as global resources:

  • Identity providers (IdP) for authentication
  • Role-based access control (RBAC)
  • Service identities

Each IAM configuration spans all zones in a universe.

Authentication

You must connect an IdP to your organization using the global IdentityProviderConfig resource. This resource ensures that you use the same IdP to connect to your organization for all zones in the universe.

For more information, see Connect to an identity provider.

Access

Every user or group must be assigned a global IAMRoleBinding resource to get access to the global API server, and zonal Management API servers and Kubernetes clusters consistently in each zone of the organization.

  • Global API server access: IAMRoleBinding is propagated as a ClusterRoleBinding or RoleBinding to a predefined ClusterRole in the global API server.

  • Zonal Management API server access: IAMRoleBinding is propagated as a ClusterRoleBinding or RoleBinding in the zonal Management API server.

  • Kubernetes cluster access: IAMRoleBinding is propagated as a ProjectRole and ProjectRoleBinding to propagate as a Kubernetes Role and RoleBinding to Kubernetes namespaces in the Management API server and Kubernetes clusters, corresponding to the project that the ProjectRole and ProjectRoleBinding are in.

For more information, see Grant and revoke access.

Service identity

Service accounts are user principals that workloads and services use to programmatically consume resources and access microservices securely. They are a special kind of identity used by an application or workload rather than by a person. Similar to a user account, service accounts can be granted permissions and roles, but they can't sign in like a human user. The service identity feature is included in the global ProjectServiceAccount resource.

For more information, see Authenticate with service accounts.

Networking

You can configure the following networking services for your GDC zones:

  • Anycast services
  • Load balancing
  • Project network policies
  • DNS

Configure your global and zonal networking services to manage inter-zone and intra-zone networking traffic in your GDC universe.

Anycast services

Multi-zone provides Anycast networking services to serve your managed zones from multiple locations around the world for high availability. Likewise, Data Center Interconnection (DCI) options are implemented as a full mesh to interconnect multiple GDC air-gapped zones over diverse geographic locations. This enables GDC to deliver multi-zone disaster protection with site diversity while accommodating the requirement for complete disconnection from any Google infrastructure.

Anycast services are represented by unique /32 IPv4 prefixes, which are provided using Border Gateway Protocol (BGP) to customer facilities, ensuring reachability from any connected location. While each Anycast service is accessible from all zones within the GDC air-gapped network, the actual endpoint to which traffic is directed depends on factors such as proximity and zone preference based on your custom routing policy.

Traffic delivery is optimized by routing it to the nearest available service instance, always within the same zone as the customer connection. This not only reduces latency but also enhances the overall performance and responsiveness of the service. For example, if an Anycast service is deployed across zone 1, zone 2, and zone 3, a customer request originating from zone 2 would typically be routed to the service instance within zone 2, as it is the closest and, therefore, most efficient option.

While the Anycast range is globally accessible, it is only provided to customers from the specific zones where the service is actively deployed. This access configuration means a service deployed in zone 1 would only be available to customers connected to zone 1 and not to those connected to other zones.

Furthermore, GDC implements a zone preference system where zones are assigned a numerical value during creation, irrespective of its zone name, that sets customer attraction. For example, if an Anycast service is deployed to zones with numerical values 1, 2, and 3, customer traffic would generally be directed towards the zone with the lowest value set before the other zones as the preferred location. This preference system provides a degree of predictability and control over traffic patterns, but it also includes built-in failover mechanisms. In the event of a failure or outage affecting the preferred zone, the system would automatically shift traffic to another zone, ensuring uninterrupted service availability.

Contact your operator for more information.

Load balancing

GDC provides an L4 passthrough load balancer for pod and VM workloads. This load balancer provides the following configurations:

  • Either TCP or UDP protocol.
  • No proxy between the workload and client.
  • Dedicated load balancing for particular zones, or globally across all zones in the universe.
  • Internal network traffic within your organization, or external network traffic between organizations.

The following diagram illustrates the components of an external passthrough L4 load balancer in a GDC universe:

An external passthrough L4 load balancer in a GDC universe.

The load balancer can be fine tuned to function within a single zone or globally for all zones.

For more information on load balancing in GDC, see Manage load balancers.

Project network policies

Project network policies define either data transfer in or data transfer out rules for a project. Because projects are a global resource, you must define a project's network policies globally as well, to allow for cross-zone networking traffic for the services and workloads within a project.

For more information, see Configure project network policies.

DNS

Domain Name System (DNS) services are global and span across multiple zones. If a DNS service instance becomes inaccessible in a zone, clients are seamlessly served by another DNS service instance in another zone.

Each organization in a zone contains three global authoritative DNS servers:

  • Global infrastructure internal server: the authoritative server that resolves the DNS requests within the infrastructure Virtual Private Cloud (VPC) network. This server only manages infrastructure workloads. User workloads don't interact with this component. All the global infrastructure internal deployments for an organization across all zones are accessible with an Anycast IP address.

  • Global customer internal server: the authoritative server that resolves DNS requests within the customer Virtual Private Cloud (VPC) network. This server only manages user workloads, such as a pod in a Kubernetes cluster or a virtual machine (VM), and resolves all DNS requests originating from those user workloads. All the global customer internal deployments for an organization across all zones are accessible with an Anycast IP address. Since VPCs span across zones, a request for resolution of a global fully qualified domain name (FQDN) from a zone can land on any of the healthy zones.

  • Global customer external server: the authoritative server that resolves the DNS requests originating from the customer's network. All the global customer external deployments for an organization across all zones are accessible with an Anycast IP address.

You can connect to GDC with a dedicated external network or a shared external network. These network types determine how GDC resolves your DNS requests.

A dedicated external network connects to the global customer external DNS server directly, which resolves the request. Alternatively, a shared external network connects to the root of your DNS hierarchy. This root server provides the request's name server (NS) record for the appropriate DNS zone to the global customer external DNS server. Then, your DNS resolver recursively resolves the request.

GDC provides DNS resolution for internal and external traffic both globally and within a single zone.

Requests that originate from your external network are routed from your DNS resolver. Likewise, internal DNS requests originate from your workloads within a GDC universe.

DNS requests have the following FQDN format:

  • Global DNS requests: SERVICE_NAME.ORG_NAME.SUFFIX, such as service-1.org-1.google.com.
  • Zonal DNS requests: SERVICE_NAME.ORG_NAME.ZONE_NAME.SUFFIX, such as service-1.org-1.zone-1.google.com.

For more information on how to configure networking within a GDC universe, see Networking overview.

Storage

Multi-zone universes offer storage replication for resources such as file shares and object stores in synchronous mode for disaster recovery scenarios.

Synchronous replication maintains strict consistency between data in two zones, ensuring that all writes are duplicated immediately and provide zero-RPO in disaster scenarios.

Typically, when a zone participating in synchronous replication becomes disconnected from its pair, then both zones are unable to use the storage volume until the zones are rejoined. This is required to ensure that the data at each site does not diverge independently, which makes it impossible to reconstruct the shared data.

  • Synchronous File Share: The network file system (NFS) is a protocol that allows a server to share its file system with remote clients. Synchronous file shares appear to be standard NFS volumes, but are made fault-tolerant and highly-available by replicating all data immediately to another zone. During zone partitions, the volume will become unavailable until connectivity is restored. Manual failover is available, which allows an administrator to choose a single zone for continued operation.

  • Synchronous Bucket Replication: Bucket replication creates a bi-directional replication relationship between buckets in two zones. The data is always written to both zones immediately. Synchronous buckets remain available for reads during zone partitions, but writes from either zone are blocked until the partition recovers.

Asynchronous storage replication is not available in GDC multi-zone universes.

Latency requirements

To make sure that you can plan GDC zone locations, we base latency requirements on Google Cloud to ensure services can operate effectively. This approach lets you confidently choose GDC air-gapped locations knowing whether those zones will be in the same region and, therefore, support synchronous services.

Multi-zone latency requirements.

The maximum supported latency is less than 1 ms round-trip time (RTT) at the physical layer between any two zones in a region. Because calculating latency at the physical layer requires specialized equipment not available in most instances, this can be approximated by measuring the fiber length between two zones.

A fiber length for zones in a region of 50 km on the primary path and 100 km on the secondary path latency will support regional services. In a full mesh network, this requirement means each fiber length can be no more than 100 km, whereas in a ring network, each fiber length can be no more than 50 km.

Reach out to your operator for more information on your specific latency requirements.