Google Cloud products are served from specific regional failure domains and are fully supported by Service Level Agreements to ensure you are designing your application architecture within the structure of Google Cloud.
Google Cloud infrastructure services are available in locations across North America, South America, Europe, Asia, and Australia. These locations are divided into regions and zones. You can choose where to locate your applications to meet your latency, availability, and durability requirements.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.Get started for free
Regions and zones
Regions are independent geographic areas that consist of zones.
A zone is a deployment area for Google Cloud resources within a region. Zones should be considered a single failure domain within a region. To deploy fault-tolerant applications with high availability and help protect against unexpected failures, deploy your applications across multiple zones in a region.
To protect against the loss of an entire region due to natural disaster, have a disaster recovery plan and know how to bring up your application in the unlikely event that your primary region is lost. See application deployment considerations for more information.
For more information about the specific resources available within each location option, see our Cloud locations.
Google Cloud's services and resources can be zonal, regional, or managed by Google across multiple regions. For more information about what these options mean for your data, see geographic management of data.
Zonal resources operate within a single zone. Zonal outages can affect some or all of the resources in that zone. An example of a zonal resource is a Compute Engine virtual machine (VM) instance that resides within a specific zone.
Regional resources are resources that are redundantly deployed across multiple zones within a region, for example App Engine applications, or regional managed instance groups. This gives them higher availability relative to zonal resources.
Multiple Google Cloud services are managed by Google to be redundant and distributed within and across regions. These services optimize availability, performance, and resource efficiency. As a result, these services require a trade-off between either latency or the consistency model. These trade-offs are documented on a product specific basis.
The following services have one or more multiregional locations in addition to any regional locations:
- Cloud DLP
- Cloud Healthcare API
- Cloud Key Management Service
- Cloud Storage
- Database Migration Service
These multiregional services are designed to be able to function following the loss of a single region.
You can find each product's exact configurations and options with respect to regions and zones in Google Cloud public documentation.
Google Cloud has been designed to operate globally from the ground up and continually conducts maintenance and upgrades 24/7/365 without inconveniencing you. Our global backbone provides tremendous flexibility for load-balancing, and reduces end-user latency by having interconnects close to you. Our global cloud management plane simplifies managing multi-region developments.
Underpinning and supporting many customer facing Google Cloud services are a set of proven internal services like Spanner, Colossus, Borg, and Chubby.
These internal services are either globally load-balanced across multiple regions, or dedicated to each region in which they are available. Where services are load-balanced across multiple regions, we deploy updates progressively region-by-region, allowing us to detect and address problems without affecting your service usage. None of these internal services are limited to a single logical data center or to a single region.
In general, for Google Cloud services, if a single region fails, only customers solely in that region are impacted; customers who have multi-region products are not impacted. Google Cloud has significant architecture in place with a goal to prevent correlated failures across regions.
All Google Cloud services rely upon core internal tools to provide fundamental services such as networking (in and out of data centers), access to data centers, and identity authorization systems. These tools are resilient to regional outages, with the goal of one region not being impacted if other regions become unavailable.
Google Cloud provides clear direction on how customers can architect their applications for the desired level of resilience on our public website, especially for commonly-used Google Cloud products such as Compute Engine, BigQuery, Pub/Sub, and other services.
Our major dependencies are listed below, starting with dependencies common to all services, with the proviso that lower level implementation details are subject to change.
Common dependencies for all services
- Identity data plane for authentication and authorization
- Internal services that provide logging, metadata storage, and workflow management
- Access to Google Cloud APIs depends on DNS, globally-distributed load balancers, and points of presence (PoPs).
- The configuration of global resources: For example, IAM policies, global firewall rules, global load balancer configurations, and Pub/Sub topics are stored in replicated databases.
- When Google Cloud services makes requests to customer-controlled endpoints, for example, Cloud EKM fetching customer keys, or Pub/Sub delivering messages, those requests depend on our global network infrastructure to access those customer-controlled endpoints.
Additional details on dependencies
- Compute Engine services
- The Google Cloud VM and Persistent Disk data planes depend on lower-level Compute Engine and Cloud Storage services such as Borg and Colossus.
- Google Cloud and infrastructure storage services like Spanner,
Bigtable, and Cloud Storage depend on:
- Encryption and key management infrastructure for customer (Cloud KMS / Cloud EKM) and internal infrastructure for Google-owned keys
- Internal services to provide logging and auditing of data access
- Internal data replication services, where data is expected to be available across multiple regions
- Explicitly-configured backups and replication to other regions depends on cross-region networking
- Messaging services
- Pub/Sub depends on our global network infrastructure to access customer-controlled endpoints
- Networking services
- Global load balancing, DNS, and failover between regions all depend on physical networking infrastructure.
- Preventing DDos attacks, and the like, depends on lower-level Compute Engine infrastructure.
- Managed and hosted services like GKE and Cloud SQL
- Depend on Compute Engine and either Container Registry or Artifact Registry for VM images.
- Self-contained lower-level infrastructure
- Our internal cluster-level control plane including Borg and network fabrics
- Cluster-level storage, such as Colossus
- Encryption and key management infrastructure
Maintaining and improving availability and resilience
Site Reliability Engineering (SRE) is Google's internal organization dedicated to working on availability, latency, performance, and capacity. Outages and service unavailability are correlated to the deployment of new code or changes to the environment. By using industry best practices, SRE balances the need to release new software and keeps the environment secure with the understanding that those necessary changes might cause downtime.
Partnering with customers to build resilient services
If you have mission critical needs, and need to architect for resilience and disaster recovery, our SRE/CRE and PSO teams can work with you to architect your applications to bridge multiple regions and zones and can further assist you with designing High Availability (HA) systems.
If you have heightened availability requirements around specific dates, such as Black Friday/Cyber Monday, Google Cloud has a program to partner with you to check and validate your specific application running in GCP and identify any unexpected service dependencies between your application and our services.
Geographic management of data
Data locality for Google Cloud services are governed by the terms of service, including service specific terms. Google understands that each customer might have unique security and compliance needs. The Google Cloud sales team can help you work towards meeting your requirements.
When using regional or zonal storage resources, we strongly recommend that you replicate data to another region or snapshot it to a multiregional storage resource for disaster recovery purposes.
Application deployment considerations
- To build highly available services and applications that can withstand zones becoming unavailable
Use the following:
- Regional resources, such as App Engine applications, regional managed instance groups, or managed multiregional resources such as Cloud Storage, Datastore, Firestore, or Spanner.
- Zonal resources, such as Compute Engine virtual machines, but manage your own compute and storage redundancy across zones or across regions.
- To build disaster recovery capable applications that can withstand the extended loss of entire regions
For data, use one or more of the following strategies:
- Use managed, multiregional storage services such as Cloud Storage, Datastore, Firestore, or Spanner.
- Use zonal or regional resources, but snapshot data to a multiregional resource such as Cloud Storage, Datastore, Firestore, or Spanner.
- Use zonal or regional resources, but manage your own data replication to one or more other regions.
For compute, use the following strategy:
- Use zonal or regional resources, such as Compute Engine or App Engine, but manually or automatically bring up your application in another region (on regional failure) referring to copies of your primary data if the data is not already in a managed, multiregional resource.
For more information about service dependencies, contact sales.
Additional solutions and tutorials
The following solutions and tutorials provide guidance for ensuring your application is highly available and can withstand outages:
Learn how to use Google Cloud to build scalable and resilient application architectures using patterns and practices that apply broadly to any web application.
Configure Compute Engine instances in different regions and use HTTP load balancing to distribute traffic across the regions to increase availability across regions and provide failover in the case of a service outage.
Design your application on the Compute Engine service to be robust against failures, network interruptions, and unexpected disasters.
Learn how to add basic disaster recovery to your Cassandra installation by backing up your data into, and restoring your data from, Cloud Storage.
General principles for designing and testing a disaster recovery plan with Google Cloud.