Choosing the Right Architecture for Global Data Distribution

This solution describes three example architectures that you can use to distribute data across Google Cloud Platform (GCP) regions.

Many enterprises work with data from geographically dispersed locations while responding to client requests in near-real time. For example, a demand-side platform (DSP) for digital advertising might have customers who expect database response times to be less than twenty milliseconds, regardless of their geographical location or of the current network load. Implementing this sort of global DSP solution isn't possible if the network architecture is based on a single centralized database, which is vulnerable to latencies based on physical distance and which is heavily impacted by usage spikes.

You can meet these needs with a distributed architecture for data storage. Not all architectures are appropriate for all business needs, and each architecture has varying strengths and weaknesses. This solution therefore offers various GCP alternatives to help implement your overall business strategy and guide your network implementation approach.

GCP advantages

GCP offers robust and stable network bandwidth around the world. And GCP has many additional advantages:

GCP is extremely flexible, and you can use it to build a global virtual network, allowing your applications to more securely communicate across regions using private IP addresses. For example, you can set up Compute Engine virtual machine (VM) instances in two regions, such as us-central1 and asia-east1. You can have these VM instances use private IP addresses to communicate directly with each other by creating a Virtual Private Cloud (VPC) network. In this way your organization can help maintain secure communications between instances.

With GCP anycast IP, a single global IP address is assigned to a managed service, such as load balancing. Using anycast IP, you can create a single, global load balancer instead of configuring load balancers in every region. The global load balancer routes client requests to applications running in the nearest regions and automatically scales to meet changing demand.

Three example data distribution architectures

This section outlines three deployment architectures and discusses when the architecture is appropriate. The architectures and use cases are:

  • Hybrid deployment, consisting of GCP and on-premises services. You want to maintain some on-premises services but would like to take advantage of GCP features. GCP is linked to your current network and incorporated into your ongoing company processes. Some or all on-premises data is copied to GCP or incorporated with GCP.

  • Hybrid deployment, consisting of GCP and other cloud service provider platforms. You want to maintain your current cloud service provider operations, but would like to include some GCP features and configure the two systems to communicate.

  • GCP using multiple regions. You want to support near-synchronous data transfers, possibly on a global scale. Configuring GCP in multiple regions allows extremely rapid and near-simultaneous data transfer across the world.

Hybrid deployment: GCP and on-premises services

Combining GCP with on-premises services is appropriate for use cases involving applications that store data on-premises and that also propagate data to GCP.

For example, in the retail industry, master data an enterprise might insert product data into on-premises databases for a legacy inventory management system. The company might also need to propagate that data to a GCP database that's used for online web stores. With a hybrid approach, you can build a new system that uses GCP without affecting the existing on-premises system. In this architecture, GCP essentially works in parallel with the on-premises network structures.

You should consider the following issues when deciding whether to implement a hybrid GCP and on-premises deployment:

  • If data is both on-premises and in GCP, you must decide which data to treat as master data and where this master data should reside. For example, you might define GCP data to be the master data. In that case, GCP behaves as a data hub connecting one or multiple on-premises environments, exchanging data between them. After data is added or updated in the GCP environment, the data is transmitted to on-premises systems. Alternatively, on-premises systems could hold the master data and periodically update GCP.
  • If you are developing an application for this hybrid environment, keep in mind that managed services are available only for the resources in GCP. Applications that run both on-premises and in the GCP environment might not be able to rely on managed services such as automated backup, redundancy, and scalability.
  • In order to keep data portable and to help ensure consistent administrative operations, it might be easier to host cross-platform data stores, such as MySQL, on virtual machines in both your on-premises and GCP deployment.

Example hybrid architecture

The following diagram illustrates an example of a hybrid architecture with GCP and on-premises systems.

architecture of a hybrid system

In the example architecture:

  • Data is exchanged between on-premises file servers and Google Cloud Storage. This could involve backing up local files to GCP, batch processing files as input, or downloading files from GCP to on-premises networks.
  • Custom applications in local data centers use REST APIs to access applications on App Engine to retrieve or submit data. REST requests are typically synchronous and block clients until results are returned. In this architecture, App Engine provides auto-scaling to grow capacity as required, which helps keep latency low for these synchronous calls.
  • Custom applications submit messages directly to Google Cloud Pub/Sub to store them in a replicated queue for later processing. When messages arrive at Cloud Pub/Sub, Cloud Pub/Sub returns the status immediately and doesn't block clients. Messages can be retrieved and processed asynchronously using Cloud Functions, Cloud Dataflow, applications running on Compute Engine, and other methods. Client applications in on-premises environments can also retrieve messages.
  • Data stored in on-premises databases is exported (perhaps as CSV files) and uploaded to GCP for batch loading into databases managed by Cloud SQL.
  • A Firebase database is used to synchronize data between on-premises systems and GCP. Applications subscribe to keys in the database and whenever values are updated, applications are notified in real time and receive updated values. Applications that interact with Firebase can be on-premises, on GCP, or both.

Hybrid deployment: GCP and other cloud providers

You might combine GCP with other cloud providers to more effectively distribute your data, to leverage multiple fail-safe mechanisms, or to take advantage of specific GCP features. This architecture is a good choice when you already have production services running on other cloud providers, but want to take advantage of GCP features. For example, you might want to use BigQuery to analyze application data, as well as logs and monitoring metrics.

This architecture is similar to the hybrid on-premises and GCP architecture described earlier. You should consider the following issues when implementing a hybrid deployment of GCP and other cloud providers:

  • You can use open source multi-cloud client libraries such as jclouds and libcloud to help integrate APIs between GCP and other cloud services.
  • GCP offers ways to transfer data from Amazon Web Services (AWS), such as Storage Transfer Service and Stackdriver monitoring and logging. You can export Stackdriver logs to BigQuery for further analysis.
  • Cloud Pub/Sub is a global service, and your applications don't need to know in which region Cloud Pub/Sub queues exist. You can publish messages or subscribe to globally available topics. With GCP, client apps need to be aware of only a single set of IP addresses and ports. For other cloud providers queues might be specific to a region. If that's so, when you deploy apps across multiple regions, client apps need to be aware of the endpoints for every region. Keeping track of the endpoints can be cumbersome, especially if you add services from new regions.

Example architecture for GCP combined with another cloud provider

The following diagram illustrates a hybrid architecture including GCP and other cloud providers.

architecture of a system involving GCP and another cloud provider

In the example architecture:

  • Messages are exchanged between Cloud Pub/Sub and other public clouds. Cloud Pub/Sub provides a global endpoint and can act as a message hub between clouds, because applications don't need to know in which region the message queues actually exist.
  • Instances of Monitoring Agent are installed in virtual machines of other public clouds to collect metrics about CPU utilization, memory usage, process information, and so on. Stackdriver monitors resource usage across hybrid cloud environments.
  • Custom applications running on virtual machines in other cloud environments use REST APIs to call applications hosted on App Engine to submit or retrieve data.
  • Storage Transfer Service directly transfers files from Amazon S3 on demand or periodically. Transferred files can be processed on Compute Engine to load into Cloud SQL.

Hybrid deployment: GCP with multiple regions

An architecture based on GCP in multiple regions is a good choice when your application needs to serve users globally and synchronize data between regions with minimum latency. An example is an internet-enabled video game, which must function throughout the world with near real-time synchronization between players.

This architecture takes advantage of the power of GCP-managed services to reduce administrative tasks and to ease system design. GCP allows you to focus on your applications without spending time on infrastructure design. You should consider the following issues when implementing a hybrid deployment of GCP with multiple regions:

  • You can easily deploy multi-regional data processing services, because message publishers and subscribers can run in any region. Cloud Pub/Sub can exchange messages between applications running in different regions without you having to specify where the application is running. In this architecture, Cloud Pub/Sub messages stay entirely within GCP and are not sent across the internet, resulting in lower latency.
  • Applications on Compute Engine instances can directly communicate across regions using private IP addresses within a GCP VPC network.
  • You can use REST APIs to make custom applications loosely coupled. Because the architecture is fully inside the GCP environment, you can use App Engine for managing applications where you expect minimal administrative tasks.
  • After distributing data across regions, you can use Cloud Dataflow or Cloud Dataproc for processing ETL or analytical workloads.

Example multi-region GCP architecture

The following diagram illustrates the architecture of a GCP deployment with multiple regions.

architecture of a system involving multiple GCP regions

In the example architecture:

  • As with the hybrid cloud architecture, Stackdriver monitors all compute resources and displays a consolidated global view of resource usage. Collected logs and metrics are exported to BigQuery for further analysis.
  • As with the hybrid cloud architecture, Cloud Pub/Sub is used as a message hub. Cloud Pub/Sub allows services to be loosely coupled and independent from where the application actually runs.
  • Custom applications that run on App Engine or Compute Engine directly exchange messages with other custom applications using REST APIs. This is a more tightly coupled architecture than with the hybrid architecture, and therefore achieves more predictable latency.
  • Storage Transfer Service is used to synchronize Cloud Storage buckets. Alternatively, the gsutil tool can be used for on-demand transfers between buckets across regions.

Next steps

Learn more about data management on GCP from the following links:

Was this page helpful? Let us know how we did:

Send feedback about...