Analytics hybrid and multicloud pattern

Last reviewed 2023-12-14 UTC

This document discusses that the objective of the analytics hybrid and multicloud pattern is to capitalize on the split between transactional and analytics workloads.

In enterprise systems, most workloads fall into these categories:

  • Transactional workloads include interactive applications like sales, financial processing, enterprise resource planning, or communication.
  • Analytics workloads include applications that transform, analyze, refine, or visualize data to aid decision-making processes.

Analytics systems obtain their data from transactional systems by either querying APIs or accessing databases. In most enterprises, analytics and transactional systems tend to be separate and loosely coupled. The objective of the analytics hybrid and multicloud pattern is to capitalize on this pre-existing split by running transactional and analytics workloads in two different computing environments. Raw data is first extracted from workloads that are running in the private computing environment and then loaded into Google Cloud, where it's used for analytical processing. Some of the results might then be fed back to transactional systems.

The following diagram illustrates conceptually possible architectures by showing potential data pipelines. Each path/arrow represents a possible data movement and transformation pipeline option that can be based on ETL or ELT, depending on the available data quality and targeted use case.

To move your data into Google Cloud and unlock value from it, use data movement services, a complete suite of data ingestion, integration, and replication services.

Data flowing from an on-premises or other cloud environment into Google Cloud, through ingest, pipelines, storage, analytics, into the application and presentation layer.

As shown in the preceding diagram, connecting Google Cloud with on-premises environments and other cloud environments can enable various data analytics use cases, such as data streaming and database backups. To power the foundational transport of a hybrid and multicloud analytics pattern that requires a high volume of data transfer, Cloud Interconnect and Cross-Cloud Interconnect provide dedicated connectivity to on-premises and other cloud providers.

Advantages

Running analytics workloads in the cloud has several key advantages:

  • Inbound traffic—moving data from your private computing environment or other clouds to Google Cloud—might be free of charge.
  • Analytics workloads often need to process substantial amounts of data and can be bursty, so they're especially well suited to being deployed in a public cloud environment. By dynamically scaling compute resources, you can quickly process large datasets while avoiding upfront investments or having to overprovision computing equipment.
  • Google Cloud provides a rich set of services to manage data throughout its entire lifecycle, ranging from initial acquisition through processing and analyzing to final visualization.
    • Data movement services on Google Cloud provide a complete suite of products to move, integrate, and transform data seamlessly in different ways.
    • Cloud Storage is well suited for building a data lake.
  • Google Cloud helps you to modernize and optimize your data platform to break down data silos. Using a data lakehouse helps to standardize across different storage formats. It can also provide the flexibility, scalability, and agility needed to help ensure that your data generates value for your business, rather than inefficiencies. For more information, see BigLake.

  • BigQuery Omni, provides compute power that runs locally to the storage on AWS or Azure. It also helps you query your own data stored in Amazon Simple Storage Service (Amazon S3) or Azure Blob Storage. This multicloud analytics capability lets data teams break down data silos. For more information about querying data stored outside of BigQuery, see Introduction to external data sources.

Best practices

To implement the analytics hybrid and multicloud architecture pattern, consider the following general best practices:

  • Use the handover networking pattern to enable the ingestion of data. If analytical results need to be fed back to transactional systems, you might combine both the handover and the gated egress pattern.
  • Use Pub/Sub queues or Cloud Storage buckets to hand over data to Google Cloud from transactional systems that are running in your private computing environment. These queues or buckets can then serve as sources for data-processing pipelines and workloads.
  • To deploy ETL and ELT data pipelines, consider using Cloud Data Fusion or Dataflow depending on your specific use case requirements. Both are fully managed, cloud-first data processing services for building and managing data pipelines.
  • To discover, classify, and protect your valuable data assets, consider using Google Cloud Sensitive Data Protection capabilities, like de-identification techniques. These techniques let you mask, encrypt, and replace sensitive data—like personally identifiable information (PII)—using a randomly generated or pre-determined key, where applicable and compliant.
  • When you have existing Hadoop or Spark workloads, consider migrating jobs to Dataproc and migrating existing HDFS data to Cloud Storage.
  • When you're performing an initial data transfer from your private computing environment to Google Cloud, choose the transfer approach that is best suited for your dataset size and available bandwidth. For more information, see Migration to Google Cloud: Transferring your large datasets.

  • If data transfer or exchange between Google Cloud and other clouds is required for the long term with high traffic volume, you should evaluate using Google Cloud Cross-Cloud Interconnect to help you establish high-bandwidth dedicated connectivity between Google Cloud and other cloud service providers (available in certain locations).

  • If encryption is required at the connectivity layer, various options are available based on the selected hybrid connectivity solution. These options include VPN tunnels, HA VPN over Cloud Interconnect, and MACsec for Cross-Cloud Interconnect.

  • Use consistent tooling and processes across environments. In an analytics hybrid scenario, this practice can help increase operational efficiency, although it's not a prerequisite.