Healthcare analytics platform reference architecture

This document explains the architecture of the healthcare analytics platform—a set of tools on Google Cloud that helps you process clinical and operational healthcare data—to researchers, data scientists, IT teams, and business analysts.

Healthcare organizations often exist in a siloed data world that includes multiple systems, data types, and modalities. You can modernize your infrastructure by moving healthcare data from multiple on-premises sources into an analytics platform on Google Cloud, which lets you harmonize data, monitor data pipelines, run analytics, and create visualizations for provider insights and data-driven decision-making.

Through this cloud-based analytics platform, your environment scales automatically in response to spikes in workload. For example, new data sources can trigger an increase in analytical processes or require additional compute power for research. By using the healthcare analytics platform, you can also minimize the resources needed by health systems managers to prepare the KPIs and metrics that drive business decisions. In addition, the healthcare analytics platform provides a holistic view of the processes required to deliver insights to healthcare providers, including the following:

  • Harmonizing healthcare-specific data types.
  • Developing portable cohort definitions, clinical measurement. definitions, and dashboards.
  • Providing relevant datasets for integration into your analytics.
  • Secure sharing of aggregated data subsets with others.

This document is the first part of a series about building and deploying your healthcare analytics platform on Google Cloud:

Core features

The core features of the healthcare analytics platform are as follows:

  • A set of tools for harmonizing and enriching clinical and operational healthcare data.
  • Cohort and machine learning (ML) tools for portable research and analytics.
  • Ingestion of healthcare-specific data types—Fast Healthcare Interoperability Resources (FHIR), HL7v2, and Digital Imaging and Communications in Medicine (DICOM) in raw form, with no need for parsing, indexing, or data management.
  • Open source technology that ingests and harmonizes data on-premises, in multi-cloud environments, or on Google Cloud, with minimal change in code and support from the user community.
  • Fast, scalable analytics that support data queries with common analytics tools and that simplify the process of creating visualizations.
  • Data lineage and indexing automation, pipelines for standard data models, and scalability, which reduce the need for management and also reduce overhead.
  • Secure data storage enabled by data encryption at rest by default, the Healthcare Data Protection Suite, and other functionalities.

Reference architecture overview

The following diagram shows the reference architecture and the primary components of the healthcare analytics platform on Google Cloud.

Reference architecture of the healthcare analytics platform on Google Cloud.

The preceding diagram shows data ingestion into Google Cloud from clinical systems such as electronic health records (EHRs), picture archiving and communication systems (PACS), and historical databases. Operational and financial data is ingested from other systems. After ingestion, the data is transformed into a common data model or data type that lets you perform cross analytics. Provenance and lineage metadata is stored, and harmonized data is available for additional use, such as creating visualizations for provider insights, business applications, and data enrichment.

Provider analytics workflow

The following diagram shows the workflow of provider analytics on the healthcare analytics platform on Google Cloud, highlighting the elements of data ingestion, transformation and harmonizing, storage, access, and enrichment.

Provider analytics workflow of the healthcare analytics platform on Google Cloud.

The preceding diagram starts with the sources that you draw data from, such as EHRs, PACS, and clinical data warehouses (CDWs). The next step is data ingestion, which you can do through the Cloud Healthcare API, and also by using Cloud Data Fusion and Cloud Storage. You then harmonize data through structural transformations, including conversion and de-identification, to prepare it for storage. The data and metadata persist in storage in BigQuery through the Cloud Healthcare API, and also in Cloud Storage. The final steps of the workflow are data enrichment and discovery. You can explore data in BigQuery, run notebook-based analysis, enrich your data by using the Google Cloud Marketplace public datasets, and use AI Platform Prediction and AI Platform Training for advanced analytics.

What's next