Real-World Insights reference architecture

This article provides guidance for building and deploying the following platform components of Real-World Insights:

  • Web-based user interfaces for building studies and enrolling participants
  • Backend services for managing the flow of data
  • Mobile applications that participants use to discover, enroll, and participate in studies

Introduction

Real-World Insights is based on an open source implementation of the FDA MyStudies platform. Clinical researchers use Real-World Insights in the following industries:

  • Commercial pharmaceutical and medical technology

  • Health systems and research centers that collect patient-generated health data in a dedicated environment

Clinical researchers within these industries can deploy Real-World Insights using the Google Cloud Healthcare Data Protection Suite and Real-World Insights deployment templates.

Real-World Insights is based on the FDA MyStudies code and documentation that was released by the FDA to support clinical trials and real world evidence studies. It further extends the FDA’s existing open source platform with enhanced security, configurable privacy controls, and seamless interoperability with Google Cloud managed services.

Reference architecture diagrams

Deployment reference architecture diagram

Deployment reference architecture diagram

The diagram shows the following components that are created during the deployment:

Devops Project: created by the user with a manual Terraform deployment

  • Identity and Access Management (IAM): permissions for the projects and folder
  • Terraform: used to deploy the other components
  • Cloud Storage: to store the state of the deployment
  • Cloud Build: used for CI/CD with configurations pulled from a GitHub repository

Audit project: created by an automated Terraform deployment

  • Cloud Storage: log sink bucket for storing logs from all other components
  • BigQuery: log sink dataset for analysis of logs from all other components

Network project: created by an automated Terraform deployment

  • Virtual Private Cloud: including networks and subnetworks
  • Cloud NAT: allow internet connectivity of private bastion host and GKE cluster
  • Cloud Router: routing traffic to and from components
  • Compute Engine: an instance which serves as a bastion for admin access to components

Firebase project: created by an automated Terraform deployment

  • Pub/Sub: used for triggers based on survey responses; this is an optional component that can be used to trigger actions when survey responses are submitted
  • Firestore: raw ingested data from survey responses

Secrets project: created by an automated Terraform deployment

  • Secret Manager: used by components to store secrets and variables

Apps project: created by an automated Terraform deployment

  • Cloud DNS: a static IP address for ingress for mobile apps and user connections
  • IAM: service accounts to be used by the components
  • Cloud Build: triggers to build and deploy the application containers in the GKE cluster
  • Google Kubernetes Engine (GKE): a GKE cluster into which the various application components will be deployed

Data project: created by an automated Terraform deployment

  • BigQuery: survey data from Firestore for analytics; this is an optional component that allows for analysis of survey data using BigQuery
  • Cloud Storage: used for consent documents submitted by participants
  • Cloud Storage: used for study designs created in Study Builder
  • Cloud Storage: to store SQL commands and scripts used by components
  • Cloud SQL: MySQL databases used by the GKE application components
  • IAM: IAM bindings

Application components reference architecture

Application components reference architecture

The diagram shows the application components in the GKE cluster and their interactions:

  • The deployments and services are deployed into a single GKE cluster in the Apps project
  • The Auth Server component uses Cloud SQL in the Data project as storage
  • The Study Builder web app component uses Cloud Storage in the Data project to store study designs and Cloud SQL in the Data project to store Study Builder data
  • The Study Datastore component exposes the data stored in the Study Builder Cloud SQL database
  • The Response Datastore component utilizes Firestore in the Firebase project to store response data and Cloud SQL in the Data project to store activity data. The Firestore data is also synced to BigQuery in the Data project for analysis
  • The Participant Datastore consists of three separate containerized services which use Cloud SQL in the Data project as storage:
    • User Management Service
    • Enrollment Management Service
    • Consent Management Service which also stores consent documents in Cloud Storage
  • The Participant Manager web app component uses the same Cloud SQL as the Participant Datastore and accesses the consent documents from Cloud Storage
  • Response Datastore, Participant Datastore, and Participant Manager connect to the Auth Server for authentication.
  • Study Builder web app includes its own authentication

Terminology

The following list provides terms used in this page and their definitions:

  • Participant: A mobile app user is referred to as a "participant" when enrolled into a study and will be associated with a unique participant ID. A single mobile app user can be associated with multiple studies and is a unique participant in each study.
  • Study Content: All the content that is required to carry out a study such as study eligibility criteria, consent forms, questionnaires, and response types.
  • Response Data: Responses provided by a participant to questionnaires and activities within the context of a study.

Architecture components

The Real-World Insights platform includes the following components:

  • Study Builder (UI)
  • Study Datastore
  • Auth Server
  • Participant Datastore
  • Response Datastore
  • Participant Manager (UI)
  • Mobile Apps (UI)

Study Builder

The Study Builder provides a user interface for study administrators to create and launch studies and to manage study content during the course of a study. It does not handle any patient or participant information. The backend database is a MySQL database, which is shared with the Study Datastore. The Study Datastore provides this study information to all downstream applications. In addition, the Study Builder has its own built-in authentication and authorization functionality. The Study Builder is a Java application built on the Spring framework, and is deployed on Google Kubernetes Engine and Cloud SQL.

Study Datastore

The Study Datastore provides REST APIs for downstream client applications to obtain study content that was created using the Study Builder. The backend database is the shared MySQL database also used by Study Builder. The Study Datastore uses basic authentication with API username and password keys provided to client applications. The Study Datastore is a Java application built on the Spring framework, and is deployed on Google Kubernetes Engine and Cloud SQL.

Auth Server

The Auth Server is the centralized authentication mechanism for client applications such as mobile apps and Participant Manager users in the MyStudies platform. The Auth Server includes an OAuth SCIM server, follows OpenID Connect (OIDC) flows, and leverages ORY Hydra for token generation.

The Auth Server provides the following functionality to support mobile app users:

  • User registration
  • User credentials management
  • User authentication
  • Token management
  • User logout

The Auth Server provides the following functionality to support server-to-server authentication:

  • Client credentials management (client ID and secret)
  • Client credentials validation

The Auth Server is a Spring Boot application, and is deployed on Google Kubernetes Engine and Cloud SQL.

Response Datastore

The Response Datastore behaves as a resource server that stores participant response data and participant activity state data, as well as REST APIs for accessing the data. The Response Datastore requires a valid access token and client token to provide the resource to the participant. It does not store user data that may identify the participant. The Response Datastore is a Spring Boot application, and is deployed on Google Kubernetes Engine, Cloud SQL for activity data storage, and Firestore for response data storage.

Participant Datastore

The Participant Datastore consists of three REST-based services which are deployed separately:

  • User Management: manage mobile app user registration and profile
  • Enrollment Management: manage mobile app user enrollment data
  • Consent Management: store and manage mobile app user study consent status and documents

These services are resource servers that store mobile app user information and require a valid access token and client token to provide the resource to the participant.

These services are Spring Boot applications, and are deployed on Google Kubernetes Engine, a single Cloud SQL database for backend data storage, and Cloud Storage for consent document storage.

Participant Manager

The Participant Manager is an Angular web application which provides the following functionality:

  • Manage sites for a study
  • Manage the registry of participants for a study
  • Generate and distribute enrollment tokens to pre-screened study participants

The web application provides a user interface for study and site administrators to create study sites, add participants to the registry, manage these participants, and other related activities.

These services are Spring Boot applications with an Angular web application, and are deployed on Google Kubernetes Engine. Participant Manager uses the same backend database as the Participant Datastore and accesses consent documents from Cloud Storage.

Mobile apps

The Real-World Insights platform also includes mobile applications that study participants use to discover, enroll in, and participate in studies. Study workflow features are provided by ResearchStack for the Android app and Apple ResearchKit for the iOS app.

The mobile apps communicate with the individual services directly to store or retrieve information such as consent documents, participant information, and study responses.

The mobile apps are used by study participants to complete study activities such as completing consent documents or study questionnaires. The content delivered to the mobile apps, as well as the study activities presented are defined using the Study Builder. The branding within the apps can also be customized during deployment.

What’s next