The Architecture Center provides content resources across a wide variety of big data and analytics subjects.
Big data and analytics resources in the Architecture Center
You can filter the following list of big data and analytics resources by typing a product name or a phrase that's in the resource title or description.
Analyzing FHIR data in BigQuery Explains the processes and considerations for analyzing Fast Healthcare Interoperability Resources (FHIR) data in BigQuery. Products used: BigQuery |
Architecture and functions in a data mesh Guidance on implementing a data mesh in Google Cloud... |
Architecture: Marketing Data Warehouse Provides a reference architecture that describes how you can build scalable marketing data warehouses. Marketing data warehouse solutions let you deliver timely, targeted, and tailored advertising experiences to your users while... Products used: AIPlatform, Auto ML, BigQuery, Cloud Data Fusion, Dataflow, Dataprepby Trifacta, Google Analytics, Looker |
Automatically apply sensitivity tags in Data Catalog to files, databases, and BigQuery tables Shows how to use Data Catalog with an automated Dataflow pipeline to identify and apply data sensitivity tags to your data in Cloud Storage files, relational databases, and BigQuery. Products used: Cloud Build, Cloud Data Loss Prevention, Cloud SQL, Cloud Storage, Compute Engine, Data Catalog, Dataflow, Secret Manager |
Build and visualize demand forecast predictions using Datastream, Dataflow, BigQuery ML, and Looker Shows you how to replicate and process operational data from an Oracle database into Google Cloud in real time. It also demonstrates how to forecast future demand, and how to visualize this forecast data as it arrives. Products used: BigQuery, Dataflow, Looker |
Building an ML vision analytics solution with Dataflow and Cloud Vision API How to deploy a Dataflow pipeline to process large-scale image files with Cloud Vision. Dataflow stores the results in BigQuery so that you can use them to train BigQuery ML pre-built models. The Dataflow pipeline you... Products used: BigQuery, Cloud Build, Cloud Pub/Sub, Cloud Storage, Cloud Vision, Dataflow |
Building custom data integrations using Fivetran and Cloud Functions Shows you how to use the Fivetran connector for Cloud Functions. Fivetran offers standardized data connectors for a wide range of business apps, event... Products used: BigQuery, Cloud Functions, Cloud Storage |
Cloud Monitoring metric export Describes a way to export Cloud Monitoring metrics for long-term analysis. Products used: App Engine, BigQuery, Cloud Monitoring, Cloud Pub/Sub, Cloud Scheduler, Datalab, Looker Studio |
Continuous data replication to BigQuery using Striim Demonstrates how to migrate a MySQL database to BigQuery using Striim. Striim is a comprehensive streaming extract, transform, and load (ETL) platform. Products used: BigQuery, Cloud SQL for MySQL, Compute Engine |
Continuous data replication to Cloud Spanner using Striim How to migrate a MySQL database to Cloud Spanner using Striim. This document focuses on the implementation of a continuous replication from Cloud SQL for MySQL to BigQuery. Products used: Cloud SQL, Cloud SQL for MySQL, Cloud Spanner, Compute Engine |
Data analytics design patterns Provides links to business use cases, sample code, and technical reference guides for industry data analytics use cases. Use these resources to learn, identify best practices to accelerate the implementation of your workloads. The design... |
Data science with R on Google Cloud: Exploratory data analysis Shows you how to get started with data science at scale with R on Google Cloud. This document is intended for those who have some experience with R and with Jupyter notebooks, and who are comfortable with SQL. Products used: BigQuery, Cloud Storage, Notebooks |
Discusses how to use Sensitive Data Protection to create an automated data transformation pipeline to de-identify sensitive data like personally identifiable information (PII). Products used: BigQuery, Cloud Data Loss Prevention, Cloud Pub/Sub, Cloud Storage, Dataflow, Identity and Access Management |
Example architecture for using a DLP proxy to query a database containing sensitive data Describes using Sensitive Data Protection to mitigate the risk of exposing sensitive data stored in Google Cloud databases to users, and yet still let them query meaningful data. Products used: Cloud Audit Logs, Cloud Data Loss Prevention, Cloud Key Management Service |
Genomic data processing reference architecture Describes reference architectures for using the Cloud Life Sciences API with other Google Cloud products to perform genomic data processing by using different methods and workflow engines. Specifically, this document focuses on the... Products used: Cloud Life Sciences, Cloud Storage, Compute Engine |
Geospatial analytics architecture Learn about Google Cloud geospatial capabilities and how you can use these capabilities in your geospatial analytics applications. Products used: BigQuery, Dataflow |
Import data from an external network into a secured BigQuery data warehouse Describes an architecture that you can use to help secure a data warehouse in a production environment, and provides best practices for importing data into BigQuery from an external network such as an on-premises environment. Products used: BigQuery |
Import data from Google Cloud into a secured BigQuery data warehouse Describes an architecture that you can use to help secure a data warehouse in a production environment, and provides best practices for data governance of a data warehouse in Google Cloud. Products used: BigQuery, Cloud Data Loss Prevention, Cloud Key Management Service, Dataflow |
Ingesting clinical and operational data with Cloud Data Fusion Explains to researchers, data scientists, and IT teams how Cloud Data Fusion can unlock data by ingesting, transforming, and storing the data in BigQuery, an aggregated data warehouse on Google Cloud. Healthcare organizations rely on... Products used: BigQuery, Cloud Data Fusion, Cloud Storage |
Jump Start Solution: Analytics lakehouse Helps you understand and deploy the Analytics lakehouse Jump Start Solution. |
Jump Start Solution: Data warehouse with BigQuery Demonstrates how you can build a data warehouse in Google Cloud using BigQuery as your data warehouse, with Looker Studio as a dashboard and visualization tool. |
Jump Start Solution: Large data sharing Go web app Demonstrates a Go app that can handle large quantities of files operations. |
Jump Start Solution: Large data sharing Java web app Demonstrates a Java app that can handle large quantities of files operations. |
Helps you plan, design, and implement the process of migrating your application and infrastructure workloads to Google Cloud, including computing, database, and storage workloads. Products used: App Engine, Cloud Build, Cloud Data Fusion, Cloud Deployment Manager, Cloud Functions, Cloud Run, Cloud Storage, Container Registry, Data Catalog, Dataflow, Direct Peering, Google Kubernetes Engine (GKE), Transfer Appliance |
Migrating On-Premises Hadoop Infrastructure to Google Cloud Guidance on moving on-premises Hadoop workloads to Google Cloud... Products used: BigQuery, Cloud Storage, Dataproc |
Optimizing large-scale ingestion of analytics events and logs Describes an architecture for optimizing large-scale analytics ingestion on Google Cloud, where 'large-scale' means greater than 100,000 events per second, or having a total aggregate event payload size of over 100 MB per second. Products used: BigQuery, Cloud Logging, Cloud Pub/Sub, Compute Engine, Dataflow |
Performing ETL from a relational database into BigQuery using Dataflow Demonstrates how to use Dataflow to extract, transform, and load (ETL) data from an online transaction processing (OLTP) relational database into BigQuery for analysis. Products used: BigQuery, Cloud Storage, Compute Engine, Dataflow |
Propensity modeling for gaming applications Learn how to use BigQuery ML to train, evaluate, and get predictions from several different types of propensity models. Propensity models can help you to determine the likelihood of specific users returning to your app, so you can use that... |
Security log analytics in Google Cloud Shows how to collect, export, and analyze logs from Google Cloud to help you audit usage and detect threats to your data and workloads. Use the included threat detection queries for BigQuery or Chronicle, or bring your own SIEM. Products used: BigQuery, Cloud Logging, Compute Engine, Looker Studio |
Set up a regulatory reporting architecture with BigQuery Shows you how to get started with a regulatory reporting solution for cloud and run a basic pipeline. Products used: BigQuery, Cloud Storage |
Smart API to predict customer propensity to purchase by using Apigee, BigQuery ML, and Cloud Spanner Create an API that can predict how likely a customer is to make a purchase. Products used: Apigee, App Sheet, BigQuery ML, Cloud Spanner |
Tracking provenance and lineage metadata for healthcare data Describes how to track provenance and lineage metadata for healthcare data in Google Cloud for researchers, data scientists, and IT teams. Provenance and lineage metadata can help healthcare organizations track where their clinical and... Products used: BigQuery, Cloud Data Fusion, Cloud Storage |
Transforming and harmonizing healthcare data for BigQuery Describes the processes and considerations involved in harmonizing healthcare data on Google Cloud for researchers, data scientists, and IT teams who want to create an analytics data lake in BigQuery. Clinical data can be inaccurate and... Products used: BigQuery, Cloud Data Fusion, Cloud Data Loss Prevention, Cloud Storage |
Use a CI/CD pipeline for data-processing workflows Describes how to set up a continuous integration/continuous deployment (CI/CD) pipeline for processing data by implementing CI/CD methods with managed products on Google Cloud. Products used: Cloud Build, Cloud Composer, Cloud Source Repositories, Cloud Storage, Compute Engine, Dataflow |
Shows how to use Apache Hive on Dataproc in an efficient and flexible way by storing Hive data in Cloud Storage and hosting the Hive metastore in a MySQL database on Cloud SQL. Products used: Cloud SQL, Cloud Storage, Dataproc |
Using Fivetran and ELT with BigQuery How your organization can benefit from replacing extract, transform, and load (ETL) with extract, load, and transform (ELT) by using Fivetran and BigQuery. It's intended for analysts, data scientists, and data engineers whose... Products used: BigQuery |