Data analytics design patterns

Last reviewed 2023-02-06 UTC

This page provides links to business use cases, sample code, and technical reference guides for industry data analytics use cases. Use these resources to learn, identify best practices to accelerate the implementation of your workloads.

The design patterns listed here are code-oriented use cases and meant to get you quickly to implementation. To see a broader range of analytics solutions, review the list of Data Analytics technical reference guides.

Anomaly detection

Solution	Description	Products	Links
Finding anomalies in time series data by using an LSTM autoencoder	Use this reference implementation to learn how to pre-process time series data to fill gaps in the source data, then run the data through an LSTM autoencoder to identify anomalies. The autoencoder is built as a Keras model that implements an LSTM neural network.	BigQuery Dataflow Pub/Sub	Sample code: Processing time-series data
Real-time credit card fraud detection	Learn how to use transactions and customer data to train machine learning models in BigQuery ML that can be used in a real-time data pipeline to identify, analyze, and trigger alerts for potential credit card fraud.	Vertex AI BigQuery Looker Studio Dataflow Firestore Pub/Sub	Sample code: Real-time credit card fraud detection Overview video: Fraudfinder: A comprehensive solution for real data science problems
Relative strength modeling on time series for Capital Markets	This pattern is particularly relevant for Capital Markets customers and their quantitative analysis departments (Quants), to track their technical indicators in real-time to make investment decisions or track indexes. It is built on a foundation of time series anomaly detection, and can easily be applied to other industries like manufacturing, to detect anomalies in relevant time-series metrics.	Vertex AI BigQuery Dataflow Pub/Sub	Sample code: Dataflow Financial Services Time-Series Example Business & Technical blog post: How to detect machine-learned anomalies in real-time foreign exchange data

Solution

Description

Products

Links

Finding anomalies in time series data by using an LSTM autoencoder

Use this reference implementation to learn how to pre-process time series data to fill gaps in the source data, then run the data through an LSTM autoencoder to identify anomalies. The autoencoder is built as a Keras model that implements an LSTM neural network.

Sample code: Processing time-series data

Real-time credit card fraud detection

Learn how to use transactions and customer data to train machine learning models in BigQuery ML that can be used in a real-time data pipeline to identify, analyze, and trigger alerts for potential credit card fraud.

Sample code: Real-time credit card fraud detection

Overview video: Fraudfinder: A comprehensive solution for real data science problems

Relative strength modeling on time series for Capital Markets

This pattern is particularly relevant for Capital Markets customers and their quantitative analysis departments (Quants), to track their technical indicators in real-time to make investment decisions or track indexes. It is built on a foundation of time series anomaly detection, and can easily be applied to other industries like manufacturing, to detect anomalies in relevant time-series metrics.

Sample code: Dataflow Financial Services Time-Series Example

Business & Technical blog post: How to detect machine-learned anomalies in real-time foreign exchange data

Environmental, social, and governance

Solution	Description	Products	Links
Calculating physical climate risk for sustainable finance	Introducing a climate risk analytics design pattern for lending and investment portfolios using cloud-native tools and granular geospatial datasets.	BigQuery Cloud Run Looker Studio	Technical overview: Portfolio climate risk analytics Bitbucket repository Overview video: Leveraging Independent ESG Data Insights Blog post: Quantifying portfolio climate risk for sustainable investing with geospatial analytics

Solution

Description

Products

Links

Calculating physical climate risk for sustainable finance

Introducing a climate risk analytics design pattern for lending and investment portfolios using cloud-native tools and granular geospatial datasets.

Technical overview: Portfolio climate risk analytics Bitbucket repository

Overview video: Leveraging Independent ESG Data Insights

Blog post: Quantifying portfolio climate risk for sustainable investing with geospatial analytics

General analytics

Solution	Description	Products	Links
Building a real-time website analytics dashboard	Learn how to build a dashboard that provides real-time metrics you can use to understand the performance of incentives or experiments on your website.	Memorystore Compute Engine Dataflow Pub/Sub	Sample code: Realtime Analytics using Dataflow and Memorystore Overview video: Level Up - Real-time analytics using Dataflow and Memorystore
Building a pipeline to transcribe and analyze speech files	Learn how to transcribe and analyze uploaded speech files, then save that data to BigQuery for use in visualizations.	BigQuery Cloud Functions Cloud Natural Language API Cloud Storage Dataflow Pub/Sub Speech-to-Text	Sample code: Speech Analysis Framework
Analyze unstructured data in object stores	Learn how to analyze unstructured data in Cloud Storage, enabling analysis with remote functions like Vertex AI Vision on images. Learn how to perform inference on unstructured data using BigQuery ML.	BigQuery Cloud Storage BigLake Vertex AI Vision BigQuery ML	Technical reference guide: Introduction to object tables Tutorial: Analyze an object table by using a remote function and Cloud Vision API Tutorial: Run inference on image object tables by using TensorFlow and BigQuery ML
Analyze unstructured document files in a data warehouse	Learn how to use BigLake object tables and remote functions to parse unstructured documents with Document AI and save the output as structured data in BigQuery.	BigQuery Cloud Storage Document AI BigLake	Sample code: Unstructured document analysis in SQL
Building an experience management data warehouse	Learn how to transform survey data into formats that can be used in a data warehouse and for deeper analytics. This pattern applies to customer experience, employee experience, and other experience-focused use cases.	BigQuery Dataprep by Trifacta Google Forms Looker Studio Looker	Technical reference guide: Driving Insight from Google Forms With a Survey Data Warehouse Sample code: Transforming and Loading Survey Data into BigQuery using Dataprep by Trifacta Blog post: Creating an Experience Management (XM) Data Warehouse with Survey Responses Overview video: Creating an Experience Management Data Warehouse with Survey Responses Tutorial: Transform and Load Google Forms Survey Responses into BigQuery
Use Google Trends data for common business needs	Learn how to use the Google Trends Public Dataset from our Google Cloud Datasets to address common business challenges like identifying trends in your retail locations, anticipating product demand, and developing new marketing campaigns.	BigQuery Cloud Functions Analytics Hub Looker	Blog post: Make Informed Decisions with Google Trends Data Overview video: The Google Trends dataset is now in BigQuery Sample code (notebook): Trends Example Notebook Sample code (SQL): Google Trends Sample Queries Sample dashboard: Top 25 Trending Google Search Terms
Understanding and optimizing your Google Cloud spend	Learn how to bring your Google Cloud Billing data into BigQuery to understand and optimize your spend and visualize actionable results in Looker or Looker Studio.	BigQuery Cloud Billing Looker Looker Studio	Blog post: Optimizing your Google Cloud spend with BigQuery and Looker Sample code: Google Cloud Billing Looker Block
Data Driven Price Optimization	Learn how to to react rapidly to market changes to remain competitive, with faster price optimization customers can offer competitive prices to their end users using Google Cloud services, thus increasing sales and their bottom line. This solution uses Dataprep by Trifacta to integrate and standarize data sources, BigQuery to manage and store your pricing models and visualize actionable results in Looker.	BigQuery Dataprep by Trifacta Looker	Blog post: Data Driven Price Optimization Tutorial: Optimizing the price of retail products Sample code: Google Cloud Billing Looker Block

Health care and life sciences

Solution	Description	Products	Links
Running a single-cell genomics analysis	Learn how to configure Dataproc with Dask, RAPIDS, GPUs and JupyterLab, then execute a single-cell genomics analysis.	Dataproc Cloud Storage	Technical overview: Running a genomics analysis with Dask, RAPIDS, and GPUs on Dataproc Sample code: Notebook Blog post: Single-cell genomic analysis accelerated by NVIDIA on Google Cloud

Solution

Description

Products

Links

Running a single-cell genomics analysis

Learn how to configure Dataproc with Dask, RAPIDS, GPUs and JupyterLab, then execute a single-cell genomics analysis.

Technical overview: Running a genomics analysis with Dask, RAPIDS, and GPUs on Dataproc

Sample code: Notebook

Blog post: Single-cell genomic analysis accelerated by NVIDIA on Google Cloud

Log analytics

Solution	Description	Products	Links
Building a pipeline to capture Dialogflow interactions	Learn how to build a pipeline to capture and store Dialogflow interactions for further analysis.	BigQuery Cloud Logging Dataflow Pub/Sub	Sample code: Dialogflow log parser

Pattern recognition

Solution Description Products Links

Solution	Description	Products	Links
Detecting objects in video clips	This solution shows you how to build a real-time video clip analytics solution for object tracking by using Dataflow and the Video Intelligence API, allowing you to analyze large volumes of unstructured data in near real time.	BigQuery Cloud Build Cloud Storage Dataflow Pub/Sub Video Intelligence API	Sample code: Video Analytics Solution Using Dataflow and the Video Intelligence API Apache Beam `Ptransform` for calling Video Intelligence API: apache_beam.ml.gcp.videointelligenceml module
Anonymize (de-identify) and re-identify PII data in your smart analytics pipeline	This series of solutions shows you how to use Dataflow, Sensitive Data Protection, BigQuery, and Pub/Sub to de-identify and re-identify personally identifiable information (PII) in a sample dataset.	BigQuery Cloud Build Sensitive Data Protection Cloud Key Management Service Cloud Storage Dataflow Pub/Sub	Technical reference guides: De-identification and re-identification of PII in large-scale datasets using Sensitive Data Protection Sample code: Migrate Sensitive Data in BigQuery Using Dataflow and Cloud Data Loss Prevention

Detecting objects in video clips

This solution shows you how to build a real-time video clip analytics solution for object tracking by using Dataflow and the Video Intelligence API, allowing you to analyze large volumes of unstructured data in near real time.

Sample code: Video Analytics Solution Using Dataflow and the Video Intelligence API

Apache Beam Ptransform for calling Video Intelligence API: apache_beam.ml.gcp.videointelligenceml module

Anonymize (de-identify) and re-identify PII data in your smart analytics pipeline

This series of solutions shows you how to use Dataflow, Sensitive Data Protection, BigQuery, and Pub/Sub to de-identify and re-identify personally identifiable information (PII) in a sample dataset.

Technical reference guides:

De-identification and re-identification of PII in large-scale datasets using Sensitive Data Protection

Sample code: Migrate Sensitive Data in BigQuery Using Dataflow and Cloud Data Loss Prevention

Predictive forecasting

Solution	Description	Products	Links
Building a demand forecasting model	Learn how to build a time series model that you can use to forecast retail demand for multiple products.	BigQuery	Blog post: How to build demand forecasting models with BigQuery ML Notebook: bqml_retail_demand_forecasting.ipynb
Building a forecasting web app	Learn how to build a web app that leverages multiple forecasting models, including BigQuery and Vertex AI forecasting, to predict product sales. Nontechnical users can use this web app to produce forecasts and explore the effects of different parameters.	BigQuery Vertex AI forecasting	Sample code: Time-series forecasting Sample web app: Time-series forecasting live demo
Building new audiences based on current customer lifetime value	Learn how to identify your most valuable current customers and then use them to develop similar audiences in Google Ads.	BigQuery Google Ads	Technical reference guide: Building new audiences based on existing customer lifetime value Sample code: Activate on LTV predictions
Forecasting from Google Sheets using BigQuery ML	Learn how to operationalize machine learning with your business processes by combining Connected Sheets with a forecasting model in BigQuery ML. In this specific example, we'll walk through the process for building a forecasting model for website traffic using Google Analytics data. This pattern can be extended to work with other data types and other machine learning models.	BigQuery Google Sheets	Blog post: How to use a machine learning model from Google Sheets using BigQuery ML Sample code: BigQuery ML Forecasting with Sheets Template: BigQuery ML Forecasting with Sheets
Propensity modeling for gaming applications	Learn how to use BigQuery ML to train, evaluate, and get predictions from several different types of propensity models. Propensity models can help you to determine the likelihood of specific users returning to your app, so you can use that information in marketing decisions.	BigQuery Google Analytics 4	Blog post: Churn prediction for game developers using Google Analytics 4 and BigQuery ML Notebook: Churn prediction for game developers using Google Analytics 4 and BigQuery ML Technical overview: Propensity modeling for gaming applications
Recommending personalized investment products	Learn how to to provide personalized investment recommendations, by ingesting, processing, and enhancing market data from public APIs using Cloud Functions, loading data in BigQuery with Dataflow, and then training and deploying multiple AutoML Tables models with Vertex AI, orchestrating these pipelines with Cloud Composer and finally deploying a basic web frontend to recommend investments to users.	Cloud Functions Vertex AI BigQuery Dataflow Cloud Composer	Blog post: Empowering consumer finance apps with highly personalized investment recommendations using Vertex AI Technical reference guide: A technical solution producing highly-personalized investment recommendations using ML Sample code: FSI design pattern Investment Products Recommendation Engine (IPRE)

Working with data lakes

Solution	Description	Products	Links
Building CI/CD pipelines for a data lake's serverless data processing services	Learn how to set up continuous integration and continuous delivery (CI/CD) for a data lake’s data processing pipelines. Implement CI/CD methods with Terraform, GitHub, and Cloud Build, using the popular GitOps methodology.	BigQuery Cloud Storage Dataflow	Technical overview: Building CI/CD pipelines for a data lake's serverless data processing services
Fine-grained access control for data stored in an object store	Learn how to use BigLake to apply fine-grained permissions (row and column level security) on files stored in an object store. Demonstrate that such security extends to other services, such as Spark run on Dataproc.	BigQuery Cloud Storage Dataproc BigLake	Sample code: Fine-grained access control on BigLake with Spark

Data analytics design patterns

Anomaly detection

Finding anomalies in time series data by using an LSTM autoencoder

Real-time credit card fraud detection

Relative strength modeling on time series for Capital Markets

Environmental, social, and governance

Calculating physical climate risk for sustainable finance

General analytics

Building a real-time website analytics dashboard

Building a pipeline to transcribe and analyze speech files

Analyze unstructured data in object stores

Analyze unstructured document files in a data warehouse

Building an experience management data warehouse

Use Google Trends data for common business needs

Understanding and optimizing your Google Cloud spend

Data Driven Price Optimization

Health care and life sciences

Running a single-cell genomics analysis

Log analytics

Building a pipeline to capture Dialogflow interactions

Pattern recognition

Detecting objects in video clips

Anonymize (de-identify) and re-identify PII data in your smart analytics pipeline

Predictive forecasting

Building a demand forecasting model

Building a forecasting web app

Building new audiences based on current customer lifetime value

Forecasting from Google Sheets using BigQuery ML

Propensity modeling for gaming applications

Recommending personalized investment products

Working with data lakes

Building CI/CD pipelines for a data lake's serverless data processing services

Fine-grained access control for data stored in an object store