Smart analytics reference patterns

This page provides links to sample code and technical reference guides for common analytics use cases. Use these resources to learn, identify best practices, and leverage sample code to build the analytics features that you need.

The reference patterns listed here are code-oriented and meant to get you quickly to implementation. To see a broader range of analytics solutions, review the list of Big data technical reference guides.

Anomaly detection

Solution Description Products Links

Building a Telecom network anomaly detection application using k-means clustering

This solution shows you how to build an ML-based network anomaly detection application for telecom networks to identify cyber security threats by using Dataflow, BigQuery ML and Cloud Data Loss Prevention.

Technical reference guide: Building a secure anomaly detection solution using Dataflow, BigQuery ML, and Cloud Data Loss Prevention

Sample code: Anomaly Detection in Netflow logs

Blog post: Anomaly detection using streaming analytics and AI

Overview video: Building a Secure Anomaly Detection Solution

Finding anomalies in financial transactions in real time using BoostedTrees

Use this reference implementation to learn how to identify fraudulent transactions by using a TensorFlow boosted tree model with Dataflow and AI Platform.

Technical reference guide: Detecting anomalies in financial transactions by using AI Platform, Dataflow, and BigQuery

Sample code: Anomaly Detection in Financial Transactions

Finding anomalies in time series data by using an LSTM autoencoder

Use this reference implementation to learn how to pre-process time series data to fill gaps in the source data, then run the data through an LSTM autoencoder to identify anomalies. The autoencoder is built as a Keras model that implements an LSTM neural network.

Sample code: Processing time-series data

Real-time credit card fraud detection

Learn how to use transactions and customer data to train machine learning models in BigQuery ML that can be used in a real-time data pipeline to identify, analyze, and trigger alerts for potential credit card fraud.

Sample code: Real-time credit card fraud detection

Technical blog post: How to build a serverless real-time credit card fraud detection solution

Overview video: How to build a serverless real-time credit card fraud detection solution

Webinar: Credit card fraud detection

Relative strength modeling on time series for Capital Markets

This pattern is particularly relevant for Capital Markets customers and their quantitative analysis departments (Quants), to track their technical indicators in real-time to make investment decisions or track indexes. It is built on a foundation of time series anomaly detection, and can easily be applied to other industries like manufacturing, to detect anomalies in relevant time-series metrics.

Sample code: Dataflow Financial Services Time-Series Example

Business & Technical blog post: How to detect machine-learned anomalies in real-time foreign exchange data

Data monetization

Solution Description Products Links

Listing your data for sale in Google Cloud Marketplace using the Datashare Toolkit

Learn how to exchange and monetize historical and real-time market data securely and easily. This reference solution works for market data publishers, aggregators, and consumers alike.

Technical overview: Datashare Toolkit Readme

Sample code: Datashare Toolkit

Overview video: Datashare Overview

Deployment (Google Cloud account needed): Datashare VMs

General analytics

Solution Description Products Links

Building a real-time website analytics dashboard

Learn how to build a dashboard that provides real-time metrics you can use to understand the performance of incentives or experiments on your website.

Sample code: Realtime Analytics using Dataflow and Memorystore

Overview video: Level Up - Real-time analytics using Dataflow and Memorystore

Building a pipeline to transcribe and analyze speech files

Learn how to transcribe and analyze uploaded speech files, then save that data to BigQuery for use in visualizations.

Sample code: Speech Analysis Framework

Building an experience management data warehouse

Learn how to transform survey data into formats that can be used in a data warehouse and for deeper analytics. This pattern applies to customer experience, employee experience, and other experience-focused use cases.

Technical reference guide: Driving Insight from Forms With a Survey Data Warehouse

Sample code: Transforming and Loading Survey Data into BigQuery using Dataprep by Trifacta

Blog post: Creating an Experience Management (XM) Data Warehouse with Survey Responses

Overview video: Creating an Experience Management Data Warehouse with Survey Responses

Tutorial: Transform and Load Forms Survey Responses into BigQuery

Demo experience: Cloud Market Research

Captioning media clips in real time

Learn how to create real-time WebVTT captions for audio or video clips by using the Speech-to-Text API in a Dataflow pipeline.

Technical reference guide: Captioning media clips in real time by using Dataflow, Pub/Sub, and the Speech-to-Text API

Sample code: Automatic WebVTT Caption From Streaming Speech-to-Text API By Using Dataflow

Creating a Unified App Analytics Platform

Learn how to centralize your data sources in a data warehouse and dig deeper into customer behavior to make informed business decisions.

Technical reference guide: Creating a unified app analytics platform using Firebase, BigQuery, and Looker

Blog post: Creating a Unified Analytics Platform for Digital Natives

Overview video: Creating a unified app analytics platform

Sample code: Unified Application Analytics

Learn how to use the Google Trends Public Dataset from our Google Cloud Datasets to address common business challenges like identifying trends in your retail locations, anticipating product demand, and developing new marketing campaigns.

Blog post: Make Informed Decisions with Google Trends Data

Overview video: The Google Trends dataset is now in BigQuery

Sample code (notebook): Trends Example Notebook

Sample code (SQL): Google Trends Sample Queries

Sample dashboard: Top 25 Trending Google Search Terms

Understanding and optimizing your Google Cloud spend

Learn how to bring your Google Cloud Billing data into BigQuery to understand and optimize your spend and visualize actionable results in Looker or Data Studio.

Blog post: Optimizing your Google Cloud spend with BigQuery and Looker

Sample code: Google Cloud Billing Looker Block

Health care and life sciences

Solution Description Products Links

Running a single-cell genomics analysis

Learn how to configure Dataproc with Dask, RAPIDS, GPUs and JupyterLab, then execute a single-cell genomics analysis.

Technical overview: Running a genomics analysis with Dask, RAPIDS, and GPUs on Dataproc

Sample code: Notebook

Blog post: Single-cell genomic analysis accelerated by NVIDIA on Google Cloud

Log analytics

Solution Description Products Links

Building a pipeline to capture Dialogflow interactions

Learn how to build a pipeline to capture and store Dialogflow interactions for further analysis.

Sample code: Dialogflow log parser

Processing logs at scale using Dataflow

Learn to build analytical pipelines that process log entries from multiple sources, then combine the log data in ways that help you extract meaningful information.

Technical reference guide: Processing Logs at Scale Using Dataflow

Sample code: Processing Logs at Scale Using Dataflow

Pattern recognition

Solution Description Products Links

Detecting objects in video clips

This solution shows you how to build a real-time video clip analytics solution for object tracking by using Dataflow and the Video Intelligence API, allowing you to analyze large volumes of unstructured data in near real time.

Sample code: Video Analytics Solution Using Dataflow and the Video Intelligence API

Apache Beam Ptransform for calling Video Intelligence API: apache_beam.ml.gcp.videointelligenceml module

Processing user-generated content using the Video Intelligence API and the Cloud Vision API

This set of solutions describes the architecture for deploying a scalable system to filter image and video submissions by using Cloud Vision API and Video Intelligence API.

Architecture: Processing User-Generated Content Using the Video Intelligence API and the Cloud Vision API

Tutorial: Processing User-Generated Content Using the Video Intelligence API and the Cloud Vision API

Sample code: Processing User-generated Content Using the Video Intelligence API and the Cloud Vision API

Apache Beam Ptransform for calling Cloud Vision API: apache_beam.ml.gcp.visionml module

Anonymize (de-identify) and re-identify PII data in your smart analytics pipeline

This series of solutions shows you how to use Dataflow, Cloud Data Loss Prevention, BigQuery, and Pub/Sub to de-identify and re-identify personally identifiable information (PII) in a sample dataset.

Technical reference guides:

Sample code: Migrate Sensitive Data in BigQuery Using Dataflow and Cloud Data Loss Prevention

Predictive forecasting

Solution Description Products Links

Building a demand forecasting model

Learn how to build a time series model that you can use to forecast retail demand for multiple products.

Blog post: How to build demand forecasting models with BigQuery ML

Notebook: bqml_retail_demand_forecasting.ipynb

Building an e-commerce recommendation system

Learn how to build a recommendation system by using BigQuery ML to generate product or service recommendations from customer data in BigQuery. Then, learn how to make that data available to other production systems by exporting it to Google Analytics 360 or Cloud Storage, or programmatically reading it from the BigQuery table.

Technical reference guide: Building an e-commerce recommendation system by using BigQuery ML

Notebook: bqml_retail_recommendation_system.ipynb

Building a k-means clustering model for market segmentation

Learn how to segment Google Analytics 360 audience data for marketing purposes by creating k-means clusters with BigQuery ML.

Technical reference guide: Building a k-means clustering model for market segmentation by using BigQuery ML

Notebook: How to build k-means clustering models for market segmentation using BigQuery ML

Building a propensity model for financial services on Google Cloud

This solution shows how to explore data and build a scikit-learn machine learning (ML) model on Google Cloud. The use case for this solution is a predictive, propensity-to-buy model for financial services. Propensity models are widely used in the financial industry to analyze a prospective customer's inclination to make a purchase, but the best practices described in this solution can be applied to a broad range of ML use cases.

Technical reference guide: Building a propensity model for financial services on Google Cloud

Sample code: Professional Services

Building a propensity to purchase solution

Learn how to build and deploy a propensity to purchase model, use it to get predictions about customer purchasing behavior, and then build a pipeline to automate the workflow.

Technical reference guide: Predicting customer propensity to buy by using BigQuery ML and AI Platform

Sample code: How to build an end-to-end propensity to purchase solution using BigQuery ML and Kubeflow Pipelines

Blog post: How to build an end-to-end propensity to purchase solution using BigQuery ML and Kubeflow Pipelines

Building new audiences based on current customer lifetime value

Learn how to identify your most valuable current customers and then use them to develop similar audiences in Google Ads.

Technical reference guide: Building new audiences based on existing customer lifetime value

Sample code: Activate on LTV predictions

Building a time series demand forecasting model

Learn to build an end-to-end solution for forecasting demand for retail products. Use historical sales data to train a demand forecasting model using BigQuery ML, and then visualize the forecasted values in a dashboard.

Sample code: How to build a time series demand forecasting model using BigQuery ML

Creating and serving embeddings for near real-time recommendations

Learn how to create and serve embeddings to make real-time similar item recommendations. Use BigQuery ML to create a matrix factorization model to predict the embeddings and the open-source ScaNN framework to build a nearest neighbour index, then deploy the model to AI Platform Prediction for real-time similar items matching.

Technical reference guide: Architecture of a machine learning system for item matching

Sample code: Real-time Item-to-item Recommendation BigQuery ML Matrix Factorization and ScaNN

Forecasting from Sheets using BigQuery ML

Learn how to operationalize machine learning with your business processes by combining Connected Sheets with a forecasting model in BigQuery ML. In this specific example, we'll walk through the process for building a forecasting model for website traffic using Google Analytics data. This pattern can be extended to work with other data types and other machine learning models.

Blog post: How to use a machine learning model from Sheets using BigQuery ML

Sample code: BigQuery ML Forecasting with Sheets

Template: BigQuery ML Forecasting with Sheets

Predict mechanical failures using a vision analytics pipeline

This solution guides you through building a Dataflow pipeline to derive insights from large-scale image files stored in a Cloud Storage bucket. Automated visual inspection can help meet manufacturing goals, such as improving quality control processes or monitoring worker safety, while reducing costs.

Sample code: Vision Analytics Solution Using Dataflow and Cloud Vision API

Predicting customer lifetime value

This series shows you how to predict customer lifetime value (CLV) by using AI Platform and BigQuery.

Technical reference guides:

Sample code: Customer Lifetime Value Prediction on Google Cloud

Propensity modeling for gaming applications

Learn how to use BigQuery ML to train, evaluate, and get predictions from several different types of propensity models. Propensity models can help you to determine the likelihood of specific users returning to your app, so you can use that information in marketing decisions.

Blog post: Churn prediction for game developers using Google Analytics 4 and BigQuery ML

Notebook: Churn prediction for game developers using Google Analytics 4 and BigQuery ML

Technical overview: Propensity modeling for gaming applications

Real-time clickstream analytics

Solution Description Products Links

E-commerce sample application using streaming analytics and real-time AI

The e-commerce sample application illustrates common use cases and best practices for implementing streaming data analytics and real-time AI. Use it to learn how to dynamically respond to customer actions by analyzing and responding to events in real time, and also how to store, analyze and visualize that event data for longer-term insights.

Technical overview: E-commerce sample application using streaming analytics and real-time AI

Sample code: E-commerce sample application for Java

Interactive demo: Explore Google's Stream Analytics

Overview video: Activate real-time web experiences with Stream Analytics

Time series analytics

Solution Description Products Links

Processing streaming time series data

Learn about the key challenges around processing streaming time series data when using Apache Beam, and then see how the Timeseries Streaming solution addresses these challenges.

Technical overview: Processing streaming time series data: overview

Tutorial: Processing streaming time series data: tutorial

Sample code: Timeseries Streaming

Working with data lakes

Solution Description Products Links

Building CI/CD pipelines for a data lake's serverless data processing services

Learn how to set up continuous integration and continuous delivery (CI/CD) for a data lake’s data processing pipelines. Implement CI/CD methods with Terraform, GitHub, and Cloud Build, using the popular GitOps methodology.

Technical overview: Building CI/CD pipelines for a data lake's serverless data processing services