Dataflow ML notebooks

Explore Dataflow ML notebooks to integrate machine learning into your Apache Beam pipelines. These notebooks provide practical examples and guidance for common machine learning workflows.

Use these resources to:

Preprocess data for ML models: This includes tasks like scaling data, computing vocabularies, and using MLTransform for data preparation.
Run inference with various models and frameworks: Use the RunInference transform with PyTorch, TensorFlow, scikit-learn, Hugging Face models, Gemma models, and Vertex AI, including on GPUs with vLLM.
Generate and manage embeddings: Create text embeddings using Vertex AI or Hugging Face, and ingest them into databases like AlloyDB and BigQuery for vector search.
Implement advanced ML pipeline patterns: This includes automatic model refresh in running pipelines, using multiple models, building ensemble models, and data enrichment using BigQuery, Bigtable, and Vertex AI Feature Store.
Apply ML to specific use cases: Examples include anomaly detection as well as sentiment analysis and summarization with Gemma.

All tutorials

Filter by:

Core Dataflow & MLTransform Concepts	Preprocessing with the Apache Beam DataFrames API Demonstrates the use of the Apache Beam DataFrames API to perform common data exploration and preprocessing steps.	View Notebook
Core Dataflow & MLTransform Concepts	Preprocess data with MLTransform A basic introduction to using MLTransform for preprocessing data for machine learning workflows.	View Notebook
Data enrichment & Embedding	Vector Embedding Ingestion with Apache Beam and AlloyDB Demonstrates how to generate embeddings from data and ingest them into AlloyDB using Apache Beam and Dataflow for scalable data processing.	View Notebook
Data enrichment & Embedding	Use Apache Beam and BigQuery to enrich data Shows how to enrich data by using the Apache Beam enrichment transform with BigQuery.	View Notebook
Data enrichment & Embedding	Embedding Ingestion and Vector Search with Apache Beam and BigQuery Demonstrates how to use the Apache Beam RAG package to generate embeddings, ingest them into BigQuery, and perform vector similarity search.	View Notebook
Data enrichment & Embedding	Use Apache Beam and Bigtable to enrich data Shows how to enrich data by using the Apache Beam enrichment transform with Bigtable.	View Notebook
Data enrichment & Embedding	Generate text embeddings by using Hugging Face Hub models Uses MLTransform to generate embeddings from text data using Hugging Face's SentenceTransformers framework.	View Notebook
Data enrichment & Embedding	Use Apache Beam and Vertex AI Feature Store to enrich data Shows how to enrich data by using the Apache Beam enrichment transform with Vertex AI Feature Store.	View Notebook
Data enrichment & Embedding	Generate text embeddings by using the Vertex AI API Uses the Vertex AI text-embeddings API to generate text embeddings that use Google’s large generative artificial intelligence (AI) models.	View Notebook
Model training & Data processing	Update ML models in running pipelines Demonstrates how to perform automatic model updates without stopping your Apache Beam pipeline by using side inputs.	View Notebook
Model training & Data processing	Compute and apply vocabulary on a dataset Shows how to use MLTransform to generate a vocabulary on input text and assign an index value to each token.	View Notebook
Model training & Data processing	Run ML inference with multiple differently-trained models Demonstrates how to use a KeyedModelHandler to run inference in an Apache Beam pipeline with multiple different models on a per-key basis.	View Notebook
Model training & Data processing	Use MLTransform to scale data Shows how to use MLTransform to scale data, an important preprocessing step for training machine learning (ML) models.	View Notebook
Model training & Data processing	TensorFlow Model Analysis in Beam Shows how you can use TFMA to investigate and visualize the performance of a model as part of your Apache Beam pipeline by creating and comparing two models.	View Notebook
Run inference	Remote inference in Apache Beam Demonstrates how to implement a custom inference call in Apache Beam by using the Google Cloud Vision API.	View Notebook
Run inference	Bring your own ML model to Beam RunInference Illustrates how to use the spaCy package to load a machine learning (ML) model and perform inference in an Apache Beam pipeline using the RunInference PTransform.	View Notebook
Run inference	Run inference with a Gemma open model Demonstrates how to load the preconfigured Gemma 2B model and then use it in an Apache Beam inference pipeline.	View Notebook
Run inference	Use RunInference for Generative AI Shows how to use the Apache Beam RunInference transform for generative AI tasks with a large language model (LLM) from the Hugging Face Model Hub.	View Notebook
Run inference	Apache Beam RunInference with Hugging Face Shows how to use models from Hugging Face and Hugging Face pipeline in Apache Beam pipelines that uses the RunInference transform.	View Notebook
Run inference	Ensemble model using an image captioning and ranking example Shows how to implement a cascade model in Apache Beam using the RunInference API for image captioning.	View Notebook
Run inference	Apache Beam RunInference for PyTorch Demonstrates the use of the RunInference transform for PyTorch.	View Notebook
Run inference	Use RunInference in Apache Beam Demonstrates how to use the RunInference API with three popular ML frameworks: PyTorch, TensorFlow, and scikit-learn.	View Notebook
Run inference	Apache Beam RunInference for scikit-learn Demonstrates the use of the RunInference transform for scikit-learn.	View Notebook
Run inference	Apache Beam RunInference with TensorFlow Shows how to use the Apache Beam RunInference transform for TensorFlow.	View Notebook
Run inference	Use RunInference with TFX Basic Shared Libraries Demonstrates how to use the Apache Beam RunInference transform with TensorFlow and TFX Basic Shared Libraries (tfx-bsl).	View Notebook
Run inference	Apache Beam RunInference with TensorFlow and TensorFlow Hub Shows how to use the Apache Beam RunInference transform for TensorFlow with a trained model from TensorFlow Hub.	View Notebook
Run inference	Apache Beam RunInference with Vertex AI Shows how to use the Apache Beam RunInference transform for image classification with Vertex AI.	View Notebook
Run inference	Run ML inference by using vLLM on GPUs Demonstrates how to run machine learning inference by using vLLM and GPUs.	View Notebook
Run inference	Use TPUs in Dataflow Demonstrates how to configure and execute two distinct Dataflow pipelines that leverage Tensor Processing Units (TPUs). The first pipeline performs a simple computation to confirm TPU access, while the second, more complex pipeline runs inference with the Gemma-3-27b-it model.	View Notebook
Specialized use cases	Anomaly Detection on Batch and Streaming Data using Apache Beam (Z-Score Method) Shows how to perform anomaly detection on both batch and streaming data using the AnomalyDetection PTransform with the Z-Score algorithm.	View Notebook
Specialized use cases	Use Gemma to gauge sentiment and summarize conversations Demonstrates how to use Gemma to gauge the sentiment of a conversation, summarize the conversation's content, and draft a reply.	View Notebook