Dataflow ML notebooks

Explore Dataflow ML notebooks to integrate machine learning into your Apache Beam pipelines. These notebooks provide practical examples and guidance for common machine learning workflows.

Use these resources to:

  • Preprocess data for ML models: This includes tasks like scaling data, computing vocabularies, and using MLTransform for data preparation.
  • Run inference with various models and frameworks: Use the RunInference transform with PyTorch, TensorFlow, scikit-learn, Hugging Face models, Gemma models, and Vertex AI, including on GPUs with vLLM.
  • Generate and manage embeddings: Create text embeddings using Vertex AI or Hugging Face, and ingest them into databases like AlloyDB and BigQuery for vector search.
  • Implement advanced ML pipeline patterns: This includes automatic model refresh in running pipelines, using multiple models, building ensemble models, and data enrichment using BigQuery, Bigtable, and Vertex AI Feature Store.
  • Apply ML to specific use cases: Examples include anomaly detection as well as sentiment analysis and summarization with Gemma.

All tutorials

Filter by:
Core Dataflow & MLTransform Concepts

Preprocessing with the Apache Beam DataFrames API

Demonstrates the use of the Apache Beam DataFrames API to perform common data exploration and preprocessing steps.

View Notebook
Core Dataflow & MLTransform Concepts

Preprocess data with MLTransform

A basic introduction to using MLTransform for preprocessing data for machine learning workflows.

View Notebook
Data enrichment & Embedding

Vector Embedding Ingestion with Apache Beam and AlloyDB

Demonstrates how to generate embeddings from data and ingest them into AlloyDB using Apache Beam and Dataflow for scalable data processing.

View Notebook
Data enrichment & Embedding

Use Apache Beam and BigQuery to enrich data

Shows how to enrich data by using the Apache Beam enrichment transform with BigQuery.

View Notebook
Data enrichment & Embedding

Embedding Ingestion and Vector Search with Apache Beam and BigQuery

Demonstrates how to use the Apache Beam RAG package to generate embeddings, ingest them into BigQuery, and perform vector similarity search.

View Notebook
Data enrichment & Embedding

Use Apache Beam and Bigtable to enrich data

Shows how to enrich data by using the Apache Beam enrichment transform with Bigtable.

View Notebook
Data enrichment & Embedding

Generate text embeddings by using Hugging Face Hub models

Uses MLTransform to generate embeddings from text data using Hugging Face's SentenceTransformers framework.

View Notebook
Data enrichment & Embedding

Use Apache Beam and Vertex AI Feature Store to enrich data

Shows how to enrich data by using the Apache Beam enrichment transform with Vertex AI Feature Store.

View Notebook
Data enrichment & Embedding

Generate text embeddings by using the Vertex AI API

Uses the Vertex AI text-embeddings API to generate text embeddings that use Google’s large generative artificial intelligence (AI) models.

View Notebook
Model training & Data processing

Update ML models in running pipelines

Demonstrates how to perform automatic model updates without stopping your Apache Beam pipeline by using side inputs.

View Notebook
Model training & Data processing

Compute and apply vocabulary on a dataset

Shows how to use MLTransform to generate a vocabulary on input text and assign an index value to each token.

View Notebook
Model training & Data processing

Run ML inference with multiple differently-trained models

Demonstrates how to use a KeyedModelHandler to run inference in an Apache Beam pipeline with multiple different models on a per-key basis.

View Notebook
Model training & Data processing

Use MLTransform to scale data

Shows how to use MLTransform to scale data, an important preprocessing step for training machine learning (ML) models.

View Notebook
Model training & Data processing

TensorFlow Model Analysis in Beam

Shows how you can use TFMA to investigate and visualize the performance of a model as part of your Apache Beam pipeline by creating and comparing two models.

View Notebook
Run inference

Remote inference in Apache Beam

Demonstrates how to implement a custom inference call in Apache Beam by using the Google Cloud Vision API.

View Notebook
Run inference

Bring your own ML model to Beam RunInference

Illustrates how to use the spaCy package to load a machine learning (ML) model and perform inference in an Apache Beam pipeline using the RunInference PTransform.

View Notebook
Run inference

Run inference with a Gemma open model

Demonstrates how to load the preconfigured Gemma 2B model and then use it in an Apache Beam inference pipeline.

View Notebook
Run inference

Use RunInference for Generative AI

Shows how to use the Apache Beam RunInference transform for generative AI tasks with a large language model (LLM) from the Hugging Face Model Hub.

View Notebook
Run inference

Apache Beam RunInference with Hugging Face

Shows how to use models from Hugging Face and Hugging Face pipeline in Apache Beam pipelines that uses the RunInference transform.

View Notebook
Run inference

Ensemble model using an image captioning and ranking example

Shows how to implement a cascade model in Apache Beam using the RunInference API for image captioning.

View Notebook
Run inference

Apache Beam RunInference for PyTorch

Demonstrates the use of the RunInference transform for PyTorch.

View Notebook
Run inference

Use RunInference in Apache Beam

Demonstrates how to use the RunInference API with three popular ML frameworks: PyTorch, TensorFlow, and scikit-learn.

View Notebook
Run inference

Apache Beam RunInference for scikit-learn

Demonstrates the use of the RunInference transform for scikit-learn.

View Notebook
Run inference

Apache Beam RunInference with TensorFlow

Shows how to use the Apache Beam RunInference transform for TensorFlow.

View Notebook
Run inference

Use RunInference with TFX Basic Shared Libraries

Demonstrates how to use the Apache Beam RunInference transform with TensorFlow and TFX Basic Shared Libraries (tfx-bsl).

View Notebook
Run inference

Apache Beam RunInference with TensorFlow and TensorFlow Hub

Shows how to use the Apache Beam RunInference transform for TensorFlow with a trained model from TensorFlow Hub.

View Notebook
Run inference

Apache Beam RunInference with Vertex AI

Shows how to use the Apache Beam RunInference transform for image classification with Vertex AI.

View Notebook
Run inference

Run ML inference by using vLLM on GPUs

Demonstrates how to run machine learning inference by using vLLM and GPUs.

View Notebook
Run inference

Use TPUs in Dataflow

Demonstrates how to configure and execute two distinct Dataflow pipelines that leverage Tensor Processing Units (TPUs). The first pipeline performs a simple computation to confirm TPU access, while the second, more complex pipeline runs inference with the Gemma-3-27b-it model.

View Notebook
Specialized use cases

Anomaly Detection on Batch and Streaming Data using Apache Beam (Z-Score Method)

Shows how to perform anomaly detection on both batch and streaming data using the AnomalyDetection PTransform with the Z-Score algorithm.

View Notebook
Specialized use cases

Use Gemma to gauge sentiment and summarize conversations

Demonstrates how to use Gemma to gauge the sentiment of a conversation, summarize the conversation's content, and draft a reply.

View Notebook