Stay organized with collections Save and categorize content based on your preferences.

Dataflow ML

Dataflow ML allows you to use Dataflow to deploy and manage end-to-end machine learning (ML) pipelines. Use ML models to do local and remote inference with batch and streaming pipelines. Use data processing tools to prepare your data for model training and to process the results of the models.
with pipeline as p:
  predictions = (
      | beam.ReadFromSource('a_source')
      | RunInference(MODEL_HANDLER))
Using RunInference is as straightforward as adding the transform code to your pipeline. In this example, MODEL_HANDLER is the model configuration object.
Whether you want to classify images in real-time, run remote inference calls, or build a custom model handler, you can find end-to-end Dataflow ML examples.
Data processing for ML training is often used in the context of ML-OPS frameworks. These frameworks orchestrate ML workflows, which contain various pre- and post-processing steps. Read more about how Dataflow integrates with these systems.

Prediction and inference use cases

Use a pre-trained model with Pytorch.
Use a pre-trained model with scikit-learn.
Use a pre-trained model with Tensorflow.
Vertex AI supports both online and batch prediction and has built-in integration with other ML workflows, like model registry, pipelines, model monitoring, and Explainable AI. Run inference using a model deployed on Vertex AI by using custom inference calls.
Explore all the available prediction and inference use cases.

Dataflow ML in ML workflows

Vertex AI Pipelines helps you to automate, monitor, and govern your ML systems by orchestrating your ML workflows in a serverless manner. Use Vertex AI Pipelines to orchestrate workflow DAGs defined by either TFX or KFP and to automatically track your ML artifacts using Vertex ML Metadata.
TensorFlow Extended (TFX) lets you deploy end-to-end ML pipelines by using an orchestration framework that has a built-in integration with Apache Beam and the Dataflow runner.
Kubeflow makes deployments of ML workflows on Kubernetes simple, portable, and scalable. Kubeflow Pipelines are reusable end-to-end ML workflows built using the Kubeflow Pipelines SDK.


Using GPUs in Dataflow jobs can accelerate image processing and machine learning processing tasks.
To run the Dataflow ML examples, you might need to configure your Google Cloud permissions. Read a detailed guide about the required permissions for Dataflow pipelines.
All of the examples and the corresponding source code are available on GitHub. In GitHub, you can also find instructions for running the examples in Colab.