Creating a machine learning pipeline

Before you can run your machine learning (ML) process on AI Platform Pipelines, you must first define your process as a pipeline. You can orchestrate your ML process as a pipeline using TensorFlow Extended (TFX) or the Kubeflow Pipelines SDK.

This document provides guidance for choosing the best option for building your pipeline, and resources for getting started.

Building pipelines using the TFX SDK

TFX is an open source project that you can use to define your ML workflow as a pipeline. Currently, TFX components can only train TensorFlow based models. TFX provides components that you can use to ingest and transform data, train and evaluate a model, deploy a trained model for inference, etc. By using the TFX SDK, you can compose a pipeline for your ML process from TFX components.

To get started building pipelines with TFX pipeline templates:

Building pipelines using the Kubeflow Pipelines SDK

The Kubeflow Pipelines SDK is an open source SDK that you can use to build complex custom ML pipelines based on containers. You can reuse pre-built components or build custom pipeline components using the Kubeflow Pipelines SDK. At a high level, you build components and pipelines by:

  1. Developing the code for each step in your workflow using your preferred language and tools
  2. Creating a Docker container image for each step's code
  3. Using Python to define your pipeline using the Kubeflow Pipelines SDK

To get started building pipelines with the Kubeflow Pipelines SDK:

What's next