Before you can run your machine learning (ML) process on AI Platform Pipelines, you must first define your process as a pipeline. You can orchestrate your ML process as a pipeline using TensorFlow Extended (TFX) or the Kubeflow Pipelines SDK.
This document provides guidance for choosing the best option for building your pipeline, and resources for getting started.
- If you are orchestrating a process that trains a TensorFlow model, use TFX to build your pipeline.
- If you are orchestrating a process that trains a model using frameworks such as PyTorch, XGBoost, and scikit-learn, use the Kubeflow Pipelines SDK to build your pipeline.
Building pipelines using the TFX SDK
TFX is an open source project that you can use to define your ML workflow as a pipeline. Currently, TFX components can only train TensorFlow based models. TFX provides components that you can use to ingest and transform data, train and evaluate a model, deploy a trained model for inference, etc. By using the TFX SDK, you can compose a pipeline for your ML process from TFX components.
To get started building pipelines with TFX pipeline templates:
- Follow the tutorial about TFX pipelines on Google Cloud.
- Read the TFX User Guide to learn more about TFX concepts and components.
Building pipelines using the Kubeflow Pipelines SDK
The Kubeflow Pipelines SDK is an open source SDK that you can use to build complex custom ML pipelines based on containers. You can reuse pre-built components or build custom pipeline components using the Kubeflow Pipelines SDK. At a high level, you build components and pipelines by:
- Developing the code for each step in your workflow using your preferred language and tools
- Creating a Docker container image for each step's code
- Using Python to define your pipeline using the Kubeflow Pipelines SDK
To get started building pipelines with the Kubeflow Pipelines SDK:
- Read the Introduction to the Kubeflow Pipelines SDK.
- Learn more about Kubeflow pipelines by exploring the Kubeflow Pipelines samples.
- Reuse pre-built components by exploring the Kubeflow pipeline components on GitHub.
What's next
- Learn how to run your ML pipelines.