Kubeflow Pipelines v2: Making ML pipelines easier, faster, and more scalable
Developer Advocate Machine Learning
Machine learning is increasingly essential for businesses of all sizes. However, building, deploying, and continuously training ML models can be complex and time-consuming. That's where Kubeflow Pipelines (KFP) comes in. Over the years, we have seen increasing adoption of KFP. KFP provides an ecosystem to compose, deploy and manage reusable end-to-end machine learning workflows, making it a no-lock-in hybrid solution from prototyping to production. So we are excited to announce the release of KFP v2. This blog post will take you through what's new in KFP v2.
Wait, what’s Kubeflow?
The Kubeflow ecosystem was initially open-sourced by Google and partners in 2018 to extend Kubernetes for machine learning. In 2019 Kubeflow Pipelines was introduced as a standalone component of that ecosystem for defining and orchestrating MLOps workflows to continuously train models via the execution of a directed acyclic graph (DAG) of container images. KFP provides a Python SDK and domain-specific language (DSL) for defining a pipeline, and backend and frontend services for running and scheduling pipelines on your Kubernetes cluster of choice. Since its launch, KFP has accrued a rich community of orchestration options (e.g., Vertex AI Pipelines) and pre-built components (e.g., Google Cloud Pipeline Components).
What’s new in KFP v2?
KFP v2 has several major improvements. This section will highlight some of them.
1. An improved Python-based authoring experience for components and pipelines
@dsl.container_component decorators simplify and standardize the component authoring experience while improving readability. Additionally, Python docstrings are automatically propagated to the pipeline specification, improving the understandability and reuse of the pipelines you author.
2. A new Argo-independent pipeline definition that enables compile and run anywhere: executable components, nested pipelines, and potentially, new orchestrator options
KFP v2’s updated intermediate representation (IR) includes additional details that makes the pipeline executable by any backend (not just Argo). Another big benefit is that you can now compile and run individual components (not just pipelines), and nest pipelines as a component of a larger pipeline.
3. An uplifted Workflow GUI
KFP v2 introduces several improvements to help visualize ML workflows. KFP now surfaces input and output artifacts (e.g., datasets, models, and metrics) as first-class nodes in the DAG visualization. This enables users to view how artifacts are used and produced in the workflow and metadata describing them. Nested pipelines are now supported and represented as a sub-DAG. Users can now zoom in the workflow canvas, which greatly improves usability for large pipelines. And there’s more: new run comparison, run cloning, and listing Artifacts and Executions features.
4. First class support for ML metadata (MLMD) artifacts and lineage
Previously MLMD was an optional integration. In KFP v2, MLMD is a required dependency. This enables KFP to provide (1) rich lineage tracking and visualization out-of-the-box and eventually (2) custom artifact schemas for strict type checking of component interfaces and improved visualization of custom artifacts.
5. Increased security due to upgrading upstream dependencies to their latest major version (e.g. Argo, MinIO, MySQL, Envoy Proxy, and MLMD)
We have taken the opportunity afforded to us by this major version bump to upgrade many upstream dependencies to their latest major version. The KFP v2 backend is completely backwards-compatible with v1 (v1 APIs still exist), so we recommend that all KFP backend users upgrade to KFP v2.
These changes were discussed in greater depth during the 2022 Kubeflow Summit.
Have a look at this notebook to learn more about KFP v2 and how to use it with Vertex AI Pipelines.
SDK users, including Vertex AI Pipelines users, can find migration guidance at: Migrate from KFP SDK v1. Open source backend and frontend users can find upgrade instructions for their KFP clusters at: Installation.
KFP has come a long way since 2019, and proven itself at scale in powering complex MLOps workflow orchestration. We recommend upgrading to KFP v2 without delay to enjoy all the benefits it provides. We greatly appreciate your feedback via GitHub, support for, and contributions to KFP. You can find KFP’s contribution guide at CONTRIBUTING.md.