AI & Machine Learning

Kubeflow Pipelines v2: Making ML pipelines easier, faster, and more scalable

October 25, 2023

Chase Lyall

Product Manager

Erwin Huizenga

Developer Advocate Machine Learning

Machine learning is increasingly essential for businesses of all sizes. However, building, deploying, and continuously training ML models can be complex and time-consuming. That's where Kubeflow Pipelines (KFP) comes in. Over the years, we have seen increasing adoption of KFP. KFP provides an ecosystem to compose, deploy and manage reusable end-to-end machine learning workflows, making it a no-lock-in hybrid solution from prototyping to production. So we are excited to announce the release of KFP v2. This blog post will take you through what's new in KFP v2.

Wait, what’s Kubeflow?

The Kubeflow ecosystem was initially open-sourced by Google and partners in 2018 to extend Kubernetes for machine learning. In 2019 Kubeflow Pipelines was introduced as a standalone component of that ecosystem for defining and orchestrating MLOps workflows to continuously train models via the execution of a directed acyclic graph (DAG) of container images. KFP provides a Python SDK and domain-specific language (DSL) for defining a pipeline, and backend and frontend services for running and scheduling pipelines on your Kubernetes cluster of choice. Since its launch, KFP has accrued a rich community of orchestration options (e.g., Vertex AI Pipelines) and pre-built components (e.g., Google Cloud Pipeline Components).

What’s new in KFP v2?

KFP v2 has several major improvements. This section will highlight some of them.

1. An improved Python-based authoring experience for components and pipelines

The new @dsl.component and @dsl.container_component decorators simplify and standardize the component authoring experience while improving readability. Additionally, Python docstrings are automatically propagated to the pipeline specification, improving the understandability and reuse of the pipelines you author.

2. A new Argo-independent pipeline definition that enables compile and run anywhere: executable components, nested pipelines, and potentially, new orchestrator options

KFP v2’s updated intermediate representation (IR) includes additional details that makes the pipeline executable by any backend (not just Argo). Another big benefit is that you can now compile and run individual components (not just pipelines), and nest pipelines as a component of a larger pipeline.

3. An uplifted Workflow GUI

KFP v2 introduces several improvements to help visualize ML workflows. KFP now surfaces input and output artifacts (e.g., datasets, models, and metrics) as first-class nodes in the DAG visualization. This enables users to view how artifacts are used and produced in the workflow and metadata describing them. Nested pipelines are now supported and represented as a sub-DAG. Users can now zoom in the workflow canvas, which greatly improves usability for large pipelines. And there’s more: new run comparison, run cloning, and listing Artifacts and Executions features.

4. First class support for ML metadata (MLMD) artifacts and lineage

Previously MLMD was an optional integration. In KFP v2, MLMD is a required dependency. This enables KFP to provide (1) rich lineage tracking and visualization out-of-the-box and eventually (2) custom artifact schemas for strict type checking of component interfaces and improved visualization of custom artifacts.

5. Increased security due to upgrading upstream dependencies to their latest major version (e.g. Argo, MinIO, MySQL, Envoy Proxy, and MLMD)

We have taken the opportunity afforded to us by this major version bump to upgrade many upstream dependencies to their latest major version. The KFP v2 backend is completely backwards-compatible with v1 (v1 APIs still exist), so we recommend that all KFP backend users upgrade to KFP v2.

These changes were discussed in greater depth during the 2022 Kubeflow Summit.

What’s next?

Have a look at this notebook to learn more about KFP v2 and how to use it with Vertex AI Pipelines.

SDK users, including Vertex AI Pipelines users, can find migration guidance at: Migrate from KFP SDK v1. Open source backend and frontend users can find upgrade instructions for their KFP clusters at: Installation.

KFP has come a long way since 2019, and proven itself at scale in powering complex MLOps workflow orchestration. We recommend upgrading to KFP v2 without delay to enjoy all the benefits it provides. We greatly appreciate your feedback via GitHub, support for, and contributions to KFP. You can find KFP’s contribution guide at CONTRIBUTING.md.

Posted in

https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Cloud_AIML_thumbnail.max-700x700.jpg

AI & Machine Learning

Google is a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud AI Developer Services

By Burak Gokturk • 6-minute read

AI & Machine Learning

Long document summarization with Workflows and Gemini models

By Guillaume Laforge • 5-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/aiml2022_PO1vxqJ.max-700x700.jpg

Storage & Data Transfer

Woven by Toyota decreased their AI training times by 20% by using Cloud Storage FUSE

By Marco Abela • 10-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/DO_NOT_USE_CUxs9oC.max-700x700.jpg

Data Analytics

Transforming customer feedback: analyzing audio customer reviews with BigQuery ML’s speech-to-text

By Nivedita Kumari • 7-minute read

Kubeflow Pipelines v2: Making ML pipelines easier, faster, and more scalable

Chase Lyall

Erwin Huizenga

Wait, what’s Kubeflow?

What’s new in KFP v2?

What’s next?

Related articles

Google is a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud AI Developer Services

Long document summarization with Workflows and Gemini models

Woven by Toyota decreased their AI training times by 20% by using Cloud Storage FUSE

Transforming customer feedback: analyzing audio customer reviews with BigQuery ML’s speech-to-text