Introduction to Google Cloud Pipeline Components

The Google Cloud Pipeline Components (GCPC) SDK provides a set of prebuilt Kubeflow Pipelines components that are production quality, performant, and easy to use. You can use Google Cloud Pipeline Components to define and run ML pipelines in Vertex AI Pipelines and other ML pipeline execution backends conformant with Kubeflow Pipelines.

For example, you can use these components to complete the following:

  • Create a new dataset and load different data types into the dataset (image, tabular, text, or video).
  • Export data from a dataset to Cloud Storage.
  • Use AutoML to train a model using image, tabular, text, or video data.
  • Run a custom training job using a custom container or a Python package.
  • Upload an existing model to Vertex AI for batch prediction.
  • Create a new endpoint and deploy a model to it for online predictions.

Additionally, these prebuilt Google Cloud Pipeline Components are supported in Vertex AI Pipelines and offer the following benefits:

  • Easier debugging: Show the underlying resources launched from the component for simplified debugging.
  • Standardized artifact types: Provide consistent interfaces to use standard artifact types for input and output. These standard artifacts are tracked in Vertex ML Metadata, making it easier for you to analyze the lineage of your pipeline's artifacts. For more details on artifact lineage, see Tracking the lineage of pipeline artifacts.
  • Understand pipeline costs with billing labels: Resource labels are automatically propagated to Google Cloud services generated by the Google Cloud Pipeline Components in your pipeline run. You can use billing labels along with Cloud Billing export to BigQuery to review the cost of your pipeline run. For more information about using labels to understand the cost of a pipeline run, see Understand pipeline run costs. For more information about how labels are propagated from a pipeline run to resources spawned by Google Cloud Pipeline Components, see Resource labeling by Vertex AI Pipelines.
  • Cost efficiencies*: Vertex AI Pipelines optimize the execution of these components by launching the Google Cloud resources, without having to launch the container. This reduces the startup latency and reduces the costs of the busy-waiting container.

What's next