Machine learning operations (MLOps) is the practice of applying DevOps strategies to machine learning (ML) systems. DevOps strategies let you efficiently build and release code changes, and monitor systems to ensure you meet your reliability goals. MLOps extends this practice to help you reduce the amount of time that it takes to reliably go from data ingestion to deploying your model in production, in a way that lets you monitor and understand your ML system.
Vertex AI Pipelines helps you to automate, monitor, and govern your ML systems by orchestrating your ML workflow in a serverless manner, and storing your workflow's artifacts using Vertex ML Metadata. By storing the artifacts of your ML workflow in Vertex ML Metadata, you can analyze the lineage of your workflow's artifacts — for example, an ML model's lineage may include the training data, hyperparameters, and code that were used to create the model.
Understanding ML pipelines
To orchestrate your ML workflow on Vertex AI Pipelines, you must first describe your workflow as a pipeline. ML pipelines are portable and scalable ML workflows that are based on containers. ML pipelines are composed of a set of input parameters and a list of steps. Each step is an instance of a pipeline component.
You can use ML pipelines to:
- Apply MLOps strategies to automate and monitor repeatable processes.
- Experiment by running an ML workflow with different sets of hyperparameters, number of training steps or iterations, etc.
- Reuse a pipeline's workflow to train a new model.
You can use Vertex AI Pipelines to run pipelines that were built using the Kubeflow Pipelines SDK or TensorFlow Extended . Learn more about choosing between the Kubeflow Pipelines SDK and TFX.
Understanding pipeline components
Pipeline components are self-contained sets of code that perform one part of a pipeline's workflow, such as data preprocessing, data transformation, and training a model.
Components are composed of a set of inputs, a set of outputs, and the location of a container image. A component's container image is a package that includes the component's executable code and a definition of the environment that the code runs in.
You can build custom components or you can reuse prebuilt components. To use features of Vertex AI like AutoML in your pipeline, use the Google Cloud pipeline components. Learn more about using Google Cloud pipeline components in your pipeline.
Understanding pipeline workflow
Each step in a pipeline performs part of the pipeline's workflow. Since steps are instances of pipeline components, steps have inputs, outputs, and a container image. Step inputs can be set from the pipeline's inputs or they can depend on the output of other steps within this pipeline. These dependencies define the pipeline's workflow as a directed acyclic graph.
For example, consider a pipeline with the following steps:
- Ingest data: This step loads training data into the pipeline.
- Preprocess data: This step preprocesses the ingested training data.
- Train model: This step uses the preprocessed training data to train a model.
- Evaluate model: This step evaluates the trained model.
- Deploy: This step deploys the trained model for predictions.
When you compile your pipeline, the pipelines SDK (the Kubeflow Pipelines SDK or TFX) analyzes the data dependencies between steps to create the workflow graph.
- The ingest data step does not depend on any other tasks, so it can be the first step in the workflow or it can run concurrently with other steps.
- The preprocess data step relies on the data produced by the ingest data step, so preprocessing data must occur after ingesting data.
- The model training step relies on the preprocessed training data, so training a model must occur after preprocessing the data.
- Model evaluation and deploying a trained model both rely on the trained model, so they must occur after the model training step. Model evaluation and deploying a trained model for predictions can occur concurrently since they both depend on the model training step.
Based on this analysis, the Vertex AI Pipelines runs the ingest data, preprocess data, and model training steps sequentially, and then runs the model evaluation and deployment steps concurrently.
Understanding the lineage of your ML artifacts
In order to understand changes in the performance or accuracy of your ML system, you must be able to analyze the metadata of pipeline runs and the lineage of ML artifacts. An artifact's lineage includes all the factors that contributed to its creation, as well as artifacts and metadata that are derived from this artifact. Managing this metadata in an ad-hoc manner can be difficult and time-consuming.
For example, a model's lineage could include the following:
- The training, test, and evaluation data used to create the model.
- The hyperparameters used during model training.
- The code that was used to train the model.
- Metadata recorded from the training and evaluation process, such as the model's accuracy.
- Artifacts that descend from this model, such as the results of batch predictions.
When you run a pipeline using Vertex AI Pipelines, the artifacts and metadata of your pipeline run are stored using Vertex ML Metadata. You can use this metadata help answer questions like the following:
- Why did a certain pipeline run produce an especially accurate model?
- Which pipeline run produced the most accurate model, and what hyperparameters were used to train the model?
- Depending on the steps in your pipeline, you may be able to use Vertex ML Metadata to answer system governance questions. For example, you could use metadata to determine which version of your model was in production at a given point in time.
Learn more about visualizing pipeline runs, analyzing the lineage of your ML artifacts, or first party artifact types that Google Cloud Pipeline Components defines.
What's next
- Get started building pipelines.
- Learn how to run a pipeline.
- Learn best practices for implementing custom-trained ML models on Vertex AI.