Architecture for MLOps using TFX, Kubeflow Pipelines, and Cloud Build

This document describes the overall architecture of a machine learning (ML) system using TensorFlow Extended (TFX) libraries. It also discusses how to set up a continuous integration (CI), continuous delivery (CD), and continuous training (CT) for the ML system using Cloud Build and Kubeflow Pipelines.

In this document, the terms ML system and ML pipeline refer to ML model training pipelines, rather than model scoring or prediction pipelines.

This document is for data scientists and ML engineers who want to adapt their CI/CD practices to move ML solutions to production on Google Cloud, and who want to help ensure the quality, maintainability, and adaptability of their ML pipelines.

This document covers the following topics:

  • Understanding CI/CD and automation in ML.
  • Designing an integrated ML pipeline with TFX.
  • Orchestrating and automating the ML pipeline using Kubeflow Pipelines.
  • Setting up a CI/CD system for the ML pipeline using Cloud Build.

MLOps

To integrate an ML system in a production environment, you need to orchestrate the steps in your ML pipeline. In addition, you need to automate the execution of the pipeline for the continuous training of your models. To experiment with new ideas and features, you need to adopt CI/CD practices in the new implementations of the pipelines. The following sections give a high-level overview of CI/CD and CT in ML.

ML pipeline automation

In some use cases, the manual process of training, validating, and deploying ML models can be sufficient. This manual approach works if your team manages only a few ML models that aren't retrained or aren't changed frequently. In practice, however, models often break down when deployed in the real world because they fail to adapt to changes in the dynamics of environments, or the data that describes such dynamics.

For your ML system to adapt to such changes, you need to apply the following MLOps techniques:

  • Automate the execution of the ML pipeline to retrain new models on new data to capture any emerging patterns. CT is discussed later in this document in the ML with Kubeflow Pipelines section.
  • Set up a continuous delivery system to frequently deploy new implementations of the entire ML pipeline. CI/CD is discussed later in this document in the CI/CD setup for ML on Google Cloud section.

You can automate the ML production pipelines to retrain your models with new data. You can trigger your pipeline on demand, on a schedule, on the availability of new data, on model performance degradation, on significant changes in the statistical properties of the data, or based on other conditions.

CI/CD pipeline compared to CT pipeline

The availability of new data is one trigger to retrain the ML model. The availability of a new implementation of the ML pipeline (including new model architecture, feature engineering, and hyperparameters) is another important trigger to re-execute the ML pipeline. This new implementation of the ML pipeline serves as a new version of the model prediction service, for example, a microservice with a REST API for online serving. The difference between the two cases is as follows:

  • To train a new ML model with new data, the previously deployed CT pipeline is executed. No new pipelines or components are deployed; only a new prediction service or newly trained model is served at the end of the pipeline.
  • To train a new ML model with a new implementation, a new pipeline is deployed through a CI/CD pipeline.

To deploy new ML pipelines quickly, you need to set up a CI/CD pipeline. This pipeline is responsible for automatically deploying new ML pipelines and components when new implementations are available and approved for various environments (such as development, test, staging, pre-production, canary, and production).

The following diagram shows the relationship between the CI/CD pipeline and the ML CT pipeline.

The output of the CI/CD pipeline is the CT pipeline.

Figure 1. CI/CD and ML CT pipelines.

The output for these pipelines is as follows:

  • If given new implementation, a successful CI/CD pipeline deploys a new ML CT pipeline.
  • If given new data, a successful CT pipeline serves a new model prediction service.

Designing a TFX-based ML system

The following sections discuss how to design an integrated ML system using TensorFlow Extend (TFX) to set up a CI/CD pipeline for the ML system. Although there are several frameworks for building ML models, TFX is an integrated ML platform for developing and deploying production ML systems. A TFX pipeline is a sequence of components that implement an ML system. This TFX pipeline is designed for scalable, high-performance ML tasks. These tasks include modeling, training, validation, serving inference, and managing deployments. The key libraries of TFX are as follows:

TFX ML system overview

The following diagram shows how the various TFX libraries are integrated to compose an ML system.

Steps of a TFX-based ML system.

Figure 2. A typical TFX-based ML system.

Figure 2 shows a typical TFX-based ML system. The following steps can be completed manually or by an automated pipeline:

  1. Data extraction: The first step is to extract the new training data from its data sources. The outputs of this step are data files that are used for training and evaluating the model.
  2. Data validation: TFDV validates the data against the expected (raw) data schema. The data schema is created and fixed during the development phase, before system deployment. The data validation steps detect anomalies related to both data distribution and schema skews. The outputs of this step are the anomalies (if any) and a decision on whether to execute downstream steps or not.
  3. Data transformation: After the data is validated, the data is split and prepared for the ML task by performing data transformations and feature engineering operations using TFT. The outputs of this step are data files to train and evaluate the model, usually transformed in TFRecords format. In addition, the transformation artifacts that are produced help with constructing the model inputs and with exporting the saved model after training.
  4. Model training and tuning: To implement and train the ML model, use the tf.estimator or tf.Keras APIs with the transformed data produced by the previous step. To select the parameter settings that lead to the best model, you can use Keras tuner, a hyperparameter tuning library for Keras, or you can use other services like Katib. The output of this step is a saved model that is used for evaluation, and another saved model that is used for online serving of the model for prediction.
  5. Model evaluation and validation: When the model is exported after the training step, it's evaluated on a test dataset to assess the model quality by using TFMA. TFMA evaluates the model quality as a whole, and identifies which part of the data model isn't performing. This evaluation helps guarantee that the model is promoted for serving only if it satisfies the quality criteria. The criteria can include fair performance on various data subsets (for example, demographics and locations), and improved performance compared to previous models or a benchmark model. The output of this step is a set of performance metrics and a decision on whether to promote the model to production
  6. Model serving for prediction: After the newly trained model is validated, it's deployed as a microservice to serve online predictions using TensorFlow Serving. The output of this step is a deployed prediction service of the trained ML model. You can replace this step by storing the trained model in a model registry. Subsequently a separate model serving CI/CD process is launched.

For a tutorial that shows how to use the TFX libraries, see ML with TensorFlow Extended (TFX).

TFX ML system on Google Cloud

In a production environment, the components of the system have to run at scale on a reliable platform. The following diagram shows how each step of the TFX ML pipeline runs using a managed service on Google Cloud, which ensures agility, reliability, and performance at a large scale.

Steps of a TFX-based ML system on Google Cloud.

Figure 3. TFX-based ML system on Google Cloud.

The following table describes the key Google Cloud services shown in figure 3:

Step TFX library Google Cloud service
Data extraction and validation TensorFlow Data Validation Dataflow
Data transformation TensorFlow Transform Dataflow
Model training and tuning TensorFlow AI Platform Training
Model evaluation and validation TensorFlow Model Analysis Dataflow
Model serving for prediction TensorFlow Serving AI Platform Prediction
Model artifacts storage NA AI Hub
  • Dataflow is a fully managed, serverless, and reliable service for running Apache Beam pipelines at scale on Google Cloud. Dataflow is used to scale the following processes:

    • Computing the statistics to validate the incoming data.
    • Performing data preparation and transformation.
    • Evaluating the model on a large dataset.
    • Computing metrics on different aspects of the evaluation dataset.
  • Cloud Storage is a highly available and durable storage for binary large objects. Cloud Storage hosts artifacts produced throughout the execution of the ML pipeline, including the following:

    • Data anomalies (if any)
    • Transformed data and artifacts
    • Exported (trained) model
    • Model evaluation metrics
  • AI Hub is an enterprise-grade hosted repository for discovering, sharing, and reusing artificial intelligence (AI) and ML assets. To store trained and validated models, plus their relevant metadata, you can use AI Hub as a model registry.

  • AI Platform is a managed service to train and serve ML models at scale. AI Platform Training not only supports TensorFlow, Scikit-learn, and XGboost models, but also supports models implemented in any framework using a user-provided custom container. In addition, a scalable, Bayesian optimization-based service for a hyperparameter tuning is available. Trained models can be deployed to AI Platform Prediction as a microservice that has a REST API.

Orchestrating the ML system using Kubeflow Pipelines

This document has covered how to design a TFX-based ML system, and how to run each component of the system at scale on Google Cloud. However, you need an orchestrator in order to connect these different components of the system together. The orchestrator runs the pipeline in a sequence, and automatically moves from one step to another based on the defined conditions. For example, a defined condition might be executing the model serving step after the model evaluation step if the evaluation metrics meet predefined thresholds. Orchestrating the ML pipeline is useful in both the development and production phases:

  • During the development phase, orchestration helps the data scientists to run the ML experiment, instead of manually executing each step.
  • During the production phase, orchestration helps automate the execution of the ML pipeline based on a schedule or certain triggering conditions.

ML with Kubeflow Pipelines

Kubeflow is an open source Kubernetes framework for developing and running portable ML workloads. Kubeflow Pipelines is a Kubeflow service that lets you compose, orchestrate, and automate ML systems, where each component of the system can run on Kubeflow, Google Cloud, or other cloud platforms. The Kubeflow Pipelines platform consists of the following:

  • A user interface for managing and tracking experiments, jobs, and runs.
  • An engine for scheduling multistep ML workflows. Kubeflow pipelines use Argo to orchestrate Kubernetes resources.
  • A Python SDK for defining and manipulating pipelines and components.
  • Notebooks for interacting with the system using the Python SDK.
  • An ML Metadata store to save information about executions, models, datasets, and other artifacts.

The following constitutes a Kubeflow pipeline:

  • A set of containerized ML tasks, or components. A pipeline component is self-contained code that is packaged as a Docker image. A component takes input arguments, produces output files, and performs one step in the pipeline.

  • A specification of the sequence of the ML tasks, defined through a Python domain-specific language (DSL). The topology of the workflow is implicitly defined by connecting the outputs of an upstream step to the inputs of a downstream step. A step in the pipeline definition invokes a component in the pipeline. In a complex pipeline, components can execute multiple times in loops, or they can be executed conditionally.

  • A set of pipeline input parameters, whose values are passed to the components of the pipeline, including the criteria for filtering data and where to stored the artifacts that the pipeline produces.

The following diagram shows a sample graph of Kubeflow Pipelines.

Graph of ML pipeline using Kubeflow Pipelines.

Figure 4. A sample graph of Kubeflow Pipelines.

Kubeflow Pipelines components

For a component to be invoked in the pipeline, you need to create a component op. You can create a component op using the following methods:

  • Implementing a lightweight Python component: This component doesn't require that you build a new container image for every code change, and is intended for fast iteration in a notebook environment. You can create a lightweight component from your Python function using the kfp.components.func_to_container_op function.

  • Creating reusable component: This functionality requires that your component incudes a component specification in the component.yaml file. The component specification describes the component to the Kubeflow Pipelines in terms of arguments, the Docker container image URL to execute, and the outputs. Component ops are automatically created from the component.yaml files using the ComponentStore.load_components function in the Kubeflow Pipelines SDK during pipeline compilation. Reusable component.yaml specifications can be shared to AI Hub for composability in different Kubeflow Pipelines projects.

  • Using predefined Google Cloud components: Kubeflow Pipelines provides predefined components that execute various managed services on Google Cloud by providing the required parameters. These components help you execute tasks using services such as BigQuery, Dataflow, Dataproc, and AI Platform. These predefined Google Cloud components are also available in AI Hub. Similar to using reusable components, these component ops are automatically created from the predefined component specifications through ComponentStore.load_components. Other predefined components are available for executing jobs in Kubeflow and other platforms.

You can also use the TFX Pipeline DSL and use TFX components. A TFX component encapsulates metadata functionality. The driver supplies metadata to the executor by querying the metadata store. The publisher accepts the results of the executor and stores them in metadata. You can also implement your custom component, which has the same integration with the metadata. TFX provides a command-line interface (CLI) that compiles the pipeline's Python code to a YAML file and describes the Argo workflow. Then you can submit the file to Kubeflow Pipelines.

The following diagram shows how in Kubeflow Pipelines, a containerized task can invoke other services such as BigQuery jobs, AI Platform (distributed) training jobs, and Dataflow jobs.

Architecture of Kubeflow Pipelines on Google Cloud.

Figure 5. ML pipeline with Kubeflow Pipelines and Google Cloud managed services.

Kubeflow Pipelines lets you orchestrate and automate a production ML pipeline by executing the required Google Cloud services. In figure 5, Cloud SQL serves as the ML metadata store for Kubeflow Pipelines.

Kubeflow Pipelines components aren't limited to executing TFX-related services on Google Cloud. These components can execute any data-related and compute-related services, including Dataproc for SparkML jobs, AutoML, and other compute workloads.

Containerizing tasks in Kubeflow Pipelines has the following advantages:

  • Decouples the execution environment from your code runtime.
  • Provides reproducibility of the code between the development and production environment, because the things you test are the same in production.
  • Isolates each component in the pipeline; each can have its own version of the runtime, different languages, and different libraries.
  • Helps with composition of complex pipelines.
  • Integrates with ML metadata store for traceability and reproducibility.

For a comprehensive introduction to Kubeflow pipelines and an example with TFX libraries, see the Getting started with Kubeflow Pipelines blog post.

Triggering and scheduling Kubeflow Pipelines

When you deploy a Kubeflow pipeline to production, you need to automate its executions, depending on the scenarios discussed in the ML pipeline automation section.

Kubeflow Pipelines provides a Python SDK to operate the pipeline programmatically. The kfp.Client class includes APIs to create experiments, and to deploy and run pipelines. By using the Kubeflow Pipelines SDK, you can invoke Kubeflow Pipelines using the following services:

Kubeflow Pipelines also provides a built-in scheduler for recurring pipelines in Kubeflow Pipelines.

Setting up CI/CD for ML on Google Cloud

Kubeflow Pipelines enables you to orchestrate ML systems that involve multiple steps, including data preprocessing, model training and evaluation, and model deployment. In the data science exploration phase, Kubeflow Pipelines helps with rapid experimentation of the whole system. In the production phase, Kubeflow Pipelines enables you to automate the pipeline execution based on new data to train or retrain the ML model.

CI/CD architecture

The following diagram shows a high-level overview of CI/CD for ML with Kubeflow pipelines.

Architecture of CI/CD for ML pipeline using Kubeflow Pipelines.

Figure 6: High-level overview of CI/CD for Kubeflow pipelines.

At the heart of this architecture is Cloud Build, a managed service that executes your builds on Google Cloud infrastructure. Cloud Build can import source from Cloud Source Repositories, GitHub, or Bitbucket, and then execute a build to your specifications, and produce artifacts such as Docker containers or Python tar files.

Cloud Build executes your build as a series of build steps, defined in a build configuration file (cloulbuild.yaml). Each build step is run in a Docker container. You can either use the supported build steps provided by Cloud Build, or write your own build steps.

The Cloud Build process, which performs the required CI/CD for your ML system, can be executed either manually or through automated build triggers. Triggers execute your configured build steps whenever changes are pushed to the build source. You can set a build trigger to execute the build routine on changes to the source repository, or to execute the build routine only when changes match certain criteria.

In addition, you can have build routines (Cloud Build configuration files) that are executed in response to different triggers. For example, you can have the following setup:

  • A build routine starts when commits are made to a development branch.
  • A build routine starts when commits are made to the master branch.

You can use configuration variable substitutions to define the environment variables at build time. These substitutions are captured from triggered builds. These variables include $COMMIT_SHA, $REPO_NAME, $BRANCE_NAME, $TAG_NAME, and $REVISION_ID. Other non-trigger-based variables are $PROJECT_ID and $BUILD_ID. Substitutions are helpful for variables whose value isn't known until build time, or to reuse an existing build request with different variable values.

CI/CD workflow use case

A source code repository typically includes the following items:

  • The Python source code for implementing the components of Kubeflow Pipelines, including data validation, data transformation, model training, model evaluation, and model serving.
  • Python unit tests to test the methods implemented in the component.
  • Dockerfiles that are required in order to create Docker container images, one for each Kubeflow Pipelines component.
  • The component.yaml file that defines the pipeline component specifications. These specifications are used to generate the component ops in the pipeline definition.
  • A Python module (for example, the pipeline.py module) where the Kubeflow Pipelines workflow is defined.
  • Other scripts and configuration files, including the cloudbuild.yaml files.
  • Notebooks used for exploratory data analysis, model analysis, and interactive experimentation on models.
  • A settings file (for example, the settings.yaml file), including configurations to the pipeline input parameters.

In the following example, a build routine is triggered when a developer pushes source code to the development branch from their data science environment.

Example build steps.

Figure 7. Example build steps performed by Cloud Build.

As shown in figure 7, Cloud Build performs the following build steps:

  1. The source code repository is copied to the Cloud Build runtime environment, under the /workspace directory.
  2. Unit tests are run.
  3. (Optional) Static code analysis is run, such as Pylint.
  4. If the tests pass, the Docker container images are built, one for each Kubeflow Pipelines component. The images are tagged with the $COMMIT_SHA parameter.
  5. The Docker container images are uploaded to the Container Registry.
  6. The image URL is updated in each of the component.yamlfiles with the created and tagged Docker container images.
  7. The Kubeflow Pipelines workflow is compiled to produce the workflow.tar.gz file.
  8. The workflow.tar.gz file is uploaded to Cloud Storage.
  9. The compiled pipeline is deployed to Kubeflow Pipelines, which involves the following steps:
    1. Read the pipeline parameters from the settings.yaml file.
    2. Create an experiment (or use an existing one).
    3. Deploy the pipeline to Kubeflow Pipelines (and tag its name with a version).
  10. (Optional) Run the pipeline with the parameter values as part of an integration test or production execution. The executed pipeline eventually deploys a model as an API on AI Platform.

For a comprehensive Cloud Build example that covers most of these steps, see A Simple CI/CD Example with Kubeflow Pipelines and Cloud Build.

Additional considerations

When you set up the ML CI/CD architecture on Google Cloud, consider the following:

  • For the data science environment, you can use a local machine, or an AI Platform Notebooks instance that is based on a Deep Learning VM (DLVM).
  • You can configure the automated Cloud Build pipeline to skip triggers, for example, if only documentation files are edited, or if the experimentation notebooks are modified.
  • You can execute the pipeline for integration and regression testing as a build test. Before the pipeline is deployed to the target environment, you can use the wait_for_pipeline_completion method to execute the pipeline on a sample dataset to test the pipeline.
  • As an alternative to using Cloud Build, you can use other build systems such as Jenkins. A ready-to-go deployment of Jenkins is available on Google Cloud Marketplace.
  • You can configure the pipeline to deploy automatically to different environments, including development, test, and staging, based on different triggers. In addition, you can deploy to particular environments manually, such as pre-production or production, typically after getting a release approval. You can have multiple build routines for different triggers or for different target environments.
  • You can use Apache Airflow, a popular orchestration and scheduling framework, for general-purpose workflows, which you can run using the fully managed Cloud Composer service. For more information on how to orchestrate data pipelines with Cloud Composer and Cloud Build, see Setting up a CI/CD pipeline for your data-processing workflow.
  • When you deploy a new version of the model to production, deploy it as a canary release to get an idea of how it will perform (CPU, memory, and disk usage). Before you configure the new model to serve all live traffic, you can also perform A/B testing. Configure the new model to serve 10% to 20% of the live traffic. If the new model performs better than current one, you can configure the new model to serve all traffic. Otherwise, the serving system rolls back to the current model.

What's next