Tabular Workflow for Wide & Deep

This document provides an overview of the Tabular Workflow for Wide & Deep pipelines and components. To learn how to train a model with Wide & Deep, see Train a model with Wide & Deep.

Wide & Deep jointly trains wide linear models and deep neural networks. It combines the benefits of memorization and generalization. In some online experiments, the results showed that Wide & Deep significantly increased Google store application acquisitions compared with wide-only and deep-only models.

Benefits

  • Integrated with Vertex AI. The trained model is a Vertex AI model. You can run batch predictions or deploy the model for online predictions right away.

Wide & Deep on Vertex AI Pipelines

Tabular Workflow for Wide & Deep is a managed instance of Vertex AI Pipelines.

Vertex AI Pipelines is a serverless service that runs Kubeflow pipelines. You can use pipelines to automate and monitor your machine learning and data preparation tasks. Each step in a pipeline performs part of the pipeline's workflow. For example, a pipeline can include steps to split data, transform data types, and train a model. Since steps are instances of pipeline components, steps have inputs, outputs, and a container image. Step inputs can be set from the pipeline's inputs or they can depend on the output of other steps within this pipeline. These dependencies define the pipeline's workflow as a directed acyclic graph.

Two versions of the Tabular Workflow for Wide & Deep are available:

  • HyperparameterTuningJob searches for the best set of hyperparameter values to use for model training.
  • CustomJob lets you specify the hyperparameter values to use for model training. If you know exactly which hyperparameter values you need, you can specify them instead of searching for them and save on training resources.

Overview of Wide & Deep CustomJob pipeline and components

The Wide & Deep CustomJob pipeline can be illustrated by the following diagram:

Pipeline for Wide & Deep CustomJob 

The pipeline components are:

  1. feature-transform-engine: Perform feature engineering. See Feature Transform Engine for details.
  2. split-materialized-data: Split the materialized data into a training set, an evaluation set, and a test set.

    Input:

    • Materialized data materialized_data.

    Output:

    • Materialized training split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.
  3. wide-and-deep-trainer: Perform model training.

    Input:

    • Instance baseline instance_baseline.
    • Training schema training_schema.
    • Transform output transform_output.
    • Materialized train split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.

    Output:

    • Final model
  4. automl-tabular-infra-validator: Validate the trained model by sending a prediction request and checking whether it completes successfully.
  5. model-upload: Upload the model from the user's Cloud Storage bucket to Vertex AI as a Vertex AI model.
  6. condition-run-evaluation-2: Optional. Use the test set to calculate evaluation metrics. Runs only when run_evaluation is set to true.

Overview of Wide & Deep HyperparameterTuningJob pipeline and components

The Wide & Deep HyperparameterTuningJob pipeline can be illustrated by the following diagram:

Pipeline for Wide & Deep HyperparameterTuningJob 

  1. feature-transform-engine: Perform feature engineering. See Feature Transform Engine for details.
  2. split-materialized-data: Split the materialized data into a training set, an evaluation set, and a test set.

    Input:

    • Materialized data materialized_data.

    Output:

    • Materialized training split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.
  3. get-wide-and-deep-study-spec-parameters: Generate the study spec based on a configuration of the training pipeline. If the user provides values for study_spec_parameters_override, use those values to override the study spec values.

    Input:

    • Optional override of study spec parameters study_spec_parameters_override.

    Output:

    • Final list of hyperparameters and their ranges for the hyperparameter tuning job.
  4. wide-and-deep-hyperparameter-tuning-job: Perform one or more trials of hyperparameter tuning.

    Input:

    • Instance baseline instance_baseline.
    • Training schema training_schema.
    • Transform output transform_output.
    • Materialized train split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.
    • List of hyperparameters and their ranges for the hyperparameter tuning job.
  5. get-best-hyperparameter-tuning-job-trial: Select the model from the best hyperparameter tuning job trial of the previous step.

    Output:

    • Final model
  6. automl-tabular-infra-validator: Validate the trained model by sending a prediction request and checking whether it completes successfully.
  7. model-upload: Upload the model from the user's Cloud Storage bucket to Vertex AI as a Vertex AI model.
  8. condition-run-evaluation-2: Optional. Use the test set to calculate evaluation metrics. Runs only when run_evaluation is set to true.

What's next