This document provides an overview of the Tabular Workflow for Wide & Deep pipelines and components. To learn how to train a model with Wide & Deep, see Train a model with Wide & Deep.
Wide & Deep jointly trains wide linear models and deep neural networks. It combines the benefits of memorization and generalization. In some online experiments, the results showed that Wide & Deep significantly increased Google store application acquisitions compared with wide-only and deep-only models.
Benefits
- Integrated with Vertex AI. The trained model is a Vertex AI model. You can run batch predictions or deploy the model for online predictions right away.
Wide & Deep on Vertex AI Pipelines
Tabular Workflow for Wide & Deep is a managed instance of Vertex AI Pipelines.
Vertex AI Pipelines is a serverless service that runs Kubeflow pipelines. You can use pipelines to automate and monitor your machine learning and data preparation tasks. Each step in a pipeline performs part of the pipeline's workflow. For example, a pipeline can include steps to split data, transform data types, and train a model. Since steps are instances of pipeline components, steps have inputs, outputs, and a container image. Step inputs can be set from the pipeline's inputs or they can depend on the output of other steps within this pipeline. These dependencies define the pipeline's workflow as a directed acyclic graph.
Two versions of the Tabular Workflow for Wide & Deep are available:
- HyperparameterTuningJob searches for the best set of hyperparameter values to use for model training.
- CustomJob lets you specify the hyperparameter values to use for model training. If you know exactly which hyperparameter values you need, you can specify them instead of searching for them and save on training resources.
Overview of Wide & Deep CustomJob pipeline and components
The Wide & Deep CustomJob pipeline can be illustrated by the following diagram:
The pipeline components are:
- feature-transform-engine: Perform feature engineering. See Feature Transform Engine for details.
- split-materialized-data:
Split the materialized data into a training set, an evaluation set, and a test set.
Input:
- Materialized data
materialized_data
.
Output:
- Materialized training split
materialized_train_split
. - Materialized evaluation split
materialized_eval_split
. - Materialized test set
materialized_test_split
.
- Materialized data
- wide-and-deep-trainer:
Perform model training.
Input:
- Instance baseline
instance_baseline
. - Training schema
training_schema
. - Transform output
transform_output
. - Materialized train split
materialized_train_split
. - Materialized evaluation split
materialized_eval_split
. - Materialized test set
materialized_test_split
.
Output:
- Final model
- Instance baseline
- automl-tabular-infra-validator: Validate the trained model by sending a prediction request and checking whether it completes successfully.
- model-upload: Upload the model from the user's Cloud Storage bucket to Vertex AI as a Vertex AI model.
- condition-run-evaluation-2:
Optional. Use the test set to calculate evaluation metrics. Runs only when
run_evaluation
is set totrue
.
Overview of Wide & Deep HyperparameterTuningJob pipeline and components
The Wide & Deep HyperparameterTuningJob pipeline can be illustrated by the following diagram:
- feature-transform-engine: Perform feature engineering. See Feature Transform Engine for details.
- split-materialized-data:
Split the materialized data into a training set, an evaluation set, and a test set.
Input:
- Materialized data
materialized_data
.
Output:
- Materialized training split
materialized_train_split
. - Materialized evaluation split
materialized_eval_split
. - Materialized test set
materialized_test_split
.
- Materialized data
- get-wide-and-deep-study-spec-parameters:
Generate the study spec based on a configuration of the training pipeline. If
the user provides values for
study_spec_parameters_override
, use those values to override the study spec values.Input:
- Optional override of study spec parameters
study_spec_parameters_override
.
Output:
- Final list of hyperparameters and their ranges for the hyperparameter tuning job.
- Optional override of study spec parameters
- wide-and-deep-hyperparameter-tuning-job:
Perform one or more trials of hyperparameter tuning.
Input:
- Instance baseline
instance_baseline
. - Training schema
training_schema
. - Transform output
transform_output
. - Materialized train split
materialized_train_split
. - Materialized evaluation split
materialized_eval_split
. - Materialized test set
materialized_test_split
. - List of hyperparameters and their ranges for the hyperparameter tuning job.
- Instance baseline
- get-best-hyperparameter-tuning-job-trial:
Select the model from the best hyperparameter tuning job trial of the previous step.
Output:
- Final model
- automl-tabular-infra-validator: Validate the trained model by sending a prediction request and checking whether it completes successfully.
- model-upload: Upload the model from the user's Cloud Storage bucket to Vertex AI as a Vertex AI model.
- condition-run-evaluation-2:
Optional. Use the test set to calculate evaluation metrics. Runs only when
run_evaluation
is set totrue
.