Tabular Workflow for TabNet

This document provides an overview of the Tabular Workflow for TabNet pipelines and components. To learn how to train a model with TabNet, see Train a model with TabNet.

TabNet uses sequential attention to choose which features to reason from at each decision step. This promotes interpretability and more efficient learning because the learning capacity is used for the most salient features.

Benefits

  • Automatically selects the appropriate hyperparameter search space based on the dataset size, prediction type, and training budget.
  • Integrated with Vertex AI. The trained model is a Vertex AI model. You can run batch predictions or deploy the model for online predictions right away.
  • Provides inherent model interpretability. You can get insight into which features TabNet used to make its decision.
  • Supports GPU training.

TabNet on Vertex AI Pipelines

Tabular Workflow for TabNet is a managed instance of Vertex AI Pipelines.

Vertex AI Pipelines is a serverless service that runs Kubeflow pipelines. You can use pipelines to automate and monitor your machine learning and data preparation tasks. Each step in a pipeline performs part of the pipeline's workflow. For example, a pipeline can include steps to split data, transform data types, and train a model. Since steps are instances of pipeline components, steps have inputs, outputs, and a container image. Step inputs can be set from the pipeline's inputs or they can depend on the output of other steps within this pipeline. These dependencies define the pipeline's workflow as a directed acyclic graph.

Two versions of the Tabular Workflow for TabNet are available:

  • HyperparameterTuningJob searches for the best set of hyperparameter values to use for model training.
  • CustomJob lets you specify the hyperparameter values to use for model training. If you know exactly which hyperparameter values you need, you can specify them instead of searching for them and save on training resources.

Overview of TabNet CustomJob pipeline and components

The TabNet CustomJob pipeline can be illustrated by the following diagram:

Pipeline for TabNet CustomJob 

The pipeline components are:

  1. feature-transform-engine: Perform feature engineering. See Feature Transform Engine for details.
  2. split-materialized-data: Split the materialized data into a training set, an evaluation set, and a test set.

    Input:

    • Materialized data materialized_data.

    Output:

    • Materialized training split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.
  3. tabnet-trainer: Perform model training.

    Input:

    • Instance baseline instance_baseline.
    • Training schema training_schema.
    • Transform output transform_output.
    • Materialized train split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.

    Output:

    • Final model
  4. automl-tabular-infra-validator: Validate the trained model by sending a prediction request and checking whether it completes successfully.
  5. model-upload: Upload the model from the user's Cloud Storage bucket to Vertex AI as a Vertex AI model.
  6. condition-run-evaluation-2: Optional. Use the test set to calculate evaluation metrics. Runs only when run_evaluation is set to true.

Overview of TabNet HyperparameterTuningJob pipeline and components

The TabNet HyperparameterTuningJob pipeline can be illustrated by the following diagram:

Pipeline for TabNet HyperparameterTuningJob 

  1. feature-transform-engine: Perform feature engineering. See Feature Transform Engine for details.
  2. split-materialized-data: Split the materialized data into a training set, an evaluation set, and a test set.

    Input:

    • Materialized data materialized_data.

    Output:

    • Materialized training split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.
  3. get-tabnet-study-spec-parameters: Generate the study spec based on a configuration of the training pipeline. If the user provides values for study_spec_parameters_override, use those values to override the study spec values.

    Input:

    • Training pipeline configuration (max_trial_count, prediction_type).
    • Dataset statistics dataset_stats.
    • Optional override of study spec parameters study_spec_parameters_override.

    Output:

    • Final list of hyperparameters and their ranges for the hyperparameter tuning job.
  4. tabnet-hyperparameter-tuning-job: Perform one or more trials of hyperparameter tuning.

    Input:

    • Instance baseline instance_baseline.
    • Training schema training_schema.
    • Transform output transform_output.
    • Materialized train split materialized_train_split.
    • Materialized evaluation split materialized_eval_split.
    • Materialized test set materialized_test_split.
    • List of hyperparameters and their ranges for the hyperparameter tuning job.
  5. get-best-hyperparameter-tuning-job-trial: Select the model from the best hyperparameter tuning job trial of the previous step.

    Output:

    • Final model
  6. automl-tabular-infra-validator: Validate the trained model by sending a prediction request and checking whether it completes successfully.
  7. model-upload: Upload the model from the user's Cloud Storage bucket to Vertex AI as a Vertex AI model.
  8. condition-run-evaluation-2: Optional. Use the test set to calculate evaluation metrics. Runs only when run_evaluation is set to true.

What's next