Tabular Workflow for Forecasting

This document provides an overview of the Tabular Workflow for Forecasting pipeline and components. To learn how to train a model, see Train a model with Tabular Workflow for Forecasting .

Tabular Workflow for Forecasting is the complete pipeline for forecasting tasks. It is similar to the AutoML API, but allows you to choose what to control and what to automate. Instead of having controls for the whole pipeline, you have controls for every step in the pipeline. These pipeline controls include:

  • Data splitting
  • Feature engineering
  • Architecture search
  • Model training
  • Model ensembling

Benefits

The following are some of the benefits of Tabular Workflow for Forecasting :

  • Supports large datasets that are up to 1TB in size and have up to 200 columns.
  • Allows you to improve stability and lower training time by limiting the search space of architecture types or skipping architecture search.
  • Allows you to improve training speed by manually selecting the hardware used for training and architecture search.
  • For some model training methods, allows you to reduce model size and improve latency by changing the ensemble size.
  • Each component can be inspected in a powerful pipelines graph interface that lets you see the transformed data tables, evaluated model architectures and many more details.
  • Each component gets extended flexibility and transparency, such as being able to customize parameters, hardware, view process status, logs and more.

Forecasting on Vertex AI Pipelines

Tabular Workflow for Forecasting is a managed instance of Vertex AI Pipelines.

Vertex AI Pipelines is a serverless service that runs Kubeflow pipelines. You can use pipelines to automate and monitor your machine learning and data preparation tasks. Each step in a pipeline performs part of the pipeline's workflow. For example, a pipeline can include steps to split data, transform data types, and train a model. Since steps are instances of pipeline components, steps have inputs, outputs, and a container image. Step inputs can be set from the pipeline's inputs or they can depend on the output of other steps within this pipeline. These dependencies define the pipeline's workflow as a directed acyclic graph.

Overview of pipeline and components

The following diagram shows the modeling pipeline for Tabular Workflow for Forecasting :

Pipeline for Forecasting 

The pipeline components are:

  1. feature-transform-engine: Perform feature engineering. See Feature Transform Engine for details.
  2. training-configurator-and-validator: Validate the training configuration and generate the training metadata.

    Input:

    • instance_schema: Instance schema in OpenAPI specification, which describes the data types of the prediction data.
    • dataset_stats: Statistics that describe the raw dataset. For example, dataset_stats gives the number of rows in the dataset.
    • training_schema: Training data schema in OpenAPI specification, which describes the data types of the training data.
  3. split-materialized-data: Split the materialized data into a training set, an evaluation set, and a test set.

    Input:

    • materialized_data: Materialized data.

    Output:

    • materialized_train_split: Materialized training split.
    • materialized_eval_split: Materialized evaluation split.
    • materialized_test_split: Materialized test set.
  4. calculate-training-parameters-2: Calculate the expected runtime duration for automl-forecasting-stage-1-tuner.

  5. get-hyperparameter-tuning-results - Optional: If you configured the pipeline to skip the architecture search, load the hyperparameter tuning results from a previous pipeline run.

  6. Perform model architecture search and tune hyperparameters (automl-forecasting-stage-1-tuner) or use the hyperparameter tuning results from a previous pipeline run (automl-forecasting-stage-2-tuner).

    • An architecture is defined by a set of hyperparameters.
    • Hyperparameters include the model type and the model parameters.
    • Model types considered are neural networks and boosted trees.
    • A model is trained for each architecture considered.

    Input:

    • materialized_train_split: Materialized training split.
    • materialized_eval_split: Materialized evaluation split.
    • artifact - Hyperparameter tuning results from a previous pipeline run. This artifact is an input only if you configured the pipeline to skip the architecture search.

    Output:

    • tuning_result_output: Tuning output.
  7. get-prediction-image-uri-2: Produce the correct prediction image URI based on the model type.

  8. automl-forecasting-ensemble-2: Ensemble the best architectures to produce a final model.

    Input:

    • tuning_result_output: Tuning output.

    Output:

    • unmanaged_container_model: Output model.
  9. model-upload-2 - Upload the model.

    Input:

    • unmanaged_container_model: Output model.

    Output:

    • model: Vertex AI model.
  10. should_run_model_evaluation - Optional: Use the test set to calculate evaluation metrics.

What's next