Tabular Workflows on Vertex AI

Stay organized with collections Save and categorize content based on your preferences.

Tabular Workflows is a set of integrated, fully managed, and scalable pipelines for end-to-end ML with tabular data. It leverages Google's technology for model development and provides you with customization options to fit your needs.

Benefits

  • Fully managed: you don't need to worry about updates, dependencies and conflicts.
  • Easy to scale: you don't need to re-engineer infrastructure as workloads or datasets grow.
  • Optimized for performance: the right hardware is automatically configured for the workflow's requirements.
  • Deeply integrated: compatibility with products in the Vertex AI MLOps suite, like Vertex AI Pipelines and Vertex AI Experiments, allows you to run many experiments in a short amount of time.

Technical Overview

Each workflow is a managed instance of Vertex AI Pipelines.

Vertex AI Pipelines is a serverless service that runs Kubeflow pipelines. You can use pipelines to automate and monitor your machine learning and data preparation tasks. To learn more about Vertex AI Pipelines, see Introduction to Vertex AI Pipelines.

Each step in a pipeline performs part of the pipeline's workflow. For example, a pipeline can include steps to split data, transform data types, and train a model. Since steps are instances of pipeline components, steps have inputs, outputs, and a container image. Step inputs can be set from the pipeline's inputs or they can depend on the output of other steps within this pipeline. These dependencies define the pipeline's workflow as a directed acyclic graph.

tabular workflows as a directed acyclic graph

How to get started

In most cases, you need to define and run the pipeline using the Google Cloud Pipeline Components SDK. The following sample code provides an illustration. Note that the actual implementation of the code may be different.

  // Define the pipeline and the parameters
  template_path, parameter_values = tabular_utils.get_default_pipeline_and_parameters(
     …
      optimization_objective=optimization_objective,
      data_source=data_source,
      target_column_name=target_column_name
     …)
  // Run the pipeline
  job = pipeline_jobs.PipelineJob(..., template_path=template_path, parameter_values=parameter_values)
  job.run(...)

For sample colabs and notebooks, contact your sales representative or fill out a request form.

Versioning and maintenance

Tabular Workflows have an effective versioning system that allows for continuous updates and improvements without breaking changes to your applications.

Each workflow is released and updated as part of the Google Cloud Pipeline Components SDK. Updates and modifications to any workflow are released as new versions of that workflow. Previous versions of every workflow are always available through the older versions of the SDK. If the SDK version is pinned, the workflow version is also pinned.

Available workflows

Name Availability
Classification and Regression
End-to-End AutoML Public Preview
TabNet Training Private Preview
Wide & Deep Training Private Preview
Feature Engineering
Feature Selection Private Preview
Feature Transformations Private Preview

For additional information and sample notebooks, contact your sales representative or fill out a request form.

Classification and regression workflows

End-to-End AutoML

End-to-End AutoML is the complete AutoML pipeline for classification and regression tasks. It is similar to the AutoML API, but allows you to choose what to control and what to automate. Instead of having controls for the whole pipeline, you have controls for every step in the pipeline. These pipeline controls include:

  • Data splitting
  • Feature engineering
  • Architecture search
  • Model training
  • Model ensembling
  • Model distillation

Benefits

  • Supports large datasets that are multiple TB in size and with up to 1000 columns.
  • Allows you to improve stability and lower training time by limiting the search space of architecture types or skipping architecture search.
  • Allows you to improve training speed by manually selecting the hardware used for training and architecture search.
  • Allows you to reduce model size and improve latency with distillation or by changing the ensemble size.
  • Each AutoML component can be inspected in a powerful pipelines graph interface that lets you see the transformed data tables, evaluated model architectures and many more details.
  • Each AutoML component gets extended flexibility and transparency, such as being able to customize parameters, hardware, view process status, logs and more.

Input-Output

  • Takes a BigQuery table or a CSV file from Cloud Storage as input.
  • Produces a Vertex AI model as output.
  • Intermediate outputs include dataset statistics and dataset splits.

For further information, see: End-to-End AutoML.

TabNet model training

TabNet model training is the training pipeline for TabNet model architecture. It supports classification and regression. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning because the learning capacity is used for the most salient features.

Benefits

  • Automatically selects the appropriate hyper-parameter search space based on dataset size, prediction type, and training budget.
  • Integrated with Vertex AI. The trained model is a Vertex AI model. You can run batch prediction and online prediction right away.

Input-Output

Takes a BigQuery table or a CSV file from Cloud Storage as input and provides a Vertex AI Model as output.

Wide & Deep model training

Wide & Deep model training is a training pipeline for Wide & Deep model architecture. It supports classification and regression. Wide & Deep jointly trains wide linear models and deep neural networks. It combines the benefits of memorization and generalization. In some online experiments, the results showed that Wide & Deep significantly increased Google store application acquisitions compared with wide-only and deep-only models.

Benefits

  • Integrated with Vertex AI. The trained model is a Vertex AI model. You can run batch prediction and online prediction right away.

Input-Output

Takes a BigQuery table or a CSV file from Cloud Storage as input and provides a Vertex AI Model as output.

Feature Engineering

Feature search is a pipeline that creates a ranked set of important features for datasets up to 10,000 columns. It can be used together with any of the training workflows.

Input-Output

Takes a BigQuery table or a CSV file from Cloud Storage as input and produces a JSON file that contains feature rankings.

Feature Transformations

Feature Transformations workflows are used to apply feature engineering consistently during training and serving. They support both TensorFlow and non-TensorFlow frameworks.

Input-Output

Takes dataset splits (train / evaluation / test) and produces the following:

  • Transformed dataset splits
  • Artifact used to reapply the transformations during serving

What's next