Tabular data overview

Vertex AI lets you perform machine learning with tabular data using simple processes and interfaces. You can create the following model types for your tabular data problems:

  • Binary classification models predict a binary outcome (one of two classes). Use this model type for yes or no questions. For example, you might want to build a binary classification model to predict whether a customer would buy a subscription. Generally, a binary classification problem requires less data than other model types.
  • Multi-class classification models predict one class from three or more discrete classes. Use this model type for categorization. For example, as a retailer, you might want to build a multi-class classification model to segment customers into different personas.
  • Regression models predict a continuous value. For example, as a retailer, you might want to build a regression model to predict how much a customer will spend next month.
  • Forecasting models predict a sequence of values. For example, as a retailer, you might want to forecast daily demand of your products for the next 3 months so that you can appropriately stock product inventories in advance.

For an introduction to machine learning with tabular data, see Introduction to Tabular Data. For further information about Vertex AI solutions, see Vertex AI solutions for classification and regression and Vertex AI solutions for forecasting.

A note about fairness

Google is committed to making progress in following responsible AI practices. To this end, our ML products, including AutoML, are designed around core principles such as fairness and human-centered machine learning. For more information about best practices for mitigating bias when building your own ML system, see Inclusive ML guide - AutoML.

Vertex AI solutions for classification and regression

Vertex AI offers the following solutions for classification and regression:

Tabular Workflow for End-to-End AutoML

Tabular Workflow for End-to-End AutoML is a complete AutoML pipeline for classification and regression tasks. It is similar to the AutoML API, but allows you to choose what to control and what to automate. Instead of having controls for the whole pipeline, you have controls for every step in the pipeline. These pipeline controls include:

  • Data splitting
  • Feature engineering
  • Architecture search
  • Model training
  • Model ensembling
  • Model distillation

Benefits

  • Supports large datasets that are multiple TB in size and have up to 1000 columns.
  • Allows you to improve stability and lower training time by limiting the search space of architecture types or skipping architecture search.
  • Allows you to improve training speed by manually selecting the hardware used for training and architecture search.
  • Allows you to reduce model size and improve latency with distillation or by changing the ensemble size.
  • Each AutoML component can be inspected in a powerful pipelines graph interface that lets you see the transformed data tables, evaluated model architectures, and many more details.
  • Each AutoML component gets extended flexibility and transparency, such as being able to customize parameters, hardware, view process status, logs, and more.

To learn more about Tabular Workflows, see Tabular Workflows on Vertex AI. To learn more about Tabular Workflow for End-to-End AutoML, see Tabular Workflow for End-to-End AutoML.

Tabular Workflow for TabNet

Tabular Workflow for TabNet is a pipeline that you can use to train classification or regression models. TabNet uses sequential attention to choose which features to reason from at each decision step. This promotes interpretability and more efficient learning because the learning capacity is used for the most salient features.

Benefits

  • Automatically selects the appropriate hyperparameter search space based on the dataset size, prediction type, and training budget.
  • Integrated with Vertex AI. The trained model is a Vertex AI model. You can run batch predictions or deploy the model for online predictions right away.
  • Provides inherent model interpretability. You can get insight into which features TabNet used to make its decision.
  • Supports GPU training.

To learn more about Tabular Workflows, see Tabular Workflows on Vertex AI. To learn more about Tabular Workflow for TabNet, see Tabular Workflow for TabNet.

Tabular Workflow for Wide & Deep

Tabular Workflow for Wide & Deep is a pipeline that you can use to train classification or regression models. Wide & Deep jointly trains wide linear models and deep neural networks. It combines the benefits of memorization and generalization. In some online experiments, the results showed that Wide & Deep significantly increased Google store application acquisitions compared with wide-only and deep-only models.

Benefits

  • Integrated with Vertex AI. The trained model is a Vertex AI model. You can run batch predictions or deploy the model for online predictions right away.

To learn more about Tabular Workflows, see Tabular Workflows on Vertex AI. To learn more about Tabular Workflow for Wide & Deep, see Tabular Workflow for Wide & Deep.

Classification and regression with AutoML

Vertex AI offers integrated, fully managed pipelines for end-to-end classification or regression tasks. Vertex AI searches for the optimal set of hyperparameters, trains multiple models with multiple sets of hyperparameters and then creates a single, final model from an ensemble of the top models. Vertex AI considers neural networks and boosted trees for the model types.

Benefits

  • Easy to use: model type, model parameters, and hardware are chosen for you.

For further information, see Classification and Regression Overview.

Vertex AI solutions for forecasting

Vertex AI offers the following solutions for forecasting:

Tabular Workflow for Forecasting

Tabular Workflow for Forecasting is the complete pipeline for forecasting tasks. It is similar to the AutoML API, but allows you to choose what to control and what to automate. Instead of having controls for the whole pipeline, you have controls for every step in the pipeline. These pipeline controls include:

  • Data splitting
  • Feature engineering
  • Architecture search
  • Model training
  • Model ensembling

Benefits

  • Supports large datasets that are up to 1TB in size and have up to 200 columns.
  • Allows you to improve stability and lower training time by limiting the search space of architecture types or skipping architecture search.
  • Allows you to improve training speed by manually selecting the hardware used for training and architecture search.
  • For some model training methods, allows you to reduce model size and improve latency by changing the ensemble size.
  • Each component can be inspected in a powerful pipelines graph interface that lets you see the transformed data tables, evaluated model architectures and many more details.
  • Each component gets extended flexibility and transparency, such as being able to customize parameters, hardware, view process status, logs and more.

To learn more about Tabular Workflows, see Tabular Workflows on Vertex AI. To learn more about Tabular Workflow for Forecasting, see Tabular Workflow for Forecasting.

Forecasting with AutoML

Vertex AI offers an integrated, fully managed pipeline for end-to-end forecasting tasks. Vertex AI searches for the optimal set of hyperparameters, trains multiple models with multiple sets of hyperparameters, and then creates a single, final model from an ensemble of the top models. You can choose between Time series Dense Encoder (TiDE), Temporal Fusion Transformer (TFT), AutoML (L2L), and Seq2Seq+ for your model training method. Vertex AI considers only neural networks for the model type.

Benefits

  • Easy to use: model parameters and hardware are chosen for you.

For further information, see Forecasting Overview.

Forecasting with BigQuery ML ARIMA_PLUS

BigQuery ML ARIMA_PLUS is a univariate forecasting model. As a statistical model, it is faster to train than a model based on neural networks. We recommend training a BigQuery ML ARIMA_PLUS model if you need to perform many quick iterations of model training or if you need an inexpensive baseline to measure other models against.

Like Prophet, BigQuery ML ARIMA_PLUS attempts to decompose each time series into trends, seasons, and holidays, producing a forecast using the aggregation of these models' predictions. One of the many differences, however, is that BQML ARIMA+ uses ARIMA to model the trend component, while Prophet attempts to fit a curve using a piecewise logistic or linear model.

Google Cloud offers a pipeline for training a BigQuery ML ARIMA_PLUS model and a pipeline for getting batch predictions from a BigQuery ML ARIMA_PLUS model. Both pipelines are instances of Vertex AI Pipelines from Google Cloud Pipeline Components (GCPC).

Benefits

  • Easy to use: model parameters and hardware are chosen for you.
  • Fast: model training gives a low-cost baseline to compare other models against.

For further information, see Forecasting with ARIMA+.

Forecasting with Prophet

Prophet is a forecasting model maintained by Meta. See the Prophet paper for algorithm details and the documentation for more information about the library.

Like BigQuery ML ARIMA_PLUS, Prophet attempts to decompose each time series into trends, seasons, and holidays, producing a forecast using the aggregation of these models' predictions. An important difference, however, is that BQML ARIMA+ uses ARIMA to model the trend component, while Prophet attempts to fit a curve using a piecewise logistic or linear model.

Google Cloud offers a pipeline for training a Prophet model and a pipeline for getting batch predictions from a Prophet model. Both pipelines are instances of Vertex AI Pipelines from Google Cloud Pipeline Components (GCPC).

Integration of Prophet with Vertex AI means that you can do the following:

Although Prophet is a multivariate model, Vertex AI supports only a univariate version of it.

Benefits

  • Flexible: you can improve training speed by selecting the hardware used for training

For further information, see Forecasting with Prophet.

What's next