Data Analytics

AutoML Tables is now generally available in BigQuery ML

June 29, 2021

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_xUQd32s.max-2000x2000.jpg

Steve Walker

Customer Engineer, Machine Learning

Google’s cloud data warehouse, BigQuery, has enabled organizations around the world to accelerate their digital transformation and empower their data analysts to unlock actionable insights from their data. Using BigQuery ML, data analysts are able to create sophisticated machine learning models with just SQL and uncover predictive insights from their data much faster. Today we are excited to announce the addition of the AutoML Tables model type to the list of supported ML models within BigQuery ML. The AutoML Tables model type, now generally available, integrates directly and seamlessly with our Vertex AI AutoML Tables offering, and enables teams to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale. BigQuery ML can improve AutoML models, as it transforms input variables into features for AutoML Tables by standardizing numeric columns, one-hot encoding non-numerical columns, extracting components from timestamp, and even expanding array and struct columns. It even does missing value imputation with approaches for numerical, categorial and timestamp columns.

How does AutoML Tables build powerful, sophisticated models? Behind the scenes, AutoML does quite a lot of machine learning magic:

preprocesses the data
performs automatic feature engineering
model architecture search
model tuning
cross validation
automatic model selection and ensembling

https://storage.googleapis.com/gweb-cloudblog-publish/images/image5_lmPOzPu.max-1000x1000.png

Walking through an example:

Linear Regression

Using the new_york_taxi_trips.tlc_yellow_trips_2018 dataset that is part of BigQuery’s public datasets, you can try using AutoML Tables to predict the taxi ride tip amount. As a first iteration, since we are trying to predict a continuous dependent variable, we will build a linear regression model using just SQL:

You can see the evaluation metrics for this linear regression model below. Note the low R2 value of .35.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Table_2.max-1000x1000.jpg

You can then use the model to do predictions:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image7_cEPpwXC.max-1200x1200.png

AutoML Tables

To try to improve on the R2 metric, you can use the AutoML Tables model type:

There are a couple of things to note here:

1: The model_type is ‘AUTOML_REGRESSOR’, because the goal is to predict a number (continuous dependent variable). To predict a category or class, you can instead use the ‘AUTOML_CLASSIFIER’ model type.

2: The ‘budget-hours’ parameter tells AutoML Tables to train the model for a maximum of one hour, compress the model if necessary, and then stop.

Here are the evaluation metrics with AutoML Tables:

https://storage.googleapis.com/gweb-cloudblog-publish/images/Table_1.max-1000x1000.jpg

The AutoML Tables model has improved the R2 from .35 to .41, a 15% improvement. That is outstanding!

Using the AutoML model for prediction:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_vtspaQk.max-1200x1200.png

The power of AutoML Tables lies in the ability to feed the model any and all of your data, let Google’s machine learning perform the feature engineering, model selection, and hyperparameter tuning, and ensemble a state-of-the-art model for you. Whether it's to use AutoML to create an initial benchmark for your data science team, or to use AutoML directly for your machine learning problems, one thing is clear: you can save time and complexity by relying on AutoML Tables. That leaves you more time to solve your next business problem!

Posted in

Data Analytics

Data Analytics

How to reduce costs with Managed Service for Apache Kafka: CUDs, compression and more

By Qiqi Wu • 5-minute read

Data Analytics

How to use gen AI for better data schema handling, data quality, and data generation

By Deb Lee • 9-minute read

Data Analytics

BigQuery ML is now compatible with open-source gen AI models

By Vaibhav Sethi • 3-minute read

Data Analytics

Introducing BigQuery metastore, a unified metadata service with Apache Iceberg support

By Yuri Volobuev • 4-minute read

AutoML Tables is now generally available in BigQuery ML

Steve Walker

Walking through an example:

Related articles

How to reduce costs with Managed Service for Apache Kafka: CUDs, compression and more

How to use gen AI for better data schema handling, data quality, and data generation

BigQuery ML is now compatible with open-source gen AI models

Introducing BigQuery metastore, a unified metadata service with Apache Iceberg support