Jump to Content
Developers & Practitioners

How to use a machine learning model from a Google Sheet using BigQuery ML

February 25, 2021
Karl Weinmeister

Head of Cloud Product DevRel

Spreadsheets are everywhere! They are one of the most useful productivity tools available. They make organizing, calculating, and presenting data a breeze. Google Sheets is the spreadsheet application included in Google Workspace, which has over 2 billion users

Machine learning, or ML for short, has also become an essential business tool. Making predictions with data at low cost and high accuracy has transformed industries. The adoption of machine learning in business is estimated to be growing at over 40% a year.

Doesn't it make sense to bring the power of machine learning to all the data out there in spreadsheets? I definitely think so! Now we have the tools to make this happen. Let's take a look in this blog post.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/BQML_1.gif

The Big Picture

BigQuery ML, built into BigQuery, enables users to create machine learning models using standard SQL queries. In this blog post, we'll discuss how to create a time series forecasting model with BigQuery ML. The input for a time series model is a historical sequence of values, and the output is a sequence of future predicted values.

I've picked this particular type of model because time series data is very common in spreadsheets. You can imagine a variety of scenarios that might have an ordered set of dates along with numeric values: sales, staffing, operational metrics, etc. For an in-depth look at a demand forecasting reference pattern using BigQuery ML, I recommend this blog post.

What you'll learn in this blog post will apply to any type of machine learning model. BigQuery ML supports a wide variety of model types, from neural networks, AutoML models, or even TensorFlow models. You don't need to know how the model is built--you just need to bring your data and choose the appropriate options. BigQuery ML will build and host the model for you.

To build the model, you need training data to learn patterns from. Fortunately, that data is right in your spreadsheet! If you use Connected Sheets, you can even access massive amounts of data directly from BigQuery, ensuring that you're working with the most recent, secured data at all times.

To execute BigQuery ML queries from Sheets, we'll use Apps Script, a cloud-based, JavaScript platform to extend Google Workspace. The Apps Script code will extract input data from the spreadsheet; execute BigQuery ML queries for training and predicting; and update the spreadsheet with predictions from the model.

The Problem

As an example scenario, let's forecast visits to an e-commerce site using Google Analytics data from BigQuery Public Datasets. The dataset consists of 12 months of traffic, content, and transaction data. Let's look at a chart of hourly website visits:
https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_2.max-1200x1200.png

This dataset contains some complex patterns that the robust forecasting capabilities in BigQuery ML can handle well. For example, we see a repeating pattern over each day, as well as over each week (daily and weekly seasonality). Also, there are some spikes that could potentially throw off a forecasting algorithm, but BigQuery ML provides automatic outlier detection to manage these events.

Using Data from BigQuery

Our example code will work with your data, wherever it comes from. If you do have access to business data in BigQuery, Connected Sheets is the recommended way to access it from Sheets. Let's take a look.

From the Data menu, you'd choose to Connect to BigQuery:
https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_3.max-300x300.png

Select the project, dataset, and table, and your spreadsheet is now connected!

https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_4.max-1300x1300.png

Connected Sheets allows you to unlock big data insights with features you’re already familiar with in Sheets such as pivot tables, charts and formulas,that can be automatically refreshed with new data from BiqQuery. For our scenario, we'd want to create a model trained with a snapshot from a certain point in time. That's as easy as selecting the data, and then selecting Extract:

https://storage.googleapis.com/gweb-cloudblog-publish/images/BGML_5.max-1500x1500.png

Using a BigQuery ML Model from Sheets

Let's now look into how we can execute BigQuery ML commands from the Sheets user interface. We'll walk through a code sample from the Google Workspace ml-integration-samples to show how this can be done.

By opening up the Script Editor from your sheet (Tools > Script Editor), you'll be able to write your Apps Script code and configure integrations. This sample has enabled two Advanced Services, BigQuery and Sheets, to support the use of these APIs in the script.
https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_6.max-600x600.png

Let's first look at the menu that's added to Sheets, and see how we linked it to code in the project:

https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_7.max-500x500.png

The onOpen() is triggered when the spreadsheet is opened. You can see that it creates several menu items, which are each linked to functions. For example, the "Train" menu item will invoke the train() function when it is selected.

Loading...

Model Training

Let's now explore how the training function works with any custom data provided. The user selects a range of data in the sheet, each containing a date and the data to forecast. The training code extracts these values that are populated in the Range object.
https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_8.max-400x400.png

The rows are then inserted into a temporary BigQuery table. Finally, a query is executed to create a time series model with the data in the table.

Loading...

Let's now look at an example query that could be generated by the training code. It specifies a model type of "ARIMA" (for time series) and the timestamp and data columns from the temporary table. Additional options are available in BigQuery ML, such as holidays, but are not included in the sample.

Loading...

Forecasting

To make a forecast, you would select a number of rows in the spreadsheet, and then select Forecast in the menu. The script will invoke the ML.FORECAST() function on the model, specifying the horizon as the number of rows selected.
https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_9.max-400x400.png

Two fields are extracted from the forecast, for each time period:

  • The forecast timestamp formatted as a string with the date, time, and time zone.
  • The forecast value for that timestamp.

The query might look like this, assuming 3 rows are selected:

Loading...

Now, let's look at an actual forecast! Here, we see the last week of data in blue, along with a forecast for the last 3 days in red.

https://storage.googleapis.com/gweb-cloudblog-publish/images/BQML_10.max-1200x1200.png

Putting it all together

By combining Sheets with BigQuery ML, you can bring the power of machine learning to any data in your spreadsheet. You just need to have enough high-quality data to extract meaningful patterns from.

The code sample shows how the training process can work with all types of data, as long as it has a date and a numeric value. Depending on your needs, the sample could be modified to work with different model types; use different options; or work with different spreadsheet layouts. Then, business users can simply use the menu to build new models and forecast with them.

A template sheet, pre-installed with the code sample, can help you get started. You can also follow the instructions in the code sample to install it yourself. If you want to get more hands-on experience with BigQuery ML, I recommend this introductory codelab. With Sheets and BigQuery ML, I hope you are able to solve more problems than ever before!

Posted in