Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

How to crunch your business data from Sheets in BigQuery

Thursday, September 7, 2017

By Dan McClary, Product Manager

Love them or hate them, spreadsheets play an undeniable role in the operations of many businesses. They offer users a convenient way to manipulate data and a scratchpad for programming with small amounts of data. Given the importance of spreadsheets, incorporating them in your data warehouse — while still maintaining the ability to change the spreadsheet — can enable new collaborations and quicker insights.

Fortunately, Google BigQuery can query Google Sheets just like any other table, allowing exactly this kind of collaboration. Here’s how you can query your Sheets using BigQuery, plus tips on how to easily maintain sharing controls to keep your data protected.

A Bay Area example

Let’s consider the public dataset for movies and TV filmed in San Francisco — a simple table, but we can learn some interesting things from it. If we want to know how many movies or TV shows were filmed in San Francisco each year, we can perform the following query:

#standardSQL
SELECT
  release_year,
  COUNT(DISTINCT title) AS num_movies
FROM
  `bigquery-public-data.san_francisco.film_locations`
GROUP BY
  release_year
ORDER BY
  release_year DESC

But how do you combine that with a spreadsheet? Suppose you’re a nerd and keep a spreadsheet of movies with ratings like this one. (Yes, these are actual movies and TV shows I’ve rated on a certain video streaming service.)

Adding this Sheet to BigQuery is easy! Here’s how to do it:

  1. In the BigQuery Web UI, click on a dataset and choose to create a table.
  2. Choose the source as Google Drive.
  3. Paste the URL for your Sheet into the location bar. Note: Make sure you copy the URL from the worksheet in Sheets that you want to serve as the table.
  4. Choose either CSV or Sheets as the format. Note: CSV format will allow you to check “Auto-detect Schema.” Sheets’ format will allow you to specify the column names and types.


Once that’s done, it’s easy to write queries to join your spreadsheet data with data stored and managed by BigQuery. Suppose we want to know fun facts and locations for the movies we’ve rated that are set in San Francisco, we could write a query like this:

#standardSQL
SELECT
  release_year,
  sf.title,
  locations,
  fun_facts
FROM
  `bigquery-public-data.san_francisco.film_locations` sf
INNER JOIN
  `sheets-and-bigquery.my_movies.movies_i_rated` me
ON
  LOWER(sf.title) = LOWER(me.Title)

Or what if we want to take visiting friends and family on a movie-themed tour of San Francisco? We can use Array SQL to create a concise crib sheet for each of the locations we’d visit. 

#standardSQL
SELECT
  locations,
  ARRAY_AGG(sf.title) AS titles,
  ARRAY_AGG(release_year) as release_years,
  fun_facts
FROM
  `bigquery-public-data.san_francisco.film_locations` sf
INNER JOIN
  `sheets-and-bigquery.my_movies.movies_i_rated` me
ON
  LOWER(sf.title) = LOWER(me.Title)
GROUP BY
  locations, fun_facts

Query the most up-to-date data

Unlike most tables in a data warehouse, we expect spreadsheets to change often, and not through SQL statements. Fortunately, BigQuery takes this into account when you create tables backed by Sheets. You can edit a sheet and relevant changes appear immediately in BigQuery. For example, we can add Ant-Man to our sheet and it automatically populates in the query.

This is great, but what about updating the new column rating? If we add the field to the BigQuery table definition, Rating will start to show up. If we don’t, BigQuery will ignore it. Let’s take a look:

#standardSQL
SELECT
  Date_reviewed,
  Title
FROM
  `sheets-and-bigquery.my_movies.movies_i_rated`
ORDER BY
  Date_reviewed DESC
LIMIT
  10

The new data is picked up where relevant, and adding a column doesn’t break the existing query.

Control share settings


Your data needs to stay protected in both systems, which is why we’ve created built-in security measures for both BigQuery and Sheets. There’s a way to bridge access between the two, to help you manage access for groups of users — just create a Google Group to serve as an access control group. Here’s an example where we create a group called “MyMovieSpreadsheets.” Once we’ve created the group, we:
  1. Give MyMovieSpreadsheets view access to the Sheet.
  1. Share the BigQuery dataset with MyMovieSpreadsheets.

Now, you have a convenient way to manage access to both the spreadsheet and the dataset. If you want to try these queries yourself, just join this public Google Group.

Wrapping up


It's important for you to be able to work with tools you're comfortable with in the workplace, and to gain valuable data insights for your team all in one spot. Test out BigQuery and Sheets queries at https://bigquery.cloud.google.com and try these examples.
  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.

TRY IT FREE

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.