How to crunch your business data from Sheets in BigQuery
Love them or hate them, spreadsheets play an undeniable role in the operations of many businesses. They offer users a convenient way to manipulate data and a scratchpad for programming with small amounts of data. Given the importance of spreadsheets, incorporating them in your data warehouse — while still maintaining the ability to change the spreadsheet — can enable new collaborations and quicker insights.
A Bay Area exampleLet’s consider the public dataset for movies and TV filmed in San Francisco — a simple table, but we can learn some interesting things from it. If we want to know how many movies or TV shows were filmed in San Francisco each year, we can perform the following query:
COUNT(DISTINCT title) AS num_movies
But how do you combine that with a spreadsheet? Suppose you’re a nerd and keep a spreadsheet of movies with ratings like this one. (Yes, these are actual movies and TV shows I’ve rated on a certain video streaming service.)
Adding this Sheet to BigQuery is easy! Here’s how to do it:
- In the BigQuery Web UI, click on a dataset and choose to create a table.
- Choose the source as Google Drive.
- Paste the URL for your Sheet into the location bar. Note: Make sure you copy the URL from the worksheet in Sheets that you want to serve as the table.
- Choose either CSV or Sheets as the format. Note: CSV format will allow you to check “Auto-detect Schema.” Sheets’ format will allow you to specify the column names and types.
Once that’s done, it’s easy to write queries to join your spreadsheet data with data stored and managed by BigQuery. Suppose we want to know fun facts and locations for the movies we’ve rated that are set in San Francisco, we could write a query like this:
LOWER(sf.title) = LOWER(me.Title)
Or what if we want to take visiting friends and family on a movie-themed tour of San Francisco? We can use Array SQL to create a concise crib sheet for each of the locations we’d visit.
ARRAY_AGG(sf.title) AS titles,
ARRAY_AGG(release_year) as release_years,
LOWER(sf.title) = LOWER(me.Title)
Query the most up-to-date dataUnlike most tables in a data warehouse, we expect spreadsheets to change often, and not through SQL statements. Fortunately, BigQuery takes this into account when you create tables backed by Sheets. You can edit a sheet and relevant changes appear immediately in BigQuery. For example, we can add Ant-Man to our sheet and it automatically populates in the query.
This is great, but what about updating the new column rating? If we add the field to the BigQuery table definition, Rating will start to show up. If we don’t, BigQuery will ignore it. Let’s take a look:
The new data is picked up where relevant, and adding a column doesn’t break the existing query.
Control share settings
Your data needs to stay protected in both systems, which is why we’ve created built-in security measures for both BigQuery and Sheets. There’s a way to bridge access between the two, to help you manage access for groups of users — just create a Google Group to serve as an access control group. Here’s an example where we create a group called “MyMovieSpreadsheets.” Once we’ve created the group, we:
1. Give MyMovieSpreadsheets view access to the Sheet.
2. Share the BigQuery dataset with MyMovieSpreadsheets.
Now, you have a convenient way to manage access to both the spreadsheet and the dataset.