How to crunch your business data from Sheets in BigQuery
By Dan McClary, Product Manager
Love them or hate them, spreadsheets play an undeniable role in the operations of many businesses. They offer users a convenient way to manipulate data and a scratchpad for programming with small amounts of data. Given the importance of spreadsheets, incorporating them in your data warehouse — while still maintaining the ability to change the spreadsheet — can enable new collaborations and quicker insights.
Fortunately, Google BigQuery can query Google Sheets just like any other table, allowing exactly this kind of collaboration. Here’s how you can query your Sheets using BigQuery, plus tips on how to easily maintain sharing controls to keep your data protected.
A Bay Area example
Let’s consider the public dataset for movies and TV filmed in San Francisco — a simple table, but we can learn some interesting things from it. If we want to know how many movies or TV shows were filmed in San Francisco each year, we can perform the following query:
#standardSQL SELECT release_year, COUNT(DISTINCT title) AS num_movies FROM `bigquery-public-data.san_francisco.film_locations` GROUP BY release_year ORDER BY release_year DESC
But how do you combine that with a spreadsheet? Suppose you’re a nerd and keep a spreadsheet of movies with ratings like this one. (Yes, these are actual movies and TV shows I’ve rated on a certain video streaming service.)
Adding this Sheet to BigQuery is easy! Here’s how to do it:
- In the BigQuery Web UI, click on a dataset and choose to create a table.
- Choose the source as Google Drive.
- Paste the URL for your Sheet into the location bar. Note: Make sure you copy the URL from the worksheet in Sheets that you want to serve as the table.
- Choose either CSV or Sheets as the format. Note: CSV format will allow you to check “Auto-detect Schema.” Sheets’ format will allow you to specify the column names and types.
Once that’s done, it’s easy to write queries to join your spreadsheet data with data stored and managed by BigQuery. Suppose we want to know fun facts and locations for the movies we’ve rated that are set in San Francisco, we could write a query like this:
#standardSQL SELECT release_year, sf.title, locations, fun_facts FROM `bigquery-public-data.san_francisco.film_locations` sf INNER JOIN `sheets-and-bigquery.my_movies.movies_i_rated` me ON LOWER(sf.title) = LOWER(me.Title)
Or what if we want to take visiting friends and family on a movie-themed tour of San Francisco? We can use Array SQL to create a concise crib sheet for each of the locations we’d visit.
#standardSQL SELECT locations, ARRAY_AGG(sf.title) AS titles, ARRAY_AGG(release_year) as release_years, fun_facts FROM `bigquery-public-data.san_francisco.film_locations` sf INNER JOIN `sheets-and-bigquery.my_movies.movies_i_rated` me ON LOWER(sf.title) = LOWER(me.Title) GROUP BY locations, fun_facts
Query the most up-to-date data
Unlike most tables in a data warehouse, we expect spreadsheets to change often, and not through SQL statements. Fortunately, BigQuery takes this into account when you create tables backed by Sheets. You can edit a sheet and relevant changes appear immediately in BigQuery. For example, we can add Ant-Man to our sheet and it automatically populates in the query.
This is great, but what about updating the new column rating? If we add the field to the BigQuery table definition, Rating will start to show up. If we don’t, BigQuery will ignore it. Let’s take a look:
#standardSQL SELECT Date_reviewed, Title FROM `sheets-and-bigquery.my_movies.movies_i_rated` ORDER BY Date_reviewed DESC LIMIT 10
The new data is picked up where relevant, and adding a column doesn’t break the existing query.
Control share settings
Your data needs to stay protected in both systems, which is why we’ve created built-in security measures for both BigQuery and Sheets. There’s a way to bridge access between the two, to help you manage access for groups of users — just create a Google Group to serve as an access control group. Here’s an example where we create a group called “MyMovieSpreadsheets.” Once we’ve created the group, we:
- Give MyMovieSpreadsheets view access to the Sheet.
- Share the BigQuery dataset with MyMovieSpreadsheets.
Now, you have a convenient way to manage access to both the spreadsheet and the dataset. If you want to try these queries yourself, just join this public Google Group.
It's important for you to be able to work with tools you're comfortable with in the workplace, and to gain valuable data insights for your team all in one spot. Test out BigQuery and Sheets queries at https://bigquery.cloud.google.com and try these examples.