A top-10 list of Google BigQuery user experiences in 2016
If 2016 is any evidence, the use cases for BigQuery are becoming more prevalent and diverse than ever.If one proxy for measuring adoption is how enthusiastic the community is about sharing its positive experiences, then 2016 was a milestone for Google BigQuery, our managed service for doing petabyte-scale data analytics/data warehousing on the public cloud.
Here are our 10 favorite BigQuery stories of 2016, listed in chronological order. They span a range of industries and use cases, but they all illustrate BigQuery’s role in a data analytics pipeline that begins with storage and ingestion and extends to machine learning. (Note: unless otherwise indicated by the author, assume that these posts represent personal, not company, points of view.)
- Our trip with BigQuery, by Peter Mueller
This CTO describes querying 7 years’ worth of client data, and how and why his company eventually integrated BigQuery into its own product back-end.
- Creating a serverless ETL nirvana using Google BigQuery, by Graham Polley
In this post, the author explains how he combined the BigQuery federated sources feature, which supports queries across Google Cloud Storage and Google Drive, with UDFs to build a serverless ETL pipeline.
- A billion taxi rides on Google's BigQuery, by Mark Litwintschik
This big data consultant evaluated BigQuery to see how fast it could query metadata collected from 1.1 billion taxi trips. The answer? Really fast.
- Getting your feet wet in the data lake: Analytics 360 in BigQuery, by Hazem Mahsoub
For this analyst, BigQuery’s federated sources feature, which includes support for integrating JSON and CSV data for analysis, proved to be a game changer.
- Real time fraud detection with BigQuery, by Stephen Whitworth
As demonstrated by this use case as an analytics back-end for real-time fraud detection, BigQuery has properties that can obviate the use of a traditional time-series database for powering a data warehouse.
- BigQuery at WePay, by Chris Riccomini
This post describes WePay’s use of Apache Airflow (incubating) to orchestrate data transfer from an operational MySQL database to Google Cloud Storage, and from Cloud Storage to BigQuery, for analysis.
- Google BigQuery hits the gym and beefs up!, by Graham Polley
In this follow-up to his “serverless ETL nirvana” post (see above), the author offers a wrap-up of new or alpha/beta features in BigQuery, including a new columnar storage format that can accelerate queries, Apache Avro support, ANSI SQL 2011 compatibility, and partitioned tables.
- Calibrating temperature forecasts with machine learning, Google BigQuery, and reforecasts, by Francisco M. Alvarez
For data scientists, improving the accuracy of weather forecasts (or any prediction, really) can present an irresistible challenge. Here, the author explains how he used BigQuery to collect, explore and process historical weather data (provided as a sample BigQuery dataset), and then brought it and other data into Google Cloud Datalab as a pandas DataFrame for machine-learning purposes.
- Demystifying educational MOOC data using Google BigQuery: The Person-Course dataset, by Glenn Lopez
This Harvard University administration data scientist used BigQuery to get insights about potential optimizations based on how thousands of daily online learners interact with Harvard MOOC content.
- Google BigQuery, and why big data is about to have its Gmail moment, by Mark Rittman
In this post, the author provides some personal observations based on his development of a pipeline that ingests IoT data via Google Cloud Pub/Sub, processes it using Google Cloud Dataflow, and then streams that data into BigQuery for analysis.
If you know of others, please post links in comments so everyone can benefit from them. We value the BigQuery user community immensely, and look forward to seeing all the wonderful things you produce in 2017!