BigQuery goes wide: this week on Google Cloud Platform
One of the many tools that sets Google Cloud Platform apart from other cloud providers is Google BigQuery, a managed data warehouse service that allows users to query petabyte-scale datasets with a familiar SQL-like interface.
Last month, we announced that BigQuery is now integrated with Google Drive, our online file storage space, and Google Sheets, our spreadsheet app. That’s on top of an existing integration with Google Data Analytics Studio, as discussed by Felipe Hoffa here. We’ve also highlighted some fun and interesting ways to use BigQuery, for example, to forecast demand, or to visualize campaign contributions to the 2016 presidential elections.
Now it seems the broader cloud community is getting wind of how useful and useable BigQuery can be, and is working on ways to use it with workloads and datasets outside of GCP. This week, we read an interesting blog from Dominic Woodman about how he uses BigQuery for large-scale SEO processes such as doing an audit on a large number of internal links. The article is a must-read for more than just online marketers, though — it’s relevant to anyone who makes heavy use of Microsoft Excel. “What do you do when Excel fails?” Woodman writes. “Excel is a fantastic tool, but that doesn't mean it’s what we should use for everything.”
We also heard from Gareth Jones at Shine Technologies, a digital consulting company, about exporting HTTP request logs from AWS into BigQuery for analysis.
Now, we could use Splunk or fluentd or logstash or some other great service for doing this, but our client is familiar with BigQuery, they like the SQL interface, and they have other datasets stored there already. As a bonus, they could run their own reports instead of having to talk to developers (and nobody wants to do that, not even developers).
You should still read this article even if you’re not in the business of analyzing lots request logs. That’s because along the way, Jones also introduces a useful hack for avoiding egress charges as they move data out of AWS. These charges would have set them back $350/month, and avoiding extra charges is something that anyone working with multiple cloud providers can get behind.