We are launching this blog because we have a lot to share with you about new applications powered by modern large-scale data processing technologies. While Google has been instrumental to the development of big data technologies, we still have a lot more to contribute. We’re just getting started, and so is the big data party.
We’re a group of engineers, developer advocates, product managers, technical writers, technical program managers, and support engineers, who see how big data powers Google; we have been working for several years to bring these capabilities to everyone via open source contributions, publications, and Cloud services.
We of course want to write about Google Cloud services, but this blog is about a lot more. Just last Wednesday, we submitted the Dataflow SDK to the Apache Software Foundation (ASF), alongside fellow developers from Cloudera, data Artisans, Talend, Cask, Slack and Paypal. There is a huge amount of energy around making stream processing mainstream and we’re excited to contribute a portable programming model that unifies batch and stream. You can be sure that this effort in ASF will yield plenty of insights on this blog, many of which will not be tied to Google Cloud.
This blog is for anyone who cares about the bleeding edge of data processing technology. Some posts will be very deep, like this series (part 1 and part 2) about modern stream processing recently posted by Google’s Tyler Akidau at O’Reilly. You can expect to read more from Tyler on this blog very soon. But don’t worry, the posts on this blog won’t all ask you to set aside 20 minutes and a thermos of coffee likes those amazing deep dives. There will be plenty of shorter posts when the point can be made succinctly.
This blog is also about nerdy fun. For example, I will try to get one of the MapReduce engineers (now a Dataflow engineer) to tell the story of what happened when he got access, for a few days, to a brand new data center which hadn’t yet gone “online” and was entirely his to play with. A story that so far has only been told over beers, but deserves to be recorded for posterity.
This blog is also for anyone who gets giddy when talking about processing Terabytes in seconds, especially when doing so requires zero setup and costs only a few dollars. Or processing several Petabytes in a single SQL query (for a bit more than a few dollars, but orders of magnitude less than it would cost to assemble a system capable of doing this outside of BigQuery).
We hope that this blog will show that scale is a solved problem, and that the interesting issues are now about development productivity, ease of use, automation, operational excellence, and applying technology in a smart way to solve real world problems.
Posted by: William Vambenepe, Lead Product Manager, Big Data, Google Cloud Platform