Google Cloud

5 can’t-miss sessions about big data and analytics at Google Cloud NEXT ‘17

February 6, 2017

William Vambenepe

Lead Product Manager, Big Data, Google Cloud Platform

For data engineers, scientists and analysts, Google Cloud NEXT ‘17 is a required stop on the conference circuit.

Google Cloud NEXT ‘17 (March 8-10 in San Francisco) is the new landmark conference for people interested in the future of cloud computing, as well as those using (or considering) Google Cloud as an implementation platform today. The design and management of scalable data analytics infrastructure has been a core competency for Google for much of its history, as evidenced by the publication of influential research papers about Bigtable, Dremel, MapReduce and the like over the years. With the subjects of many of those papers now externalized for use by customers as Google Cloud Platform (GCP) managed services, Google Cloud NEXT is the place to be in March for machine-learning engineers and data engineers/architects, scientists and analysts.

With more than 50 codelabs, bootcamps and sessions in the Big Data and Machine Learning area, the conference offers something for everybody across roles and expertise levels. Here I want to highlight a few examples of data-related sessions that I find particularly interesting. (Machine-learning sessions will be covered in a separate post.)

So, listed in current chronological order:

For data engineers: Serverless data processing with Google Cloud Dataflow (March 8, 1:20pm - Sergei Sokolenko, Product Manager, Google Cloud)
Automation is the special sauce for data engineers on GCP. Here Sergei describes the role and value of dynamic work rebalancing, batch and streaming autoscaling and other key auto-features in Cloud Dataflow.
For data engineers and analysts: Data modeling and querying in BigQuery with Standard SQL (March 8, 4pm - Dan McClary, Product Manager, Google Cloud)
Dan will explain how to re-fit standard enterprise data warehouse schemas (Star, Snowflake, Snowstorm) as BigQuery models to ensure optimal performance. (For a deep dive on SQL analytics in particular, you should also consider participating in the “From Data to Insights with BigQuery” bootcamp on March 6 or 7.)
For data engineers and analysts: "Instant insights": how to turn raw data into business intelligence in real time (March 8, 4pm - Rafael Fernandez, Engineering Manager and Slava Chernyak, Software Engineer, Google Cloud)
Learn how to build a real-time, event-driven data processing and analysis pipeline on GCP using Google Cloud Pub/Sub, Google Cloud Datatflow, Google Cloud Bigtable, Google BigQuery and Google Cloud Machine Learning.
For data engineers: Moving your Apache Spark and Apache Hadoop workloads to Google Cloud Platform (March 9, 11:20am - James Malone, Product Manager, Google Cloud)
Here you’ll learn how to use Google Cloud's managed Spark and Hadoop service, Google Cloud Dataproc (clusters in 90 secs!), to take advantage of existing investments in those platforms (and how to migrate).
For data engineers, analysts and scientists: Using Apache Beam for parallel data processing (March 10, 2:40pm - Frances Perry, Software Engineer, Google Cloud & Beam PMC)
Beam offers a consistent programming model/API for building batch or streaming data processing pipelines across various languages and platforms. In this session, you'll get an overview of Beam and its use with multiple “runners” (Cloud Dataflow, Apache Flink, Apache Spark and so on.).