Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

Google Cloud Platform for Data Scientists: Using R with Google BigQuery, Part 2 (storing and retrieving data frames)

Thursday, July 20, 2017

Learn how to create an R data frame and stash it in BigQuery using bigrquery.

Moving Thumbtack’s data infrastructure to Google Cloud Platform

Tuesday, July 18, 2017

Learn how Thumbtack ramped up GCP usage from a few BigQuery tables to include all of its data infrastructure, a move resulting in big productivity gains.

How to aggregate data for BigQuery using Apache Airflow

Tuesday, July 11, 2017

Users of Google BigQuery, the cloud-native data warehouse service from GCP, have access to an ever-expanding range of public datasets for exploration.

After Lambda: Exactly-once processing in Cloud Dataflow, Part 3 (sources and sinks)

Thursday, July 6, 2017

The series concludes with a description of how exactly-once processing in Cloud Dataflow is supported by sources and sinks.

Counting uniques faster in BigQuery with HyperLogLog++

Wednesday, July 5, 2017

Learn how BigQuery uses HyperLogLog++, Google’s internal implementation of the HyperLogLog algorithm for cardinality estimation.

Get on track to becoming a Google Certified Professional Data Engineer

Friday, June 30, 2017

Get tips on preparing for the exam to become a Google Certified Data Engineer. Show prospective employers you have the skills to build and scale on GCP.

Cloud Machine Learning Perception services updates: Cloud Video Intelligence enters beta and Cloud Vision gets new features

Thursday, June 29, 2017

Cloud Video Intelligence beta is now open to all. Now Google Cloud Platform users can use Cloud Video Intelligence API to understand their video content.

Introducing Cloud Dataflow Shuffle: For up to 5x performance improvement in data analytic pipelines

Tuesday, June 27, 2017

Learn how the new service-based Shuffle feature brings significant performance improvements to your Cloud Dataflow pipelines.

How Qubit deduplicates streaming data at scale with Google Cloud Platform

Monday, June 26, 2017

Learn how Qubit uses GCP to dedupe messages at scale, with no self-managed components.

Guide to common Cloud Dataflow use-case patterns, Part 1

Friday, June 16, 2017

In this open-ended series, we'll describe the most common Dataflow use-case patterns, including description, example, solution and pseudocode.

Training an object detector using Cloud Machine Learning Engine

Thursday, June 15, 2017

Announcing Tensorflow Object Detection API, a new open source framework for object detection that makes model development and research easier.

Visualization and large-scale processing of historical weather radar (NEXRAD Level II) data

Thursday, June 15, 2017

The historical archive of NEXRAD network weather radar data is now available as a public dataset on Google Cloud Storage.

Build your own machine-learning-powered robot arm using TensorFlow and Google Cloud

Tuesday, June 13, 2017

Learn about the Find Your Candy robot arm powered by machine learning.

U.S. EPA and OpenAQ air quality data now available in BigQuery

Wednesday, June 7, 2017

Using these new public datasets in BigQuery is a great way to understand air quality in your community.

Fastest track to Apache Hadoop and Spark success: using job-scoped clusters on cloud-native architecture

Tuesday, June 6, 2017

A combination of rapid startup time, per-minute billing, and cloud-native architecture is transformative for operators.

Cloud Dataflow 2.0 SDK goes GA

Monday, June 5, 2017

Learn what “Beam-first” design means for the new Cloud Dataflow 2.0 SDKs for Java and Python

Life of a BigQuery streaming insert

Thursday, June 1, 2017

Understanding how BigQuery streaming inserts work makes it easier to build real-time applications.

After Lambda: Exactly-once processing in Cloud Dataflow, Part 2 (Ensuring low latency)

Tuesday, May 30, 2017

Learn about graph optimization and Bloom filters in Cloud Dataflow.

Introducing Ads Data Hub: Next generation insights and reporting

Wednesday, May 24, 2017

With Ads Data Hub, advertisers can access detailed, impression-level data about their cross-device media campaigns in a more secure, privacy-safe environme

New healthcare and population datasets now available in Google BigQuery

Friday, May 19, 2017

Query newly added publicly available healthcare datasets now on Google BigQuery.

Try Google BigQuery today: Now with 10GB of free storage

Wednesday, May 17, 2017

Getting started with BigQuery keeps getting easier, and for more people.

An in-depth look at Google’s first Tensor Processing Unit (TPU)

Friday, May 12, 2017

There’s a common thread that connects Google services such as Google Search, Street View, Google Photos, Google Translate: they all use Google’s Tensor

Designing ETL architecture for a cloud-native data warehouse on Google Cloud Platform

Thursday, May 11, 2017

Learn how to build an ETL solution for Google BigQuery using Google Cloud Dataflow, Google Cloud Pub/Sub and Google App Engine Cron as building blocks.

After Lambda: Exactly-once processing in Google Cloud Dataflow, Part 1

Wednesday, May 10, 2017

Learn the meaning of “exactly once” processing in Dataflow, its importance for stream processing overall and its implementation in streaming shuffle phase

Announcing general availability of GPUs for Cloud Machine Learning Engine

Tuesday, May 9, 2017

Cloud GPUs are generally available for use with Cloud Machine Learning Engine. See how Airbus and Global Fishing Watch used GPUs.

Free Trial

Get $300 free credit to spend over 12 months

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.