Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

How Aucnet leveraged TensorFlow to transform their IT engineers into machine learning engineers

Thursday, August 17, 2017

Learn how Aucnet uses deep learning to build a real-time car image recognition system powered by Tensorflow.

Easier integration with Apache Spark and Hadoop via Google Cloud Dataproc Job IDs and Labels

Tuesday, August 15, 2017

Learn best practices for using Google Cloud Dataproc Job IDs and Labels to integrate your apps with Apache Spark and Hadoop.

Hyperparameter tuning in Cloud Machine Learning Engine using Bayesian Optimization

Thursday, August 10, 2017

Learn about HyperTune, hyperparameter tuning as a service, in Cloud Machine Learning Engine.

When art meets big data: Analyzing 200,000 items from The Met collection in BigQuery

Monday, August 7, 2017

This new public dataset allows you to build a custom machine-learning model, create an app for sorting and visualizing the images, and more.

Traveloka’s journey to stream analytics on Google Cloud Platform

Thursday, August 3, 2017

Travel technology company Traveloka talks about migrating its streaming data processing pipeline to a multi-cloud solution including GCP data analytics.

How WePay uses stream analytics for real-time fraud detection using GCP and Apache Kafka

Tuesday, August 1, 2017

Learn how WePay built a new stream analytics pipeline for real-time fraud detection using Apache Kafka and Google Cloud Platform.

Life of a Cloud Dataflow service-based shuffle

Monday, July 31, 2017

Learn the practical impact of Google Cloud Dataflow's Shuffle on data pipelines using the Opinion Analysis project as an example.

Running external libraries with Cloud Dataflow for grid-computing workloads

Friday, July 28, 2017

Learn how Cloud Dataflow used in conjunction with other GCP services can unlock parallel workloads.

Cloud Dataproc is now even faster and easier to use for running Apache Spark and Apache Hadoop

Wednesday, July 26, 2017

Learn about Cloud Dataproc 1.2, which includes software component updates, environment configuration changes and YARN changes.

New hands-on labs for scientific data processing on Google Cloud Platform

Monday, July 24, 2017

Try out 7 labs which can teach scientists how to use Google Cloud products and services to support their professional goals.

Google Cloud Platform for Data Scientists: Using R with Google BigQuery, Part 2 (storing and retrieving data frames)

Thursday, July 20, 2017

Learn how to create an R data frame and stash it in BigQuery using bigrquery.

Moving Thumbtack’s data infrastructure to Google Cloud Platform

Tuesday, July 18, 2017

Learn how Thumbtack ramped up GCP usage from a few BigQuery tables to include all of its data infrastructure, a move resulting in big productivity gains.

How to aggregate data for BigQuery using Apache Airflow

Tuesday, July 11, 2017

Users of Google BigQuery, the cloud-native data warehouse service from GCP, have access to an ever-expanding range of public datasets for exploration.

After Lambda: Exactly-once processing in Cloud Dataflow, Part 3 (sources and sinks)

Thursday, July 6, 2017

The series concludes with a description of how exactly-once processing in Cloud Dataflow is supported by sources and sinks.

Counting uniques faster in BigQuery with HyperLogLog++

Wednesday, July 5, 2017

Learn how BigQuery uses HyperLogLog++, Google’s internal implementation of the HyperLogLog algorithm for cardinality estimation.

Get on track to becoming a Google Certified Professional Data Engineer

Friday, June 30, 2017

Get tips on preparing for the exam to become a Google Certified Data Engineer. Show prospective employers you have the skills to build and scale on GCP.

Cloud Machine Learning Perception services updates: Cloud Video Intelligence enters beta and Cloud Vision gets new features

Thursday, June 29, 2017

Cloud Video Intelligence beta is now open to all. Now Google Cloud Platform users can use Cloud Video Intelligence API to understand their video content.

Introducing Cloud Dataflow Shuffle: For up to 5x performance improvement in data analytic pipelines

Tuesday, June 27, 2017

Learn how the new service-based Shuffle feature brings significant performance improvements to your Cloud Dataflow pipelines.

How Qubit deduplicates streaming data at scale with Google Cloud Platform

Monday, June 26, 2017

Learn how Qubit uses GCP to dedupe messages at scale, with no self-managed components.

Guide to common Cloud Dataflow use-case patterns, Part 1

Friday, June 16, 2017

In this open-ended series, we'll describe the most common Dataflow use-case patterns, including description, example, solution and pseudocode.

Training an object detector using Cloud Machine Learning Engine

Thursday, June 15, 2017

Announcing Tensorflow Object Detection API, a new open source framework for object detection that makes model development and research easier.

Visualization and large-scale processing of historical weather radar (NEXRAD Level II) data

Thursday, June 15, 2017

The historical archive of NEXRAD network weather radar data is now available as a public dataset on Google Cloud Storage.

Build your own machine-learning-powered robot arm using TensorFlow and Google Cloud

Tuesday, June 13, 2017

Learn about the Find Your Candy robot arm powered by machine learning.

U.S. EPA and OpenAQ air quality data now available in BigQuery

Wednesday, June 7, 2017

Using these new public datasets in BigQuery is a great way to understand air quality in your community.

Fastest track to Apache Hadoop and Spark success: using job-scoped clusters on cloud-native architecture

Tuesday, June 6, 2017

A combination of rapid startup time, per-minute billing, and cloud-native architecture is transformative for operators.

Free Trial

Get $300 free credit to spend over 12 months

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.