Google Cloud Big Data and Machine Learning Blog
Innovation in data processing and machine learning technology
BigQuery lazy data loading: SQL data languages (DDL and DML), partitions, and half a trillion Wikipedia pageviews
Learn how to use new BigQuery features to query federated tables, and work with DDL, DML, data partitions, and a massive Wikipedia data set.
Serving real-time scikit-learn and XGBoost predictions
Cloud ML Engine now supports scikit-learn and XGBoost in beta. Learn how to start using online prediction with these two additional frameworks.
Stretching Elastic’s capabilities with historical analysis, backups, and cross-cloud monitoring on Google Cloud Platform
Elastic is partnering with Google Cloud Platform to enable Elasticsearch and X-Pack functionality, as well as BigQuery and Stackdriver integration.
Using BigDL for deep learning with Apache Spark and Google Cloud Dataproc
Learn how to use Intel's BigDL to scale machine learning workloads with Apache Spark across multiple nodes and try out this workflow on the MNIST dataset.
Architecting live NCAA predictions: from archives to insights
Learn how the NCAA uses Google Cloud to build a predictive data analytics workflow that helps them get real-time insights from college hoops game data.
Simplifying machine learning on open hybrid clouds with Kubeflow
Cisco and Google Cloud are now collaborating to provide a hybrid architecture for Kubeflow, permitting flexible transfer of TensorFlow jobs to the cloud.
Predicting community engagement on Reddit using TensorFlow, GDELT, and Cloud Dataflow: Part 3
In part 3 of a 3-part series, learn how to use NLP, TensorFlow, GDELT, and Cloud Dataflow to automatically predict subreddit categorization of news posts.
Testing future Apache Spark releases and changes on Google Kubernetes Engine and Cloud Dataproc
Learn how to test out upcoming changes and versions of Apache Spark in Google Kubernetes Engine, preferably on test rather than production data.
How Tokopedia modernized its data warehouse and analytics processes with BigQuery and Cloud Dataflow
Learn how Tokopedia, an Indonesian online marketplace, converted to a Google Cloud data warehouse and analytics platform with BigQuery and Cloud Dataflow.
AutoML Vision in action: from ramen to branded goods
Learn how AutoML Vision identifies the origin Tokyo ramen shop for images of ramen, and how Mercari identifies the brands sold on its marketplace app.
Pre-built Cloud Dataflow templates: KISS for data movement
Get started with simple templates for Cloud Dataflow. Learn how to do simple per-element filters and transforms in JavaScript.
Public datasets: how nonprofits can drive social impact with planetary-scale data
BigQuery Public Datasets, plus Kaggle Datasets and Kernels, make massive datasets available to nonprofits, so that they can help solve global problems.
Joining and shuffling very large datasets using Cloud Dataflow
Learn how to use Cloud Dataflow on tera-scale datasets to shuffle and join efficiently. We also describe Dataflow pricing adjustments in greater detail.
Predicting community engagement on Reddit using TensorFlow, GDELT, and Cloud Dataflow: Part 2
In part 2 of a 3-part series, learn how to use NLP, TensorFlow, GDELT, and Cloud Dataflow to automatically predict subreddit categorization of news posts.
Predicting community engagement on Reddit using TensorFlow, GDELT, and Cloud Dataflow: Part 1
In part 1 of a 3-part series, learn how to use NLP, TensorFlow, GDELT, and Cloud Dataflow to automatically predict subreddit categorization of news posts.
Hyperparameter tuning on Google Cloud Platform is now faster and smarter
Learn how to save time and budget while tuning your TensorFlow model's hyperparameters. Eliminate retraced steps, and minimize your training time.
The switch to self-service marketing analytics at zulily: best practices for using Tableau with BigQuery
zulily explains best practices for building a marketing analytics workflow with Tableau and BigQuery.
Comparing regression and classification on US elections data with TensorFlow Estimators
Apply two fundamental ML techniques, regression and classification, to the Elections 2016 dataset from Kaggle. Discover demographic voting trends.
How Color uses the new Variant Transforms tool for breakthrough clinical data science with BigQuery
Google Cloud customer Color explains how Variant Transforms enables novel genomics and clinical conclusions from within BigQuery.
Cloud poetry: training and hyperparameter tuning custom text models on Cloud ML Engine
Learn how to train a TensorFlow model to suggest the next line of poetry using Tensor2Tensor on Cloud ML Engine.
Google Cloud and NCAA® team up for a unique March Madness® competition hosted on Kaggle
Google Cloud and NCAA® are announcing the annual March Madness Machine Learning Competition on Kaggle, which helps you predict a winning bracket with AI.
How to handle mutating JSON schemas in a streaming pipeline, with Square Enix
Learn to process mutating JSON with Cloud Pub/Sub, Cloud Dataflow, and BigQuery. Square Enix engineers explain how they handle a changing game dataflow.
Practice makes perfect: the Professional Data Engineer Practice Exam is now live
Google Cloud Certified now offers an online practice exam designed to help Professional Data Engineer exam takers check their test readiness.
Easy distributed training with TensorFlow using tf.estimator.train_and_evaluate on Cloud ML Engine
Learn how to quickly and simply distribute your training workload with TensorFlow 1.4 and Cloud Machine Learning Engine.
Bitcoin in BigQuery: blockchain analytics on public data
Learn how to access the Bitcoin blockchain via a new public Google BigQuery dataset. Learn how to visualize transactions in Google Data Studio.