Jump to Content
Data Analytics

Wrapping up the summer: A host of new stories and announcements from Data Analytics

September 10, 2021
Sudhir Hasbe

Sr. Director of Product Management, Google Cloud

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

August is the time to sit back, relax, and enjoy the last of the summer. Or, for those in the southern hemisphere, August is the month you start looking at your swimsuits and sunglasses with interest as the weather warms. But regardless of where you live, August also seems to be when Google produces a lot of interesting Data Analytics reading.

In this monthly recap, I’ll divide last month’s most interesting articles into three groups: New Features and Announcements, Customer Stories, and  How-Tos. You can read through in order or skip to the section that’s most interesting to you.

Features and announcements

Datasets and Demo Queries - Recursion is a powerful topic, but it’s also a marvelous metaphor.  During August, we launched our most self-referential dataset yet; Google Cloud Release Notes are now in BigQuery. Find the product and feature announcements you need and do it fast using the new Google Cloud Release Notes Dataset. Additionally, we also launched the Top 25 topics in Google Trends (Looker Dashboard).

Save Messages, money and time with Pub/Sub topic retention. When you enable topic retention, all messages sent to the topic within the chosen retention window are accessible to all the topic's subscriptions —without increasing your storage costs when you add subscriptions. Additionally, messages will be retained and available for replay even if there are no subscriptions attached to the topic at the time the messages are published, allowing subscribers to see the entire history of messages sent to the topic.

Extend your Dataflow templates with UDFs. Google provides a set of Dataflow templates that customers commonly use for frequent data tasks but also as reference data pipelines that developers can extend. But what if you want to customize a Dataflow template without modifying or maintaining the Dataflow template code itself? With JavaScript user-defined functions (UDFs), customers can now extend certain Dataflow templates with custom logic to transform records on the fly. This is especially helpful for users who want to customize a pipeline’s output format without having to re-compile or maintain the template code itself.

The diagram below shows the process flow for UDF enabled Dataflow Templates

https://storage.googleapis.com/gweb-cloudblog-publish/images/UDF_enabled_Dataflow_Templates.max-800x800.jpg

Customer stories

Renault uses BigQuery to improve its Industrial Data platform
Last month, Renault described their journey and the impact of establishing an industrial IoT analytics system using Dataflow, Pub/Sub, and BigQuery for traceability of tools and products, as well as measure operational efficiency.  By consolidating multiple data silos into BigQuery the IT Infrastructure team was able to reduce storage costs by 50% even while processing several times more data than on their legacy system.

The chart below shows the multifold growth in data that Renault processed each month (blue bars) along with the corresponding drop in cost (red line) between the previous system (shaded area) and the Google solution.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Renault_growth.max-1300x1300.jpg

Constellation Brands chose Google Cloud to power Direct to Consumer (DTC) shift

Ryan Mason, Director and Head of Direct to Consumer Strategy from Constellation Brands authored a piece on the business value of DTC channels as well as the method and impact of how he implemented his pipelines. This story explained how to gather multiple streams from the Google Marketing platform (Analytics 360, Tag Manager 360) and land them in BigQuery.

A key differentiator for us is that all Google Marketing Platform data is natively accessible for analysis within BigQuery.

From there, Constellation Brands can calculate key performance indicators (KPIs), such as customer acquisition cost (CAC), Customer Lifetime Value (CLV), and Net Promoter Score (NPS), and broadcast them across the company using Looker dashboards. In this way, the entire organization can track the health of the business through common access to the same KPI’s.

The operational impact of Looker [dashboards] is also substantial: our team estimates that the number of hours needed to reach critical business decisions has been reduced by nearly 60%.

https://storage.googleapis.com/gweb-cloudblog-publish/images/looker_dashboard_o83pQJs.max-1300x1300.jpg

How-Tos

Our DevRel Rockstar Leigha Jarett has published four very useful articles in her continuing series on the BigQuery Administration Reference Guide. This month she covered: Monitoring, API Landscape, Data Governance, and Query Optimization.

I highly recommend reading the article on query optimization. It’s packed with good tips, from the most commonly used — partitioning and clustering to reduce the amount of data that the query has to scan – to some lesser-known tips like proper ordering of expressions.

This article on workload management in BigQuery has some very useful tips and explains the key constructs of Commitments, Reservations, and Assignments.

https://storage.googleapis.com/gweb-cloudblog-publish/images/workload_management_in_BigQuery.max-900x900.jpg

If streaming is your game then this post from Zeeshan Kahn on handling duplicate data in streaming pipelines using DF and PS will come in handy.

Zooming up a few thousand feet, Googlers Firat Tekiner and Susan Pierce offer some high level insights as they discuss the convergence of data lakes and data warehouses. After all, who wants to manage two sets of infrastructure?   A unified data platform is the way to go.

And that’s how we do a relaxing summer month here at Google Cloud! Stay tuned for more announcements, how-to blogs, and customer stories as we ramp up for Google Cloud Next coming in October.

Posted in