Jump to Content
Data Analytics

Introducing Datastream for BigQuery

September 15, 2022
https://storage.googleapis.com/gweb-cloudblog-publish/images/databases_8QOVRPF.max-2600x2600.jpg
Andi Gutmans

GM & VP of Engineering, Databases

Etai Margolin

Product Manager

In today’s competitive environment, organizations need to quickly and easily make decisions based on real-time data. That’s why we’re announcing Datastream for BigQuery, now available in preview, featuring seamless replication from operational database sources such as AlloyDB for PostgreSQL, PostgreSQL, MySQL, and Oracle, directly into BigQuery, Google Cloud’s serverless data warehouse. Datastream for BigQuery is Google’s next big step towards realizing our vision for the unified data cloud, combining databases, analytics, and machine learning into a single platform that offers the scale, speed, security, and simplicity that modern businesses need. With a serverless, auto-scaling architecture, Datastream allows you to easily set up an ELT (Extract, Load, Transform) pipeline for low-latency data replication enabling real-time insights. 

Consider the case of a large supermarket chain, with 100’s of stores spread out across the region. Each individual store runs its own local point-of-sale and stock management systems, collecting data throughout the day about transactions and stock-levels in the store. To provide visibility and help streamline the chain’s daily operations, the IT department set up a nightly batch process to collect and consolidate all of the data from the stores into a central data warehouse, so that reports on the stores’ performance could be ready for review in the morning. Maintaining this process took time and resources from the data engineering team, and as the chain grows and more data needs to be processed, this process ended up taking so long that reports would only be ready late into the day. Organizations like this are looking for a modern solution that allows effortless replication of operational data to their data warehouse, enabling real-time decision making; Datastream for BigQuery is that solution.

Datastream accelerates data-driven decision making in BigQuery

Developed in close partnership with Google Cloud’s BigQuery team, Datastream for BigQuery delivers a unique, truly seamless and easy-to-use experience that enables real-time insights in BigQuery with just a few steps. 

Using BigQuery’s newly developed Change Data Capture (CDC) and Storage Write API’s UPSERT functionality, Datastream efficiently replicates updates directly from source systems into BigQuery tables in real-time. You no longer have to waste valuable resources building and managing complex data pipelines, self-managed staging tables, tricky DML merge logic, or manual conversion from database-specific data types into BigQuery data types. Just configure your source database, connection type, and destination in BigQuery and you’re all set. Datastream for BigQuery will backfill historical data and continuously replicate new changes as they happen. And as database schemas shift, Datastream seamlessly handles schema changes and automatically adds new tables and columns to BigQuery. 

New volume-based tiered pricing

We are also excited to announce the launch of volume-based tiered pricing that makes it more affordable for customers moving larger volumes of data. Volume-based tiered pricing will be applied automatically based on actual usage to unlock the power of Datastream. 

Klook, a leading travel and leisure e-commerce platform for experiences and services, processes vast amounts of data across a range of applications and databases. Using BigQuery, Klook’s data team produces daily reports and analysis for their management team to help drive better business decisions. “Dealing with complex data environments and ingesting data from different sources into our data warehouse is very challenging”, says Stacy Zhu, Senior Manager for Data at Klook. “Prior to adopting Datastream, we had a team of data engineers dedicated to the task of ingesting data into BigQuery, and we spent a lot of time and effort making sure that the data was accurate. With Datastream, our data analysts can have accurate data readily available to them in BigQuery with a simple click. We enjoy Datastream's ease-of-use, and its performance helps us achieve large scale ELT data processing.”

https://storage.googleapis.com/gweb-cloudblog-publish/images/Datastream.max-1400x1400.jpg

Achievers, an award-winning employee engagement software and platform, is another customer who recently adopted Datastream. “Achievers had been heavily using Google Cloud VMs (GCE), and Google Kubernetes Engine (GKE)” says Daljeet Saini, Lead Data Architect at Achievers. “With the help of Datastream, Achievers will be streaming our data into BigQuery and enabling our analysts and data scientists to start using BigQuery for smart analytics, helping us take the data warehouse to the next level.”

Start using Datastream today

You can get started today with Datastream, available to all customers in all Google Cloud regions. For more information on Datastream for BigQuery, please check out the product page.
Posted in