Jump to Content
Databases

Datastream’s PostgreSQL source and BigQuery destination now generally available

April 3, 2023
https://storage.googleapis.com/gweb-cloudblog-publish/images/insights_2022.max-2500x2500.jpg
Etai Margolin

Product Manager

Last year, we announced the preview launch of Datastream for BigQuery, which provides seamless replication of data from operational databases, directly into BigQuery, Google Cloud’s serverless data warehouse, enabling organizations to quickly and easily make decisions based on real-time data. We’re happy to announce that Datastream for BigQuery is now generally available

Overview

Datastream for BigQuery delivers a unique, truly seamless and easy-to-use experience that enables real-time insights in BigQuery with just a few steps. Using BigQuery’s newly developed change data capture (CDC) and Storage Write API’s UPSERT functionality, Datastream efficiently replicates updates directly from source systems into BigQuery tables in real-time. You no longer have to waste valuable resources building and managing complex data pipelines, self-managed staging tables, tricky DML merge logic, or manual conversion from database-specific data types into BigQuery data types. Just configure your source database, connection type, and destination in BigQuery and you’re all set. Datastream for BigQuery will backfill historical data and continuously replicate new changes as they happen. 

How customers are using Datastream for BigQuery

Falabella, Latin America's largest retail platform, has a physical presence in 100 locations and an online store. To monitor and continuously improve their business, Falabella relies on data analytics in their day-to-day business for various use cases, including:

  • Customer analytics: monitor customer behavior, preferences, and purchasing habits to optimize marketing efforts and improve customer experience.

  • Seller analytics: monitor seller performance, track sales and revenue data, and identify trends or issues that may impact the business.

  • Logistics analytics: monitor and improve shipping and delivery processes.

  • Sales and revenue management: monitor sales and revenue data, especially during sales events.

“Previously, data was replicated using full database snapshots which took hours to generate and load to BigQuery,” says René Delgado, Head of Data Solutions at Falabella. “This process was orchestrated using some custom tools that were created internally. When something failed we had to do manual checks in many places, and these custom tools were difficult to debug and repair. The first immediate benefit of Datastream is that we no longer have to maintain/monitor these custom data tools: ‘best code = no code!’”

In other cases, data scientists would spin up expensive database replicas to run their analytics queries. “With all the data now available in BigQuery, simply eliminating the need to create and manage these databases helped save Falabella ~$10,000 USD/month,” says Delgado.

“With Datastream, we have a single tool to perform seamless, near real-time replication of our operational data to BigQuery. Datastream helps us get much quicker insights on our operational data. This enables us to deliver more stable data products and to better address our business needs.”

New PostgreSQL Source

We are also excited to announce the general availability of Datastream’s PostgreSQL source. With the PostgreSQL source, Datastream can now ingest changes from a range of PostgreSQL databases, including AlloyDB, Cloud SQL, Amazon RDS, and self-hosted. Datastream’s PostgreSQL source reads from PostgreSQL’s Write-Ahead Log (WAL) using logical decoding. Using logical decoding gives you more flexibility and has a minimal impact on the database server’s load.

What we learned during the preview

Since our preview announcement, many customers have used Datastream to move data from PostgreSQL and other databases into BigQuery. They repeatedly praised Datastream's ease of use, sharing how quickly they were able to successfully replicate data using Datastream, and comparing their experience with other solutions that took weeks or even months to achieve the same task. For example, one customer stated: “It’s brilliant. I can do an entire proof of concept in one week and be ready for production the next week.” Customers also highlighted Datastream’s robustness, noting how easily and transparently it handles typical scenarios such as upgrading the source database, handling database restarts, and managing failovers.

Getting started

Check out our quickstart for a detailed guide on creating a new Datastream stream. You can also try out this SkillsBoost lab for a step-by-step walkthrough of replicating from PostgreSQL to BigQuery.

https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_zW3B762.max-1300x1300.jpg
Posted in