Stay organized with collections Save and categorize content based on your preferences.
Datastream for BigQuery enables near real-time business insights. Read the blog.
Jump to

Datastream for BigQuery Preview

Seamless replication from relational databases directly to BigQuery, enabling near real-time insights on operational data.

  • Low-latency replication to enable near real-time insights in BigQuery

  • Access to streaming data from MySQL, PostgreSQL, AlloyDB, and Oracle databases

  • Serverless platform that scales automatically, with no resources to provision or manage

  • Easy setup of ELT (extract, load, transform) pipelines with built-in secure connectivity

Benefits

Replicate operational data with minimal latency

Seamlessly replicate data from MySQL, PostgreSQL, AlloyDB, and Oracle databases directly into BigQuery, with low latency and without impacting source performance.

Scale up and down with a serverless architecture

Eliminate operational overhead with a serverless approach that scales automatically with no infrastructure for you to manage.

Get up and running in minutes

A simplified setup experience allows you to start replicating data from your operational databases to BigQuery in just a few steps.

Key features

Key features

Replication of operational data into BigQuery

Datastream uses BigQuery’s Change Data Capture (CDC) functionality and Storage Write API to efficiently replicate updates directly from source systems in near real time. You no longer need replication solutions that waste valuable resources on complex data pipelines, self-managed staging tables, tricky merge logic, or manual data type conversion.

Simplified setup

Datastream allows you to start replicating data into BigQuery in a few steps. Just configure your source database, connection type, and destination in BigQuery, and you’re all set. Datastream for BigQuery will backfill historical data and continuously replicate new changes as they happen.

Streaming data from relational databases

Datastream reads and delivers every change—insert, update, and delete—from your MySQL, PostgreSQL, AlloyDB, and Oracle databases into BigQuery with minimal latency. The source database can be hosted on-premises, on Google Cloud services such as Cloud SQL or Bare Metal Solution for Oracle, or anywhere else on any cloud. An agentless and Google-native service built specifically for BigQuery, it reliably streams every event as it happens. 

Schema drift resolution

As source schemas change, Datastream seamlessly handles schema drift and automatically replicates new columns and tables added in the source to BigQuery.

Security by design

Datastream supports multiple secure, private connectivity methods to protect data in transit. Data is also encrypted at rest.

Customers

Customers use Datastream and BigQuery to enable real-time insights

Use cases

Use cases

Use case
Serverless replication to BigQuery

Datastream reads change events (inserts, updates, and deletes) from source databases and writes them in BigQuery tables in near real time. This enables you to enrich existing BigQuery data warehouses and ML models with transactional data, such as retail purchases, to build a more complete end-to-end picture of data. Datastream will backfill historical data, continuously replicate new changes as they happen, and seamlessly handle schema changes.

Serverless replication to BigQuery

Compare features

Compare options for streaming data from operational databases into BigQuery

Datastream for BigQuery

Fully managed solution for replicating data from transactional databases into BigQuery

Datastream and Dataflow

Customizable solution for replicating changes in data sources

Datastream and Data Fusion

Code-free wizard that is part of a fully managed ETL service

Key benefits

  • Easiest option for replicating operational data to BigQuery

  • Serverless architecture that automatically scales up and down

  • Single interface for end-to-end visibility and monitoring of replication pipelines

  • Customizable solution with additional flexibility

  • Pre-built templates supported by Google for a range of destinations

  • Integration of additional features such as data quality and data masking

  • Simple interface for ETL developers and data analysts

  • Identification of potential issues and gaps in replication in advance

  • Near real-time insights into replication performance

You can also stream data from operational databases into BigQuery with partner ETL/ELT solutions, Kafka, or batch jobs. Compared to these options, Datastream typically has the advantages of serverless architecture, ease of integration, and low latency.

Pricing

Datastream pricing

Datastream pricing is based on actual data processed. Volume-based tiered pricing is available, which makes it more affordable if you're moving larger volumes of data. Additional pricing details are available on the Datastream pricing page.

Additional resources such as BigQuery, Cloud Storage, and Dataflow are billed per those services' pricing.