Datastream is a serverless and easy-to-use change data capture (CDC) and replication service. It allows you to synchronize data across heterogeneous databases and applications reliably, and with minimal latency and downtime.
Datastream supports streaming data from Oracle, MySQL and PostgreSQL databases into BigQuery and Cloud Storage. In addition to these destinations, the service offers streamlined integration by using Dataflow templates to build custom workflows for loading data into BigQuery for analytics. You can also use Datastream to replicate your databases into Cloud SQL or Cloud Spanner for database synchronization, or leverage the event stream directly from Cloud Storage to realize event-driven architectures.
Benefits of Datastream include:
- Seamless setup of ELT (Extract, Load, Transform) pipelines for low-latency data replication to enable near real-time insights in BigQuery.
- Being serverless so there are no resources to provision or manage, and the service scales up and down automatically, as needed, with minimal downtime.
- Easy-to-use setup and monitoring experiences that achieve super-fast time-to-value.
- Integration across the best of Google Cloud data services' portfolio for data integration across Datastream, Dataflow, Cloud Data Fusion, Pub/Sub, BigQuery, and more.
- Synchronizing and unifying data streams across heterogeneous databases and applications.
- Security, with private connectivity options and the security you expect from Google Cloud.
- Being accurate and reliable, with transparent status reporting and robust processing flexibility in the face of data and schema changes.
- Supporting multiple use cases, including analytics, database replication, and synchronization for migrations and hybrid-cloud configurations, and for building event-driven architectures.
The streaming capabilities of Datastream enable a variety of use cases:
Replicating and synchronizing data across your organization with minimal latency
You can synchronize data across heterogeneous databases and applications reliably, with low latency, and with minimal impact to the performance of your source. Unlock the power of data streams for analytics, database replication, cloud migration, and event-driven architectures across hybrid environments.
Scale up or down with a serverless architecture seamlessly
Get up and running fast with a serverless and easy-to-use service that scales seamlessly as your data volumes shift. Focus on deriving up-to-date insights from your data and responding to high-priority issues, instead of managing infrastructure, performance tuning, or resource provisioning.
Integrate with Google Cloud's data integration suite
Connect data across your organization with Google Cloud's data integration suite of products. Datastream leverages Dataflow templates to load data into BigQuery, Cloud Spanner, and Cloud SQL, and powers Cloud Data Fusion's CDC Replicator connectors for easier-than-ever data pipelining.
There are three main elements that comprise Datastream:
- Private connectivity configurations enable Datastream to communicate with a data source over a private network (internally within Google Cloud, or with external sources connected over VPN or Interconnect). This communication happens through a Virtual Private Cloud (VPC) peering connection.
- Connection profiles represent connectivity information to both a source and a destination. This information will be used by a stream.
- Streams use the information in the connection profiles to transfer CDC and backfill data from the source to the destination.