Cloud Data Fusion

Watch the Data Cloud Summit on demand and learn about the latest innovations in analytics, AI, BI, and databases.

Cloud Data Fusion

Fully managed, cloud-native data integration at any scale.

New customers get $300 in free credits to spend on Data Fusion. All customers get the first 120 hours of pipeline development free per month, per account, not charged against your credits.

Visual point-and-click interface enabling code-free deployment of ETL/ELT data pipelines
Broad library of 150+ preconfigured connectors and transformations, at no additional cost
Natively integrated best-in-class Google Cloud services
End-to-end data lineage for root cause and impact analysis
Built with an open source core (CDAP) for pipeline portability

Thumbnail from video: 3 people climb steps past windows into various data storage architectures

Introduction to Cloud Data Fusion

1:54

Benefits

Avoid technical bottlenecks and lift productivity

Data Fusion’s intuitive drag-and-drop interface, pre-built connectors, and self-service model of code-free data integration remove technical expertise-based bottlenecks and accelerate time to insight.

Lower total cost of pipeline ownership

A serverless approach leveraging the scalability and reliability of Google services like Managed Service for Apache Spark means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership.

Build with a data governance foundation

With built-in features like end-to-end data lineage, integration metadata, and cloud-native security and data protection services, Data Fusion assists teams with root cause or impact analysis and compliance.

Key features

Open core, delivering hybrid and multi-cloud integration

Data Fusion is built using open source project CDAP, and this open core ensures data pipeline portability for users. CDAP’s broad integration with on-premises and public cloud platforms gives Cloud Data Fusion users the ability to break down silos and deliver insights that were previously inaccessible.

Integrated with Google’s industry-leading big data tools

Data Fusion’s integration with Google Cloud simplifies data security and ensures data is immediately available for analysis. Whether you’re curating a data lake with Cloud Storage and Managed Service for Apache Spark, moving data into BigQuery for data warehousing, or transforming data to land it in a relational store like Spanner, Cloud Data Fusion’s integration makes development and iteration fast and easy.

Data integration through collaboration and standardization

Cloud Data Fusion offers pre-built transformations for both batch and real-time processing. It provides the ability to create an internal library of custom connections and transformations that can be validated, shared, and reused across teams. It lays the foundation of collaborative data engineering and improves productivity. That means less waiting for ETL developers and data engineers and, importantly, less sweating about code quality.

The Economic Benefits of Data Fusion and its Data Integration Alternatives

Download the report

Customers

Learn from customers using Cloud Data Fusion

Blog post

Liveramp scales identity data management with Cloud Data Fusion

5-min read

Case study

Star Media Group transforms into an engagement business with Cloud Data Fusion.

5-min read

What's new

Explore the latest updates

How to bring data from SAP to Google Cloud icon

Video

How to bring data from SAP to Google CloudWatch video

Embedded data wrangling with Data Fusion icon

Video

Embedded data wrangling with Data FusionWatch video

Blog post

Lower TCO for managing data pipelines by 80% with Cloud Data FusionLearn more

Blog post

Bridge Data Silos with Data FusionRead the blog

Real-time Change Data Capture for data replication into BigQuery icon

Blog post

Real-time Change Data Capture for data replication into BigQueryRead the blog

Better together: orchestrating your Data Fusion pipelines with Managed Service for Apache Airflow icon

Blog post

Better together: orchestrating your Data Fusion pipelines with Managed Service for Apache AirflowRead the blog

Documentation

Tutorial

Enabling Cloud Data Fusion

Learn how to enable the Cloud Data Fusion API for your Google Cloud project.

Tutorial

Cloud Data Fusion concepts overview

Learn about Cloud Data Fusion concepts and features.

Tutorial

Exploring data lineage

This tutorial shows how to use Cloud Data Fusion to explore data lineage: the data's origins and its movement over time.

Tutorial

Using JDBC drivers with Cloud Data Fusion

Discover how to use Java Database Connectivity (JDBC) drivers with Cloud Data Fusion pipelines.

Tutorial

Data engineering on Google Cloud

Learn firsthand how to design and build data processing systems on Google Cloud with this four-day instructor-led class.

Not seeing what you’re looking for?

Release notes

Read about the latest releases for Cloud Data Fusion

Use cases

Use case

Modern, more secure data lakes on Google Cloud

Cloud Data Fusion helps users build scalable, distributed data lakes on Google Cloud by integrating data from siloed on-premises platforms. Customers can leverage the scale of the cloud to centralize data and drive more value out of their data as a result. The self-service capabilities of Cloud Data Fusion increase process visibility and lower the overall cost of operational support.

Use case

Agile data warehouses with BigQuery

Cloud Data Fusion can help organizations better understand their customers by breaking down data silos and enabling development of agile, cloud-based data warehouse solutions in BigQuery. A trusted, unified view of customer engagement and behavior unlocks the ability to drive a better customer experience, which leads to higher retention and higher revenue per customer.

Use case

Unified analytics environment

Many users today want to establish a unified analytics environment across a myriad of expensive, on-premises data marts. Employing a wide range of disconnected tools and stop-gap measures creates data quality and security challenges. Cloud Data Fusion’s vast variety of connectors, visual interfaces, and abstractions centered around business logic helps in lowering TCO, promoting self-service and standardization, and reducing repetitive work.

Generate a solution

What problem are you trying to solve?

What you'll get:

Step-by-step guide

Reference architecture

Available pre-built solutions

This service was built with Gemini Enterprise Agent Platform. You must be 18 or older to use it. Do not enter sensitive, confidential, or personal info.

All features

Code-free self-service	Remove bottlenecks by enabling nontechnical users through a code-free graphical interface that delivers point-and-click data integration.
Collaborative data engineering	Cloud Data Fusion offers the ability to create an internal library of custom connections and transformations that can be validated, shared, and reused across an organization.
Google Cloud-native	Fully managed Google Cloud-native architecture unlocks the scalability, reliability, security, and privacy features of Google Cloud.
Real-time data integration	Replicate transactional and operational databases such as SQL Server, Oracle and MySQL directly into BigQuery with just a few clicks using Data Fusion’s replication feature. Integration with Datastream allows you to deliver change streams into BigQuery for continuous analytics. Use feasibility assessment for faster development iterations and performance/health monitoring for observability.
Batch integration	Design, run and operate high-volumes of data pipelines periodically with support for popular data sources including file systems and object stores, relational and NoSQL databases, SaaS systems, and mainframes.
Enterprise-grade security	Integration with Cloud Identity and Access Management (IAM), Private IP, VPC-SC and CMEK provides enterprise security and alleviates risks by ensuring compliance and data protection.
Integration metadata and lineage	Search integrated datasets by technical and business metadata. Track lineage for all integrated datasets at the dataset and field level.
Seamless operations	REST APIs, time-based schedules, pipeline state-based triggers, logs, metrics, and monitoring dashboards make it easy to operate in mission-critical environments.
Comprehensive integration toolkit	Built-in connectors to a variety of modern and legacy systems, code-free transformations, conditionals and pre/post processing, alerting and notifications, and error processing provide a comprehensive data integration experience.
Hybrid enablement	Open source provides the flexibility and portability required to build standardized data integration solutions across hybrid and multi-cloud environments.

Pricing

Cloud Data Fusion pricing is broken down by:

1. Design cost: based on the number of hours an instance is running and not the number of pipelines being developed and run. The Basic edition offers the first 120 hours per month per account at no cost.

2. Processing cost: The cost of Managed Service for Apache Spark clusters used to run the pipelines.

Edition	Price per Cloud Data Fusion instance hour	Number of simultaneous pipelines supported	Number of users supported
Developer	US$0.35	2 (Recommended)	2 (Recommended)
Basic	US$1.80	Unlimited	Unlimited
Enterprise	US$4.20	Unlimited	Unlimited

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Need help getting started?
Contact sales
Work with a trusted partner
Find a partner
Continue browsing
See all products

Avoid technical bottlenecks and lift productivity

Lower total cost of pipeline ownership

Build with a data governance foundation

Key features

Open core, delivering hybrid and multi-cloud integration

Integrated with Google’s industry-leading big data tools

Data integration through collaboration and standardization

Learn from customers using Cloud Data Fusion

Explore the latest updates

Documentation

Enabling Cloud Data Fusion

Cloud Data Fusion concepts overview

Exploring data lineage

Using JDBC drivers with Cloud Data Fusion

Data engineering on Google Cloud

Not seeing what you’re looking for?

Explore more docs

Use cases

Modern, more secure data lakes on Google Cloud

Agile data warehouses with BigQuery

Unified analytics environment

All features

Pricing

Take the next step

Need help getting started?

Work with a trusted partner

Continue browsing