• Topics
  • What is data integration?

What is data integration?

Big data, the Internet of Things (IoT), software as a service (SaaS), cloud activity, and more are causing an explosion in the number of data sources as well as the sheer volume of data existing in the world. But most of this data has been collected and stored in stand-alone silos or separate data stores. Data integration is the process that brings these separate data collections together in order to generate higher data value and insights. 

Data integration is especially important as your business pursues digital transformation strategies, since your ability to improve operations, boost customer satisfaction, and compete in an increasingly digital world requires insight into all your data.

Google Cloud's data integration solution is Cloud Data Fusion, a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines.

Data integration defined

Data integration is the process of bringing together data from different sources to gain a unified and more valuable view of it, so that your business can make faster and better decisions.  

Data integration can consolidate all kinds of data—structured, unstructured, batch, and streaming—to do everything from basic querying of inventory databases to complex predictive analytics.

What are the challenges of data integration?

Difficulty of using data integration platforms

Experienced data professionals are difficult to find—and expensive—but are generally required to deploy most data integration platforms. Business analysts who need access to the data to make business decisions are often dependent on these experts, which slows down time to value of data analytics.

High capex and opex of data integration infrastructure

Both capital and operational expenses add up when procuring, deploying, maintaining, and managing the necessary infrastructure for an enterprise-class data integration initiative. Cloud-based data integration as a managed service addresses this cost issue directly.

Data that’s tightly coupled with applications

Previously, data was so tied to and dependent on specific applications that you couldn’t retrieve and use it elsewhere in your business. Today, we’re seeing application and data layers being decoupled so your data can be used more flexibly.

Data semantic issues

Multiple versions of data that mean the same thing can be organized or formatted differently. For example, dates can be stored numerically as dd/mm/yy or as month, day, year. The “transform” element of ETL and master data management tools address this challenge.

What are data integration tools?

Data integration platforms generally include many of the following tools:

  • Data ingestion tools: These tools allow you to obtain and import data, to use immediately or to store for later use
  • ETL tools: ETL stands for extract, transform, and load—the most common data integration method 
  • Data catalogs: These help businesses find and inventory data assets scattered through multiple data silos
  • Data governance tools: Tools that ensure the availability, security, usability, and integrity of data
  • Data cleansing tools: Tools that clean up dirty data by replacing, modifying, or deleting it
  • Data migration tools: These tools move data between computers, storage systems, or application formats
  • Master data management tools: Tools that help businesses adhere to common data definitions and achieve a single source of truth  
  • Data connectors: These tools move data from one database to another and can also perform transformations

What is data integration used for?

Data integration is commonly used to do the following:

Data lake development

Data integration moves data from siloed on-premises platforms into data lakes in order to increase data value.

Data warehousing

Data integration combines data from various sources into a data warehouse to analyze for business purposes. 

Marketing

Data integration moves all your marketing data—such as customer demographic, social networking, and web-analytics data—into one place for analysis and action.

IoT

Data integration helps collect data from multiple IoT sources into a single place so that you can get value from it.

Database replication

Data integration is a central part of replicating data from a source database like Oracle, MongoDB, or MySQL into a cloud data warehouse.

Google has removed one of the biggest barriers to data integration, which is that data integration tools have historically required technical teams skilled in data mining, merging, cleansing, and analyzing in order to produce valuable data products like a data lake or data warehouse.

Code-free development of ETL/ELT data pipelines is available with Cloud Data Fusion, a managed, cloud-native data ingestion and integration service that can bring the capabilities of a seasoned data engineer to any team—whether they know a little code or none at all.