Introducing SAP Integration with Cloud Data Fusion
Chaitanya (Chai) Pydimukkala
Head of Products
Businesses today have a growing demand for data analysis and insight-based action. More often than not, the valuable data driving these actions is in mission critical operational systems. Among all the applications that are in the market today, SAP is the leading provider of ERP software and Google Cloud is introducing integration with SAP to help unlock the value of SAP data quickly and easily.
Google Cloud native data integration platform Cloud Data Fusion now offers the capability to seamlessly get data out of SAP Business Suite, SAP ERP and S/4HANA. Cloud Data Fusion is a fully managed, cloud-native data integration and ingestion service that helps ETL developers, data engineers and business analysts efficiently build and manage ETL/ELT pipelines that accelerate the building of data warehouses, data marts, and data lakes on BigQuery or operational reporting systems on CloudSQL, Spanner or other systems. To simplify the unlocking of SAP data, today we’re announcing the public launch of the SAP Table Batch Source. With this capability, you can now use Cloud Data Fusion to easily integrate SAP application data to gain invaluable insights via Looker. You can also leverage the best in class machine learning products on Google Cloud to help you gain insight into your business by combining SAP data with other datasets. Some examples include running machine learning on IoT data joined with ERP transactional data to do predictive maintenance, application to application integration with SAP and CloudSQL based applications, fraud detection, spend analytics, demand forecasting etc.
Let’s take a closer look at the benefits of the SAP Table Batch Source in Cloud Data Fusion:
As Cloud Data Fusion is a complete, visual environment, users can use the Pipeline Studio to quickly design pipelines that read from SAP ECC or S/4HANA. With Data Fusion’s prebuilt transformations, you can easily join data from SAP and non SAP systems, and perform complex transformations like data cleansing, aggregations, data preparation, and lookups to rapidly get insights from the data.
Time to Value
In traditional approaches, users are forced to define models on data warehousing systems. In Cloud Data Fusion, this is automatically performed for the users when using BigQuery. After you design and execute a data pipeline that writes to BigQuery, Data Fusion auto generates the schema in BigQuery for you. As users don’t need to pre build models, you get insight into your data faster, which results in improved productivity for your organization.
Performance and Scalability
Cloud Data Fusion scales horizontally to execute pipelines. Users can leverage the ephemeral clusters or dedicated clusters to run the pipelines. The SAP Batch Source plugin automatically tunes the data pipelines for optimal performance when it extracts data from your SAP systems, based on both SAP application server resources and Cloud Data Fusion runtime resources. If parallelism is misconfigured, a failsafe mechanism in the plugin prevents any issues in your source system.
How does SAP Table Batch Source work?
Transfer full table data from SAP to BigQuery or other systems
In the Pipeline Studio, you can add multiple SAP source tables to a data pipeline, and then join the other SAP source tables with joiner transformations. As the joiner is executed in the Cloud Data Fusion processing layer, there is no additional impact on the SAP system. For example, To create a Customer Master data mart, you can join all relevant tables from SAP using the plugin, and then build complex pipelines for that data in Cloud Data Fusion's Pipeline Studio.
Extract table records in parallel
To extract records in parallel, you can configure the SAP Table Batch Source plugin using the Number of Splits to Generate property. If this property is left blank, the system determines the appropriate value for optimal performance.
Extract records based on conditions
The SAP Table Batch source plugin allows you to specify filter conditions by using the property Filter Options. You specify the conditions in OpenSQL syntax. The plugin uses the SQL WHERE clause to filter the tables. Records can be extracted based on conditions like certain columns having a defined set of values or a range of values. You can also specify complex conditions that combine multiple conditions with AND or OR clauses (e.g. TIMESTAMP >= ' 20210130100000' AND TIMESTAMP <= ' 20210226000000').
Limit the number of records to be extracted
Users can also limit the number for records extracted from the specified table by using the property Number of Rows to Fetch. This is particularly useful in development and testing scenarios.
Maximizing the returns on data
With Google Cloud Platform, you can already scale and process huge amounts of social, operational, transaction and IoT data to extract value and gain rapid insights. Cloud Data Fusion provides many connectors to existing enterprise applications and data warehouses. With the native capabilities to unlock SAP data with Cloud Data Fusion into BigQuery, you can now go a step further and get more by driving rapid and intelligent decision making.
Ready to try out the SAP Table Batch connector? Create a new instance of Data Fusion and deploy the SAP plugin from the Hub. Please refer to the SAP Table Batch Source user guide for additional details. To learn more about how leading companies are powering innovation with our data solutions including data integration, check out Google Cloud’s Data Cloud Summit on May 26th.