With the Replication Flow feature of SAP Datasphere, you can replicate data from SAP S/4HANA to BigQuery.
This guide explains how to replicate data from SAP S/4HANA to BigQuery through SAP Datasphere when you're using Core Data Services (CDS)-based replication for SAP S/4HANA.
The high-level steps are as follows:
- Connect SAP Datasphere to the SAP S/4HANA source system.
- Connect SAP Datasphere to the Google Cloud project that contains the target BigQuery dataset.
- Create a replication flow.
- Run the replication flow.
- Validate the replicated data in BigQuery.
For information about setting up SLT-based replication, see Set up SLT-based replication: SAP S/4HANA to BigQuery through SAP Datasphere.
Before you begin
Before you begin, make sure that you or your administrators have completed the following prerequisites:
In the Tenant Configuration page of your SAP Datasphere tenant, enable the Premium Outbound Integration blocks. For information about how to do this, see the SAP documentation Configure the Size of Your SAP Datasphere Tenant.
Validate the latest considerations and limitation of SAP Datasphere replication flows provided in the SAP Note 3297105 - Important considerations for SAP Datasphere Replication Flows.
Review the information about the required SAP software versions, recommended system landscape, considerations for the supported source objects, and more, provided in the SAP Note 2890171 - SAP Data Intelligence / SAP Datasphere - ABAP Integration.
Make sure that the CDS views that you're planning to use are enabled for extraction.
You have a Google Cloud account and project.
Billing is enabled for your project. For more information, see how to confirm that billing is enabled for your project.
Make sure that the BigQuery API is enabled in your Google Cloud project.
Connect SAP Datasphere to the SAP S/4HANA source system
This section provides instructions to establish a connection between SAP Datasphere and the SAP S/4HANA source system.
Install SAP Cloud Connector
To securely connect your SAP Datasphere tenant to the SAP S/4HANA source system, SAP Cloud Connector is required when your SAP S/4HANA source system is running on-premises, hosted on any cloud environment, or if you're using the SAP S/4HANA Cloud Private Edition. However, if you're using the SAP S/4HANA Cloud Public Edition, then the SAP Cloud Connector is not needed. In that case, skip the SAP Cloud Connector installation and configuration, and move to Create a connection to the SAP S/4HANA source system.
If your SAP S/4HANA source system is running on-premises or hosted on any cloud environment, then you need to install and configure the SAP Cloud Connector on your operating system (OS). For information about OS-specific requirements and instructions to install SAP Cloud Connector, see the SAP documentation Preparing Cloud Connector Connectivity.
If you're using the SAP S/4HANA Cloud Private Edition, then the SAP Cloud Connector is pre-installed as part of the SAP S/4HANA setup. In that case, skip the SAP Cloud Connector installation, and move to Configure SAP Cloud Connector.
Configure SAP Cloud Connector
You configure SAP Cloud Connector to specify the SAP Datasphere subaccount, mapping to the SAP S4/HANA source system in your network, and the accessible resources.
This section highlights the most important steps involved in configuring SAP Cloud Connector. For detailed information about configuring SAP Cloud Connector, see the SAP documentation Configure Cloud Connector.
The most important steps are as follows:
In your web browser, access the SAP Cloud Connector administration UI using the host where your SAP Cloud Connector is installed and the port. For example: http://localhost:8443.
Log in to SAP Cloud Connector. If you're logging in for the first time after installing SAP Cloud Connector, then use the following default credentials:
- Username:
Administrator
- Password:
manage
Before proceeding, change the default password. For more information, see the SAP documentation Initial Configuration.
- Username:
Specify the following details to connect your SAP Cloud Connector to your SAP BTP subaccount:
- Details about your SAP Datasphere subaccount, including the subaccount name, region, and subaccount user. For more information about these fields, see the SAP documentation Configure Cloud Connector.
- For the specified subaccount, a location ID that identifies the location of your SAP Cloud Connector.
To provide access to the SAP S/4HANA source system, add the system mapping information, including information about the internal host and the virtual host system.
For accessing data using CDS view extraction, you must specify the following resources:
- DHAMB_ -Prefix
- DHAPE_ -Prefix
RFC_FUNCTION_SEARCH
Save your configuration.
Create a connection to the SAP S/4HANA source system
In SAP Datasphere, create a local connection to use the SAP S/4HANA source system for data access. You use this connection to create replication flows.
To create a connection to the SAP S/4HANA source system, perform the following steps:
In SAP Datasphere, go to Connections, and select your space.
Create a local connection to the ABAP system of the connection type SAP S/4HANA On-Premise:
- Configure the connection properties according to your SAP Cloud Connector configuration.
- Enter the virtual host details that you defined during the SAP Cloud Connector configuration.
For information about the SAP S/4HANA On-Premise connection type, see the SAP documentation SAP S/4HANA On-Premise Connections.
To validate the connection between SAP Datasphere and SAP S/4HANA, select your connection, and click the Validate Connection icon.
For more information about how to create a connection between SAP Datasphere and SAP S/4HANA, see the SAP documentation Create a Connection.
Before you can use the connection for replication flows, check the SAP Notes relevant to replication flows and implement any necessary note on your SAP S/4HANA system. For more information about the required SAP Notes, see:
- SAP Notes listed under the section Replication Flows.
- SAP Notes listed under the section Source Systems for SAP Data Intelligence.
Connect SAP Datasphere to Google Cloud project
This section provides instructions to establish a connection between SAP Datasphere and your Google Cloud project that contains the target BigQuery dataset.
Create a service account
For the authentication and authorization of SAP Datasphere, you need an IAM service account in your Google Cloud project. You grant roles to the service account that contains permissions to interact with BigQuery.
You also need to create a JSON key for the service account. You upload the JSON key to SAP Datasphere to authenticate with Google Cloud.
To create a service account, perform the following steps:
In the Google Cloud console, go to the IAM & Admin Service accounts page.
If prompted, select your Google Cloud project.
Click Create Service Account.
Specify a name for the service account and, optionally, a description.
Click Create and Continue.
In the Grant this service account access to project panel, select the following roles:
- BigQuery Data Owner
- BigQuery Job User
Click Continue.
Click Done. The service account appears in the list of service accounts for the project.
Download JSON key for the service account
To download a JSON key for the service account, perform the following steps:
- Click the email address of the service account that you want to create a key for.
- Click the Keys tab.
- Click the Add key drop-down menu, then select Create new key.
- Select JSON as the Key type and click Create.
Clicking Create downloads a service account key file. Make sure to store the key file securely, because it can be used to authenticate as your service account. For more information, see Create and delete service account keys.
Create a BigQuery dataset
To create a BigQuery dataset, your user account must have the proper IAM permissions for BigQuery. For more information, see Required permissions.
To create a BigQuery dataset, perform the following steps:
In the Google Cloud console, go to the BigQuery page:
Next to your project ID, click the View actions icon,
, and then Create dataset.In the Dataset ID field, enter a unique name. For more information, see Name datasets.
In the Location type field, choose a geographic location for the dataset that one you are planning to utilize. After a dataset is created, the location can't be changed.
For more information about how to create BigQuery datasets, see Create datasets.
Upload SSL certificates to SAP Datasphere
To encrypt the data transmitted between SAP and Google Cloud, you need to upload the required Google SSL certificates to SAP Datasphere.
To upload the SSL certificates, perform the following steps:
From the Google Trust Services repository, download the following certificates:
- GTS Root R1
- GTS CA 1C3
In the SAP Datasphere, go to System > Configuration > Security.
Click Add Certificate.
Browse your local directory and select the certificates that you downloaded from the Google Trust Services repository.
Click Upload.
For more information from SAP about uploading certificates to SAP Datasphere, see Manage Certificates for Connections.
Upload the driver for BigQuery to SAP Datasphere
The BigQuery ODBC driver acts as a bridge between SAP Datasphere and BigQuery for replication flows. To enable access to BigQuery, you need to upload the required ODBC driver files to SAP Datasphere.
For more information from SAP about uploading the required ODBC driver files to SAP Datasphere, see Upload Third-Party ODBC Drivers (Required for Data Flows).
To upload the driver files, perform the following steps:
From ODBC and JDBC drivers for BigQuery, download the required BigQuery ODBC driver.
In the SAP Datasphere, go to System > Configuration > Data Integration.
Go to Third-Party Drivers and click Upload.
Browse your local directory and select the driver file that you downloaded from ODBC and JDBC drivers for BigQuery.
Click Upload.
Click Sync to synchronize the driver changes. After the synchronization is finished, you can use data flows with the connection.
Create a connection to Google Cloud project
To replicate data from your SAP S/4HANA source system to the target BigQuery dataset, you need to create a replication flow in your SAP Datasphere tenant.
To create a connection to Google Cloud project, perform the following steps:
In SAP Datasphere, go to Connections, and create a new connection in your space.
Choose the connection type as Google BigQuery.
In the Connection details sections, specify the following:
- Project ID: enter your Google Cloud project ID in lowercase.
- Location: enter your Google Cloud project location.
In the Credential section, upload the JSON key file that is used for authentication. For more information, see Download JSON key for the service account.
To validate the connection between SAP Datasphere and BigQuery, select your connection, and click the Validate Connection icon.
For more information from SAP about the connection to connect to and access data from BigQuery, see Google BigQuery Connections.
Create a replication flow
You create a replication flow to copy SAP data from your SAP S/4HANA source system to the target BigQuery dataset.
To create a replication flow through CDS, perform the following steps:
In SAP Datasphere, go to Data Builder, and click New Replication Flow.
Specify the source for your replication flow:
Select the source connection of type SAP S/4HANA On-Premise that you have created in the section Create a connection to the SAP S/4HANA source system.
Select CDS_EXTRATION - CDS Views Enabled for Data Extraction as a source container.
Add source objects as required.
For more information, see the SAP documentation Add a Source.
Specify the target environment for your replication flow:
Select the connection to the Google Cloud project that contains the target BigQuery dataset.
Select the container, which is the dataset in BigQuery, to which you want to replicate your data.
For more information, see the SAP documentation Add a Target.
Create mappings to specify how the source data is to be changed on its way into the target. For more information, see the SAP documentation Define Mapping.
Save the replication flow.
Deploy the replication flow.
For more information, see the SAP documentation Creating a Replication Flow.
Run the replication flow
Once your replication flow is configured and deployed, you can run it.
To run a replication flow, select the replication flow, and click Run.
Once completed, the Run Status section in the Property panel is updated. For more information, see the SAP documentation Running a flow.
Monitor replication flow status
You can view and monitor the execution details of replication flows.
To monitor replication flow status, perform the following steps:
In the SAP Datasphere, go to Data Integration Monitor > Flows.
Select a flow run in the left panel to view its details.
For more information, see the SAP documentation Monitoring flows.
Validate the replicated data in BigQuery
After the replication flow run is complete, validate the replicated table and data in BigQuery.
To validate the replicated data in BigQuery, perform the following steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer section, expand your project to view the dataset and its tables.
Select the required table. The table information is displayed under a tab in the content pane on the right side of the page.
In the table information section, click the following headings to view the SAP data:
- Preview: shows the data replicated from the SAP S/4HANA source system.
- Details: shows the table size, the total number of rows, and other details.