This guide describes how to deploy, configure, and run data pipelines that use the SAP Table Batch Source plugin.
The SAP Table Batch Source plugin enables bulk data integration from SAP applications with Cloud Data Fusion. You can configure and execute bulk data transfers from SAP tables and views without any coding.
See the FAQ for supported SAP applications, tables, and views for extraction. For more information about SAP on Google Cloud, see the Overview of SAP on Google Cloud.
Objectives
- Configure the SAP ERP system (activate DataSources in SAP).
- Deploy the plugin in your Cloud Data Fusion environment.
- Download the SAP transport from Cloud Data Fusion and install it in SAP.
- Use Cloud Data Fusion and SAP Table Batch Source to create data pipelines for integrating SAP data.
Before you begin
To use this plugin, you will need domain knowledge in the following areas:
- Building pipelines in Cloud Data Fusion
- Access management with IAM
- Configuring SAP Cloud and on-premises enterprise resource planning (ERP) systems
User roles
The tasks on this page are performed by people with the following roles in Google Cloud or in their SAP system:
User type | Description |
---|---|
Google Cloud Admin | Users assigned this role are administrators of Google Cloud accounts. |
Cloud Data Fusion User | Users assigned this role are authorized to design and run data
pipelines. They are granted, at minimum, the Data Fusion Viewer
(
roles/datafusion.viewer ) role. If you are using
role-based access control, you might need
additional
roles.
|
SAP Admin | Users assigned this role are administrators of the SAP system. They have access to download software from the SAP service site. It is not an IAM role. |
SAP User | Users assigned this role are authorized to connect to an SAP system. It is not an IAM role. |
Configure the SAP ERP system
The SAP Table Batch Source uses a Remote Function Module (RFM), which needs to be installed on each SAP Server where data gets extracted. This RFM is delivered as an SAP transport.
To configure your SAP system, follow these steps:
- The Cloud Data Fusion user must download the zip file containing the SAP transport and provide it to the SAP Admin. For more information, see Set up Cloud Data Fusion.
- The SAP Admin must import the SAP transport into the SAP system and verify the objects created. For more information about installation, see Install SAP transport.
Install the SAP transport
To design and run data pipelines in Cloud Data Fusion, the SAP components are delivered in SAP transport files, which are archived in a zip file. The download is available when you deploy the plugin in the Cloud Data Fusion Hub.
Download SAP Table transport zip file
The SAP transport request IDs and associated files are provided in the following table:
Transport ID | Cofile | Data file | Content |
---|---|---|---|
DE1K900204 | K900204.DE1 | R900204.DE1 | RFC-enabled function modules |
After the transport is imported into the SAP system, verify the creation of the
RFC-enabled function module, /GOOG/RFC_READ_TABLE
.
To install the SAP transport, follow these steps:
Step 1: Upload the transport request files
- Log into the operating system of the SAP Instance.
- Use the SAP transaction code
AL11
to get the path for theDIR_TRANS
folder. Typically, the path is/usr/sap/trans/
. - Copy the cofiles to the
DIR_TRANS/cofiles
folder. - Copy the data files to the
DIR_TRANS/data
folder. - Set the User and Group of data and cofile to
<sid>adm
andsapsys
.
Step 2: Import the transport request files
The SAP administrator can import the transport request files by using one of the following options:
Option 1: Import the transport request files by using the SAP transport management system
- Log in to the SAP system as an SAP administrator.
- Enter the transaction STMS.
- Click Overview > Imports.
- Double-click the current SID in the Queue column.
- Click Extras > Other Requests > Add.
- Select the transport request ID and click Continue.
- Select the transport request in the import queue, and then click Request > Import.
- Enter the Client number.
On the Options tab, select Overwrite Originals and Ignore Invalid Component Version (if available).
Optional: To schedule a reimport of the transports for a later time, select Leave Transports Requests in Queue for Later Import and Import Transport Requests Again. This is useful for SAP system upgrades and backup restorations.
Click Continue.
To verify the import, use any transactions, such as
SE80
andSU01
.
Option 2: Import the transport request files at the operating system level
- Log in to the SAP system as an SAP system administrator.
Add the appropriate requests to the import buffer by running the following command:
tp addtobuffer TRANSPORT_REQUEST_ID SID
For example:
tp addtobuffer IB1K903958 DD1
Import the transport requests by running the following command:
tp import TRANSPORT_REQUEST_ID SID client=NNN U1238
Replace
NNN
with the client number. For example:tp import IB1K903958 DD1 client=800 U1238
Verify that the function module and authorization roles were imported successfully by using any appropriate transactions, such as
SE80
andSU01
.
Required SAP authorizations
To run a data pipeline in Cloud Data Fusion, you need an SAP User. The SAP
User must be of Communications
or Dialog
type. Communications
type is
recommended to avoid using SAP dialog resources. Users can be created using SAP
transaction code SU01.
To create the Authorization Role in SAP, follow these steps:
- In the SAP GUI, enter the transaction code PFCG to open the Role Maintenance window.
In the Role field, enter a name for the role.
For example:
zcdf_role
Click Single Role.
The Create Roles window opens.
In the Description field, enter a description and click Save.
For example:
Authorizations for Cloud Data Fusion SAP Table plugin
.Click the Authorizations tab. The title of the window changes to Change Roles.
Under Edit Authorization Data and Generate Profiles, click
Change Authorization Data.The Choose Template window opens.
Click Do not select templates.
The Change role: Authorizations window opens.
Click Manually.
Provide the authorizations shown in the following SAP Authorization table.
Click Save.
To activate the authorization role, click the Generate icon.
Table 3: SAP Authorizations
Object Class | Object Class Text | Authorization object | Authorization object Text | Authorization | Text | Value |
---|---|---|---|---|---|---|
AAAB | Cross-application Authorization Objects | S_RFC | Cross-application Authorization Objects | RFC_TYPE | Type of RFC object to which access is to be allowed | FUNC |
AAAB | Cross-application Authorization Objects | S_RFC | Authorization Check for RFC Access | RFC_NAME | Name of RFC object to which access is allowed | DDIF_FIELDINFO_GET, RFCPING, RFC_GET_FUNCTION_INTERFACE, /GOOG/RFC_READ_TABLE, SAPTUNE_GET_SUMMARY_STATISTIC, TH_WPINFO |
AAAB | Cross-application Authorization Objects | S_RFC | Cross-application Authorization Objects | ACTVT | Activity | 16 |
AAAB | Cross-application Authorization Objects | S_TCODE | Transaction Code Check at Transaction Start | TCD | Transaction Code | SM50 |
BC_A | Basis: Administration | S_TABU_NAM | Table Access by Generic Standard Tools | ACTVT | Activity | 03 |
BC_A | Basis: Administration | S_TABU_NAM | Table Access by Generic Standard Tools | TABLE | Table Name | * |
BC_A | Basis: Administration | S_ADMI_FCD | System Authorizations | S_ADMI_FCD | System administration function | ST0R |
Set up Cloud Data Fusion
Communication must be enabled between the Cloud Data Fusion instance and the SAP server. Follow the network peering steps for private Cloud Data Fusion instances.
To configure your Cloud Data Fusion environment for the plugin, follow these steps:
The Cloud Data Fusion user must download the SAP Table Batch Source plugin from the Hub. The plugin is available starting in version 6.3.0. To find the download, see Install the SAP transport.
Optional: If you created a 6.3.0 instance before March 22, 2021, you might not see the plugin in the Hub. To enable it, run the following command.
Use the
HUB_URLS
variable for the SAP Hub. If you're using the Healthcare accelerator, include itsHUB_URLS
variable (see the comments in the command).# Enter values for these variables PROJECT=PROJECT_ID REGION=REGION_CODE INSTANCE=INSTANCE # Select one of the following HUB_URLS HUB_URLS="https://hub-cdap-io.storage.googleapis.com/sap-hub" # HUB_URLS="https://hub-cdap-io.storage.googleapis.com/sap-hub+https://storage.googleapis.com/b999ec76-9e36-457b-bf30-753cb13a8c98" # Uncomment this line if the Healthcare accelerator is enabled # Run these commands (NOTE: This restarts your instance after the update) curl -X PATCH -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://datafusion.googleapis.com/v1/projects/$PROJECT/locations/$REGION/instances/$INSTANCE -d "{ 'options':{'market.base.urls':\"$HUB_URLS\"}}" sleep 300 # Wait for update operation to succeed curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" https://datafusion.googleapis.com/v1/projects/$PROJECT/locations/$REGION/instances/$INSTANCE:restart
Click Hub > SAP > SAP Table.
Click SAP Table Batch Source > Deploy SAP Source.
Optional: You can download the SAP transport file, if needed.
On the Studio page, the SAP Table plugin appears in the Source menu.
The SAP Admin must download the following JCo artifacts from the SAP Support site and give them to the Google Cloud Admin.
The minimum JCo version supported is 3.1.2.
- One platform-independent (sapjco3.jar)
- One platform-dependent (libsapjco3.so)
To download the files, follow these steps:
- Go to SAP connectors.
- Click SAP Java Connector/Tools and Services. You can find platform-specific links for the download.
- Choose the platform your CDF instance is running on:
- If you use standard Google Cloud images for the VMs in your cluster (the default for Cloud Data Fusion), select Linux for Intel compatible processors 64-bit x86.
- If you use a custom image, select the corresponding platform.
The Google Cloud Admin must copy the JCo files to a readable Cloud Storage bucket. Provide the bucket path to the Cloud Data Fusion user.
The Google Cloud Admin must grant read access for the two files to the Cloud Data Fusion service account for the design environment and the Dataproc service account for the execution environment. For more information, see Cloud Data Fusion service accounts.
The bucket path needs to be provided in the corresponding plugin property SAP JCo Library GCS Path.
Configure the plugin
The SAP Table Batch Source plugin reads the content of an SAP table or view (see Support details). Various record filtering options are available.
You can configure the following properties for the SAP Table Batch Source.
Basic
Reference Name: Name used to uniquely identify this source for lineage,
annotating metadata, etc.
SAP Client: The SAP client to use (e.g., 100
).
SAP Language: SAP logon language (e.g., EN
).
Connection Type: SAP Connection type (Direct or Load Balanced). Selecting
one type will change the following available fields:
For Direct connection
SAP Application Server Host: The SAP server name or IP address.
SAP System Number: The SAP system number (e.g., 00
).
SAP Router: The router string.
For Load Balanced connection
SAP Message Server Host: The SAP Message Host name or IP address.
SAP Message Server Service or Port Number: The SAP Message Server
Service or Port Number (e.g., sapms02
).
SAP System ID (SID): The SAP System ID (e.g., N75
).
SAP Logon Group Name: The SAP logon group name (e.g., PUBLIC
).
SAP Table/View Name: The SAP table/view name (e.g., MARA
).
Credentials
SAP Logon Username: SAP User name. Recommended: If the SAP Logon
Username changes periodically, use a
macro.
SAP Logon Password: SAP User password. Recommended: Use
secure macros
for sensitive values like User password.
SAP JCo details
GCP Project ID: Google Cloud Project ID, which uniquely
identifies a project. It can be found on the Dashboard in the
Google Cloud console.
SAP JCo Library GCS Path: The Cloud Storage path that contains
the user-uploaded SAP JCo library files.
Get Schema: The plugin generates a schema based on the metadata from SAP, with automatic mapping of SAP data types to corresponding Cloud Data Fusion data types (same functionality as the Validate button).
Advanced
Filter Options: Conditions specified in OpenSQL syntax that will be
used as filtering conditions in the SQL WHERE
clause (e.g., KEY6 LT '25'
).
Records can be extracted based on conditions like certain columns having a
defined set of values or a range of values.
Number of Rows to Fetch: Use this to limit the number of extracted records.
Enter a positive whole number. If 0
or left blank, extracts all records from
the specified table. If a positive value is provided, which is greater than the
actual number of records available based on Filter Options, then only the
available records are extracted.
Number of Splits to Generate: Use this to create partitions to extract
table records in parallel. Enter a positive whole number. The runtime engine
creates the specified number of partitions (and SAP connections) while
extracting the table records. Use caution when setting this property to a number
greater than 16
, since higher parallelism increases simultaneous connections
with SAP. Values between 8
-16
are recommended.
If the value is 0
or left blank, then the system automatically chooses an
appropriate value based on the number of available SAP work processes, records
to be extracted, and the package size.
Package Size: Number of records to be extracted in a single SAP
network call. This is the number of records SAP stores in memory during every
network extract call. Multiple data pipelines extracting data can peak the
memory usage and may result in failures due to Out of Memory
errors. Use
caution when setting this property.
Enter a positive whole number. If 0
or left blank, the plugin uses a
calculated standard default value based on SAP System resources and that will be
used as default.
Schema projection
Schema projection lets you select a subset of the table columns for extraction. After you validate the plugin properties, click
Delete by the columns that don't need to be extracted. Deleting them reduces the network bandwidth requirements and the resource consumption in the SAP system.Data Type Mapping
Table 4: SAP data types mapping to Cloud Data Fusion types
ABAP Type | Description (SAP) | Cloud Data Fusion data type |
---|---|---|
Numeric | ||
b |
1-byte Integer (INT1) | int |
s |
2-byte Integer (INT2) | int |
i |
4-byte Integer (INT4) | int |
(int8)8 |
8-byte Integer (INT8) | long |
p |
Packed number in BCD format (DEC) | decimal |
(decfloat16)a |
Decimal floating point 8 bytes IEEE 754r (DF16_DEC, DF16_RAW) | decimal |
(decfloat34)e |
Decimal floating point 16 bytes IEEE 754r (DF34_DEC, DF34_RAW) | decimal |
f |
Binary floating-point number (FLTP) | double |
Character | ||
c |
Character string (CHAR/LCHR) | string |
string |
Character string (SSTRING, GEOM_EWKB) | string |
string |
Character string CLOB (STRING) | bytes |
n |
Numeric Text (NUMC/ACCP) | string |
Byte | ||
x |
Binary Data (RAW/LRAW) | bytes |
xstring |
Byte string BLOB (RAWSTRING) | bytes |
Date/Time | ||
d |
Date (DATS) | date |
t |
Time (TIMS) | time |
utclong/utcl |
Timestamp | timestamp |
Validation
Click Validate on the top right or Get Schema.
The plugin generates a schema based on the metadata from SAP. It automatically maps SAP data types to corresponding Cloud Data Fusion data types.
Run a data pipeline
- After deploying the pipeline, click Configure on the top center panel.
- Select Resources.
- If needed, change the Executor CPU and Memory based on the overall data size and the number of transformations used in the pipeline.
- Click Save.
- To start the data pipeline, click Run.
Optimize performance
The plugin uses Cloud Data Fusion's parallelization capabilities. The following guidelines will help you configure the runtime environment so that you provide sufficient resources to the runtime engine to achieve the intended degree of parallelism and performance.
SAP configuration
Recommended: Use an SAP Communication user rather than a Dialog user (this uses less SAP system resources). Also, if a Message Server is available in your landscape, use a Load Balanced SAP connection rather than a Direct connection.
If you specify values for Number of Splits and Package Size, the plugin may adjust these values as to not exhaust the SAP work processes and memory available. These are the upper bounds of the SAP resources used:
- 50% of available work processes
- 70% of available memory per work process
Plugin configuration
Recommended: Leave the Number of Splits to Generate and Package Size blank, unless you are familiar with your SAP system's memory settings. By default, these values are automatically tuned for better performance.
Use the following properties to ensure optimal performance when you run the pipeline:
Number of Splits to Generate: This directly controls the parallelism on the Cloud Data Fusion side. The runtime engine creates the specified number of partitions (and SAP connections) while extracting the table records. Values between 8-16 are recommended but can increase up to 30 or even 64 with the appropriate configuration on the SAP side (allocating appropriate memory resources for the work processes in SAP).
If the value is
0
or left blank, then the system automatically chooses an appropriate value based on the number of available SAP work processes, records to be extracted, and the package size.Package Size: Number of records to be extracted in a single SAP network call. This is the number of records SAP stores in memory during every extract call. Increase this (from the default of
70000
), if your SAP system allocates sufficient memory for the work processes. In most default configurations, you can increase up to100000
, but larger sizes might require reconfiguring the SAP system.
Cloud Data Fusion resource settings
Recommended: Use 1 CPU and 4 GB of memory per Executor (this value applies to each Executor process). Set these in the Configure > Resources dialog.
Dataproc cluster settings
Recommended: At minimum, allocate a total of CPUs (across workers) greater than the intended number of splits (see the Plugin configuration section). For example, if you have 16 splits, define 20 or more CPUs in total, across all workers (there is an overhead of 4 CPUs used for coordination).
Support details
Supported SAP products and versions
Supported sources are SAP ERP6/NW7.5 and SAP S4HANA 1909 and 2020.
Supported SAP Tables and Views for extraction
The plugin supports SAP transparent tables and views, CDS views, and HANA views.The objects are read through the SAP Application layer, not the Database layer. Pool and Cluster tables are not supported.
Limits on the volume of data or record width
The row width supported is limited to 30 KB. There is no limit on the number of records extracted or table size.
Expected plugin throughput
For an appropriately configured environment, the plugin can extract around 9000 rows/sec from a medium-sized table like EKKO and 6500 rows/sec from a large table like ACDOCA.
Delta extraction
Delta (data since the last run) extraction is not supported directly in the
plugin. You can define data pipelines that filter records based on timestamp
fields in transactional tables (e.g., field TIMESTAMP
in table ACDOCA
, field
AEDAT
in table EKKO
). Use the plugin property Filter Options to indicate
the filtering condition.
Example: TIMESTAMP >= '20210130100000' AND TIMESTAMP <= ' 20210226000000'
(selects records in tableACDOCA
with TIMESTAMP
between 30 Jan 2021 10:00 UTC
and 26 Feb 2021 00:00 UTC).
Error scenarios
At runtime, the plugin writes log entries in the Cloud Data Fusion data
pipeline log. These entries are prefixed with CDF_SAP
for easy identification.
At design-time, when the user validates the plugin settings, the messages are displayed in the Properties area, highlighted in red. Some specific property validation messages are displayed immediately below the properties' user entry box, highlighted in red. These property validation error messages do not have a specific Message ID.
Example generic error message:
The following table lists some common error messages (the placeholder text is replaced by actual values at runtime):
Message ID | Message | Recommended Action |
---|---|---|
N/A (UI) | Required property UI_CONNECTION_PROPERTY_LABEL for
connection type UI_CONNECTION_TYPE_RADIO_OPTION. |
Enter an actual value or macro variable. |
N/A (UI) | Invalid value for property
UI_ADVANCED_OPTIONAL_PROPERTY_LABEL. |
Enter a non-negative whole number (0 or greater, without a decimal) or macro variable. |
CDF_SAP_01412 |
One or more SAP JCo library files are missing or of incompatible
version. |
Make sure the required JCo library (sapjco3.jar) and its associated OS dependent shared library (e.g. libsapjco3.so) correspond to the same version and were uploaded to Cloud Data Fusion as documented in the User Guide. |
CDF_SAP_01500 |
Unable to retrieve SAP destination from Destination Manager.
Cannot initiate connectivity test with SAP. Root Cause: SAP_ERROR_CODE - SAP_ROOT_CAUSE_MESSAGE |
Check the root cause displayed in the message and take appropriate action. |
CDF_SAP_01404 |
SAP connection test failed. Please verify the connection
parameters. Root Cause: SAP_ERROR_CODE - SAP_ROOT_CAUSE_MESSAGE |
Check the root cause displayed in the message and take appropriate action. |
CDF_SAP_01512 |
Unable to retrieve JCo Repository from SAP Destination.
Root Cause: SAP_ERROR_CODE - SAP root cause
message |
Check the root cause displayed in the message and take appropriate action. |
CDF_SAP_01513 |
Unable to retrieve JCo Function for SAP_RFM_NAME
from SAP Repository. Root Cause: SAP_ERROR_CODE - SAP_ROOT_CAUSE_MESSAGE
| Check the root cause displayed in the message and take appropriate action. |
CDF_SAP_01501 |
RFM SAP_RFM_NAME could not be found. | Verify that the appropriate Transport Request is correctly imported in SAP. |
CDF_SAP_01406 |
Error while executing RFM SAP_RFM_NAME. Root Cause: SAP_ERROR_CODE - SAP_ROOT_CAUSE_MESSAGE
|
Verify that the appropriate authorizations are assigned to the SAP user. |
CDF_SAP_01516 |
Table or View SAP_TABLE/VIEW_NAME could not
be found. |
Ensure that the table or view exists and is active in SAP. |
CDF_SAP_01517 |
SAP_TABLE/VIEW_NAME is not of type table or view. |
Ensure that it is a valid table or view and not a structure in SAP. |
CDF_SAP_1532 |
Filter Options syntax is not valid. |
Verify that correct OpenSQL syntax is followed while providing filter conditions. |
CDF_SAP_1534 |
Data buffer in SAP exceeded while extracting records from
table/view SAP_TABLE/VIEW_NAME. |
Decrease the package size and/or the number of splits. Alternatively, notify the SAP Admin to increase the memory resources available on the SAP server. |
CDF_SAP_1403 |
User is not authorized to access SAP_TABLE/VIEW_NAME
table/view data in SAP. |
Verify that appropriate read authorization on table/view SAP Table/View Name is assigned for the SAP User. |
CDF_SAP_1535 |
Query for the table/view AP_TABLE/VIEW_NAME
failed to execute successfully. |
Verify that valid column names are used in the filter condition. |
CDF_SAP_01520 |
Failed to extract records #FROM_RECORD_INDEX to
#TO_RECORD_INDEX, even after MAX_RETRY_COUNT
retries. |
Communication error with the SAP Server. Check your network connectivity and accessibility of the SAP Server from Cloud Data Fusion. |
What's next
- Learn more about Cloud Data Fusion.
- Learn more about SAP on Google Cloud.
- Refer to the CDAP documentation.