SAP ODP plugin overview

This guide describes how to deploy, configure, and run data pipelines that use the SAP ODP plugin. You can use SAP as a source for batch-based and delta-based data extraction in Cloud Data Fusion through Operational Data Provisioning (ODP).

This plugin enables bulk data integration from SAP applications with Cloud Data Fusion. You can configure and execute bulk data transfers from SAP DataSources and core data services (CDS) views without any coding.

For supported SAP applications and DataSources for extraction, see the Support details. For more information about SAP on Google Cloud, see the Overview of SAP on Google Cloud.

Objectives

  • Configure the SAP ERP system (activate DataSources in SAP).
  • Deploy the plugin in your Cloud Data Fusion environment.
  • Download the SAP transport from Cloud Data Fusion and install it in SAP.
  • Use Cloud Data Fusion and SAP ODP to create data pipelines for integrating SAP data.

Before you begin

To use this plugin, you will need domain knowledge in the following areas:

  • Building pipelines in Cloud Data Fusion
  • Access management with IAM
  • Configuring SAP Cloud and on-premises enterprise resource planning (ERP) systems

User roles

The tasks on this page are performed by people with the following roles in Google Cloud or in their SAP system:

User type Description
Google Cloud Admin Users assigned this role are administrators of Google Cloud accounts.
Cloud Data Fusion User Users assigned this role are authorized to design and run data pipelines. They are granted, at minimum, the Data Fusion Viewer ( roles/datafusion.viewer) role. If you are using role-based access control, you might need additional roles.
SAP Admin Users assigned this role are administrators of the SAP system. They have access to download software from the SAP service site. It is not an IAM role.
SAP User Users assigned this role are authorized to connect to an SAP system. It is not an IAM role.

Prerequisites for ODP extraction

  1. Activate the DataSource or CDS view in the SAP system.

  2. Populate the DataSource or CDS view with data.

  3. Enable ODP extraction in the DataSource or CDS view. To check whether it's enabled, follow these steps:

DataSource

To check if the DataSource is exposed for ODP data extraction, follow these steps:

  1. Log in to the SAP system. Go to t-code SE16N.
  2. Provide the table name ROOSATTR and enter the DataSource name in OLTPSOURCE.
  3. Click Execute or press F8.
  4. If the EXPOSE_EXTERNAL field is marked as X, the DataSource can be used for ODP extraction.

If the DataSource is not listed in this table, or the EXPOSE_EXTERNAL field is blank, follow these steps to expose the DataSource for ODP extraction:

  1. Log in to the SAP system. Go to t-code SA38.
  2. Provide the program name RODPS_OS_EXPOSE and click Execute.
  3. Provide the DataSource name and click Release DataSource.
  4. Save the changes in the transport.

CDS view

To check if a SAP CDS view is exposed for ODP data extraction, follow these steps:

  1. Open the CDS view in the SAP CDS Editor.
  2. In the CDS view, look for the following annotations:
  • @Analytics.dataCategory
  • @Analytics.dataExtraction.enabled

If the CDS view has both of these annotations, then it's exposed for ODP data extraction. Without them, it's not exposed.

Data Extraction modes

The plugin supports these data extraction modes:

  • Full: Extracts all data.
  • Sync: Determines whether to use full (all data), delta (incremental changes), or recovery (repeat last execution) extraction mode for current execution based on previous execution status in SAP.

DataSource filterable columns

Only some DataSource columns can be used for filter conditions (this is an SAP limitation by design).

To obtain the field information, follow these steps:

  1. Log in to the SAP system. Go to t-code RSA3.
  2. Provide the DataSource name and press Enter.

    You can use fields shown in the Selections section as filters. Supported operations are Equal and Between (Range).

Configure the SAP ERP system

The SAP ODP uses a Remote Function Module (RFM), which needs to be installed on each SAP Server where data gets extracted. This RFM is delivered as an SAP transport.

To configure your SAP system, follow these steps:

  1. The Cloud Data Fusion user must download the zip file containing the SAP transport and provide it to the SAP Admin. To download it, use the link provided with the plugin in the Hub. See Setting up Cloud Data Fusion.
  2. The SAP Admin must import the SAP transport into the SAP system and verify the objects created. For more information, see Installing SAP transport.
  3. The SAP user can either import authorization transport or create the authorization role based on the authorization object. For more information about authorization objects, see Required SAP authorization.

Activate the DataSource

To extract the data, a DataSource must be activated in the source SAP system. To activate a DataSource in SAP, follow these steps:

  1. Go to transaction code RSA5
  2. Expand the DataSources list.
  3. Click Search.
  4. Provide the DataSource name and click Enter.
  5. If the search is successful, the DataSource appears in the result list.

    Select the DataSource name and click Enter.

  6. Select the DataSource and click Activate DataSources.

  7. In the Create Object Directory Entry dialog, enter the Package name and click Save.

  8. In the Prompt for transportable workbench request dialog, enter the Transport Number in the Request field. Click Enter.

    The selected DataSource is activated in SAP.

Install the SAP transport files

To design and run data pipelines in Cloud Data Fusion, the SAP components are delivered in SAP transport files, which are archived in a zip file. The download is available when you deploy the plugin in the Cloud Data Fusion Hub.

Download SAP ODP transport zip file

After the transport is imported into the SAP system, the following SAP objects are created:

  • RFC-enabled function modules:
    • /GOOG/ODP_DS_EXTRACT_DATA
    • /GOOG/ODP_DS_FETCH_DATA
    • /GOOG/ODP_DS_METADATA
    • /GOOG/ODP_REPL_CLEANUP
  • Authorization Role: /GOOG/ODP_AUTH

To install the SAP transport, follow these steps:

Step 1: Upload the transport request files

  1. Log into the operating system of the SAP Instance.
  2. Use the SAP transaction code AL11 to get the path for the DIR_TRANS folder. Typically, the path is /usr/sap/trans/.
  3. Copy the cofiles to the DIR_TRANS/cofiles folder.
  4. Copy the data files to the DIR_TRANS/data folder.
  5. Set the User and Group of data and cofile to <sid>adm and sapsys.

Step 2: Import the transport request files

The SAP administrator can import the transport request files by using one of the following options:

Option 1: Import the transport request files by using the SAP transport management system
  1. Log in to the SAP system as an SAP administrator.
  2. Enter the transaction STMS.
  3. Click Overview > Imports.
  4. In the Queue column, double-click the current SID.
  5. Click Extras > Other Requests > Add.
  6. Select the transport request ID and click Continue.
  7. Select the transport request in the import queue and click Request > Import.
  8. Enter the Client number.
  9. On the Options tab, select Overwrite Originals and Ignore Invalid Component Version (if available).

    Optional: To schedule a reimport of the transports for a later time, select Leave Transports Requests in Queue for Later Import and Import Transport Requests Again. This is useful for SAP system upgrades and backup restorations.

  10. Click Continue.

  11. To verify the import, use any transactions, such as SE80 and SU01.

Option 2: Import the transport request files at the operating system level
  1. Log in to the SAP system as an SAP system administrator.
  2. Add the appropriate requests to the import buffer by running the following command:

    tp addtobuffer TRANSPORT_REQUEST_ID SID
    

    For example: tp addtobuffer IB1K903958 DD1

  3. Import the transport requests by running the following command:

    tp import TRANSPORT_REQUEST_ID SID client=NNN U1238
    

    Replace NNN with the client number. For example: tp import IB1K903958 DD1 client=800 U1238

  4. Verify that the function module and authorization roles were imported successfully by using any appropriate transactions, such as SE80 and SU01.

Required SAP authorizations

To run a data pipeline in Cloud Data Fusion, you need an SAP User. The SAP User must be of the Communications or Dialog types. To avoid using SAP dialog resources, the Communications type is recommended. Users can be created using SAP transaction code SU01.

Assign an Authorization Role for the SAP user to design and run data pipelines in Cloud Data Fusion. You can either assign the Authorization Role /GOOG/ODP_AUTH, which is included in the SAP transports provided with the plugin, or create the Authorization Role manually in SAP.

To create the Authorization Role manually, follow these steps:

  1. In the SAP GUI, enter the transaction code PFCG to open the Role Maintenance window.
  2. In the Role field, enter a name for the role.

    For example: zcdf_role

  3. Click Single Role.

    The Create Roles window opens.

  4. In the Description field, enter a description and click Save.

    For example: Authorizations for SAP ODP plugin.

  5. Click the Authorizations tab. The title of the window changes to Change Roles.

  6. Under Edit Authorization Data and Generate Profiles, click Change Authorization Data.

    The Choose Template window opens.

  7. Click Do not select templates.

    The Change role: Authorizations window opens.

  8. Click Manually.

  9. Provide the authorizations shown in the following SAP Authorization table.

  10. Click Save.

  11. To activate the Authorization Role, click the Generate icon.

Table 3: SAP Authorizations

Object Class Object Class Text Authorization object Authorization object Text Authorization Text Value
AAAB Cross-application Authorization Objects S_RFC Authorization Check for RFC Access ACTVT Activity 16
AAAB Cross-application Authorization Objects S_RFC Authorization Check for RFC Access RFC_NAME Name of RFC object to which access is allowed /GOOG/CDF_ODP_FG,
/GOOG/ODP_DS_EXTRACT_DATA,
/GOOG/ODP_DS_FETCH_DATA,
/GOOG/ODP_DS_METADATA,
DDIF_FIELDINFO_GET,
RFCPING,
RFC_GET_FUNCTION_INTERFACE,
RODPS_REPL_ODP_CLOSE,
RODPS_REPL_SOURCES_GET_LIST,
SAPTUNE_GET_SUMMARY_STATISTIC,
TH_WPINFO
AAAB Cross-application Authorization Objects S_RFC Authorization Check for RFC Access RFC_TYPE Type of RFC object to which access is allowed FUGR
FUNC
AAAB Cross-application Authorization Objects S_TCODE Transaction Code Check at Transaction Start TCD Transaction Code SM50
BC_A Basis: Administration S_ADMI_FCD System Authorizations S_ADMI_FCD System administration function PADM,
ST0R
BC_A Basis: Administration S_BTCH_ADM Background Processing: Background Administrator BTCADMIN Background Administrator ID Y
BC_A Basis: Administration S_BTCH_JOB Background Processing: Operations on Background Jobs JOBACTION Job operations RELE
BC_A Basis: Administration S_BTCH_JOB Background Processing: Operations on Background Jobs JOBGROUP Summary of jobs for a group ''
MM_E Materials Management: Purchasing M_BEST_BSA Document Type in Purchase Order ACTVT Activity 03
MM_E Materials Management: Purchasing M_BEST_BSA Document Type in Purchase Order BSART Purchasing Document Type *
RO Authorizations: BW Service API S_RO_OSOA SAP DataSource Authorizations ACTVT Activity 03
RO Authorizations: BW Service API S_RO_OSOA SAP DataSource Authorizations OLTPSOURCE DataSource (OSOA/OSOD) *
RO Authorizations: BW Service API S_RO_OSOA SAP DataSource Authorizations OSOAAPCO Application Component of a DataSource (OSOA/OSOD) *
RO Authorizations: BW Service API S_RO_OSOA SAP DataSource Authorizations OSOAPART Subobject for DataSource DATA
*To restrict a user from running ODP pipelines with specific DataSources, for the authorization object S_RO_OSOA, do not use an asterisk (*). Instead, provide the required DataSource names in OLTPSOURCE (for example, 2LIS_02_ITM, 0MATERIAL_ATTR).

Set up Cloud Data Fusion

Ensure that communication is enabled between the Cloud Data Fusion instance and the SAP server. For private instances, set up VPC network peering. After network peering is established with the project where the SAP Systems are hosted, no additional configuration is required to connect to your Cloud Data Fusion instance. Both the SAP system and Cloud Data Fusion instance need to be inside of the same project.

Cloud Data Fusion user steps

To configure your Cloud Data Fusion environment for the plugin:

  1. Go to the instance details:

    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. Click Instances, and then click the instance's name to go to the Instance details page.

      Go to Instances

  2. Check that the instance has been upgraded to version 6.4.0 or later. If the instance is in an earlier version, you need to upgrade it.

  3. Open the instance. When the Cloud Data Fusion web interface opens, click Hub.

  4. Select the SAP tab > SAP ODP.

    If the SAP tab is not visible, see Troubleshooting SAP integrations.

  5. Click Deploy SAP ODP Plugins.

    The plugin now appears in the Source menu on the Studio page.

SAP Admin and Google Cloud Admin steps

The SAP Admin downloads the following JCo artifacts from the SAP Support site and gives them to the Google Cloud Admin.

  • One platform-independent (sapjco3.jar)
  • One platform-dependent (libsapjco3.so on Unix)

To download the files:

  1. Go to the SAP Connectors page.

  2. Click SAP Java Connector/Tools and Services. You can select platform-specific links for the download.

  3. Select the platform that your Cloud Data Fusion instance runs on:

    1. If you use standard Google Cloud images for the VMs in your cluster (the default for Cloud Data Fusion), select Linux for Intel compatible processors 64-bit x86.
    2. If you use a custom image, select the corresponding platform.
  4. The Google Cloud Admin must copy the JCo files to a readable Cloud Storage bucket. Provide the bucket path to the Cloud Data Fusion user to enter it in the corresponding plugin property in Cloud Data Fusion: SAP JCo Library GCS Path (see Configuring the plugin).

  5. The Google Cloud Admin must grant read access for the two files to the Cloud Data Fusion service account for the design environment and the Dataproc service account for the execution environment. For more information, see Cloud Data Fusion service accounts.

Configure the plugin

The SAP ODP plugin reads the content of an SAP DataSource or CDS view.

To filter the records, you can configure the following properties for the SAP ODP.

Basic properties

Property Description
Reference Name Name used to uniquely identify the source for lineage, annotating metadata, etc.
SAP Client The SAP client to use (for example 100).
SAP Language SAP logon language (for example EN).
Connection Type SAP Connection type (Direct or Load Balanced). Selecting one type changes the following available fields:
For a direct connection:
  • AP Application Server Host: The SAP server name or IP address.
  • SAP System Number: The SAP system number (for example, 00).
  • SAP Router: The router string.

For a load balanced connection:
  • SAP Message Server Host: The SAP Message Host name or IP address.
  • SAP Message Server Service or Port Number: The SAP Message Server Service or Port Number (for example, sapms02).
  • SAP System ID (SID): The SAP System ID (for example, N75).
  • SAP Logon Group Name: The SAP logon group name, (for example, PUBLIC).
Object Type Select DataSources / Extractors or ABAP Core Data Services.
SAP ODP Name The SAP DataSources or CDS view name (for example, 2LIS_02_ITM).
Get Schema button Generates a schema based on the metadata from SAP, with automatic mapping of SAP data types to corresponding Cloud Data Fusion data types (same functionality as the Validate button).
Extract Type The plugin supports the following two types of data extraction:
  • Full (All Data): Extracts all available data.
  • Sync (Automatic selection based on previous execution): Determines whether full, delta (incremental), or recovery (recover data from last execution) mode should be run based on the previous execution type and status available in SAP. It extracts full data in the initial pipeline execution (ODP mode F) and changes data in subsequent pipeline executions (ODP modes D, R).
    For more information, see Extract types.

Credential properties

Property Description
SAP Logon Username SAP User name.
Recommended: If the SAP Logon Username changes periodically, use a macro.
SAP Logon Password SAP User password. Recommended: Use secure macros for sensitive values like User password.

SAP JCo properties

Property Description
GCP Project ID Google Cloud project ID, which uniquely identifies a project. You can find it in the Google Cloud console.
SAP JCo Library GCS Path The Cloud Storage path that contains the user-uploaded SAP JCo library files.

Advanced properties

Property Description
SAP ODP Subscriber Name Identifies a valid ODP Subscriber for the data extraction from a valid DataSource or CDS view. This name must be a maximum of 32 characters, without any spaces, and can only contain a-z, A-Z, 0-9, _, or /. It must be unique for different pipelines extracting data from the same DataSource. If blank or not specified, the execution framework uses the default combination of the Project ID, Namespace, and Pipeline name. If this default value is longer than 32 characters, the plugin truncates it. This field lets you reuse a previous subscription, such as one created by a third-party tool.
Filter Options (Equal) Defines the value a field must have to be read. A list of metadata field names and their value pairs are filter options. It specifies the filter condition to apply when reading data from a DataSource. Only records that satisfy the filter are extracted. The filter key corresponds to a field in the schema and must be of a simple type (not ARRAY, RECORD, or UNION).
Filter Options (Range) Defines low and high bounds in which the field value must be read. It has the format low AND high. A list of metadata field names and their value pairs are filter options. It specifies the filter condition to apply when reading data from a DataSource. Only records that satisfy the filter are extracted. The filter key corresponds to a field in the schema and must be of a simple type (not ARRAY, RECORD, or UNION).
Filter Options (Less Equal) Defines the value that a field must be less than or equal to. List of metadata field names and values to use as filter options. It's a comma-separated list of key-value pairs, where each pair is separated by a colon (:) and specifies the filter condition to apply when reading data from a DataSource or CDS view. Only records that satisfy the filters are extracted. The filter key corresponds to a field in the schema and must be a simple type (not an ARRAY, ENUM, MAP, RECORD, or UNION).
Filter Options (Greater Equal) Defines the value that a field must be greater than to be read. List of metadata field names and values to use as filter options. It's a comma-separated list of key-value pairs, where each pair is separated by a colon (:) and specifies the filter condition to apply when reading data from a DataSource or CDS view. Only records that satisfy the filters are extracted. The filter key corresponds to a field in the schema and must be a simple type (not an ARRAY, ENUM, MAP, RECORD, or UNION).
Filter Options (Not Equal) Defines the value that a field must not be equal to be read. List of metadata field names and values to use as filter options. This is a comma-separated list of key-value pairs, where each pair is separated by a colon (:) and specifies the filter condition to apply when reading data from a DataSource or CDS view. Only records that satisfy the filters are extracted. The filter key corresponds to a field in the schema and must be a simple type (not an ARRAY, ENUM, MAP, RECORD, or UNION).
Number of Splits to Generate Creates partitions to extract table records in parallel. The runtime engine creates the specified number of partitions (and SAP connections) while extracting the table records. Use caution when setting this property to a number greater than 16 because higher parallelism increases simultaneous connections with SAP. Values between 8 and 16 are recommended. If the value is 0 or left blank, the system chooses an appropriate value based on the number of Executors available and records to extract, and the package size.
Package Size (in KB) Number of records to extract in a single SAP network call. It's the number of records SAP stores in memory during every network extract call. Multiple data pipelines extracting data can peak the memory usage and can result in failures due to Out of Memory errors. Use caution when setting this property.
Enter a positive, whole number. If 0 or left blank, the plugin uses a standard value of 70000 (or an appropriately calculated value) if the number of records to extract is less than 70,000.
If the data pipeline fails with Out of Memory errors, either decrease the package size or increase the memory available for your SAP work processes.
Additional SAP Connection Properties Set additional SAP JCo properties to override the SAP JCo defaults. For example, setting jco.destination.pool_capacity = 10 overrides the default connection pool capacity.

Configure a pipeline

For large datasets (for example, a few million records) with a large number of splits (above 16), the SAP system might send out duplicate records. To prevent this, using one of the following deduplication methods in your Cloud Data Fusion pipeline is recommended.

In both methods, you use the Key Fields of the DataSource to perform the deduplication.

  • If you use a BigQuery sink in the pipeline, use the Upsert mode in the BigQuery sink. Provide the Key Fields in the Table Key section of the BigQuery sink plugin.

  • If you do not use a BigQuery sink in the pipeline, use the Deduplicate plugin, inserted in the pipeline after the SAP ODP source plugin. Provide the key fields in the Unique Fields section of the Deduplicate plugin.

Data Type Mapping

Table 4: SAP data types mapping to Cloud Data Fusion types

SAP data type ABAP type Description (SAP) Cloud Data Fusion data type
Numeric
INT1 b 1-byte integer int
INT2 s 2-byte integer int
INT4 i 4-byte integer int
INT8 8 8-byte integer long
DEC p Packed number in BCD format (DEC) decimal
DF16_DEC,
DF16_RAW
a Decimal floating point 8 bytes IEEE 754r decimal
DF34_DEC,
DF34_RAW
e Decimal floating point 16 bytes IEEE 754r decimal
FLTP f Binary floating point number double
Character
CHAR,
LCHR
c Character string string
SSTRING,
GEOM_EWKB
string Character string string
STRING string Character string CLOB bytes
NUMC,
ACCP
n Numeric text string
Byte
RAW,
LRAW
x Binary data bytes
RAWSTRING xstring Byte string BLOB bytes
Date/Time
DATS d Date date
TIMS t Time time
TIMS utcl (Utclong),
TimeStamp
timestamp

Validation

Click Validate on the top right or Get Schema.

The plugin validates the properties and generates a schema based on the metadata from SAP. It automatically maps SAP data types to corresponding Cloud Data Fusion data types.

Run a data pipeline

  1. After deploying the pipeline, click Configure on the top center panel.
  2. Select Resources.
  3. If needed, change the Executor CPU and Memory based on the overall data size and the number of transformations used in the pipeline.
  4. Click Save.
  5. To start the data pipeline, click Run.

Optimize performance

The plugin uses Cloud Data Fusion's parallelization capabilities. The following guidelines can help you configure the runtime environment so that you provide sufficient resources to the runtime engine to achieve the intended degree of parallelism and performance.

Optimize SAP configuration

Recommended: Use an SAP Communication user rather than a Dialog user (this uses fewer SAP system resources). Also, if a Message Server is available in your landscape, use a Load Balanced SAP connection rather than a Direct connection.

If you specify values for Number of Splits and Package Size, the plugin may adjust these values to not exhaust the SAP work processes and memory available. These are the upper bounds of the SAP resources used:

  • 50% of available work processes
  • 70% of available memory per work process

Optimize plugin configuration

Recommended: Leave the Number of Splits to Generate and Package Size blank, unless you are familiar with your SAP system's memory settings. By default, these values are automatically tuned for better performance.

Use the following properties for optimal performance when you run the pipeline:

  • Number of Splits to Generate: This directly controls the parallelism on the Cloud Data Fusion side. The runtime engine creates the specified number of partitions (and SAP connections) while extracting the table records. Values between 8 and 16 are recommended but can increase up to 32 or even 64 with the appropriate configuration on the SAP side (allocating appropriate memory resources for the work processes in SAP).

    If the value is 0 or left blank, then the system automatically chooses an appropriate value based on the number of available SAP work processes, records to extract, and the package size.

  • Package Size: The size of each package of data in bytes to fetch in every network call to SAP. Smaller size causes frequent network calls repeating the associated overhead. A large package size (> 100 MB) might slow down data retrieval. 0 or no input defaults to 50 MB.

Extract Types

  • If the Extract Type is Full, the plugin always requests full data from the DataSource or CDS view.
  • If the Extract Type is Sync, the plugin first checks the status of the previous execution in SAP.
    • If there is no previous execution, choose Full (F) data.
    • If the previous execution type was Full (F):
      • If that execution was completed successfully, run the current in Delta (D) mode.
      • Otherwise, run the current in Full (F) mode. This allows recovering the data previously in error.
    • If the previous execution type was Delta (D) or Recovery (R) mode:
      • If that execution was completed successfully, run the current in Delta (D) mode.
      • Otherwise, run the current in Recovery (R) mode. This allows recovering the previous delta data in error.

Multiple pipeline extraction from the same DataSource

This feature is not currently supported. Only one pipeline can extract data from one DataSource at a time.

Recommended: Keep the SAP ODP Subscriber Name field blank and do not run multiple pipelines extracting data from the same DataSource.

Cloud Data Fusion resource settings

Recommended: Use 1 CPU and 4 GB of memory per Executor (this value applies to each Executor process). Set these in the Configure > Resources dialog.

Dataproc cluster settings

Recommended: At minimum, allocate a total of CPUs (across workers) greater than the intended number of splits (see Plugin configuration). For example, if you have 16 splits, define 20 or more CPUs in total, across all workers (there is an overhead of 4 CPUs used for coordination).

Recommended: Use a persistent Dataproc cluster to reduce the data pipeline runtime (this eliminates the Provisioning step which might take a few minutes or more). Set this in the Compute Engine configuration section.

Sample configurations and throughput

Sample development configurations:

  • Dataproc cluster with 8 workers, each with 4 CPUs and 26 GB memory. Use Number of Splits up to 28.
  • Dataproc cluster with 2 workers, each with 8 CPUs and 52 GB memory. Use Number of Splits up to 12.

Sample Production configurations and throughput:

  • Dataproc cluster with 8 workers, each with 8 CPUs and 32 GB memory. Use Number of Splits up to 32 (half the available total CPUs).
  • Dataproc cluster with 16 workers, each with 8 CPUs and 32 GB memory. Use Number of Splits up to 64 (half of the available CPUs).

The following table shows the sample throughput for a SAP S4HANA 1909 Production source system.

DataSource name Number of columns Clusters Package size Splits Extract type Extracted records Throughput
2LIS_11_VAITM 127 16 workers Default (50 MB) 0 Full 43 M 38.35 GB/Hour
2LIS_11_VAITM 127 16 workers Default (50 MB) 0 Full 43 M 38.35 GB/Hour
2LIS_11_VAITM 127 16 workers 10 MB 64 Full 43 M 36.78 GB/Hour
0FI_GL_14 232 16 workers 100 MB 64 Full 306 M 22.92 GB/Hour
0FI_GL_4 89 8 workers Default (50 MB) 0 Full 303 M 35.90 GB/Hour

Support details

Supported SAP products and versions

Supported sources:

  • SAP S4/HANA 1909 and later.
  • SAP ERP6 NW 7.31 SP16 and later. Import SAP note 2232584 to enable additional DataSources on the system.
  • SAP ERP systems based on NW 7.31 SP16 or later.

Supported SAP deployment models

The plugin was tested with SAP servers deployed on Google Cloud.

Supported SAP DataSources for extraction

The plugin supports the following DataSource types:

  • Transaction data
  • Master data
  • Attributes
  • Texts
  • Hierarchies

Required SAP notes

If you need to enable additional DataSources, implement the following note for ERP6 systems: 2232584: Release of SAP extractors for ODP replication (ODP SAPI). This external site requires an SAP login.

Active background jobs in SAP when the CDF pipeline has an error

In case of CDF pipeline error, such as when there is an error in the sink, the ODP plugin attempts to clean up any active SAP side processes related to the extraction by calling the custom RFM intended for cleanup: /GOOG/ODP_REPL_CLEANUP. When there are no errors, the plugin calls the standard RFM intended to close the queue: RODPS_REPL_ODP_CLOSE.

Limits on the volume of data or record width

There is no defined limit to the number of rows extracted or the size of the DataSource. We have tested with up to 306 million rows extracted in one pipeline run, with a record width of 1 KB.

Expected plugin throughput

For an environment configured according to the guidelines in section Optimizing performance, the plugin can extract around 38 GB per hour. Actual performance may vary with the Cloud Data Fusion and SAP system load or network traffic.

What's next