This page describes best practices for improving performance when you use a Salesforce batch source in Cloud Data Fusion.
Improve performance with PK chunking
PK chunking breaks up large datasets into smaller datasets, or chunks.
Enabling PK chunking in the Salesforce batch source plugin has the following benefits:
- It improves performance, especially for large datasets
- It reduces the load on the server
- It increases scalability
To use PK chunking, follow these steps:
- Go to the Cloud Data Fusion web interface and open your pipeline on the Studio page.
- Optional: If you haven't added a Salesforce node in your pipeline, add one:
- In the Source menu, click Salesforce. The Salesforce node appears in your pipeline. If you don't see the Salesforce source on the Studio page, deploy the Salesforce plugins from the Cloud Data Fusion Hub.
- To configure the source, go to the Salesforce node and click Properties.
- Turn on Enable PK chunking.
- In the Chunk size field, enter the number of records per chunk. The
default value is
100000
records. The maximum is250000
records. - Click Validate.
Use SObject query filters or SOQL queries
To reduce the number of API calls in Salesforce, retrieve records with SObject query filters or SOQL queries.
SObject query filters: configure the filter in the Salesforce plugin properties in the SObject name field. For more information, see Configure the plugin.
SOQL queries: configure the queries in the Salesforce plugin properties in the SOQL query field. For more information, see SOQL queries for the Salesforce source.
What's next
- Learn about configuring the Salesforce batch source in Cloud Data Fusion.
- Work through a Salesforce plugin tutorial.