Third party transfers for BigQuery Data Transfer Service allow you to automatically schedule and manage recurring load jobs for external data sources such as Salesforce CRM, Adobe Analytics, and Facebook Ads.
Before you begin
Before you create a third party data transfer:
- Verify that you have completed all actions required to enable the BigQuery Data Transfer Service.
- Create a BigQuery dataset to store the data.
Ensure that the person creating the transfer has the following required permissions in BigQuery:
bigquery.transfers.updatepermissions to create the transfer
bigquery.datasets.updatepermissions on the target dataset
bigquery.adminpredefined, project-level IAM role includes
bigquery.datasets.updatepermissions. For more information on IAM roles in BigQuery, see Access control.
Consult the documentation for your third party data source to ensure you have configured any permissions necessary to enable the transfer.
Transfer run notifications are currently in Alpha. If you intend to setup transfer run notifications for Cloud Pub/Sub, you must have
pubsub.topics.setIamPolicypermissions. Cloud Pub/Sub permissions are not required if you just set up email notifications. For more information, see BigQuery Data Transfer Service run notifications.
Third party transfers are subject to the following limitations:
- You must create or update a third party transfer by using the BigQuery web UI in the GCP Console. Third party transfers cannot be configured by using the classic web UI.
- Currently, you cannot configure or update a third party transfers by using the command-line tool.
Setting up a third party data transfer
To create a third party data transfer by using the GCP Console:
Go to the Google Cloud Platform Marketplace.
Click the appropriate third party provider.
On the documentation page for the third party provider, click Enroll. The enrollment process may take a moment.
After the enrollment is complete, click Configure Transfer.
On the Create Transfer page:
For Source, choose the appropriate third party data source. You can click Explore Data Sources to see the list of third party providers in the Google Cloud Platform Marketplace.
For Display name, enter a name for the transfer such as
My Transfer. The transfer name can be any value that allows you to easily identify the transfer if you need to modify it later.
For Schedule, leave the default value (Start now) or click Start at a set time.
For Repeats, choose an option for how often to run the transfer. Options include:
- Daily (default)
If you choose an option other than Daily, additional options are available. For example, if you choose Weekly, an option appears for you to select the day of the week.
For Start date and run time, enter the date and time to start the transfer. If you choose Start now, this option is disabled.
For Destination dataset, choose the dataset you created to store your data.
(Optional) In the Notification options section:
- Click the toggle to enable email notifications. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
- For Select a Cloud Pub/Sub topic, choose your topic name or click Create a topic to create one. This option configures Cloud Pub/Sub run notifications for your transfer. Transfer run notifications are currently in alpha.
Click Connect Source.
When prompted, click Accept to give the BigQuery Data Transfer Service permission to connect to the data source and to manage your data in BigQuery.
Follow the instructions in the subsequent pages to configure the connection to your external data source.
After you complete the configuration steps, click Save.
Troubleshooting third party transfer setup
If you are having issues setting up your transfer, consult the appropriate third party vendor. Contact information is available on the transfer's documentation page in the Google Cloud Platform Marketplace.
Querying your data
When your data is transferred to BigQuery, the data is written to ingestion-time partitioned tables. For more information, see Introduction to partitioned tables.
If you query your tables directly instead of using the auto-generated views, you
must use the
_PARTITIONTIME pseudo-column in your query. For more information,
see Querying partitioned tables.