The BigQuery Data Transfer Service for Amazon S3 allows you to automatically schedule and manage recurring load jobs from Amazon S3 into BigQuery.
Supported file formats
The BigQuery Data Transfer Service currently supports loading data from Amazon S3 in one of the following formats:
- Comma-separated values (CSV)
- JSON (newline-delimited)
Supported compression types
The BigQuery Data Transfer Service for Amazon S3 supports loading compressed data. The compression types supported by BigQuery Data Transfer Service are the same as the compression types supported by BigQuery load jobs. For more information, see Loading compressed and uncompressed data.
Amazon S3 prerequisites
To load data from an Amazon S3 data source, you must:
- Provide the Amazon S3 URI for your source data
- Have your access key ID
- Have your secret access key
- Set, at a minimum, the AWS managed policy
AmazonS3ReadOnlyAccesson your Amazon S3 source data
Amazon S3 URIs
When you supply the Amazon S3 URI, the path must be in the following format
s3://bucket/folder1/folder2/... Only the top-level bucket name is required.
Folder names are optional. If you specify a URI that includes only the bucket
name, all files in the bucket are transferred and loaded into BigQuery.
The Amazon S3 URI and the destination table can both be parameterized, allowing you to load data from Amazon S3 buckets organized by date. Note that currently, the bucket portion of the URI cannot be parameterized. The parameters used by Amazon S3 transfers are the same as those used by Cloud Storage transfers.
Wildcard support for Amazon S3 URIs
If your source data is separated into multiple files that share a common base-name, you can use a wildcard in the URI when you load the data.
To add a wildcard to the URI, you append an asterisk (*) to the
base-name. For example, if you have two files named
fed-sample000002.csv, the bucket URI would be
You can use only one wildcard for objects (filenames) within your bucket. The wildcard can appear inside the object name or at the end of the object name. Appending a wildcard to the bucket name is unsupported.
AWS access keys
The access key ID and secret access key are used to access the Amazon S3 data on your behalf. As a best practice, create a unique access key ID and secret access key specifically for Amazon S3 transfers to give minimal access to the BigQuery Data Transfer Service. For information on managing your access keys, see the AWS general reference documentation.
When you transfer data from Amazon S3, it is possible that some of your data will not be transferred to BigQuery, particularly if the files were added to the bucket very recently. It should take approximately 10 minutes for a file to become available to the BigQuery Data Transfer Service after it is added to the bucket.
In some cases, however, it may take longer than 10 minutes. To reduce the possibility of missing data, schedule your Amazon S3 transfers to occur at least 10 minutes after your files are added to the bucket. For more information on the Amazon S3 consistency model, see Amazon S3 data consistency model in the Amazon S3 documentation.
For information on BigQuery Data Transfer Service pricing, see the Pricing page.
Note that costs can be incurred outside of Google by using this service. Please review the Amazon S3 pricing page for details.