Overview of Cloud Storage transfers

The BigQuery Data Transfer Service for Cloud Storage lets you schedule recurring data loads from Cloud Storage to BigQuery. The Cloud Storage path and the destination table can both be parameterized, allowing you to load data from Cloud Storage buckets organized by date.

Supported file formats

The BigQuery Data Transfer Service currently supports loading data from Cloud Storage in one of the following formats:

  • Comma-separated values (CSV)
  • JSON (newline-delimited)
  • Avro
  • Parquet
  • ORC

Supported compression types

The BigQuery Data Transfer Service for Cloud Storage supports loading compressed data. The compression types supported by BigQuery Data Transfer Service are the same as the compression types supported by BigQuery load jobs. For more information, see Loading compressed and uncompressed data.

Cloud Storage URI

To load data from a Cloud Storage data source, you must provide the Cloud Storage URI.

The Cloud Storage URI comprises your bucket name and your object (filename). For example, if the Cloud Storage bucket is named mybucket and the data file is named myfile.csv, the bucket URI would be gs://mybucket/myfile.csv. If your data is separated into multiple files you can use a wildcard in the URI. For more information, see Cloud Storage Request URIs.

BigQuery Data Transfer Service does not support source URIs that include multiple consecutive slashes after the initial double slash. Cloud Storage object names can contain multiple consecutive slash ("/") characters. However, BigQuery Data Transfer Service converts multiple consecutive slashes into a single slash. For example, the following source URI, though valid in Cloud Storage, does not work in BigQuery Data Transfer Service: gs://bucket/my//object//name.

To retrieve the Cloud Storage URI:

  1. Open the Cloud Storage console.

    Cloud Storage console

  2. Browse to the location of the object (file) that contains the source data.

  3. At the top of the Cloud Storage console, note the path to the object. To compose the URI, replace gs://bucket/file with the appropriate path, for example, gs://mybucket/myfile.json. bucket is the Cloud Storage bucket name and file is the name of the object (file) containing the data.

Wildcard support for Cloud Storage URIs

If your Cloud Storage data is separated into multiple files that share a common base name, you can use a wildcard in the URI when you load the data.

To add a wildcard to the Cloud Storage URI, you append an asterisk (*) to the base name. For example, if you have two files named fed-sample000001.csv and fed-sample000002.csv, the bucket URI would be gs://mybucket/fed-sample*. This wildcard URI can then be used in the web UI or gcloud command-line tool (CLI).

You can use multiple wildcards for objects (filenames) within buckets. The wildcard can appear anywhere inside the object name.

Wildcards do not expand a directory in a gs://bucket/. For example, gs://bucket/dir/* finds files in the directory dir but doesn't find files in the subdirectory gs://bucket/dir/subdir/.

Neither can you match on prefixes without wildcards. For example, gs://bucket/dir doesn't match on gs://bucket/dir/file.csv nor gs://bucket/file.csv

However, you can use multiple wildcards for filenames within buckets. For example, gs://bucket/dir/*/*.csv matches gs://bucket/dir/subdir/file.csv.

For examples of wildcard support in combination with parameterized table names, see Using runtime parameters in transfers.

Location considerations

When you choose a location for your data, consider the following:

  • Colocate your Cloud Storage buckets for loading data.
    • If your BigQuery Data Transfer Service dataset is in a multi-regional location, the Cloud Storage bucket containing the data you're loading must be in a regional or multi-regional bucket in the same location. For example, if your BigQuery Data Transfer Service dataset is in the EU, the Cloud Storage bucket must be in a regional or multi-regional bucket in the EU.
    • If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
    • Exception: If your dataset is in the US multi-regional location, you can load data from a Cloud Storage bucket in any regional or multi-regional location.
  • Develop a data management plan.
    • If you choose a regional storage resource such as a BigQuery Data Transfer Service dataset or a Cloud Storage bucket, develop a plan for geographically managing your data.

For more information about Cloud Storage locations, see Bucket locations in the Cloud Storage documentation.

For more information about using Cloud Storage to store and move large datasets, see Using Cloud Storage with big data.

Pricing

  • Standard BigQuery Quotas & limits on load jobs apply.

  • After data is transferred to BigQuery, standard BigQuery storage and query pricing applies.

  • Data will not be automatically deleted from your Cloud Storage bucket after it is uploaded to BigQuery, unless you indicate deletion when setting up the transfer. See Setting up a Cloud Storage transfer.

  • See our transfers Pricing page for details.

Quotas and limits

The BigQuery Data Transfer Service uses load jobs to load Cloud Storage data into BigQuery.

All BigQuery Quotas and limits on load jobs apply to recurring Cloud Storage load jobs.

What's next

¿Te ha resultado útil esta página? Enviar comentarios:

Enviar comentarios sobre...

BigQuery Data Transfer Service