This page provides an overview of loading data from Cloud Storage into BigQuery.
When you load data from Cloud Storage into BigQuery, your data can be in any of the following formats:
- Comma-separated values (CSV)
- JSON (newline-delimited)
- Datastore exports
- Firestore exports
Recurring loads from Cloud Storage into BigQuery are supported by the BigQuery Data Transfer Service.
BigQuery supports loading data from any of the following Cloud Storage storage classes:
When you choose a location for your data, consider the following:
- Colocate your Cloud Storage buckets for loading data.
- If your BigQuery dataset is in a multi-regional location, the Cloud Storage bucket containing the data you're loading must be in a regional or multi-regional bucket in the same location. For example, if your BigQuery dataset is in the EU, the Cloud Storage bucket must be in a regional or multi-regional bucket in the EU.
- If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
- Exception: If your dataset is in the US multi-regional location, you can load data from a Cloud Storage bucket in any regional or multi-regional location.
- Develop a data management plan.
- If you choose a regional storage resource such as a BigQuery dataset or a Cloud Storage bucket, develop a plan for geographically managing your data.
For more information on Cloud Storage locations, see Bucket locations in the Cloud Storage documentation.
Moving BigQuery data between locations
You cannot change the location of a dataset after it is created, but you can make a copy of the dataset. You cannot move a dataset from one location to another, but you can manually move (recreate) a dataset.
To see steps for copying a dataset, including across regions, see Copying datasets.
Moving a dataset
To manually move a dataset from one location to another, follow this process:
Export the data from your BigQuery tables to a regional or multi-region Cloud Storage bucket in the same location as your dataset. For example, if your dataset is in the EU multi-region location, export your data into a regional or multi-region bucket in the EU.
Copy or move the data from your Cloud Storage bucket to a regional or multi-region bucket in the new location. For example, if you are moving your data from the US multi-region location to the Tokyo regional location, you would transfer the data to a regional bucket in Tokyo. For information on transferring Cloud Storage objects, see Renaming, copying, and moving objects in the Cloud Storage documentation.
Note that transferring data between regions incurs network egress charges in Cloud Storage.
After you transfer the data to a Cloud Storage bucket in the new location, create a new BigQuery dataset (in the new location). Then, load your data from the Cloud Storage bucket into BigQuery.
You are not charged for loading the data into BigQuery, but you will incur charges for storing the data in Cloud Storage until you delete the data or the bucket. You are also charged for storing the data in BigQuery after it is loaded. Loading data into BigQuery is subject to the limits on load jobs.
You can also use Cloud Composer to move and copy large datasets programmatically.
For more information on using Cloud Storage to store and move large datasets, see Using Cloud Storage with big data.
You are subject to the following limitations when you load data into BigQuery from a Cloud Storage bucket:
- If your dataset's location is set to a value other than
US, the regional or multi-regional Cloud Storage bucket must be in the same region as the dataset.
- BigQuery does not guarantee data consistency for external data sources. Changes to the underlying data while a query is running can result in unexpected behavior.
Depending on the format of your Cloud Storage source data, there may be additional limitations. For more information, see:
- CSV limitations
- JSON limitations
- Datastore export limitations
- Firestore export limitations
- Limitations on nested and repeated data
Retrieving the Cloud Storage URI
To load data from a Cloud Storage data source, you must provide the Cloud Storage URI.
The Cloud Storage URI comprises your bucket name and your object (filename).
For example, if the Cloud Storage bucket is named
mybucket and the data
file is named
myfile.csv, the bucket URI would be
If your data is separated into multiple files you can use a
wildcard in the URI. For more information, see Cloud Storage
BigQuery does not support source URIs that include multiple
consecutive slashes after the initial double slash. Cloud Storage object
names can contain multiple consecutive slash ("/") characters. However,
BigQuery converts multiple consecutive slashes into a single
slash. For example, the following source URI, though valid in Cloud Storage,
does not work in BigQuery:
To retrieve the Cloud Storage URI:
Open the Cloud Storage console.
Browse to the location of the object (file) that contains the source data.
At the top of the Cloud Storage console, note the path to the object. To compose the URI, replace
gs://bucket/filewith the appropriate path, for example,
gs://mybucket/myfile.json. bucket is the Cloud Storage bucket name and file is the name of the object (file) containing the data.
When you load data into BigQuery, you need permissions to run a load job and permissions that let you load data into new or existing BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need permissions to access to the bucket that contains your data.
At a minimum, the following permissions are required to load data into BigQuery. These permissions are required if you are loading data into a new table or partition, or if you are appending or overwriting a table or partition.
The following predefined IAM roles include both
The following predefined IAM roles include
In addition, if a user has
bigquery.datasets.create permissions, when that
user creates a dataset, they are granted
bigquery.dataOwner access to it.
bigquery.dataOwner access lets the user create and
update tables in the dataset by using a load job.
For more information on IAM roles and permissions in BigQuery, see Access control.
Cloud Storage permissions
To load data from a Cloud Storage bucket, you must be granted
storage.objects.get permissions. If you are using a URI wildcard,
you must also have
The predefined IAM role
can be granted to provide both
Cloud Storage access and storage logs
Cloud Storage provides access and storage log files in CSV format, which can be directly imported into BigQuery for analysis. For more information on loading and analyzing Cloud Storage logs, see Access logs & storage logs in the Cloud Storage documentation.
Wildcard support for Cloud Storage URIs
If your Cloud Storage data is separated into multiple files that share a common base-name, you can use a wildcard in the URI when you load the data.
To add a wildcard to the Cloud Storage URI, you append an asterisk (
the base-name. For example, if you have two files named
fed-sample000002.csv, the bucket URI is
You can then use this wildcard URI in the Cloud Console, the classic web UI,
bq command-line tool, the API, or the client libraries.
You can use only one wildcard for objects (filenames) within your bucket. The wildcard can appear inside the object name or at the end of the object name. Appending a wildcard to the bucket name is unsupported.
For Google Datastore exports, only one URI can be specified, and it
must end with
The asterisk wildcard character isn't allowed when you do the following:
- Create external tables linked to Datastore or Firestore exports.
- Load Datastore or Firestore export data from Cloud Storage.
To learn how to load data from Cloud Storage into BigQuery, see the documentation for your data format:
To learn about recurring loads from Cloud Storage into BigQuery, see Cloud Storage transfers.