This page provides an overview of loading data from Cloud Storage into BigQuery.
When you load data from Google Cloud Storage into BigQuery, your data can be in any of the following formats:
- Comma-separated values (CSV)
- JSON (newline-delimited)
- Parquet (Beta)
- Google Cloud Datastore backups
BigQuery supports loading data from any of the following Cloud Storage storage classes:
Retrieving the Google Cloud Storage URI
To load data from a Google Cloud Storage data source, you must provide the Cloud Storage URI.
The Cloud Storage URI comprises your bucket name and your object
(filename). For example, if the Cloud Storage bucket is named
mybucket and the
data file is named
myfile.csv, the bucket URI would be
gs://mybucket/myfile.csv. If your data is separated into multiple files you
can use a wildcard in the URI. For more information, see
Cloud Storage Request URIs.
BigQuery does not support source URIs that include multiple consecutive
slashes after the initial double slash. Cloud Storage object names can contain
multiple consecutive slash ("/") characters. However, BigQuery converts multiple
consecutives slashes into a single slash. For example, the following source URI,
though valid in Cloud Storage, does not work in BigQuery:
To retrieve the Cloud Storage URI:
Open the Cloud Storage web UI.
Browse to the location of the object (file) that contains the source data.
At the top of the Cloud Storage web UI, note the path to the object. To compose the URI, replace
gs://[BUCKET]/[FILE]with the appropriate path, for example,
[BUCKET]is the Cloud Storage bucket name and
[FILE]is the name of the object (file) containing the data.
When you load data into BigQuery, you need project or dataset-level permissions that allow you to load data into new or existing BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need access to the bucket that contains your data.
When you are loading data into BigQuery from Cloud Storage, you
must be granted the
bigquery.dataEditor role at the
project level or at the dataset level. Both roles grant users and groups
permission to load data into a new table or to append to or overwrite an
Granting the roles at the project level gives the user or group permission to load data into tables in every dataset in the project. Granting the roles at the dataset level gives the user or group the ability to load data only into tables in that dataset.
Cloud Storage permissions
In order to load data from a Cloud Storage bucket, you must be granted
storage.objects.get permissions at the project level or on that individual
bucket. If you are using a URI wildcard, you must also have
The predefined IAM role
can be granted to provide
Cloud Storage access and storage logs
Google Cloud Storage provides access and storage log files in CSV format, which can be directly imported into BigQuery for analysis. For more information on loading and analyzing Cloud Storage logs, see Access Logs & Storage Logs in the Cloud Storage documentation.
Wildcard support for Cloud Storage URIs
If your Google Cloud Storage data is separated into multiple files that share a common base-name, you can use a wildcard in the URI when you load the data.
To add a wildcard to the Cloud Storage URI, you append an asterisk (*) to the
base-name. For example, if you have two files named
fed-sample000002.csv, the bucket URI would be
This wildcard URI can then be used in the web UI, CLI, or API.
You can use only one wildcard for objects (filenames) within your bucket. The wildcard can appear inside the object name or at the end of the object name. Appending a wildcard to the bucket name is unsupported.
For Google Cloud Datastore backups, only one URI can be specified, and it must end
* wildcard character is not
allowed when creating external tables linked to Cloud Datastore backups or when
loading Cloud Datastore backup data from Cloud Storage.
To learn how to load data from Cloud Storage into BigQuery, see the documentation for your data format: