Cross-cloud transfer from Azure Blob Storage
This document explains how to transfer data from the Azure Blob Storage into a US multi-regional
BigQuery standard table by using the
LOAD DATA query.
BigQuery Omni lets you analyze data stored in Amazon S3 bucket
and Azure Blob Storage using queries and joins. If you want to join
the data present in Amazon S3 or Azure Blob Storage with the data present in
Google Cloud regions or if you want to utilize
BigQuery ML capabilities,
LOAD DATA SQL statement. This SQL statement lets you transfer
data from BigQuery Omni accessible Azure Blob Storage into
BigQuery standard tables.
Quotas and limits
For information about quotas and limits, see load jobs quotas and limits.
You are billed for the bytes that are transferred across clouds. Cross-cloud transfer will also require slots in the BigQuery Omni regions to scan Amazon S3 and Azure Blob Storage files to load them. For pricing information, see BigQuery Omni Pricing.
Before you begin
To provide Google Cloud with read access to the files that need to be processed, ensure that you have a connection to access Azure Blob Storage data.
To get the permissions that you need to run the
LOAD DATA query,
ask your administrator to grant you the
BigQuery Data Editor (
roles/bigquery.dataEditor) IAM role on the dataset.
For more information about granting roles, see
This predefined role contains
the permissions required to run the
LOAD DATA query. To see the exact permissions that are
required, expand the Required permissions section:
For more information about IAM roles in BigQuery, see Predefined roles and permissions.
- Loading data into a destination table with ingestion time partition is not supported.
LOAD DATAjobs don't run on reservations. Jobs utilize on-demand slots that are managed by Google Cloud.
- You can only transfer data from the
AZURE_EASTUS2BigQuery Omni region to
- The connection and the destination dataset must belong to the same project. Loading data across projects is not supported.
To load data using the
LOAD DATAquery, your source data must be in one of the following formats:
- CSV (both compressed and uncompressed)
- JSON (both compressed and uncompressed)
Use the following
LOAD DATA [INTO|OVERWRITE] SQL
to load data into BigQuery from Azure Blob Storage:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement after replacing the variables with the values:
LOAD DATA INTO [project_name.]dataset_name.]table_name (column_list ,...) [PARTITION BY partition_expression, ...] [CLUSTER BY clustering_column_list, ...] [OPTIONS (table_option_list)] FROM FILES(external_table_optional_list) WITH CONNECTION connection_name;
For more information about how to run queries, see Running interactive queries.
The following sample SQL query loads a parquet file named
test_parquet table with an auto-detect schema:
LOAD DATA INTO `mydataset.testparquet` FROM FILES (uris = ['azure://test.blob.core.windows.net/container/sample.parquet'], format = 'PARQUET') WITH CONNECTION `azure-eastus2.test-connection`
The following sample SQL query loads a CSV file with the prefix
test_csv table with predefined column partitioning by time:
LOAD DATA INTO `mydataset.test_csv` (Number INT64, Name STRING, Time DATE) PARTITION BY Time FROM FILES (format = 'CSV', uris = ['azure://test.blob.core.windows.net/container/sampled*'], skip_leading_rows=1) WITH CONNECTION `azure-eastus2.test-connection`
The following sample SQL query overwrite existing table
data from file named
sample.parquet with an auto-detect schema:
LOAD DATA OVERWRITE `mydataset.testparquet` FROM FILES (uris = ['azure://test.blob.core.windows.net/container/sample.parquet'], format = 'PARQUET') WITH CONNECTION `azure-eastus2.test-connection`
- Avoid transferring multiple small files that are less than 5 MB. Instead, create an external table on your file and export query results to Azure Blob Storage to create a larger file. This method helps to improve the transfer time of your data.
- If your source data is in a gzip-compressed file, then while creating external tables, set the