Create an external table for Azure Storage data
BigQuery Omni does not manage data stored in Azure Storage. To access Azure Storage data, define an external table.
Create a dataset
To create a dataset, use the Google Cloud console, the bq
command-line tool, or the client
library:
Google Cloud console
In the Google Cloud console, open the BigQuery page.
In the navigation panel, in the Resources section, select your project.
In the details panel, click Create dataset.
On the Create dataset page, enter the following information:
- For Dataset ID, enter a unique dataset name.
For Data location, choose
azure-eastus2
.For Default table expiration, choose one of the following options:
- Never: (Default) Tables that you create in the dataset are never automatically deleted. You must delete them manually.
- Number of days after table creation: This value determines when a newly created table in the dataset is deleted. This value is applied if you do not set a table expiration when the table is created. Data in Azure is not deleted when the table expires.
Click Create dataset.
bq
To create a new dataset, use the bq mk
command with the --location
flag.
Optional parameters include --default_table_expiration
and --description
.
To create a dataset in a project other than your default project, add the
project ID to the dataset name in the following format:
PROJECT_ID:DATASET_NAME
.
bq --location=LOCATION mk \ --dataset \ --default_table_expiration INTEGER1 \ --description DESCRIPTION \ PROJECT_ID:DATASET_NAME
Replace the following:
LOCATION
: The dataset's location. Replace withazure-eastus2
. After you create a dataset, you can't change its location. You can set a default value for the location by using the.bigqueryrc
file.INTEGER1
: The default lifetime (in seconds) for newly created tables. The minimum value is3600
(1 hour). The expiration time evaluates to the current time plus the integer value. Any table created in the dataset is deletedINTEGER1
seconds after the table's creation time. If you do not set a table expiration when you create the table, then this value is applied.DESCRIPTION
: A description of the dataset in quotation marks.PROJECT_ID
: Your project ID.DATASET_NAME
: The name of the dataset that you're creating.
Create an external table
Required permissions
To create an external table in BigQuery, you need the following Identity and Access Management (IAM) permissions:
bigquery.tables.create
bigquery.tables.getData
bigquery.jobs.create
bigquery.connections.delegate
The predefined IAM role roles/bigquery.admin
includes the permissions that you need to create an external table. For more information, see Access control with IAM.
Creating an external table
To create an external table, follow these steps:
Google Cloud console
In the Google Cloud console, open the BigQuery page.
In the navigation panel, in the Resources section, expand your project and select a dataset.
Expand the
Actions option and click Open.In the details panel, click Create table
.On the Create table page, in the Source section, do the following:
- For Create table from, select Azure Blob Storage.
For Select Azure Blob Storage path, enter an Azure Blob Storage path using the following format:
azure://AZURE_STORAGE_ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/FILE_PATH
Replace the following:
AZURE_STORAGE_ACCOUNT_NAME
: The name of the Azure Storage account. The account's region should be the same as the dataset's region.CONTAINER_NAME
: The name of the Azure container.FILE_PATH
: The data path that points to the Azure Storage data. For example, for a single CSV file,FILE_PATH
can bemyfile.csv
.
For File format, select the data format in Azure. Supported formats are AVRO, PARQUET, ORC, CSV, NEWLINE_DELIMITED_JSON, and Sheets.
Optional: If the data in Azure Blob Storage uses hive partitioning, then do the following:
- Select the Source data partitioning checkbox.
- In the Select Azure Blob Storage path field, enter the Azure Blob
Storage path using wildcard format. For example,
azure://account_name.blob.core.windows.net/container/*.
- In the Select Source URI Prefix field, enter the URI prefix. The
URI prefix is the part of the URI that appears before the partition
encoding. For example,
azure://account_name.blob.core.windows.net/container/
. - Select a Partition inference mode. If you select Provide my own, then enter schema information for the partition keys.
On the Create table page, in the Destination section, do the following:
- For Dataset name, choose the appropriate dataset.
- In the Table name field, enter the name of the table that you're creating.
- Verify that Table type is set to External table.
- For Connection ID, choose the appropriate connection ID from the dropdown.
Click Create table.
SQL
To create an external table, use the
CREATE EXTERNAL TABLE
statement:
CREATE EXTERNAL TABLE DATASET_NAME.TABLE_NAME WITH CONNECTION `AZURE_LOCATION.CONNECTION_NAME` OPTIONS( format="DATA_FORMAT", uris=["abs://AZURE_STORAGE_ACCOUNT_NAME/CONTAINER_NAME/FILE_PATH"] )
Replace the following:
DATASET_NAME
: the name of the dataset you created.TABLE_NAME
: the name you want to give to this table.AZURE_LOCATION
: an Azure location in Google Cloud (for example,azure-eastus2
).CONNECTION_NAME
: the name of the connection you created.DATA_FORMAT
: any of the supported BigQuery federated formats (such asAVRO
orCSV
).AZURE_STORAGE_ACCOUNT_NAME
: the name of the Azure Storage account.CONTAINER_NAME
: the name of the Azure container.FILE_PATH
: the data path that points to the Azure Storage data.
Example:
CREATE EXTERNAL TABLE absdataset.abstable WITH CONNECTION `azure-eastus2.abs-read-conn` OPTIONS(format="CSV", uris=["abs://account/container/path/file.csv"])
bq
Create a table definition file:
bq mkdef \ --source_format=DATA_FORMAT \ --connection_id=AZURE_LOCATION.CONNECTION_NAME \ "azure://AZURE_STORAGE_ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/FILE_PATH" > table_def
Replace the following:
DATA_FORMAT
: any of the supported BigQuery federated formats (such asAVRO
,CSV
, orPARQUET
). Make sure that the format is written in all caps.AZURE_LOCATION
: an Azure location in Google Cloud (for example,azure-eastus2
).CONNECTION_NAME
: the name of the connection that you created.AZURE_STORAGE_ACCOUNT_NAME
: the name of the Azure Storage account.CONTAINER_NAME
: the name of the Azure container.FILE_PATH
: the data path that points to the Azure Storage data.
Next, create the external table:
bq mk --external_table_definition=table_def DATASET_NAME.TABLE_NAME
Replace the following:
DATASET_NAME
: the name of the dataset that you created.TABLE_NAME
: the name that you want to give to this table.
For example, the following commands create a new external table,
my_dataset.my_table
, which can query your Azure Storage data that's stored
at the path azure://account_name.blob.core.windows.net/container/path
and
has a read connection in the location azure-eastus2
:
bq mkdef \ --source_format=AVRO \ --connection_id=azure-eastus2.read-conn \ "azure://account_name.blob.core.windows.net/container/path" > table_def bq mk \ --external_table_definition=table_def my_dataset.my_table
Hive partitioning
If the data in Azure Blob Storage uses hive partitioning, then create the table definition file as follows:
bq mkdef \ --source_format=DATA_FORMAT \ --connection_id=AZURE_LOCATION.CONNECTION_NAME \ --hive_partitioning_mode=HIVE_PARTITIONING_MODE \ --hive_partitioning_source_uri_prefix=HIVE_PARTITIONING_URI_PRFEFIX "azure://AZURE_STORAGE_ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/FILE_PATH" > table_def
Replace the following:
HIVE_PARTITIONING_MODE
: The partition schema detection mode.HIVEPARTITIONING
: The URI prefix, which is the part of the URI that appears before the partition encoding. For example,azure://account_name.blob.core.windows.net/container/
.
Query the table
BigQuery Omni lets you query the external table like any BigQuery table. The maximum result size for interactive queries is 2 MB. If your query result is larger, consider exporting it to Azure Storage. The query result is stored in a BigQuery anonymous table (prod spanner).
To run a query, use the Google Cloud console or bq
command-line tool.
Google Cloud console
In the Google Cloud console, open the BigQuery page.
Enter a standard SQL query in the Query editor box. Standard SQL is the default syntax in the Google Cloud console.
SELECT * FROM DATASET_NAME.TABLE_NAME
Replace the following:
DATASET_NAME
: the name of the dataset that you created.TABLE_NAME
: the name of the external table that you created.
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag.
To run the query, enter the following:
bq query --nouse_legacy_sql \ SELECT * FROM DATASET_NAME.TABLE_NAME
Replace the following:
DATASET_NAME
: the name of the dataset that you created.TABLE_NAME
: the name of the external table that you created.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
View resource metadata with INFORMATION_SCHEMA
When you query the JOBS_BY_*
,
JOBS_TIMELINE_BY_*
, and
RESERVATION*
BigQuery INFORMATION_SCHEMA
views, you must specify the query's processing location as us-east4 (Northern
Virginia) that is collocated with the table's region. The job fails if the
location is not specified. For all other system tables, specifying the query job
location is optional.
For information about the system tables that BigQuery Omni supports, see Limitations.
To query JOBS_*
and RESERVATION*
system tables, specify the processing
location as follows:
Console
Go to the BigQuery page in the Google Cloud console.
If the Editor tab isn't visible, then click
Compose new query.Click More > Query settings. The Query settings dialog opens.
In the Query settings dialog, for Additional settings > Data location, select us-east4 (Northern Virginia).
Select the remaining fields and click Save.
bq
Use the --location
flag to set the job's processing location to us-east4
.
Example
bq --use_legacy_sql=false --location=us-east4 \
"SELECT * FROM region-azure-eastus2.INFORMATION_SCHEMA.JOBS limit 10;"
API
If you are running jobs programmatically,
ensure that you set the location
argument to us-east4
.
What's next
- To learn about saving query results, see Export query results to Azure Storage.