Introduction to external tables
This document describes how to work with data stored outside of BigQuery in external tables. To work with external data sources, you can also use External datasets.
Non-BigLake external tables let you query structured data in external data stores. To query a non-BigLake external table, you must have permissions to both the external table and the external data source. For example, to query a non-BigLake external table that uses a data source in Cloud Storage, you must have the following permissions:
bigquery.tables.getData
bigquery.jobs.create
storage.buckets.get
storage.objects.get
Supported data stores
You can use non-BigLake external tables with the following data stores:
Temporary table support
You can query an external data source in BigQuery by using a permanent table or a temporary table. A permanent table is a table that is created in a dataset and is linked to your external data source. Because the table is permanent, you can use access controls to share the table with others who also have access to the underlying external data source, and you can query the table at any time.
When you query an external data source using a temporary table, you submit a command that includes a query and creates a non-permanent table linked to the external data source. When you use a temporary table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.
Multiple source files
If you create a non-BigLake external table based on Cloud Storage, then you can use multiple external data sources, provided those data sources have the same schema. This isn't supported for non-BigLake external table based on Bigtable or Google Drive.
Limitations
The following limitations apply to external tables:
- BigQuery does not guarantee data consistency for external data tables. Changes to the underlying data while a query is running can result in unexpected behavior.
- Query performance for external tables might be slow compared to querying data in a standard BigQuery table. If query speed is a priority, load the data into BigQuery instead of setting up an external data source. The performance of a query that includes an external table depends on the external storage type. For example, querying data stored in Cloud Storage is faster than querying data stored in Google Drive. In general, the query performance for an external table should be equivalent to reading the data directly from the data source.
- You cannot modify external data tables using DML or other methods. External tables are read-only for BigQuery.
- You cannot use the
TableDataList
JSON API method to retrieve data from external tables. For more information, seetabledata.list
. To work around this limitation, you can save query results in a destination table. You can then use theTableDataList
method on the results table. - You cannot run a BigQuery job that exports data from an external table. To work around this limitation, you can save query results in a destination table. Then, run an export job against the results table.
- You cannot reference an external table in a wildcard table query.
- External tables don't support clustering. They support partitioning in limited ways. For details, see Querying externally partitioned data.
- When you query an external data source other than Cloud Storage, the results are not cached. (GoogleSQL queries on Cloud Storage are supported.) You are charged for each query against an external table even if you issue the same query multiple times. If you need to repeatedly issue a query against an external table that does not change frequently, consider writing the query results to a permanent table and run the queries against the permanent table instead.
- You are limited to 16 concurrent queries against a Bigtable external data source.
- A dry run of a federated query that uses an external table might report a lower bound of 0 bytes of data, even if rows are returned. This is because the amount of data processed from the external table can't be determined until the actual query completes. Running the federated query incurs a cost for processing this data.
- You can't use
_object_metadata
as a column name in external tables. It is reserved for internal use. - BigQuery doesn't support the display of table storage statistics for external tables.
- External tables don't support flexible column names.
Location considerations
When you choose a location for your data, consider the following:
Cloud Storage
You can interact with Cloud Storage data using BigQuery in the following ways:
- Query Cloud Storage data using BigLake or non-BigLake external tables
- Load Cloud Storage data into BigQuery
- Export data from BigQuery into Cloud Storage
Query Cloud Storage data
When you query data in Cloud Storage by using a BigLake or a non-BigLake external table, the data you query must be colocated with your BigQuery dataset. For example:
Single region bucket: If your BigQuery dataset is in the Warsaw (
europe-central2
) region, the corresponding Cloud Storage bucket must also be in the Warsaw region, or any Cloud Storage dual-region that includes Warsaw. If your BigQuery dataset is in theUS
multi-region, then Cloud Storage bucket can be in theUS
multi-region, the Iowa (us-central1
) single region, or any dual-region that includes Iowa. Queries from any other single region fails, even if the bucket is in a location that is contained within the multi-region of the dataset. For example, if the external tables are in theUS
multi-region and the Cloud Storage bucket is in Oregon (us-west1
), the job fails.If your BigQuery dataset is in the
EU
multi-region, then Cloud Storage bucket can be in theEU
multi-region, the Belgium (europe-west1
) single region, or any dual-region that includes Belgium. Queries from any other single region fails, even if the bucket is in a location that is contained within the multi-region of the dataset. For example, if the external tables are in theEU
multi-region and the Cloud Storage bucket is in Warsaw (europe-central2
), the job fails.Dual-region bucket: If your BigQuery dataset is in the Tokyo (
asia-northeast1
) region, the corresponding Cloud Storage bucket must be in the Tokyo region, or in a dual-region that includes Tokyo, like theASIA1
dual-region.If the Cloud Storage bucket is in the
NAM4
dual-region or any dual-region that includes the Iowa(us-central1
) region, the corresponding BigQuery dataset can be in theUS
multi-region or in the Iowa (us-central1
).If Cloud Storage bucket is in the
EUR4
dual-region or any dual-region that includes the Belgium(europe-west1
) region, the corresponding BigQuery dataset can be in theEU
multi-region or in the Belgium(europe-west1
).Multi-region bucket: Using multi-region dataset locations with multi-region Cloud Storage buckets is not recommended for external tables, because external query performance depends on minimal latency and optimal network bandwidth.
If your BigQuery dataset is in the
US
multi-region, the corresponding Cloud Storage bucket must be in theUS
multi-region, in a dual-region that includes Iowa (us-central1
), like theNAM4
dual-region, or in a custom dual-region that includes Iowa (us-central1
).If your BigQuery dataset is in the
EU
multi-region, the corresponding Cloud Storage bucket must be in theEU
multi-region, in a dual-region that includes Belgium (europe-west1
), like theEUR4
dual-region, or in a custom dual-region that includes Belgium.
For more information about supported Cloud Storage locations, see Bucket locations in the Cloud Storage documentation.
Load data from Cloud Storage
When you load data from Cloud Storage by using a BigLake or a non-BigLake external table, the data you load must be colocated with your BigQuery dataset.
You can load data from a Cloud Storage bucket located in any location if your BigQuery dataset is located in the
US
multi-region.- Multi-region bucket: If the
Cloud Storage bucket that you want to load from is located in a multi-region bucket, then your
BigQuery dataset can be in the same multi-region bucket or any single region that is included in the same multi-region bucket.
For example, if the Cloud Storage bucket is in the
EU
region, then your BigQuery dataset can be in theEU
multi-region or any single region in theEU
. Dual-region bucket: If the Cloud Storage bucket that you want to load from is located in a dual-region bucket, then your BigQuery dataset can be located in regions that are included in the dual-region bucket, or in a multi-region that includes the dual-region. For example, if your Cloud Storage bucket is located in the
EUR4
region, then your BigQuery dataset can be located in either the Finland (europe-north1
) single-region, the Netherlands (europe-west4
) single-region, or theEU
multi-region.Single region bucket: If your Cloud Storage bucket that you want to load from is in a single-region, your BigQuery dataset can be in the same single-region, or in the multi-region that includes the single-region. For example, if you Cloud Storage bucket is in the Finland (
europe-north1
) region, your BigQuery dataset can be in the Finland or theEU
multi-region.One exception is that if your BigQuery dataset is located in the
asia-northeast1
region, then your Cloud Storage bucket can be located in theEU
multi-region.
For more information, see Batch loading data.
Export data into Cloud Storage
Colocate your Cloud Storage buckets for exporting data:- If your BigQuery dataset is in the
EU
multi-region, the Cloud Storage bucket containing the data that you export must be in the same multi-region or in a location that is contained within the multi-region. For example, if your BigQuery dataset is in theEU
multi-region, the Cloud Storage bucket can be located in theeurope-west1
Belgium region, which is within the EU.If your dataset is in the
US
multi-region, you can export data into a Cloud Storage bucket in any location. - If your dataset is in a region, your Cloud Storage bucket must be in the same region. For
example, if your dataset is in the
asia-northeast1
Tokyo region, your Cloud Storage bucket cannot be in theASIA
multi-region.
For more information, see Exporting table data.
Bigtable
When you query data in Bigtable through a BigQuery external table, your Bigtable instance must be in the same location as your BigQuery dataset:
- Single region: If your BigQuery dataset is in the Belgium
(
europe-west1
) regional location, the corresponding Bigtable instance must be in the Belgium region. - Multi-region: Because external query performance depends on minimal latency and optimal network bandwidth, using multi-region dataset locations is not recommended for external tables on Bigtable.
For more information about supported Bigtable locations, see Bigtable locations.
Google Drive
Location considerations don't apply to Google Drive external data sources.
Data management
-
Develop a data management plan:
- If you choose a regional storage resource such as a BigQuery dataset or a Cloud Storage bucket, develop a plan for geographically managing your data.
Moving data between locations
To manually move a dataset from one location to another, follow these steps:
-
Export the data from your BigQuery tables to a Cloud Storage bucket in either the same location as your dataset or in a location contained within your dataset's location. For example, if your dataset is in the
EU
multi-region location, you could export your data to theeurope-west1
Belgium location, which is part of the EU.There are no charges for exporting data from BigQuery, but you do incur charges for storing the exported data in Cloud Storage. BigQuery exports are subject to the limits on export jobs.
-
Copy or move the data from your export Cloud Storage bucket to a new bucket you created in the destination location. For example, if you are moving your data from the
US
multi-region to theasia-northeast1
Tokyo region, you would transfer the data to a bucket that you created in Tokyo. For information about transferring Cloud Storage objects, see Copy, rename, and move objects in the Cloud Storage documentation.Transferring data between regions incurs network egress charges in Cloud Storage.
-
Create a new BigQuery dataset in the new location, and then load your data from the Cloud Storage bucket into the new dataset.
You are not charged for loading the data into BigQuery, but you will incur charges for storing the data in Cloud Storage until you delete the data or the bucket. You are also charged for storing the data in BigQuery after it is loaded. Loading data into BigQuery is subject to the load jobs limits.
You can also use Cloud Composer to move and copy large datasets programmatically.
For more information about using Cloud Storage to store and move large datasets, see Use Cloud Storage with big data.
Pricing
When querying an external table from BigQuery, you are charged for running the query and the applicable bytes read if using BigQuery on-demand (per TiB) pricing or slot consumption if using BigQuery capacity (per slot-hour) pricing.
If your data is stored in ORC or Parquet on Cloud Storage, see Data size calculation.
You are also charged for storing the data and any resources used by the source application, subject to the application's pricing guidelines:
- For information on Cloud Storage pricing, see Cloud Storage Pricing.
- For information on Bigtable pricing, see Pricing.
- For information on Drive pricing, see Pricing.
What's next
- Learn how to create a Bigtable external table.
- Learn how to create a Cloud Storage external table.
- Learn how to create a Drive external table.
- Learn how to schedule and run data quality checks with Dataplex.