This page provides an overview of using external tables to query data stored outside of BigQuery.
An external table is a table that acts like a standard BigQuery table. The table metadata, including the table schema, is stored in BigQuery storage, but the data itself resides in the external source.
External tables can be temporary or permanent. A permanent external table is contained inside a dataset, and you manage it in the same way that you manage a standard BigQuery table. For example, you can view the table properties, set access controls, and so forth. You can query the table and join it with other tables.
You can use external tables with the following data sources:
External table limitations
External data source limitations include the following:
- BigQuery does not guarantee data consistency for external data tables. Changes to the underlying data while a query is running can result in unexpected behavior.
- Query performance for external tables may not be as high as querying data in a native BigQuery table. If query speed is a priority, load the data into BigQuery instead of setting up an external data source. The performance of a query that includes an external table depends on the external storage type. For example, querying data stored in Cloud Storage is faster than querying data stored in Google Drive. In general, the query performance for an external table should be equivalent to reading the data directly from the data source.
- You cannot modify external data tables using DML or other methods. External tables are read-only for BigQuery.
You cannot use the
TableDataListJSON API method to retrieve data from external tables. For more information, see
To work around this limitation, you can save query results in a destination table. You can then use the
TableDataListmethod on the results table.
You cannot run a BigQuery job that exports data from an external table.
To work around this limitation, you can save query results in a destination table. You can then run an export job against the results table.
You cannot reference an external table in a wildcard table query.
External tables do not support clustering. They support partitioning in limited ways. For details, see Querying externally partitioned data.
When you query an external data source other than Cloud Storage, the results are not cached. (BigQuery standard SQL queries on Cloud Storage are supported.) You are charged for each query against an external table even if you issue the same query multiple times. If you need to repeatedly issue a query against an external table that does not change frequently, consider writing the query results to a permanent table and run the queries against the permanent table instead.
You are limited to 4 concurrent queries against a Cloud Bigtable external data source.
A dry run of a federated query that uses an external table might report a lower bound of 0 bytes of data, even if rows are returned. This is because the amount of data processed from the external table can't be determined until the actual query completes. Running the federated query will still incur a cost for processing this data.
When you choose a location for your data, consider the following:
- Colocate your BigQuery dataset when using external data sources.
- Cloud Storage: When you query data in Cloud Storage through a BigQuery external table, the data you query must be colocated with your BigQuery dataset. For example:
- Single region: If your BigQuery dataset is in the Warsaw (`europe-central2`) region, the corresponding Cloud Storage bucket must also be in the Warsaw region because there is currently no Cloud Storage dual-region that includes Warsaw.
- Dual-region: If your BigQuery dataset is in the Tokyo (`asia-northeast1`) region, the corresponding Cloud Storage bucket must be a bucket in the Tokyo region or in the `ASIA1` dual-region (which includes Tokyo).
- Multi-region: Because external query performance depends on minimal latency and optimal network bandwidth, using multi-region dataset locations with multi-region Cloud Storage buckets is not recommended for external tables.
- Cloud Bigtable: When you query data in
Cloud Bigtable through a BigQuery
external table, your
Cloud Bigtable instance must be in the same location as your BigQuery dataset.
- Single region: If your BigQuery dataset is in the Belgium (europe-west1) regional location, the corresponding Cloud Bigtable instance must be in the Belgium region.
- Multi-region: Because external query performance depends on minimal latency and optimal network bandwidth, using multi-region dataset locations is not recommended for external tables on Cloud Bigtable.
- Google Drive: Location considerations do not apply to Google Drive external data sources.
- Develop a data management plan.
- If you choose a regional storage resource such as a BigQuery dataset or a Cloud Storage bucket, develop a plan for geographically managing your data.
For more information on Cloud Storage locations, see Bucket Locations in the Cloud Storage documentation.
To manually move a dataset from one location to another, follow this process:
Export the data from your BigQuery tables to a Cloud Storage bucket in either the same location as your dataset or in a location contained within your dataset's location. For example, if your dataset is in the `EU` multi-region location, you could export your data to the `europe-west1` Belgium location, which is part of the EU.
Copy or move the data from your export Cloud Storage bucket to a new bucket you created in the destination location. For example, if you are moving your data from the `US` multi-region to the `asia-northeast1` Tokyo region, you would transfer the data to a bucket you created in Tokyo. For information on transferring Cloud Storage objects, see Copying, renaming, and moving objects in the Cloud Storage documentation.
Note that transferring data between regions incurs network egress charges in Cloud Storage.
After you transfer the data to a Cloud Storage bucket in the new location, create a new BigQuery dataset (in the new location). Then, load your data from the Cloud Storage bucket into BigQuery.
You are not charged for loading the data into BigQuery, but you will incur charges for storing the data in Cloud Storage until you delete the data or the bucket. You are also charged for storing the data in BigQuery after it is loaded. Loading data into BigQuery is subject to the limits on load jobs.
You can also use Cloud Composer to move and copy large datasets programmatically.
For more information on using Cloud Storage to store and move large datasets, see Using Cloud Storage with big data.
External data source pricing
When querying an external table from BigQuery, you are charged for running the query.
If your data is stored in ORC or Parquet on Cloud Storage, see Data size calculation.
You are also charged for storing the data and any resources used by the source application, subject to the application's pricing guidelines:
- For information on Cloud Storage pricing, see Cloud Storage Pricing.
- For information on Cloud Bigtable pricing, see Pricing.
- For information on Google Drive pricing, see Pricing.
- Learn how to query data stored in Cloud Bigtable
- Learn how to query data stored in Cloud Storage
- Learn how to query data stored in Google Drive