This page provides an overview of querying data stored outside of BigQuery.
An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery. Instead of loading or streaming the data, you create a table that references the external data source.
BigQuery offers support for querying data directly from:
Use cases for external data sources include:
- Loading and cleaning your data in one pass by querying the data from an external data source (a location external to BigQuery) and writing the cleaned result into BigQuery storage.
- Having a small amount of frequently changing data that you join with other tables. As an external data source, the frequently changing data does not need to be reloaded every time it is updated.
External data source limitations
External data source limitations include the following:
- BigQuery does not guarantee data consistency for external data sources. Changes to the underlying data while a query is running can result in unexpected behavior.
- Query performance for external data sources may not be as high as querying data in a native BigQuery table. If query speed is a priority, load the data into BigQuery instead of setting up an external data source. The performance of a query that includes an external data source depends on the external storage type. For example, querying data stored in Google Cloud Storage is faster than querying data stored in Google Drive. In general, query performance for external data sources should be equivalent to reading the data directly from the external storage.
You cannot use the
TableDataListJSON API method to retrieve data from tables that reside in an external data source. For more information, see Tabledata: list.
To work around this limitation, you can save query results in a destination table. You can then use the
TableDataListmethod on the results table.
You cannot run a BigQuery job that exports data from an external data source.
To work around this limitation, you can save query results in a destination table. You can then run an export job against the results table.
You cannot reference an external data source in a wildcard table query.
- Currently, you cannot query external data stored in Parquet or ORC format.
- When you query an external data source, the results are not cached. You are charged for each query against an external table even if you issue the same query multiple times. If you need to repeatedly issue a query against an external table that does not change frequently, consider writing the query results to a permanent table and run the queries against the permanent table instead.
- You cannot create an external table that references data stored in a region different from the dataset's location. For example, if you are creating an external table in a US-based dataset, the Cloud Storage bucket containing the data must be a regional or multi-regional bucket in the US.
- Querying data in Cloud Bigtable is currently only available in the
following regions and zones:
Region Zone(s) us-central1
- You are limited to 4 concurrent queries against a Cloud Bigtable external data source.
External data source quotas
The limits for external data sources are the same as the limits for load jobs, as described in the Load jobs section on the Quota Policy page.
External data source pricing
When querying an external data source from BigQuery, you are charged for the number of bytes read by the query. For more information, see Query pricing.
You are also charged for storing the data and any resources used by the source application, subject to the application's pricing guidelines:
- For information on Google Cloud Storage pricing, see Cloud Storage Pricing.
- For information on Google Cloud Bigtable pricing, see Pricing.
- For information on Google Drive pricing, see Pricing.