Introduction to external data sources

This page provides an overview of querying data stored outside of BigQuery.

An external data source is a data source that you can query directly from BigQuery, even though the data is not stored in BigQuery storage. For example, you might have data in a different Google Cloud database, in files in Cloud Storage, or in a different cloud product altogether that you would like to analyze in BigQuery, but that you aren't prepared to migrate.

Use cases for external data sources include the following:

For extract-load-transform (ELT) workloads, loading and cleaning your data in one pass and writing the cleaned result into BigQuery storage, by using a CREATE TABLE ... AS SELECT query.
Joining BigQuery tables with frequently changing data from an external data source. By querying the external data source directly, you don't need to reload the data into BigQuery storage every time it changes.

BigQuery has two different mechanisms for querying external data: external tables and federated queries.

External tables

External tables are similar to standard BigQuery tables, in that these tables store their metadata and schema in BigQuery storage. However, their data resides in an external source.

External tables are contained inside a dataset, and you manage them in the same way that you manage a standard BigQuery table. For example, you can view the table's properties, set access controls, and so forth. You can query these tables and in most cases you can join them with other tables.

There are four kinds of external tables:

BigLake tables
BigQuery Omni tables
Object tables
Non-BigLake external tables

BigLake tables

BigLake tables let you query structured data in external data stores with access delegation. Access delegation decouples access to the BigLake table from access to the underlying data store. An external connection associated with a service account is used to connect to the data store. Because the service account handles retrieving data from the data store, you only have to grant users access to the BigLake table. This lets you enforce fine-grained security at the table level, including row-level and column-level security. For BigLake tables based on Cloud Storage, you can also use dynamic data masking. To learn more about multi-cloud analytic solutions using BigLake tables with Amazon S3 or Blob Storage data, see BigQuery Omni.

For more information, see Introduction to BigLake tables.

Object tables

Object tables let you analyze unstructured data in Cloud Storage. You can perform analysis with remote functions or perform inference by using BigQuery ML, and then join the results of these operations with the rest of your structured data in BigQuery.

Like BigLake tables, object tables use access delegation, which decouples access to the object table from access to the Cloud Storage objects. An external connection associated with a service account is used to connect to Cloud Storage, so you only have to grant users access to the object table. This lets you enforce row-level security and manage which objects users have access to.

For more information, see Introduction to object tables.

Non-BigLake external tables

Non-BigLake external tables let you query structured data in external data stores. To query a non-BigLake external table, you must have permissions to both the external table and the external data source. For example, to query a non-BigLake external table that uses a data source in Cloud Storage, you must have the following permissions:

bigquery.tables.getData
bigquery.jobs.create
storage.buckets.get
storage.objects.get

For more information, see Introduction to external tables.

Federated queries

Federated queries let you send a query statement to AlloyDB, Spanner, or Cloud SQL databases and get the result back as a temporary table. Federated queries use the BigQuery Connection API to establish a connection with AlloyDB, Spanner, or Cloud SQL. In your query, you use the EXTERNAL_QUERY function to send a query statement to the external database, using that database's SQL dialect. The results are converted to GoogleSQL data types.

For more information, see Introduction to federated queries.

External data source feature comparison

The following table compares the behavior of external data sources:

	BigLake tables	Object tables	Non-BigLake external tables	Federated queries
Uses access delegation	Yes, through a service account	Yes, through a service account	No	Yes, through a database user account (Cloud SQL only)
Can be based on multiple source URIs	Yes	Yes	Yes (Cloud Storage only)	Not applicable
Row mapping	Rows represent file content	Rows represent file metadata	Rows represent file content	Not applicable
Accessible by other data processing tools by using connectors	Yes (Cloud Storage only)	No	Yes	Not applicable
Can be joined to other BigQuery tables	Yes (Cloud Storage only)	Yes	Yes	Yes
Can be accessed as a temporary table	Yes (Cloud Storage only)	No	Yes	Yes
Works with Amazon S3	Yes	No	No	No
Works with Azure Storage	Yes	No	No	No
Works with Bigtable	No	No	Yes	No
Works with Spanner	No	No	No	Yes
Works with Cloud SQL	No	No	No	Yes
Works with Google Drive	No	No	Yes	No
Works with Cloud Storage	Yes	Yes	Yes	No

What's next

Learn more about BigLake tables.
Learn more about object tables
Learn more about external tables.
Learn more about federated queries.
Learn about BigQuery pricing.