This page describes how to create a table definition file for an external data source. An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery.
Table definition files
A table definition file contains an external table's schema definition and metadata, such as the table's data format and related properties. When you create a table definition file, you can use schema auto-detection to define the schema for an external data source, you can provide the schema inline (on the command line), or you can provide a JSON file containing the schema definition.
Table definition files are used with the BigQuery CLI. The
properties in a table definition file also apply to creating an
ExternalDataConfiguration
when you use the REST API. You do not use table definition files when you create
an external table by using the Cloud Console or the classic BigQuery web UI.
You can create table definition files for the following external data sources:
Cloud Storage
Google Drive
- Comma-separated values (CSV)
- Newline-delimited JSON
- Avro files
- Google Sheets
Cloud Bigtable
Before you begin
To create a table definition file, you need the URI for your data source:
- For a Google Drive data source, you need the Google Drive URI
- For a Cloud Storage data source, you need the Cloud Storage URI
- For a Cloud Bigtable data source, you need the Cloud Bigtable URI
Permanent versus temporary external tables
You can query an external data source in BigQuery by using a permanent table or a temporary table. A permanent table is a table that is created in a dataset and is linked to your external data source. Because the table is permanent, you can use dataset-level access controls to share the table with others who also have access to the underlying external data source, and you can query the table at any time.
When you query an external data source using a temporary table, you submit a command that includes a query and creates a non-permanent table linked to the external data source. When you use a temporary table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.
You can use a table definition file to describe a permanent or temporary external table.
Creating a table definition using schema auto-detect
If you specify a CSV, JSON, or Google Sheets file without including an inline
schema description or a schema file, you can use the --autodetect
flag
to set the "autodetect"
option to true
in the table definition file. When
auto-detect is enabled, BigQuery makes a best-effort attempt to
automatically infer the schema. See
Schema auto-detection in BigQuery
for more information.
You can use a schema auto-detect when creating table definitions for:
- JSON files stored in Cloud Storage or Google Drive
- CSV files stored in Cloud Storage or Google Drive
- Google Sheets files stored in Google Drive
Cloud Storage
To use the CLI to create a table definition for a Cloud Storage data source:
Use the command-line tool's
mkdef
command with the--autodetect
flag to create a table definition. Themkdef
command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file:/tmp/file_name
.In the command, replace:
- source_format with your file format:
NEWLINE_DELIMITED_JSON
,CSV
, orGOOGLE_SHEETS
. - file_name with the name of your table definition file.
- bucket_uri with your Cloud Storage URI,
for example,
gs://mybucket/myfile
.
bq mkdef \ --autodetect \ --source_format=source_format \ "bucket_uri" > /tmp/file_name
- source_format with your file format:
(Optional) Open the table definition file in a text editor. For example, the command
nano /tmp/file_name
opens the file in nano. The file should look like the following for a CSV external data source. Notice"autodetect"
is set totrue
.{ "autodetect": true, "csvOptions": { "allowJaggedRows": false, "allowQuotedNewlines": false, "encoding": "UTF-8", "fieldDelimiter": ",", "quote": "\"", "skipLeadingRows": 0 }, "sourceFormat": "CSV", "sourceUris": [ "bucket_uri" ] }
(Optional) Manually edit the table definition file to modify, add, or delete general settings such as
maxBadRecords
andignoreUnknownValues
. There are no configuration settings that are specific to JSON source files, but there are settings that apply to CSV and Google Sheets files. For more information, seeExternalDataConfiguration
in the API reference.
Externally partitioned data
To configure a table definition file for externally partitioned data on Cloud Storage, see Querying externally partitioned data.
Google Drive
To use the CLI to create a table definition for a Google Drive data source:
Use the command-line tool's
mkdef
command with the--autodetect
flag to create a table definition. Themkdef
command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file:/tmp/file_name
.In the command, replace:
- source_format with your file format:
NEWLINE_DELIMITED_JSON
,CSV
, orGOOGLE_SHEETS
. - file_name with the name of your table definition file.
- drive_uri with your Google Drive URI,
for example,
https://drive.google.com/open?id=123ABCD123AbcD123Abcd
.
bq mkdef \ --autodetect \ --source_format=source_format \ "drive_uri" > /tmp/file_name
- source_format with your file format:
Open the table definition file in a text editor. For example, the command
nano /tmp/file_name
opens the file in nano. The file should look like the following for a Google Sheets external data source. Notice"autodetect"
is set totrue
.{ "autodetect": true, "sourceFormat": "GOOGLE_SHEETS", "sourceUris": [ "drive_uri" ] }
(Optional) Manually edit the table definition file to modify, add, or delete general settings such as
maxBadRecords
andignoreUnknownValues
. There are no configuration settings that are specific to JSON source files, but there are settings that apply to CSV and Google Sheets files. For more information, seeExternalDataConfiguration
in the API reference.To specify a particular sheet or a cell range in a Google Sheets file, add the
range
property to the table definition file. To query a particular sheet, specifiy the sheet name. To query a cell range, specify the range in the form:sheet_name!top_left_cell_id:bottom_right_cell_id
, for example,"Sheet1!A1:B20"
. If therange
parameter is not specified, the first sheet in the file is used.
Creating a table definition using an inline schema
If you do not want to use schema auto-detect, you can create a table definition
file by providing an inline schema definition. To provide an inline schema
definition, list the fields and data types on the command line in the following
format: field:data_type,field:data_type
.
You can use an inline schema definition when creating table definition files for:
- JSON files stored in Cloud Storage or Google Drive
- CSV files stored in Cloud Storage or Google Drive
- Google Sheets files stored in Google Drive
To use the CLI to create a table definition for a Cloud Storage data source by using an inline schema definition:
Use the command-line tool's
mkdef
command with the--noautodetect
flag to create a table definition. Themkdef
command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file:/tmp/file_name
.In the command, replace:
- source_format with your file format:
NEWLINE_DELIMITED_JSON
,CSV
, orGOOGLE_SHEETS
. - uri with your Cloud Storage URI
or your Google Drive URI. For
example,
https://drive.google.com/open?id=123ABCD123AbcD123Abcd
for Google Drive orgs://mybucket/myfile
for Cloud Storage. field:data_type,field:data_type
with your schema definition, for example,Name:STRING,Address:STRING, ...
.- file_name with the name of your table definition file.
bq mkdef \ --noautodetect \ --source_format=source_format \ "uri" \ field:data_type,field:data_type > /tmp/file_name
- source_format with your file format:
(Optional) Open the table definition file in a text editor. For example, the command
nano /tmp/file_name
opens the file in nano. The file should look like the following. Notice"autodetect"
is not enabled, and the schema information is written to the table definition file.{ "schema": { "fields": [ { "name": "field", "type": "data_type" }, { "name": "field", "type": "data_type" } ... ] }, "sourceFormat": "NEWLINE_DELIMITED_JSON", "sourceUris": [ "uri" ] }
(Optional) Manually edit the table definition file to modify, add, or delete general settings such as
maxBadRecords
andignoreUnknownValues
. There are no configuration settings that are specific to JSON source files, but there are settings that apply to CSV and Google Sheets files. For more information, seeExternalDataConfiguration
in the API reference.
Creating a table definition using a JSON schema file
If you do not want to use auto-detect or provide an inline schema definition, you can create a JSON schema file and reference it when creating your table definition file. You must create the JSON schema file manually, and it must be on your local machine. Referencing a JSON schema file stored in Cloud Storage or in Google Drive is not supported.
You can use a JSON schema file when creating table definitions for:
- JSON files stored in Cloud Storage or Google Drive
- CSV files stored in Cloud Storage or Google Drive
- Google Sheets files stored in Google Drive
Cloud Storage
To use the CLI to create a table definition for a Cloud Storage data source using a JSON schema file:
Use the command-line tool's
mkdef
command with the--noautodetect
flag to create a table definition. Themkdef
command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file:/tmp/file_name
.In the command, replace:
- source_format with your file format:
NEWLINE_DELIMITED_JSON
,CSV
, orGOOGLE_SHEETS
. - file_name with the name of your table definition file.
- bucket_uri with your
Cloud Storage URI, for
example,
gs://mybucket/myfile
. - path_to_schema_file with the location of the JSON schema file on your local machine.
bq mkdef \ --noautodetect \ --source_format=source_format \ "bucket_uri" \ path_to_schema_file > /tmp/file_name
- source_format with your file format:
(Optional) Open the table definition file in a text editor. For example, the command
nano /tmp/file_name
opens the file in
nano. The file should look like the following. Notice"autodetect"
is not enabled, and the schema information is written to the table definition file.{ "schema": { "fields": [ { "name": "field", "type": "data_type" }, { "name": "field", "type": "data_type" } ... ] }, "sourceFormat": "NEWLINE_DELIMITED_JSON", "sourceUris": [ "bucket_uri" ] }
(Optional) Manually edit the table definition file to modify, add, or delete general settings such as
maxBadRecords
andignoreUnknownValues
. There are no configuration settings that are specific to JSON source files, but there are settings that apply to CSV and Google Sheets files. For more information, seeExternalDataConfiguration
in the API reference.
Google Drive
To use the CLI to create a table definition for a Google Drive data source using a JSON schema file:
Use the command-line tool's
mkdef
command with the--noautodetect
flag to create a table definition. Themkdef
command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file:/tmp/file_name
.In the command, replace:
- source_format with your file format:
NEWLINE_DELIMITED_JSON
,CSV
, orGOOGLE_SHEETS
. - drive_uri with your Google Drive URI,
for example,
https://drive.google.com/open?id=123ABCD123AbcD123Abcd
. - path_to_schema_file with the location of the JSON schema file on your local machine.
- file_name with the name of your table definition file.
bq mkdef \ --noautodetect \ --source_format=source_format \ "drive_uri" \ path_to_schema_file > /tmp/file_name
- source_format with your file format:
Open the table definition file in a text editor. For example, the command
nano /tmp/file_name
opens the file in nano. The file should look like the following. Notice"autodetect"
is not enabled, and the schema information is written to the table definition file.{ "schema": { "fields": [ { "name": "field", "type": "data_type" }, { "name": "field", "type": "data_type" } ... ] }, "sourceFormat": "GOOGLE_SHEETS", "sourceUris": [ "drive_uri" ] }
(Optional) Manually edit the table definition file to modify, add, or delete general settings such as
maxBadRecords
andignoreUnknownValues
. There are no configuration settings that are specific to JSON source files, but there are settings that apply to CSV and Google Sheets files. For more information, seeExternalDataConfiguration
in the API reference.To specify a particular sheet or a cell range in a Google Sheets file, add the
range
property to the table definition file. To query a particular sheet, specifiy the sheet name. To query a cell range, specify the range in the form:sheet_name!top_left_cell_id:bottom_right_cell_id
, for example,"Sheet1!A1:B20"
. If therange
parameter is not specified, the first sheet in the file is used.
Creating Avro table definitions
If you use an Avro file as an external data source, BigQuery automatically retrieves the schema using the source data. When creating a table definition for Avro files, you do not need to use schema auto-detect, and you do not need to provide an inline schema definition or schema file.
You can create a table definition file for Avro data stored in Cloud Storage or Google Drive.
To use the CLI to create a table definition file for Avro data:
Use the command-line tool's
mkdef
command to create a table definition. You do not need to use the--noautodetect
flag with Avro files. Schema auto-detect is disabled for Avro files. Themkdef
command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file:/tmp/file_name
.In the command, replace:
- uri with your Cloud Storage URI
or your Google Drive URI. For
example,
https://drive.google.com/open?id=123ABCD123AbcD123Abcd
for Google Drive orgs://mybucket/myfile
for Cloud Storage. - file_name with the name of your table definition file.
bq mkdef \ --source_format=AVRO \ "uri" > /tmp/file_name
- uri with your Cloud Storage URI
or your Google Drive URI. For
example,
(Optional) Open the table definition file in a text editor. For example, the command
nano /tmp/file_name
opens the file in nano. The file should look like the following. Notice there is no need for the"autodetect"
setting.{ "sourceFormat": "AVRO", "sourceUris": [ "uri" ] }
(Optional) Manually edit the table definition file to modify, add, or delete general settings such as
maxBadRecords
andignoreUnknownValues
. There are no configuration settings that are specific to Avro source files, but there are settings that apply to CSV and Google Sheets files. For more information, seeExternalDataConfiguration
in the API reference.
Creating Datastore and Cloud Firestore export table definitions
If you use a Datastore or Cloud Firestore export as an external data source, BigQuery automatically retrieves the schema using the self-describing source data. When creating a table definition for Datastore and Cloud Firestore backup files, you do not need to use schema auto-detect, and you do not need to provide an inline schema definition or schema file.
You can create a table definition file for Datastore and Cloud Firestore export data stored in Cloud Storage.
To use the CLI to create a table definition file for Datastore or Cloud Firestore export data:
Use the command-line tool's
mkdef
command to create a table definition. You do not need to use the--noautodetect
flag with Datastore or Cloud Firestore backup files. Schema auto-detect is disabled for these file types. Themkdef
command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file:/tmp/file_name
.In the command, replace:
- bucket_uri with your Cloud Storage URI.
- file_name with the name of your table definition file.
Note that the
DATASTORE_BACKUP
source format is used for both Datastore and Cloud Firestore.bq mkdef \ --source_format=DATASTORE_BACKUP \ "uri" > /tmp/file_name
(Optional) Open the table definition file in a text editor. For example, the command
nano /tmp/file_name
opens the file in nano. The file should look like the following. Notice there is no need for the"autodetect"
setting.{ "sourceFormat": "DATASTORE_BACKUP", "sourceUris": [ "gs://bucket_uri" ] }
(Optional) Manually edit the table definition file to modify, add, or delete settings such as
maxBadRecords
andignoreUnknownValues
. There are no configuration settings that are specific to Datastore and Cloud Firestore export files, but there are settings that apply to CSV and Google Sheets files. For more information, seeExternalDataConfiguration
in the API reference.
Creating Cloud Bigtable table definitions
When you create a table definition file for Cloud Bigtable, you manually
generate the file in JSON format. Using the mkdef
command to create a table
definition is not currently supported for Cloud Bigtable data sources. Schema
auto-detect is also unsupported for Cloud Bigtable. For a list of
Cloud Bigtable table definition options, see
BigtableOptions
in
the REST API reference.
A JSON table definition file for Cloud Bigtable looks like the following. Using this table definition file, BigQuery reads data from a single column family, interpreting the values as binary encoded integers.
{ "sourceFormat": "BIGTABLE", "sourceUris": [ "https://googleapis.com/bigtable/projects/project_id/instances/instance_id/tables/table_name" ], "bigtableOptions": { "columnFamilies" : [ { "familyId": "family_int", "type": "INTEGER", "encoding": "BINARY" } ], } }
Wildcard support for table definition files
If your Cloud Storage data is separated into multiple files that share a
common base-name, you can use a wildcard in the URI in the table definition
file. You append an asterisk (*) to the base-name and enclose the bucket and
filename in quotes. For example, if you have two files named
fed-sample000001.csv
and fed-sample000002.csv
, the bucket URI would be
"gs://mybucket/fed-sample*"
.
You can only use one wildcard for objects (filenames) within your bucket. The wildcard can appear inside the object name or at the end of the object name. Appending a wildcard to the bucket name is unsupported.
For Cloud Bigtable data, only one URI can be specified and it has to be a
fully specified and valid HTTPS URL for a Cloud Bigtable table. For
Datastore backups, only one URI can be specified, and it must end
with .backup_info
.
The *
wildcard character is not allowed when creating table definition files
for:
- Cloud Bigtable data sources
- Datastore exports stored in Cloud Storage
- Cloud Firestore exports stored in Cloud Storage
- Data stored in Google Drive