Schema Auto-Detection for External Data Sources

Schema auto-detection

Schema auto-detection is available when you load data into BigQuery, and when you query an external data source.

When auto-detection is enabled, BigQuery starts the inference process by selecting a random file in the data source and scanning up to 100 rows of data to use as a representative sample. BigQuery then examines each field and attempts to assign a data type to that field based on the values in the sample.

To see the detected schema for a table:

  • Use the command-line tool's bq show command
  • Use the BigQuery web UI to view the table's schema

When BigQuery detects schemas, it might, on rare occasions, change a field name to make it compatible with BigQuery SQL syntax.

For information about data type conversion, see:

Schema auto-detection for external data sources

To enable schema auto-detection in the web UI when you create a table that is linked to an external data source, check the Automatically detect option. When enabled, BigQuery makes a best-effort attempt to automatically infer the schema for CSV and JSON external data sources.

Currently, you cannot enable schema auto-detection for Google Sheets external data sources by using the web UI. Also, schema auto-detection is not used with Avro or Google Cloud Datastore backup files. When you create a table that is linked to Avro or Cloud Datastore backup files, BigQuery automatially retrieves the schema by using the source data.

Using the CLI, you can enable schema auto-detection when you create a table definition file for CSV, JSON, or Google Sheets data. When using the CLI to create a table definition file, you can pass the --autodetect flag to the mkdefcommand to enable schema auto-detection, or you can pass the --noautodetect flag to disable auto-detection.

When you use the --autodetect flag, the "autodetect" setting is set to true in the table definition file. When you use the --noautodetect flag, the "autodetect" setting is set to false. If you do not provide a schema definition for the external data source when you create a table definition, and you do not use the --noautodetect or --autodetect flag, the "autodetect" setting defaults to true.

When you create a table definition file by using the API, set the value of the "autodetect" property to true or false. Setting autodetect to true enables auto-detection. Setting autodetect to false disables it.

Auto-detection details

In addition to detecting schema details, auto-detection recognizes the following:

Compression
BigQuery recognizes gzip-compatible file compression when opening a file.
CSV Delimiter
BigQuery detects the following delimiters:
  • comma ( , )
  • pipe ( | )
  • tab ( \t )
CSV Header
BigQuery infers headers by comparing the first row of the file with other rows in the data set. If the first line contains only strings, and the other lines do not, BigQuery assumes that the first row is a header row.
CSV Quoted new lines
BigQuery detects quoted new line characters within a CSV field and does not interpret the quoted new line character as a row boundary.
Timestamps
BigQuery detects a wide array of timestamp formats, including, but not limited to:
  • yyyy-mm-dd in any order
  • yyyy-mm-dd hh:mm:ss
  • yyyy-mm-dd hh:mm:ss.sss
Other timestamp details:
  • Date separators can be "-", "/", or "."
  • A time zone can be appended using an offset or name

Examples

The following are examples of timestamp formats auto-detected by BigQuery:

  • 253402300799
  • 2017-07-05 12:54:00 UTC
  • 2017-08-19 07:11:35.220 -05:00
  • 2017-08-19 12:11:35.220 UTC
  • 2017-08-19T12:11:35.220Z
  • 2.53402300799e11
  • 2017-08-19 12:11:35.220000
  • 2017-08-19 12:11:35.220

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...