Schema Auto-Detection in BigQuery

This page describes schema auto-detection in BigQuery.

Schema auto-detection

Schema auto-detection is available when you load data into BigQuery, and when you query an external data source.

When auto-detection is enabled, BigQuery starts the inference process by selecting a random file in the data source and scanning up to 100 rows of data to use as a representative sample. BigQuery then examines each field and attempts to assign a data type to that field based on the values in the sample.

To see the detected schema for a table:

  • Use the command-line tool's bq show command
  • Use the BigQuery web UI to view the table's schema

When BigQuery detects schemas, it might, on rare occasions, change a field name to make it compatible with BigQuery SQL syntax.

For information about data type conversion, see:

Loading data using schema auto-detection

To enable schema auto-detection in the web UI when loading data, in the Schema section, check the Automatically detect option. When enabled, BigQuery makes a best-effort attempt to automatically infer the schema for CSV and JSON files.

auto detect link

Currently, you cannot enable schema auto-detection for Google Sheets data sources using the web UI. Also, schema auto-detection is not used with Avro or Google Cloud Datastore backup files. When you load Avro or Cloud Datastore backup data, BigQuery automatially retrieves the schema using the source data.

To enable schema auto-detection in the CLI when loading data, pass the --autodetect flag to the bq load command.

Schema auto-detection for external data sources

To enable schema auto-detection in the web UI when creating a table linked to an external data source, check the Automatically detect option. When enabled, BigQuery makes a best-effort attempt to automatically infer the schema for CSV and JSON external data sources.

Currently, you cannot enable schema auto-detection for Google Sheets data sources using the web UI. Also, schema auto-detection is not used with Avro or Google Cloud Datastore backup files. When you create a table linked to Avro or Cloud Datastore backup files, BigQuery automatially retrieves the schema using the source data.

Using the CLI, you can enable schema auto-detection when you create a table definition file for CSV, JSON, or Google Sheets data. When using the CLI to create a table definition file, you can pass the --autodetect flag to the the mkdefcommand to enable schema auto-detection, or you can pass the --noautodetect flag to disable auto-detection.

When you use the --autodetect flag, the "autodetect" setting is set to true in the table definition file. When you use the --noautodetect flag, the "autodetect" setting is set to false. If you do not provide a schema definition for the external data source when you create a table definition, and you do not use the --noautodetect or --autodetect flag, the "autodetect" setting defaults to true.

When creating a table definition file using the API, set the value of the "autodetect" property to true or false. Setting autodetect to true enables auto-detection. Setting autodetect to false disables it.

Auto-detection details

In addition to detecting schema details, auto-detection recognizes the following:

Compression
BigQuery recognizes gzip-compatible file compression when opening a file.
CSV Delimiter
BigQuery detects the following delimiters:
  • comma ( , )
  • pipe ( | )
  • tab ( \t )
CSV Header
BigQuery infers headers by comparing the first row of the file with other rows in the data set. If the first line contains only strings, and the other lines do not, BigQuery assumes that the first row is a header row.
CSV Quoted new lines
BigQuery detects quoted new line characters within a CSV field and does not interpret the quoted new line character as a row boundary.
Timestamps
BigQuery detects a wide array of timestamp formats, including, but not limited to:
  • yyyy-mm-dd in any order
  • yyyy-mm-dd hh:mm:ss
  • yyyy-mm-dd hh:mm:ss.sss
Other timestamp details:
  • Date separators can be "-", "/", or "."
  • A time zone can be appended using an offset or name

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...