Loading Data Using Schema Auto-Detection

Schema auto-detection

Schema auto-detection is available when you load data into BigQuery, and when you query an external data source.

When auto-detection is enabled, BigQuery starts the inference process by selecting a random file in the data source and scanning up to 100 rows of data to use as a representative sample. BigQuery then examines each field and attempts to assign a data type to that field based on the values in the sample.

To see the detected schema for a table:

  • Use the command-line tool's bq show command
  • Use the BigQuery web UI to view the table's schema

When BigQuery detects schemas, it might, on rare occasions, change a field name to make it compatible with BigQuery SQL syntax.

For information about data type conversion, see:

Loading data using schema auto-detection

To enable schema auto-detection when loading data:

  • BigQuery web UI: In the Schema section, check the Automatically detect option.
  • CLI: Use the bq load command with the --autodetect parameter.

When enabled, BigQuery makes a best-effort attempt to automatically infer the schema for CSV and JSON files.

Schema auto-detection is not used with Avro or Google Cloud Datastore backup files. When you load Avro or Cloud Datastore backup data, BigQuery automatially retrieves the schema using the source data.

To use schema auto-detection when you load JSON or CSV data:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your dataset name in the navigation and click Create new table.

    Note:In the UI, the process for loading data is the same as creating a table.

  3. On the Create table page:

  4. For Source Data, click Create from source.

  5. For Destination Table, choose your dataset and enter the table name in the Destination table name field.
  6. For Schema, click Automatically detect to determine the schema.

    auto detect link

  7. Click Create Table.

CLI

Issue the bq load command with the --autodetect parameter.

The following command loads a file using schema auto-detect:

bq load --autodetect --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE]

Where:

  • [FORMAT] is either NEWLINE_DELIMITED_JSON or CSV.
  • [DATASET] is the dataset that contains the table into which you're loading data.
  • [TABLE] is the name of the table into which you're loading data.
  • [PATH_TO_SOURCE] is the location of the CSV or JSON file.

For example, the following command loads myfile.csv from your local machine into a table named mytable which is stored in a dataset named mydataset:

bq load --autodetect --source_format=CSV mydataset.mytable ./myfile.csv

API

  1. Create a load job that points to the source data. For information about creating jobs, see Running jobs programmatically.

  2. Specify the data format by setting the configuration.load.sourceFormat property. To use schema autodetection, this value must be set to NEWLINE_DELIMITED_JSON or CSV.

  3. Set schema autodetection to true using the configuration.load.autodetect property.

Auto-detection details

In addition to detecting schema details, auto-detection recognizes the following:

Compression
BigQuery recognizes gzip-compatible file compression when opening a file.
CSV Delimiter
BigQuery detects the following delimiters:
  • comma ( , )
  • pipe ( | )
  • tab ( \t )
CSV Header
BigQuery infers headers by comparing the first row of the file with other rows in the data set. If the first line contains only strings, and the other lines do not, BigQuery assumes that the first row is a header row.
CSV Quoted new lines
BigQuery detects quoted new line characters within a CSV field and does not interpret the quoted new line character as a row boundary.
Timestamps
BigQuery detects a wide array of timestamp formats, including, but not limited to:
  • yyyy-mm-dd in any order
  • yyyy-mm-dd hh:mm:ss
  • yyyy-mm-dd hh:mm:ss.sss
Other timestamp details:
  • Date separators can be "-", "/", or "."
  • A time zone can be appended using an offset or name

Examples

The following are examples of timestamp formats auto-detected by BigQuery:

  • 253402300799
  • 2017-07-05 12:54:00 UTC
  • 2017-08-19 07:11:35.220 -05:00
  • 2017-08-19 12:11:35.220 UTC
  • 2017-08-19T12:11:35.220Z
  • 2.53402300799e11
  • 2017-08-19 12:11:35.220000
  • 2017-08-19 12:11:35.220

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...