Specifying a Schema

BigQuery allows you to specify a table's schema when you load data into a table, and when you create an empty table. Alternatively, you can use schema auto-detection for supported data formats. When you load Avro or Cloud Datastore backup files, the schema is inferred from the data. After loading data or creating a table, you can update the schema definition.

When you load data, you can specify a table's schema in one of the following ways:

  • Add the schema manually in the BigQuery web UI.
  • Specify the schema inline using the CLI.
  • Provide a schema file in JSON format.
  • Specify the configuration.load.schema property when inserting a load job using the API or Client Libraries.

When you specify the schema, you must consider the following items:

Schema components

When you specify a table's schema, you must identify each field's:

  • Name
  • Data type

You can optionally specify:

  • Field descriptions
  • Modes (if no mode is specified, the default is NULLABLE)

Field names and descriptions

A field name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum field name length is 128 characters.

Each field can include a description. The description is a string with a maximum length of 1,024 characters.

You can specify field descriptions in the BigQuery web UI as a JSON array or using a JSON schema file. You cannot add a description using the Add Field option in the UI, but you can manually add field descriptions in the UI after you load your data.

Standard SQL data types

BigQuery standard SQL allows you to specify the following data types in your schema:

Type Description
Integer Numeric values without fractional components
Floating point Approximate numeric values with fractional components
Boolean TRUE or FALSE (case insensitive)
String Variable-length character (Unicode) data
Bytes Variable-length binary data
Date A logical calendar date
Datetime A year, month, day, hour, minute, second, and subsecond
Time A time, independent of a specific date
Timestamp An absolute point in time, with microsecond precision
Struct (Record) Container of ordered fields each with a type (required) and field name (optional)

For more information on data types in standard SQL, see Data Types.

You can also declare an array type when you query data. For more information, see Working with arrays.

Field modes

Standard SQL supports the following field modes in your schema:

Mode Description
Nullable Field allows NULL values (default)
Required NULL values are not allowed
Repeated Field contains an array of values of the specified type

For more information on field modes, see schema.fields.mode in the REST API reference.

Manually specifying schemas

When you load data into BigQuery, you can manually specify a table's schema using the BigQuery web UI or the CLI. Manually specifying a schema is supported for CSV and JSON (newline delimited) files. When you load Avro or Cloud Datastore backup data, BigQuery automatially retrieves the schema using the source data.

To manually specify a table schema:

Web UI

In the BigQuery web UI, you can specify a schema using the Add Field option or the Edit as Text option. You cannot specify a description when you use the Add Field option. To add field descriptions, use Edit as Text and specify the schema as a JSON array.

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your dataset name in the navigation and click Create new table. The process for loading data is the same as the process for creating an empty table.

  3. On the Create table page:

    • For Source Data, click Create from source.
    • For Destination Table, choose your dataset and enter the table name in the Destination table name field.
    • For Schema, choose one of the following options to specify a schema:

      • Option 1: Use Add Field and specify each field's Name, Type, and Mode. In the BigQuery web UI, you cannot add a field description when you use the Add Field option.

        Add schema using add fields

      • Option 2: Click Edit as Text and paste the schema in the form of a JSON array.

        Add schema as JSON array

    • Click Create Table.

Command-line

Use the bq load command and supply the schema inline (in the format [FIELD]:[DATA_TYPE],[FIELD]:[DATA_TYPE]).

When you specify the schema on the command line, you cannot include a RECORD (STRUCT) type, you cannot include a field description, and you cannot specify the field mode. All field modes default to NULLABLE. To include field descriptions, modes, and RECORD types, supply a JSON schema file instead.

To load data into a table using an inline schema definition, enter:

bq load --source_format=[FORMAT] [DATASET].[TABLE_NAME] [PATH_TO_SOURCE] [SCHEMA]

Where:

  • [FORMAT] is NEWLINE_DELIMITED_JSON or CSV.
  • [DATASET] is the dataset that contains the table into which you're loading data.
  • [TABLE] is the name of the table into which you're loading data.
  • [PATH_TO_SOURCE] is the location of the CSV or JSON file.
  • [SCHEMA] is the schema definition (in the format [FIELD]:[DATA_TYPE],[FIELD]:[DATA_TYPE]).

For example, the following command loads data from a local CSV file named myfile.csv into mydataset.mytable. The schema is manually specified inline:

bq load --source_format=CSV mydataset.mytable ./myfile.csv qtr:STRING,sales:FLOAT,year:STRING

API

Call jobs.insert() and specify the load configuration. The schema object is specified using the configuration.load.schema property.

Specifying a JSON schema file

If you prefer not to specify your schema manually, you can create a JSON schema file to use with the CLI. A JSON schema file consists of a JSON array that contains the following:

  • A description (optional) of the field
  • The field name
  • The field's data type
  • The field's mode (optional — if unspecified, defaults to NULLABLE)

Creating a JSON schema file

To create a JSON schema file, enter the following using an appropriate text editor:

[
 {
   "description": "[DESCRIPTION]"
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 },
 {
   "description": "[DESCRIPTION]"
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 }
]

Where:

  • [DESCRIPTION] is a string that represents the field's description.
  • [NAME] is a string that represents the field name.
  • [TYPE] is the field's data type.
  • [MODE] is the field mode.

The JSON array is indicated by the brackets []. Each field entry must be separated by a comma — },.

You can write an existing table schema to a file by entering the following command:

bq show --format=prettyjson [DATASET].[TABLE] > myschema.json

You can use the output file as a starting point for your own JSON schema file. If you use this approach, ensure the file contains only the JSON array that represents the table's schema.

For example, the following JSON array represents a basic JSON schema. This schema has 3 fields: qtr (REQUIRED STRING), rep (NULLABLE STRING), and sales (NULLABLE FLOAT).

[
  {
    "description": "quarter",
    "mode": "REQUIRED",
    "name": "qtr",
    "type": "STRING"
  },
  {
    "description": "sales representative",
    "mode": "NULLABLE",
    "name": "rep",
    "type": "STRING"
  },
  {
    "description": "total sales",
    "mode": "NULLABLE",
    "name": "sales",
    "type": "FLOAT"
  }
]

Using a JSON schema file

After creating your JSON schema file, you can specify it on the command line. You cannot use a schema file with the BigQuery web UI or the API, but you can specify the schema as a JSON array.

In the CLI, use the bq load command and supply the schema file inline. When you supply a JSON schema file, it must be stored in a locally readable location. You cannot specify a JSON schema file stored in Cloud Storage or Google Drive.

The following command loads data into a table using the schema definition in a JSON file:

bq load --source_format=[FORMAT] [DATASET].[TABLE_NAME] [PATH_TO_SOURCE] [SCHEMA_FILE]

Where:

  • [FORMAT] is NEWLINE_DELIMITED_JSON or CSV.
  • [DATASET] is the dataset that contains the table into which you're loading data.
  • [TABLE] is the name of the table into which you're loading data.
  • [PATH_TO_SOURCE] is the location of the CSV or JSON file.
  • [SCHEMA_FILE] is the path to the schema file on your local machine.

For example, the following command loads data from a local CSV file named myfile.csv into mydataset.mytable. The schema is manually specified in myschema.json:

bq load --source_format=CSV mydataset.mytable ./myfile.csv ./myschema.json

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...