Specifying a Schema

BigQuery allows you to specify a table's schema when you load data into a table, and when you create an empty table. Alternatively, you can use schema auto-detection for supported data formats. When you load Avro, Parquet, ORC, or Cloud Datastore backup files, the schema is inferred from the data.

You can specify a table's schema in one of the following ways:

  • Manually specify the schema:
    • Using the BigQuery web UI.
    • Inline using the CLI.
  • Create a schema file in JSON format.
  • Call the jobs.insert method and configure the configuration.load.schema property.
  • Call the tables.insert method and configure the schema in the table resource using the schema property.

After loading data or creating an empty table, you can modify the table's schema definition.

Schema components

When you specify a table schema, you must supply each column's name and data type. You may optionally supply a column's description and mode.

Column names

A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum column name length is 128 characters. A column name cannot use any of the following prefixes:

  • _TABLE_
  • _FILE_
  • _PARTITION

Duplicate column names are not allowed even if the case differs. For example, a column named Column1 is considered identical to a column named column1.

Column descriptions

Each column can include an optional description. The description is a string with a maximum length of 1,024 characters.

Standard SQL data types

BigQuery standard SQL allows you to specify the following data types in your schema. Data type is required.

Data type Description
Integer Numeric values without fractional components
Floating point Approximate numeric values with fractional components
Numeric Exact numeric values with fractional components
Boolean TRUE or FALSE (case insensitive)
String Variable-length character (Unicode) data
Bytes Variable-length binary data
Date A logical calendar date
Datetime A year, month, day, hour, minute, second, and subsecond
Time A time, independent of a specific date
Timestamp An absolute point in time, with microsecond precision
Struct (Record) Container of ordered fields each with a type (required) and field name (optional)

For more information on data types in standard SQL, see Standard SQL Data Types.

You can also declare an array type when you query data. For more information, see Working with arrays.

Modes

BigQuery supports the following modes for your columns. Mode is optional. If the mode is unspecified, the column defaults to NULLABLE.

Mode Description
Nullable Column allows NULL values (default)
Required NULL values are not allowed
Repeated Column contains an array of values of the specified type

For more information on modes, see schema.fields.mode in the API reference.

Manually specifying schemas

When you load data or create an empty table, you can manually specify the table's schema using the BigQuery web UI or the command-line tool. Manually specifying a schema is supported when you load CSV and JSON (newline delimited) files. When you load Avro, Parquet, ORC, or Cloud Datastore backup data, BigQuery infers the schema from the source data.

To manually specify a table schema:

Web UI

In the BigQuery web UI, you can specify a schema using the Add Field option or the Edit as Text option.

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your dataset name in the navigation and click Create new table. The process for loading data is the same as the process for creating an empty table.

  3. On the Create table page:

    • For Source Data, click Create from source if you are loading data or click Create empty table.
    • For Destination Table, choose your dataset and enter the table name in the Destination table name field.
    • For Schema, choose one of the following options to manually specify a schema:

      • Option 1: Use Add Field and specify each field's Name, Type, and Mode. In the BigQuery web UI, you cannot add a field description when you use the Add Field option, but you can manually add field descriptions in the UI after you load your data.

        Add schema using add fields

      • Option 2: Click Edit as Text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file.

        Add schema as JSON array

    • Click Create Table.

Command-line

Manually supply the schema inline in the format [FIELD]:[DATA_TYPE],[FIELD]:[DATA_TYPE] using the:

  • load command if you are loading data
  • mk command if you are creating an empty table

When you specify the schema on the command line, you cannot include a RECORD (STRUCT) type, you cannot include a column description, and you cannot specify the column's mode. All modes default to NULLABLE. To include descriptions, modes, and RECORD types, supply a JSON schema file instead.

To load data into a table using an inline schema definition, enter the load command and specify the data format using the --source_format flag. If you are loading data into a table in a project other than your default project, include the project ID in the following format: [PROJECT_ID]:[DATASET].[TABLE_NAME].

Supply the --location flag and set the value to your location.

bq --location=[LOCATION] load --source_format=[FORMAT] [PROJECT_ID]:[DATASET].[TABLE_NAME] [PATH_TO_SOURCE] [SCHEMA]

Where:

  • [LOCATION] is the name of your location. The --location flag is optional if your data is in the US or the EU multi-region location. For example, if you are using BigQuery in the Tokyo region, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • [FORMAT] is NEWLINE_DELIMITED_JSON or CSV.
  • [PROJECT_ID] is your project ID.
  • [DATASET] is the dataset that contains the table into which you're loading data.
  • [TABLE] is the name of the table into which you're loading data.
  • [PATH_TO_SOURCE] is the location of the CSV or JSON data file on your local machine or in Google Cloud Storage.
  • [SCHEMA] is the inline schema definition.

Examples:

Enter the following command to load data from a local CSV file named myfile.csv into mydataset.mytable in your default project. The schema is manually specified inline. mydataset was created in the US multi-region location.

bq --location=US load --source_format=CSV mydataset.mytable ./myfile.csv qtr:STRING,sales:FLOAT,year:STRING

Enter the following command to load data from a local CSV file named myfile.csv into mydataset.mytable in your default project. mydataset was created in the asia-northeast1 region. The schema is manually specified inline:

bq --location=asia-northeast1 load --source_format=CSV mydataset.mytable ./myfile.csv qtr:STRING,sales:FLOAT,year:STRING

For more information on loading data into BigQuery, see Introduction to Loading Data.

To specify an inline schema definition when you create an empty table, enter the mk command with the --table or -t flag. If you are creating a table in a project other than your default project, add the project ID to the command in the following format: [PROJECT_ID]:[DATASET].[TABLE].

bq mk --table [PROJECT_ID]:[DATASET].[TABLE] [SCHEMA]

Where:

  • [PROJECT_ID] is your project ID.
  • [DATASET] is a dataset in your project.
  • [TABLE] is the name of the table you're creating.
  • [SCHEMA] is an inline schema definition.

For example, the following command creates an empty table named mytable in your default project. The schema is manually specified inline.

bq mk --table mydataset.mytable qtr:STRING,sales:FLOAT,year:STRING

For more information on creating an empty table, see Creating an empty table with a schema definition.

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To specify a table's schema when you load data into a table:
var gcsURI = "gs://cloud-samples-data/bigquery/us-states/us-states.json";

var dataset = client.CreateDataset(datasetId);

var schema = new TableSchemaBuilder {
    { "name", BigQueryDbType.String },
    { "post_abbr", BigQueryDbType.String }
}.Build();

var jobOptions = new CreateLoadJobOptions()
{
    SourceFormat = FileFormat.NewlineDelimitedJson
};

var loadJob = client.CreateLoadJob(gcsURI, dataset.GetTableReference(tableId),
    schema, jobOptions);                                  

loadJob.PollUntilCompleted();

To specify a schema when you create an empty table:

public void CreateTable(string datasetId, string tableId, BigQueryClient client)
{
    var dataset = client.GetDataset(datasetId);
    // Create schema for new table.
    var schema = new TableSchemaBuilder
    {
        { "title", BigQueryDbType.String },
        { "unique_words", BigQueryDbType.Int64 }
    }.Build();
    // Create the table if it doesn't exist.
    BigQueryTable table = dataset.GetOrCreateTable(tableId, schema);
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To specify a table's schema when you load data into a table:
gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.json")
gcsRef.SourceFormat = bigquery.JSON
gcsRef.Schema = bigquery.Schema{
	{Name: "name", Type: bigquery.StringFieldType},
	{Name: "post_abbr", Type: bigquery.StringFieldType},
}
loader := client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)
loader.WriteDisposition = bigquery.WriteEmpty

job, err := loader.Run(ctx)
if err != nil {
	return err
}
status, err := job.Wait(ctx)
if err != nil {
	return err
}

if status.Err() != nil {
	return fmt.Errorf("Job completed with error: %v", status.Err())
}

To specify a schema when you create an empty table:

sampleSchema := bigquery.Schema{
	{Name: "full_name", Type: bigquery.StringFieldType},
	{Name: "age", Type: bigquery.IntegerFieldType},
}

metaData := &bigquery.TableMetadata{
	Schema:         sampleSchema,
	ExpirationTime: time.Now().AddDate(1, 0, 0), // Table will be automatically deleted in 1 year.
}
tableRef := client.Dataset(datasetID).Table(tableID)
if err := tableRef.Create(ctx, metaData); err != nil {
	return err
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To specify a table's schema when you load data into a table, configure the LoadJobConfig.schema property.
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
    bigquery.SchemaField('name', 'STRING'),
    bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.json'

load_job = client.load_table_from_uri(
    uri,
    dataset_ref.table('us_states'),
    location='US',  # Location must match that of the destination dataset.
    job_config=job_config)  # API request

assert load_job.job_type == 'load'

load_job.result()  # Waits for table load to complete.

assert load_job.state == 'DONE'
assert client.get_table(dataset_ref.table('us_states')).num_rows > 0

To specify a schema when you create an empty table, configure the Table.schema property.

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

schema = [
    bigquery.SchemaField('full_name', 'STRING', mode='REQUIRED'),
    bigquery.SchemaField('age', 'INTEGER', mode='REQUIRED'),
]
table_ref = dataset_ref.table('my_table')
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request

assert table.table_id == 'my_table'

Specifying a JSON schema file

If you prefer not to specify your schema manually, you can create a JSON schema file to use with the CLI. A JSON schema file consists of a JSON array that contains the following:

  • (Optional) The column's description
  • The column name
  • The column's data type
  • (Optional) The column's mode (if unspecified, mode defaults to NULLABLE)

Creating a JSON schema file

To create a JSON schema file, enter the following using an appropriate text editor:

[
 {
   "description": "[DESCRIPTION]",
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 },
 {
   "description": "[DESCRIPTION]",
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 }
]

The JSON array is indicated by the beginning and ending brackets []. Each column entry must be separated by a comma — },.

You can write an existing table schema to a local file by entering the following command:

bq show --schema --format=prettyjson [PROJECT_ID]:[DATASET].[TABLE] > [PATH_TO_FILE]

You can use the output file as a starting point for your own JSON schema file. If you use this approach, ensure the file contains only the JSON array that represents the table's schema.

For example, the following JSON array represents a basic table schema. This schema has 3 columns: qtr (REQUIRED STRING), rep (NULLABLE STRING), and sales (NULLABLE FLOAT).

[
  {
    "description": "quarter",
    "mode": "REQUIRED",
    "name": "qtr",
    "type": "STRING"
  },
  {
    "description": "sales representative",
    "mode": "NULLABLE",
    "name": "rep",
    "type": "STRING"
  },
  {
    "description": "total sales",
    "mode": "NULLABLE",
    "name": "sales",
    "type": "FLOAT"
  }
]

Using a JSON schema file

After creating your JSON schema file, you can specify it on the command line. You cannot use a schema file with the BigQuery web UI or the API.

Manually supply the schema file using the:

  • load command if you are loading data
  • mk command if you are creating an empty table

When you supply a JSON schema file, it must be stored in a locally readable location. You cannot specify a JSON schema file stored in Cloud Storage or Google Drive.

Specifying a schema file when you load data

The following command loads data into a table using the schema definition in a JSON file:

bq --location=[LOCATION] load --source_format=[FORMAT] [PROJECT_ID]:[DATASET].[TABLE] [PATH_TO_DATA_FILE] [PATH_TO_SCHEMA_FILE]

Where:

  • [LOCATION] is the name of your location. The --location flag is optional if your data is in the US or the EU multi-region location. For example, if you are using BigQuery in the Tokyo region, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • [FORMAT] is NEWLINE_DELIMITED_JSON or CSV.
  • [PROJECT_ID] is your project ID.
  • [DATASET] is the dataset that contains the table into which you're loading data.
  • [TABLE] is the name of the table into which you're loading data.
  • [PATH_TO_DATA_FILE] is the location of the CSV or JSON data file on your local machine or in Google Cloud Storage.
  • [SCHEMA_FILE] is the path to the schema file on your local machine.

Examples:

Enter the following command to load data from a local CSV file named myfile.csv into mydataset.mytable in your default project. mydataset was created in the US multi-region location. The schema is specified in myschema.json.

bq --location=US load --source_format=CSV mydataset.mytable ./myfile.csv ./myschema.json

Enter the following command to load data from a local CSV file named myfile.csv into mydataset.mytable in your default project. mydataset was created in the asia-northeast1 region. The schema is specified in myschema.json.

bq --location=asia-northeast1 load --source_format=CSV mydataset.mytable ./myfile.csv ./myschema.json

Specifying a schema file when you create a table

The following command creates an empty table in an existing dataset using the schema definition in a JSON file:

bq mk --table [PROJECT_ID]:[DATASET].[TABLE] [PATH_TO_SCHEMA_FILE]

Where:

  • [PROJECT_ID] is your project ID.
  • [DATASET] is a dataset in your project.
  • [TABLE] is the name of the table you're creating.
  • [PATH_TO_SCHEMA_FILE] is the path to the schema file on your local machine.

For example, the following command creates a table named mytable in mydataset in your default project. The schema is specified in myschema.json:

bq mk --table mydataset.mytable ./myschema.json

Specifying a schema in the API

To specify a table schema using the API:

  • To specify a schema when you load data, call the jobs.insert method and configure the configuration.load.schema property. Specify your region in the location property in the jobReference section.
  • To specify a schema when you create a table, call the tables.insert method and configure the schema in the table resource using the schema property.

Specifying a schema using the API is similar to the process for Creating a JSON schema file.

Next steps

Was this page helpful? Let us know how we did:

Send feedback about...