Specifying a schema

BigQuery lets you specify a table's schema when you load data into a table, and when you create an empty table. Alternatively, you can use schema auto-detection for supported data formats.

When you load Avro, Parquet, ORC, Firestore export files, or Datastore export files, the schema is automatically retrieved from the self-describing source data.

You can specify a table's schema in the following ways:

  • Manually specify the schema:
    • Using the Cloud Console.
    • Inline using the bq command-line tool.
  • Create a schema file in JSON format.
  • Call the jobs.insert method and configure the schema property in the load job configuration.
  • Call the tables.insert method and configure the schema in the table resource using the schema property.

After loading data or creating an empty table, you can modify the table's schema definition.

Schema components

When you specify a table schema, you must supply each column's name and data type. You can also supply a column's description and mode.

Column names

A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum column name length is 128 characters. A column name cannot use any of the following prefixes:

  • _TABLE_
  • _FILE_
  • _PARTITION

Duplicate column names are not allowed even if the case differs. For example, a column named Column1 is considered identical to a column named column1.

Column descriptions

Each column can include an optional description. The description is a string with a maximum length of 1,024 characters.

Standard SQL data types

BigQuery standard SQL lets you specify the following data types in your schema. Data type is required.

Name Data type Description
Integer INT64 Numeric values without fractional components
Floating point FLOAT64 Approximate numeric values with fractional components
Numeric NUMERIC Exact numeric values with fractional components
Boolean BOOL TRUE or FALSE (case insensitive)
String STRING Variable-length character (Unicode) data
Bytes BYTES Variable-length binary data
Date DATE A logical calendar date
Date/Time DATETIME A year, month, day, hour, minute, second, and subsecond
Time TIME A time, independent of a specific date
Timestamp TIMESTAMP An absolute point in time, with microsecond precision
Struct (Record) STRUCT Container of ordered fields each with a type (required) and field name (optional)
Geography GEOGRAPHY A pointset on the Earth's surface (a set of points, lines and polygons on the WGS84 reference spheroid, with geodesic edges)

For more information about data types in standard SQL, see Standard SQL data types.

You can also declare an array type when you query data. For more information, see Working with arrays.

Modes

BigQuery supports the following modes for your columns. Mode is optional. If the mode is unspecified, the column defaults to NULLABLE.

Mode Description
Nullable Column allows NULL values (default)
Required NULL values are not allowed
Repeated Column contains an array of values of the specified type

For more information about modes, see mode in the TableFieldSchema.

Manually specifying schemas

When you load data or create an empty table, you can manually specify the table's schema using the Cloud Console or the bq command-line tool. Manually specifying a schema is supported when you load CSV and JSON (newline delimited) files. When you load Avro, Parquet, ORC, Firestore export data, or Datastore export data, the schema is automatically retrieved from the self-describing source data.

To manually specify a table schema:

Console

In the Cloud Console, you can specify a schema using the Add field option or the Edit as text option.

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. Select a dataset from the Resources section of the navigation panel.

  3. Click Create table on the right side of the window.

    Create table.

  4. On the Create table page, in the Source section, select Empty table.

  5. On the Create table page, in the Destination section:

    • For Dataset name, choose the appropriate dataset

      Select dataset.

    • In the Table name field, enter the name of the table you're creating.

    • Verify that Table type is set to Native table.

  6. In the Schema section, enter the schema definition.

    • Option 1: Use Add field and specify each field's Name, Type, and Mode. In the Cloud Console, you cannot add a field description when you use the Add field option, but you can manually add field descriptions after you load your data.
    • Option 2: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file.
  7. Click Create table.

bq

Manually supply the schema inline in the format field:data_type,field:data_type using one of the following commands:

  • If you're loading data, use the bq load command.
  • If you're creating an empty table, use the bq mk command.

When you specify the schema on the command line, you cannot include a RECORD (STRUCT) type, you cannot include a column description, and you cannot specify the column's mode. All modes default to NULLABLE. To include descriptions, modes, and RECORD types, supply a JSON schema file instead.

To load data into a table using an inline schema definition, enter the load command and specify the data format using the --source_format flag. If you are loading data into a table in a project other than your default project, include the project ID in the following format: project_id:dataset.table_name.

(Optional) Supply the --location flag and set the value to your location.

bq --location=location load \
--source_format=format \
project_id:dataset.table_name \
path_to_source \
schema

Replace the following:

  • location: the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • format: NEWLINE_DELIMITED_JSON or CSV.
  • project_id: your project ID.
  • dataset: the dataset that contains the table into which you're loading data.
  • table_name: the name of the table into which you're loading data.
  • path_to_source: the location of the CSV or JSON data file on your local machine or in Cloud Storage.
  • schema: the inline schema definition.

Example:

Enter the following command to load data from a local CSV file named myfile.csv into mydataset.mytable in your default project. The schema is manually specified inline.

bq load \
--source_format=CSV \
mydataset.mytable \
./myfile.csv \
qtr:STRING,sales:FLOAT,year:STRING

For more information about loading data into BigQuery, see Introduction to loading data.

To specify an inline schema definition when you create an empty table, enter the bq mk command with the --table or -t flag. If you are creating a table in a project other than your default project, add the project ID to the command in the following format: project_id:dataset.table.

bq mk --table project_id:dataset.table schema

Replace the following:

  • project_id: your project ID.
  • dataset: a dataset in your project.
  • table: the name of the table you're creating.
  • schema: an inline schema definition.

For example, the following command creates an empty table named mytable in your default project. The schema is manually specified inline.

bq mk --table mydataset.mytable qtr:STRING,sales:FLOAT,year:STRING

For more information about creating an empty table, see Creating an empty table with a schema definition.

C#

To specify a table's schema when you load data into a table:


using Google.Apis.Bigquery.v2.Data;
using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryLoadTableGcsJson
{
    public void LoadTableGcsJson(
        string projectId = "your-project-id",
        string datasetId = "your_dataset_id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        var gcsURI = "gs://cloud-samples-data/bigquery/us-states/us-states.json";
        var dataset = client.GetDataset(datasetId);
        var schema = new TableSchemaBuilder {
            { "name", BigQueryDbType.String },
            { "post_abbr", BigQueryDbType.String }
        }.Build();
        TableReference destinationTableRef = dataset.GetTableReference(
            tableId: "us_states");
        // Create job configuration
        var jobOptions = new CreateLoadJobOptions()
        {
            SourceFormat = FileFormat.NewlineDelimitedJson
        };
        // Create and run job
        BigQueryJob loadJob = client.CreateLoadJob(
            sourceUri: gcsURI, destination: destinationTableRef,
            schema: schema, options: jobOptions);
        loadJob.PollUntilCompleted();  // Waits for the job to complete.
        // Display the number of rows uploaded
        BigQueryTable table = client.GetTable(destinationTableRef);
        Console.WriteLine(
            $"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");
    }
}

To specify a schema when you create an empty table:


using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryCreateTable
{
    public BigQueryTable CreateTable(
        string projectId = "your-project-id",
        string datasetId = "your_dataset_id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        var dataset = client.GetDataset(datasetId);
        // Create schema for new table.
        var schema = new TableSchemaBuilder
        {
            { "full_name", BigQueryDbType.String },
            { "age", BigQueryDbType.Int64 }
        }.Build();
        // Create the table
        return dataset.CreateTable(tableId: "your_table_id", schema: schema);
    }
}

Go

To specify a table's schema when you load data into a table:

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery"
)

// importJSONExplicitSchema demonstrates loading newline-delimited JSON data from Cloud Storage
// into a BigQuery table and providing an explicit schema for the data.
func importJSONExplicitSchema(projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydataset"
	// tableID := "mytable"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.json")
	gcsRef.SourceFormat = bigquery.JSON
	gcsRef.Schema = bigquery.Schema{
		{Name: "name", Type: bigquery.StringFieldType},
		{Name: "post_abbr", Type: bigquery.StringFieldType},
	}
	loader := client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)
	loader.WriteDisposition = bigquery.WriteEmpty

	job, err := loader.Run(ctx)
	if err != nil {
		return err
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err
	}

	if status.Err() != nil {
		return fmt.Errorf("job completed with error: %v", status.Err())
	}
	return nil
}

To specify a schema when you create an empty table:

import (
	"context"
	"fmt"
	"time"

	"cloud.google.com/go/bigquery"
)

// createTableExplicitSchema demonstrates creating a new BigQuery table and specifying a schema.
func createTableExplicitSchema(projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydatasetid"
	// tableID := "mytableid"
	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	sampleSchema := bigquery.Schema{
		{Name: "full_name", Type: bigquery.StringFieldType},
		{Name: "age", Type: bigquery.IntegerFieldType},
	}

	metaData := &bigquery.TableMetadata{
		Schema:         sampleSchema,
		ExpirationTime: time.Now().AddDate(1, 0, 0), // Table will be automatically deleted in 1 year.
	}
	tableRef := client.Dataset(datasetID).Table(tableID)
	if err := tableRef.Create(ctx, metaData); err != nil {
		return err
	}
	return nil
}

Java

To specify a table's schema when you load data into a table:

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.FormatOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.LoadJobConfiguration;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.TableId;

// Sample to load JSON data from Cloud Storage into a new BigQuery table
public class LoadJsonFromGCS {

  public static void runLoadJsonFromGCS() {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String sourceUri = "gs://cloud-samples-data/bigquery/us-states/us-states.json";
    Schema schema =
        Schema.of(
            Field.of("name", StandardSQLTypeName.STRING),
            Field.of("post_abbr", StandardSQLTypeName.STRING));
    loadJsonFromGCS(datasetName, tableName, sourceUri, schema);
  }

  public static void loadJsonFromGCS(
      String datasetName, String tableName, String sourceUri, Schema schema) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);
      LoadJobConfiguration loadConfig =
          LoadJobConfiguration.newBuilder(tableId, sourceUri)
              .setFormatOptions(FormatOptions.json())
              .setSchema(schema)
              .build();

      // Load data from a GCS JSON file into the table
      Job job = bigquery.create(JobInfo.of(loadConfig));
      // Blocks until this load table job completes its execution, either failing or succeeding.
      job = job.waitFor();
      if (job.isDone()) {
        System.out.println("Json from GCS successfully loaded in a table");
      } else {
        System.out.println(
            "BigQuery was unable to load into the table due to an error:"
                + job.getStatus().getError());
      }
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Column not added during load append \n" + e.toString());
    }
  }
}

To specify a schema when you create an empty table:

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.StandardTableDefinition;
import com.google.cloud.bigquery.TableDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;

public class CreateTable {

  public static void runCreateTable() {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    Schema schema =
        Schema.of(
            Field.of("stringField", StandardSQLTypeName.STRING),
            Field.of("booleanField", StandardSQLTypeName.BOOL));
    createTable(datasetName, tableName, schema);
  }

  public static void createTable(String datasetName, String tableName, Schema schema) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);
      TableDefinition tableDefinition = StandardTableDefinition.of(schema);
      TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build();

      bigquery.create(tableInfo);
      System.out.println("Table created successfully");
    } catch (BigQueryException e) {
      System.out.println("Table was not created. \n" + e.toString());
    }
  }
}

Python

To specify a table's schema when you load data into a table, configure the LoadJobConfig.schema property.

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("name", "STRING"),
        bigquery.SchemaField("post_abbr", "STRING"),
    ],
    source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.json"

load_job = client.load_table_from_uri(
    uri,
    table_id,
    location="US",  # Must match the destination dataset location.
    job_config=job_config,
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

To specify a schema when you create an empty table, configure the Table.schema property.

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

schema = [
    bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
]

table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table)  # Make an API request.
print(
    "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

Specifying a JSON schema file

If you prefer not to specify your schema manually, you can create a JSON schema file to use with the bq command-line tool. A JSON schema file consists of a JSON array that contains the following:

  • (Optional) The column's description
  • The column name
  • The column's data type
  • (Optional) The column's mode (if unspecified, mode defaults to NULLABLE)

Creating a JSON schema file

To create a JSON schema file, enter the following using an appropriate text editor:

[
 {
   "description": "[DESCRIPTION]",
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 },
 {
   "description": "[DESCRIPTION]",
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 }
]

The JSON array is indicated by the beginning and ending brackets []. Each column entry must be separated by a comma: },.

You can write an existing table schema to a local file by entering the following command:

bq show \
--schema \
--format=prettyjson \
project_id:dataset.table > path_to_file

You can use the output file as a starting point for your own JSON schema file. If you use this approach, ensure the file contains only the JSON array that represents the table's schema.

For example, the following JSON array represents a basic table schema. This schema has three columns: qtr (REQUIRED STRING), rep (NULLABLE STRING), and sales (NULLABLE FLOAT).

[
  {
    "description": "quarter",
    "mode": "REQUIRED",
    "name": "qtr",
    "type": "STRING"
  },
  {
    "description": "sales representative",
    "mode": "NULLABLE",
    "name": "rep",
    "type": "STRING"
  },
  {
    "description": "total sales",
    "mode": "NULLABLE",
    "name": "sales",
    "type": "FLOAT"
  }
]

Using a JSON schema file

After you create your JSON schema file, you can specify it using the bq command-line tool. You can't use a schema file with the Cloud Console or the API.

Manually supply the schema file:

  • If you're loading data, use the bq load command.
  • If you're creating an empty table, use the bq mk command.

When you supply a JSON schema file, it must be stored in a locally readable location. You cannot specify a JSON schema file stored in Cloud Storage or Drive.

Specifying a schema file when you load data

The following command loads data into a table using the schema definition in a JSON file:

bq --location=location load \
--source_format=format \
project_id:dataset.table \
path_to_data_file \
path_to_schema_file

Replace the following:

  • location: the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • format: NEWLINE_DELIMITED_JSON or CSV.
  • project_id: your project ID.
  • dataset: the dataset that contains the table into which you're loading data.
  • table: the name of the table into which you're loading data.
  • path_to_data_file: the location of the CSV or JSON data file on your local machine or in Cloud Storage.
  • path_to_schema_file: the path to the schema file on your local machine.

Example:

Enter the following command to load data from a local CSV file named myfile.csv into mydataset.mytable in your default project. The schema is specified in myschema.json.

bq load --source_format=CSV mydataset.mytable ./myfile.csv ./myschema.json

Specifying a schema file when you create a table

The following command creates an empty table in an existing dataset using the schema definition in a JSON file:

bq mk --table project_id:dataset.table path_to_schema_file

Replace the following:

  • project_id: your project ID.
  • dataset: a dataset in your project.
  • table: the name of the table you're creating.
  • path_to_schema_file: the path to the schema file on your local machine.

For example, the following command creates a table named mytable in mydataset in your default project. The schema is specified in myschema.json:

bq mk --table mydataset.mytable ./myschema.json

Specifying a schema in the API

Specify a table schema using the API:

Specifying a schema using the API is similar to the process for Creating a JSON schema file.

Next steps