Specifying a schema
BigQuery lets you specify a table's schema when you load data into a table, and when you create an empty table. Alternatively, you can use schema auto-detection for supported data formats.
When you load Avro, Parquet, ORC, Firestore export files, or Datastore export files, the schema is automatically retrieved from the self-describing source data.
You can specify a table's schema in the following ways:
- Manually specify the schema:
- Using the console.
- Inline using the
bq
command-line tool. - Using the
CREATE TABLE
SQL statement.
- Create a schema file in JSON format.
- Call the
jobs.insert
method and configure theschema
property in theload
job configuration. - Call the
tables.insert
method and configure the schema in the table resource using theschema
property.
After loading data or creating an empty table, you can modify the table's schema definition.
Schema components
When you specify a table schema, you must supply each column's name and data type. You can also supply a column's description and mode.
Column names
A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum column name length is 300 characters. A column name cannot use any of the following prefixes:
_TABLE_
_FILE_
_PARTITION
_ROW_TIMESTAMP
__ROOT__
_COLIDENTIFIER
Duplicate column names are not allowed even if the case differs. For example,
a column named Column1
is considered identical to a column named column1
.
Column descriptions
Each column can include an optional description. The description is a string with a maximum length of 1,024 characters.
Standard SQL data types
BigQuery standard SQL lets you specify the following data types in your schema. Data type is required.
Name | Data type | Description |
---|---|---|
Integer | INT64 |
Numeric values without fractional components |
Floating point | FLOAT64 |
Approximate numeric values with fractional components |
Numeric | NUMERIC |
Exact numeric values with fractional components |
BigNumeric | BIGNUMERIC |
Exact numeric values with fractional components |
Boolean | BOOL |
TRUE or FALSE (case-insensitive) |
String | STRING |
Variable-length character (Unicode) data |
Bytes | BYTES |
Variable-length binary data |
Date | DATE |
A logical calendar date |
Date/Time | DATETIME |
A year, month, day, hour, minute, second, and subsecond |
Time | TIME |
A time, independent of a specific date |
Timestamp | TIMESTAMP |
An absolute point in time, with microsecond precision |
Struct (Record) | STRUCT |
Container of ordered fields each with a type (required) and field name (optional) |
Geography | GEOGRAPHY |
A pointset on the Earth's surface (a set of points, lines and polygons on the WGS84 reference spheroid, with geodesic edges) |
For more information about data types in standard SQL, see Standard SQL data types.
You can also declare an array type when you query data. For more information, see Working with arrays.
Modes
BigQuery supports the following modes for your columns. Mode is
optional. If the mode is unspecified, the column defaults to NULLABLE
.
Mode | Description |
---|---|
Nullable | Column allows NULL values (default) |
Required | NULL values are not allowed |
Repeated | Column contains an array of values of the specified type |
For more information about modes, see mode
in the TableFieldSchema
.
Manually specifying schemas
When you load data or create an empty table, you can manually specify the
table's schema using the console or the bq
command-line tool. Manually
specifying a schema is supported when you load CSV and JSON (newline delimited)
files. When you load Avro, Parquet, ORC, Firestore export data, or
Datastore export data, the schema is automatically retrieved from the
self-describing source data.
To manually specify a table schema:
Console
In the console, you can specify a schema using the Add field option or the Edit as text option.
In the console, open the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
Expand the
Actions option and click Open.In the details panel, click Create table
.On the Create table page, in the Source section, select Empty table.
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset
In the Table name field, enter the name of the table you're creating.
Verify that Table type is set to Native table.
In the Schema section, enter the schema definition.
- Option 1: Use Add field and specify each field's Name, Type, and Mode. In the console, you cannot add a field description when you use the Add field option, but you can manually add field descriptions after you load your data.
- Option 2: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file.
Click Create table.
SQL
Use the
CREATE TABLE
statement.
Specify the schema using the
column
option.
The following example creates a new table named newtable
with columns
x, y, z of types integer, string, and boolean:
In the console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE TABLE IF NOT EXISTS mydataset.newtable (x INT64, y STRING, z BOOL) OPTIONS( description = 'My example table');
Click
Run.
For more information about how to run queries, see Running interactive queries.
bq
Manually supply the schema inline in the format
field:data_type,field:data_type
using one of the
following commands:
- If you're loading data, use the
bq load
command. - If you're creating an empty table, use the
bq mk
command.
When you specify the schema on the command line, you cannot include a
RECORD
(STRUCT
)
type, you cannot include a column description, and you
cannot specify the column's mode. All modes default to NULLABLE
. To
include descriptions, modes, and RECORD
types, supply a
JSON schema file instead.
To load data into a table using an inline schema definition, enter the
load
command and specify the data format using the --source_format
flag.
If you are loading data into a table in a project other than your default
project, include the project ID in the following format:
project_id:dataset.table_name
.
(Optional) Supply the --location
flag and set the value to your
location.
bq --location=location load \ --source_format=format \ project_id:dataset.table_name \ path_to_source \ schema
Replace the following:
location
: the name of your location. The--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file.format
:NEWLINE_DELIMITED_JSON
orCSV
.project_id
: your project ID.dataset
: the dataset that contains the table into which you're loading data.table_name
: the name of the table into which you're loading data.path_to_source
: the location of the CSV or JSON data file on your local machine or in Cloud Storage.schema
: the inline schema definition.
Example:
Enter the following command to load data from a local CSV file named
myfile.csv
into mydataset.mytable
in your default project. The schema is
manually specified inline.
bq load \
--source_format=CSV \
mydataset.mytable \
./myfile.csv \
qtr:STRING,sales:FLOAT,year:STRING
For more information about loading data into BigQuery, see Introduction to loading data.
To specify an inline schema definition when you create an empty table, enter
the bq mk
command with the --table
or -t
flag. If you are creating
a table in a project other than your default project, add the project ID to
the command in the following format:
project_id:dataset.table
.
bq mk --table project_id:dataset.table schema
Replace the following:
project_id
: your project ID.dataset
: a dataset in your project.table
: the name of the table you're creating.schema
: an inline schema definition.
For example, the following command creates an empty table named mytable
in
your default project. The schema is manually specified inline.
bq mk --table mydataset.mytable qtr:STRING,sales:FLOAT,year:STRING
For more information about creating an empty table, see Creating an empty table with a schema definition.
C#
To specify a table's schema when you load data into a table:
To specify a schema when you create an empty table:
Go
To specify a table's schema when you load data into a table:
To specify a schema when you create an empty table:
Java
To specify a table's schema when you load data into a table:
To specify a schema when you create an empty table:
Python
To specify a table's schema when you load data into a table, configure the LoadJobConfig.schema property.
To specify a schema when you create an empty table, configure the Table.schema property.
Specifying a JSON schema file
If you prefer, you can specify the schema using a JSON schema file instead of using an inline schema definition. A JSON schema file consists of a JSON array that contains the following:
- (Optional) The column's description
- The column name
- The column's data type
- (Optional) The column's mode (if unspecified, mode defaults to
NULLABLE
)
Creating a JSON schema file
To create a JSON schema file, enter the following using an appropriate text editor:
[ { "description": "[DESCRIPTION]", "name": "[NAME]", "type": "[TYPE]", "mode": "[MODE]" }, { "description": "[DESCRIPTION]", "name": "[NAME]", "type": "[TYPE]", "mode": "[MODE]" } ]
The JSON array is indicated by the beginning and ending brackets []
. Each
column entry must be separated by a comma: },
.
You can write an existing table schema to a local file by entering the following command:
bq show \ --schema \ --format=prettyjson \ project_id:dataset.table > path_to_file
You can use the output file as a starting point for your own JSON schema file. If you use this approach, ensure the file contains only the JSON array that represents the table's schema.
For example, the following JSON array represents a basic table schema. This
schema has three columns: qtr
(REQUIRED
STRING
), rep
(NULLABLE
STRING
),
and sales
(NULLABLE
FLOAT
).
[ { "description": "quarter", "mode": "REQUIRED", "name": "qtr", "type": "STRING" }, { "description": "sales representative", "mode": "NULLABLE", "name": "rep", "type": "STRING" }, { "description": "total sales", "mode": "NULLABLE", "name": "sales", "type": "FLOAT" } ]
Using a JSON schema file
After you create your JSON schema file, you can specify it using the bq
command-line tool.
You can't use a schema file with the console or the API.
Manually supply the schema file:
- If you're loading data, use the
bq load
command. - If you're creating an empty table, use the
bq mk
command.
When you supply a JSON schema file, it must be stored in a locally readable location. You cannot specify a JSON schema file stored in Cloud Storage or Drive.
Specifying a schema file when you load data
The following command loads data into a table using the schema definition in a JSON file:
bq --location=location load \ --source_format=format \ project_id:dataset.table \ path_to_data_file \ path_to_schema_file
Replace the following:
location
: the name of your location. The--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file.format
:NEWLINE_DELIMITED_JSON
orCSV
.project_id
: your project ID.dataset
: the dataset that contains the table into which you're loading data.table
: the name of the table into which you're loading data.path_to_data_file
: the location of the CSV or JSON data file on your local machine or in Cloud Storage.path_to_schema_file
: the path to the schema file on your local machine.
Example:
Enter the following command to load data from a local CSV file named
myfile.csv
into mydataset.mytable
in your default project. The schema is
specified in myschema.json
in the current directory.
bq load --source_format=CSV mydataset.mytable ./myfile.csv ./myschema.json
Specifying a schema file when you create a table
The following command creates an empty table in an existing dataset using the schema definition in a JSON file:
bq mk --table project_id:dataset.table path_to_schema_file
Replace the following:
project_id
: your project ID.dataset
: a dataset in your project.table
: the name of the table you're creating.path_to_schema_file
: the path to the schema file on your local machine.
For example, the following command creates a table named mytable
in
mydataset
in your default project. The schema is specified in myschema.json
in the current directory:
bq mk --table mydataset.mytable ./myschema.json
Specifying a schema in the API
Specify a table schema using the API:
To specify a schema when you load data, call the
jobs.insert
method and configure theschema
property in theJobConfigurationLoad
resource.To specify a schema when you create a table, call the
tables.insert
method and configure theschema
property in theTable
resource.
Specifying a schema using the API is similar to the process for Creating a JSON schema file.
Table security
To control access to tables in BigQuery, see Introduction to table access controls.
Next steps
- Learn how to specify nested and repeated columns in a schema definition.
- Learn about schema auto-detection.
- Learn about loading data into BigQuery.
- Learn about creating and using tables.