Specifying a schema
BigQuery lets you specify a table's schema when you load data into a table, and when you create an empty table. Alternatively, you can use schema auto-detection for supported data formats.
When you load Avro, Parquet, ORC, Firestore export files, or Datastore export files, the schema is automatically retrieved from the self-describing source data.
You can specify a table's schema in the following ways:
- Use the Google Cloud console.
- Use the
CREATE TABLE
SQL statement. - Inline using the bq command-line tool.
- Create a schema file in JSON format.
- Call the
jobs.insert
method and configure theschema
property in theload
job configuration. - Call the
tables.insert
method and configure the schema in the table resource using theschema
property.
After loading data or creating an empty table, you can modify the table's schema definition.
Schema components
When you specify a table schema, you must supply each column's name and data type. You can also supply a column's description, mode, and default value.
Column names
A column name can contain letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. If you use flexible column names, BigQuery supports starting a column name with a number. Exercise caution when starting columns with a number, since using flexible column names with the BigQuery Storage Read API or BigQuery Storage Write API requires special handling. For more information about flexible column name support, see flexible column names.
Column names have a maximum length of 300 characters. Column names can't use any of the following prefixes:
_TABLE_
_FILE_
_PARTITION
_ROW_TIMESTAMP
__ROOT__
_COLIDENTIFIER
Duplicate column names are not allowed even if the case differs. For example, a
column named Column1
is considered identical to a column named column1
. To
learn more about column naming rules, see Column
names in the
GoogleSQL reference.
If a table name (for example, test
) is the same as one of its column names
(for example, test
), the SELECT
expression interprets the test
column as
a STRUCT
containing all other table columns. To avoid this collision, use
one of the following methods:
Avoid using the same name for a table and its columns.
Assign the table a different alias. For example, the following query assigns a table alias
t
to the tableproject1.dataset.test
:SELECT test FROM project1.dataset.test AS t;
Include the table name when referencing a column. For example:
SELECT test.test FROM project1.dataset.test;
Flexible column names
You have more flexibility in what you name columns, including expanded access to characters in languages other than English as well as additional symbols.
Flexible column names support the following characters:
- Any letter in any language, as represented by the Unicode regular expression
\p{L}
. - Any numeric character in any language as represented by the Unicode regular
expression
\p{N}
. - Any connector punctuation character, including underscores, as represented
by the Unicode regular expression
\p{Pc}
. - A hyphen or dash as represented by the Unicode regular expression
\p{Pd}
. - Any mark intended to accompany another character as represented by the
Unicode regular expression
\p{M}
. For example, accents, umlauts, or enclosing boxes. - The following special characters:
- An ampersand (
&
) as represented by the Unicode regular expression\u0026
. - A percent sign (
%
) as represented by the Unicode regular expression\u0025
. - An equals sign (
=
) as represented by the Unicode regular expression\u003D
. - A plus sign (
+
) as represented by the Unicode regular expression\u002B
. - A colon (
:
) as represented by the Unicode regular expression\u003A
. - An apostrophe (
'
) as represented by the Unicode regular expression\u0027
. - A less-than sign (
<
) as represented by the Unicode regular expression\u003C
. - A greater-than sign (
>
) as represented by the Unicode regular expression\u003E
. - A number sign (
#
) as represented by the Unicode regular expression\u0023
. - A vertical line (
|
) as represented by the Unicode regular expression\u007c
. - Whitespace.
- An ampersand (
Flexible column names don't support the following special characters:
- An exclamation mark (
!
) as represented by the Unicode regular expression\u0021
. - A quotation mark (
"
) as represented by the Unicode regular expression\u0022
. - A dollar sign (
$
) as represented by the Unicode regular expression\u0024
. - A left parenthesis (
(
) as represented by the Unicode regular expression\u0028
. - A right parenthesis (
)
) as represented by the Unicode regular expression\u0029
. - An asterisk (
*
) as represented by the Unicode regular expression\u002A
. - A comma (
,
) as represented by the Unicode regular expression\u002C
. - A period (
.
) as represented by the Unicode regular expression\u002E
. - A slash (
/
) as represented by the Unicode regular expression\u002F
. - A semicolon (
;
) as represented by the Unicode regular expression\u003B
. - A question mark (
?
) as represented by the Unicode regular expression\u003F
. - An at sign (
@
) as represented by the Unicode regular expression\u0040
. - A left square bracket (
[
) as represented by the Unicode regular expression\u005B
. - A backslash (
\
) as represented by the Unicode regular expression\u005C
. - A right square bracket (
]
) as represented by the Unicode regular expression\u005D
. - A circumflex accent (
^
) as represented by the Unicode regular expression\u005E
. - A grave accent (
`
) as represented by the Unicode regular expression\u0060
. - A left curly bracket {
{
) as represented by the Unicode regular expression\u007B
. - A right curly bracket (
}
) as represented by the Unicode regular expression\u007D
. - A tilde (
~
) as represented by the Unicode regular expression\u007E
.
For additional guidelines, see Column names.
The expanded column characters are supported by both the BigQuery Storage Read API
and the BigQuery Storage Write API. To use the expanded list of Unicode characters
with the BigQuery Storage Read API, you must set a flag. You can use the
displayName
attribute to retrieve the column name. The following example
shows how to set a flag with the Python client:
from google.cloud.bigquery_storage import types
requested_session = types.ReadSession()
#set avro serialization options for flexible column.
options = types.AvroSerializationOptions()
options.enable_display_name_attribute = True
requested_session.read_options.avro_serialization_options = options
To use the expanded list of Unicode characters with the BigQuery Storage Write API,
you must provide the schema with column_name
notation, unless you are using
the JsonStreamWriter
writer object. The following example shows how to
provide the schema:
syntax = "proto2";
package mypackage;
// Source protos located in github.com/googleapis/googleapis
import "google/cloud/bigquery/storage/v1/annotations.proto";
message FlexibleSchema {
optional string item_name_column = 1
[(.google.cloud.bigquery.storage.v1.column_name) = "name-列"];
optional string item_description_column = 2
[(.google.cloud.bigquery.storage.v1.column_name) = "description-列"];
}
In this example, item_name_column
and item_description_column
are
placeholder names which need to be compliant with the
protocol buffer naming
convention. Note that column_name
annotations always take precedence over
placeholder names.
- Loading Parquet data does not support flexible column names by default. To enroll in this preview complete the enrollment form. Note that after enrolling in the preview, any invalid column names (for example, column name collation) returns an error. For projects not enrolled, the load request replaces the invalid characters with underscores instead of returning an error.
- Loading CSV data using schema autodetection does not support flexible column names by default. To enroll in this preview complete the enrollment form. Note that after enrolling in the preview, any invalid column names (for example, column name collation) returns an error. For projects not enrolled, the load request replaces the invalid characters with underscores instead of returning an error.
Limitations
Flexible column names are not supported with external tables.
Column descriptions
Each column can include an optional description. The description is a string with a maximum length of 1,024 characters.
Default values
The default value for a column must be a literal or one of the following functions:
CURRENT_DATE
CURRENT_DATETIME
CURRENT_TIME
CURRENT_TIMESTAMP
GENERATE_UUID
RAND
SESSION_USER
ST_GEOGPOINT
GoogleSQL data types
GoogleSQL lets you specify the following data types in your schema. Data type is required.
Name | Data type | Description |
---|---|---|
Integer | INT64 |
Numeric values without fractional components |
Floating point | FLOAT64 |
Approximate numeric values with fractional components |
Numeric | NUMERIC |
Exact numeric values with fractional components |
BigNumeric | BIGNUMERIC |
Exact numeric values with fractional components |
Boolean | BOOL |
TRUE or FALSE (case-insensitive) |
String | STRING |
Variable-length character (Unicode) data |
Bytes | BYTES |
Variable-length binary data |
Date | DATE |
A logical calendar date |
Date/Time | DATETIME |
A year, month, day, hour, minute, second, and subsecond |
Time | TIME |
A time, independent of a specific date |
Timestamp | TIMESTAMP |
An absolute point in time, with microsecond precision |
Struct (Record) | STRUCT |
Container of ordered fields each with a type (required) and field name (optional) |
Geography | GEOGRAPHY |
A pointset on the Earth's surface (a set of points, lines and polygons on the WGS84 reference spheroid, with geodesic edges) |
JSON | JSON |
Represents JSON, a lightweight data-interchange format |
RANGE | RANGE |
A range of DATE , DATETIME , or TIMESTAMP values |
For more information about data types in GoogleSQL, see GoogleSQL data types.
You can also declare an array type when you query data. For more information, see Work with arrays.
Modes
BigQuery supports the following modes for your columns. Mode is
optional. If the mode is unspecified, the column defaults to NULLABLE
.
Mode | Description |
---|---|
Nullable | Column allows NULL values (default) |
Required | NULL values are not allowed |
Repeated | Column contains an array of values of the specified type |
For more information about modes, see mode
in the TableFieldSchema
.
Rounding mode
When a column is a NUMERIC
or BIGNUMERIC
type, you can set the
rounding_mode
column option,
which determines how values in that column are rounded when written to the
table. You can set the rounding_mode
option on a top-level column or a STRUCT
field. The following rounding modes are supported:
"ROUND_HALF_AWAY_FROM_ZERO"
: This mode (default) rounds halfway cases away from zero."ROUND_HALF_EVEN"
: This mode rounds halfway cases towards the nearest even digit.
You cannot set the rounding_mode
option for a column that is not a NUMERIC
or BIGNUMERIC
type. To learn more about these types, see
decimal types.
The following example creates a table and inserts values that are rounded based on the rounding mode of the column:
CREATE TABLE mydataset.mytable ( x NUMERIC(5,2) OPTIONS (rounding_mode='ROUND_HALF_EVEN'), y NUMERIC(5,2) OPTIONS (rounding_mode='ROUND_HALF_AWAY_FROM_ZERO') ); INSERT mydataset.mytable (x, y) VALUES (NUMERIC "1.025", NUMERIC "1.025"), (NUMERIC "1.0251", NUMERIC "1.0251"), (NUMERIC "1.035", NUMERIC "1.035"), (NUMERIC "-1.025", NUMERIC "-1.025");
The table mytable
looks like the following:
+-------+-------+ | x | y | +-------+-------+ | 1.02 | 1.03 | | 1.03 | 1.03 | | 1.04 | 1.04 | | -1.02 | -1.03 | +-------+-------+
For more information, see roundingMode
in the
TableFieldSchema
.
Specify schemas
When you load data or create an empty table, you can specify the table's schema using the Google Cloud console or the bq command-line tool. Specifying a schema is supported when you load CSV and JSON (newline delimited) files. When you load Avro, Parquet, ORC, Firestore export data, or Datastore export data, the schema is automatically retrieved from the self-describing source data.
To specify a table schema:
Console
In the Google Cloud console, you can specify a schema using the Add field option or the Edit as text option.
In the Google Cloud console, open the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
Expand the
Actions option and click Open.In the details panel, click Create table
.On the Create table page, in the Source section, select Empty table.
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset
In the Table name field, enter the name of the table you're creating.
Verify that Table type is set to Native table.
In the Schema section, enter the schema definition.
- Option 1: Use Add field and specify each field's name, type, and mode.
- Option 2: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file.
Click Create table.
SQL
Use the
CREATE TABLE
statement.
Specify the schema using the
column
option.
The following example creates a new table named newtable
with columns
x, y, z of types integer, string, and boolean:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE TABLE IF NOT EXISTS mydataset.newtable (x INT64, y STRING, z BOOL) OPTIONS( description = 'My example table');
Click
Run.
For more information about how to run queries, see Run an interactive query.
bq
Supply the schema inline in the format
field:data_type,field:data_type
using one of the
following commands:
- If you're loading data, use the
bq load
command. - If you're creating an empty table, use the
bq mk
command.
When you specify the schema on the command line, you can't include
RECORD
(STRUCT
)
or RANGE
types, you can't include a column description, and you
can't specify the column's mode. All modes default to NULLABLE
. To
include descriptions, modes, RECORD
types, and RANGE
types, supply a
JSON schema file instead.
To load data into a table using an inline schema definition, enter the
load
command and specify the data format using the --source_format
flag.
If you are loading data into a table in a project other than your default
project, include the project ID in the following format:
project_id:dataset.table_name
.
(Optional) Supply the --location
flag and set the value to your
location.
bq --location=location load \ --source_format=format \ project_id:dataset.table_name \ path_to_source \ schema
Replace the following:
location
: the name of your location. The--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file.format
:NEWLINE_DELIMITED_JSON
orCSV
.project_id
: your project ID.dataset
: the dataset that contains the table into which you're loading data.table_name
: the name of the table into which you're loading data.path_to_source
: the location of the CSV or JSON data file on your local machine or in Cloud Storage.schema
: the inline schema definition.
Example:
Enter the following command to load data from a local CSV file named
myfile.csv
into mydataset.mytable
in your default project. The schema is
specified inline.
bq load \
--source_format=CSV \
mydataset.mytable \
./myfile.csv \
qtr:STRING,sales:FLOAT,year:STRING
For more information about loading data into BigQuery, see Introduction to loading data.
To specify an inline schema definition when you create an empty table, enter
the bq mk
command with the --table
or -t
flag. If you are creating
a table in a project other than your default project, add the project ID to
the command in the following format:
project_id:dataset.table
.
bq mk --table project_id:dataset.table schema
Replace the following:
project_id
: your project ID.dataset
: a dataset in your project.table
: the name of the table you're creating.schema
: an inline schema definition.
For example, the following command creates an empty table named mytable
in
your default project. The schema is specified inline.
bq mk --table mydataset.mytable qtr:STRING,sales:FLOAT,year:STRING
For more information about creating an empty table, see Creating an empty table with a schema definition.
C#
To specify a table's schema when you load data into a table:
Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To specify a schema when you create an empty table:
Go
To specify a table's schema when you load data into a table:
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To specify a schema when you create an empty table:
Java
To specify a table's schema when you load data into a table:
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To specify a schema when you create an empty table:
Python
To specify a table's schema when you load data into a table, configure the LoadJobConfig.schema property.
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To specify a schema when you create an empty table, configure the Table.schema property.
Specifying a JSON schema file
If you prefer, you can specify the schema using a JSON schema file instead of using an inline schema definition. A JSON schema file consists of a JSON array that contains the following:
- The column's name
- The column's data type
- Optional: The column's mode (if unspecified, mode defaults to
NULLABLE
) - Optional: The column's fields if it is a
STRUCT
type - Optional: The column's description
- Optional: The column's policy tags, used for field-level access control
- Optional: The column's maximum length of values for
STRING
orBYTES
types - Optional: The column's precision
for
NUMERIC
orBIGNUMERIC
types - Optional: The column's scale
for
NUMERIC
orBIGNUMERIC
types - Optional: The column's collation
for
STRING
types - Optional: The column's default value
- Optional: The column's rounding mode, if the column is a
NUMERIC
orBIGNUMERIC
type
Creating a JSON schema file
To create a JSON schema file, enter a
TableFieldSchema
for each column. The name
and type
fields are required. All other fields are
optional.
[ { "name": string, "type": string, "mode": string, "fields": [ { object (TableFieldSchema) } ], "description": string, "policyTags": { "names": [ string ] }, "maxLength": string, "precision": string, "scale": string, "collation": string, "defaultValueExpression": string, "roundingMode": string }, { "name": string, "type": string, ... } ]
If the column is a RANGE<T>
type, use the rangeElementType
field to
describe T
, where T
must be one of DATE
, DATETIME
, or TIMESTAMP
.
[ { "name": "duration", "type": "RANGE", "mode": "NULLABLE", "rangeElementType": { "type": "DATE" } } ]
The JSON array is indicated by the beginning and ending brackets []
. Each
column entry must be separated by a comma: },
.
To write an existing table schema to a local file, do the following:
bq
bq show \ --schema \ --format=prettyjson \ project_id:dataset.table > path_to_file
Replace the following:
project_id
: your project ID.dataset
: a dataset in your project.table
: the name of an existing table schema.path_to_file
: the location of the local file into which you are writing table schema.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To write a schema JSON file from a table using the Python client library, call the Client.schema_to_json method.You can use the output file as a starting point for your own JSON schema file. If you use this approach, ensure the file contains only the JSON array that represents the table's schema.
For example, the following JSON array represents a basic table schema. This
schema has three columns: qtr
(REQUIRED
STRING
), rep
(NULLABLE
STRING
),
and sales
(NULLABLE
FLOAT
).
[ { "name": "qtr", "type": "STRING", "mode": "REQUIRED", "description": "quarter" }, { "name": "rep", "type": "STRING", "mode": "NULLABLE", "description": "sales representative" }, { "name": "sales", "type": "FLOAT", "mode": "NULLABLE", "defaultValueExpression": "2.55" } ]
Using a JSON schema file
After you create your JSON schema file, you can specify it using the bq command-line tool. You can't use a schema file with the Google Cloud console or the API.
Supply the schema file:
- If you're loading data, use the
bq load
command. - If you're creating an empty table, use the
bq mk
command.
When you supply a JSON schema file, it must be stored in a locally readable location. You cannot specify a JSON schema file stored in Cloud Storage or Google Drive.
Specifying a schema file when you load data
To load data into a table using a JSON schema definition, do the following:
bq
bq --location=location load \ --source_format=format \ project_id:dataset.table \ path_to_data_file \ path_to_schema_file
Replace the following:
location
: the name of your location. The--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file.format
:NEWLINE_DELIMITED_JSON
orCSV
.project_id
: your project ID.dataset
: the dataset that contains the table into which you're loading data.table
: the name of the table into which you're loading data.path_to_data_file
: the location of the CSV or JSON data file on your local machine or in Cloud Storage.path_to_schema_file
: the path to the schema file on your local machine.
Example:
Enter the following command to load data from a local CSV file named
myfile.csv
into mydataset.mytable
in your default project. The schema is
specified in myschema.json
in the current directory.
bq load --source_format=CSV mydataset.mytable ./myfile.csv ./myschema.json
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To load a table schema from a JSON file using the Python client library, call the schema_from_json method.Specifying a schema file when you create a table
To create an empty table in an existing dataset using a JSON schema file, do the following:
bq
bq mk --table project_id:dataset.table path_to_schema_file
Replace the following:
project_id
: your project ID.dataset
: a dataset in your project.table
: the name of the table you're creating.path_to_schema_file
: the path to the schema file on your local machine.
For example, the following command creates a table named mytable
in
mydataset
in your default project. The schema is specified in myschema.json
in the current directory:
bq mk --table mydataset.mytable ./myschema.json
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To load a table schema from a JSON file using the Python client library, call the schema_from_json method.Specifying a schema in the API
Specify a table schema using the API:
To specify a schema when you load data, call the
jobs.insert
method and configure theschema
property in theJobConfigurationLoad
resource.To specify a schema when you create a table, call the
tables.insert
method and configure theschema
property in theTable
resource.
Specifying a schema using the API is similar to the process for Creating a JSON schema file.
Table security
To control access to tables in BigQuery, see Introduction to table access controls.
What's next
- Learn how to specify nested and repeated columns in a schema definition.
- Learn about schema auto-detection.
- Learn about loading data into BigQuery.
- Learn about creating and using tables.