BigQuery performs best when your data is denormalized. Rather than preserving a relational schema such as a star or snowflake schema, denormalize your data and take advantage of nested and repeated columns. Nested and repeated columns can maintain relationships without the performance impact of preserving a relational (normalized) schema.
You can specify nested or nested and repeated data in the Cloud Console
or a JSON schema file. To specify nested or nested and repeated columns, you use
the RECORD
(STRUCT
) data type.
Specifying nested and repeated columns
BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON files, Avro files, Firestore export files, and Datastore export files.
For example, a relational database used to track library books would likely keep
all author information in a separate table. A key such as author_id
would be
used to link the book to the authors.
In BigQuery, you can preserve the relationship between book and author without creating a separate author table. Instead, you create an author column, and you nest fields within it such as the author's first name, last name, date of birth, and so on. If a book has multiple authors, you can make the nested author column repeated.
To create a column with nested and repeated data, set the data type of the
column to RECORD
. A RECORD
is stored as a STRUCT
and can be accessed as a STRUCT
in standard SQL. A STRUCT
is a container of
ordered fields each with a type (required) and a name (optional). To make the
column nested, you add child fields to the RECORD
(parent). To make the column
repeated, change the mode to REPEATED
.
Limitations
Nested and repeated schemas are subject to the following limitations:
- When you load nested and repeated data, your schema cannot contain more than 15 levels of nested
STRUCT
s (RECORD
types). - BigQuery supports columns of type
STRUCT
(orRECORD
). ASTRUCT
is a complex type that can be used to represent an object that has multiple child columns. In aSTRUCT
column, you can also define one or more of the child columns asSTRUCT
types (referred to as nested or embeddedSTRUCT
s). When you nestSTRUCTS
, BigQuery enforces a nested depth limit of 15 levels. The nested depth limit is independent of whether theSTRUCT
s are scalar or array-based.
Example
Example schema
The following example shows sample nested and repeated data. This table contains information about people. It consists of the following fields:
id
first_name
last_name
dob
(date of birth)addresses
(a nested and repeated field)addresses.status
(current or previous)addresses.address
addresses.city
addresses.state
addresses.zip
addresses.numberOfYears
(years at the address)
The JSON data file would look like the following. Notice that the addresses
column contains an array of values (indicated by [ ]
). The multiple addresses
in the array are the repeated data. The multiple fields within each address are
the nested data.
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]} {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}
The schema for this table would look like the following:
[ { "name": "id", "type": "STRING", "mode": "NULLABLE" }, { "name": "first_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "last_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "dob", "type": "DATE", "mode": "NULLABLE" }, { "name": "addresses", "type": "RECORD", "mode": "REPEATED", "fields": [ { "name": "status", "type": "STRING", "mode": "NULLABLE" }, { "name": "address", "type": "STRING", "mode": "NULLABLE" }, { "name": "city", "type": "STRING", "mode": "NULLABLE" }, { "name": "state", "type": "STRING", "mode": "NULLABLE" }, { "name": "zip", "type": "STRING", "mode": "NULLABLE" }, { "name": "numberOfYears", "type": "STRING", "mode": "NULLABLE" } ] } ]
Specifying the nested and repeated columns in the example
Console
To specify the nested and repeated addresses
column in the
Cloud Console:
Open the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and select a dataset.
In the details view, click Create table.
On the Create table page:
- For Source, select Empty table.
- For Destination, choose your dataset and enter the table name in the Table name field.
For Schema, add a field:
- In the Name field, type
addresses
. - For Type, select
RECORD
. For Mode, choose
REPEATED
.To the right of
addresses
, click the plus icon to add a nested field.- In the Name field, type
status
. Notice the field is pre-populated withaddresses.
indicating it's a nested field. - For Type, choose
STRING
. For Mode, leave the value set to
NULLABLE
.Repeat these steps to add
address
(NULLABLE
STRING
),city
(NULLABLE
STRING
),state
(NULLABLE
STRING
),zip
(NULLABLE
STRING
), andnumberOfYears
(NULLABLE
STRING
).
- In the Name field, type
- In the Name field, type
Alternatively, click Edit as text and specify the schema as a JSON array.
bq
To specify the nested and repeated addresses
column in a JSON schema file,
enter the following using a text editor:
[ { "name": "id", "type": "STRING", "mode": "NULLABLE" }, { "name": "first_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "last_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "dob", "type": "DATE", "mode": "NULLABLE" }, { "name": "addresses", "type": "RECORD", "mode": "REPEATED", "fields": [ { "name": "status", "type": "STRING", "mode": "NULLABLE" }, { "name": "address", "type": "STRING", "mode": "NULLABLE" }, { "name": "city", "type": "STRING", "mode": "NULLABLE" }, { "name": "state", "type": "STRING", "mode": "NULLABLE" }, { "name": "zip", "type": "STRING", "mode": "NULLABLE" }, { "name": "numberOfYears", "type": "STRING", "mode": "NULLABLE" } ] } ]
After creating your JSON schema file, you can supply it via the command line.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Modifying nested and repeated columns
After you add a nested column or a nested and repeated column to a table's schema definition, you can modify the column as you would any other type of column. BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode. For more information, see Modifying table schemas.
Additionally, you can manually modify a schema definition that includes nested and repeated columns. For more information, see Manually changing table schemas.