Specifying nested and repeated columns

BigQuery performs best when your data is denormalized. Rather than preserving a relational schema such as a star or snowflake schema, denormalize your data and take advantage of nested and repeated columns. Nested and repeated columns can maintain relationships without the performance impact of preserving a relational (normalized) schema.

You can specify nested or nested and repeated data in the UI or a JSON schema file. To specify nested or nested and repeated columns, you use the RECORD (STRUCT) data type.

Specifying nested and repeated columns

BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON files, Avro files, Cloud Firestore export files, and Cloud Datastore export files.

For example, a relational database used to track library books would likely keep all author information in a separate table. A key such as author_id would be used to link the book to the authors.

In BigQuery, you can preserve the relationship between book and author without creating a separate author table. Instead, you create an author column, and you nest fields within it such as the author's first name, last name, date of birth, and so on. If a book has multiple authors, you can make the nested author column repeated.

To create a column with nested and repeated data, set the data type of the column to RECORD. A RECORD is stored as a STRUCT and can be accessed as a STRUCT in standard SQL. A STRUCT is a container of ordered fields each with a type (required) and a name (optional). To make the column nested, you add child fields to the RECORD (parent). To make the column repeated, change the mode to REPEATED.

Limitations

Nested and repeated schemas are subject to the following limitations:

  • When you load nested and repeated data, your schema cannot contain more than 15 levels of nested STRUCTs (RECORD types).
    • BigQuery supports columns of type STRUCT (or RECORD). A STRUCT is a complex type that can be used to represent an object that has multiple child columns. In a STRUCT column, you can also define one or more of the child columns as STRUCT types (referred to as nested or embedded STRUCTs). When you nest STRUCTS, BigQuery enforces a nested depth limit of 15 levels. The nested depth limit is independent of whether the STRUCTs are scalar or array-based.

Example

Example schema

The following example shows sample nested and repeated data. This table contains information about people. It consists of the following fields:

  • id
  • first_name
  • last_name
  • dob (date of birth)
  • addresses (a nested and repeated field)
    • addresses.status (current or previous)
    • addresses.address
    • addresses.city
    • addresses.state
    • addresses.zip
    • addresses.numberOfYears (years at the address)

The JSON data file would look like the following. Notice that the addresses column contains an array of values (indicated by [ ]). The multiple addresses in the array are the repeated data. The multiple fields within each address are the nested data.

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

The schema for this table would look like the following:

[
    {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "first_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "last_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "dob",
        "type": "DATE",
        "mode": "NULLABLE"
    },
    {
        "name": "addresses",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "status",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "address",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "city",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "state",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "zip",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "numberOfYears",
                "type": "STRING",
                "mode": "NULLABLE"
            }
        ]
    }
]

Specifying the nested and repeated columns in the example

Web UI

To specify the nested and repeated addresses column in the BigQuery web UI:

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your dataset name in the navigation and click Create new table. The process for loading data is the same as the process for creating an empty table.

  3. On the Create table page:

    • For Source Data, click Create from source.
    • For Destination Table, choose your dataset and enter the table name in the Destination table name field.
    • For Schema:

      • In the Name field, type addresses.
      • For Type, select RECORD.
      • For Mode, choose REPEATED.
      • To the right of RECORD, click the plus icon plus icon to add a nested field.

        • In the Name field, type status. Notice the field is pre-populated with addresses. indicating it's a nested field.
        • For Type, choose STRING.
        • For Mode, leave the value set to NULLABLE.
        • Repeat these steps to add address (NULLABLE STRING), city (NULLABLE STRING), state (NULLABLE STRING), zip (NULLABLE STRING), and numberOfYears (NULLABLE STRING).

        Nested field in the UI

    • Alternatively, click Edit as Text and specify the schema as a JSON array.

When you examine the schema in the BigQuery web UI, the addresses field looks like the following:

Nested field on the Schema tab

Command-line

To specify the nested and repeated addresses column in a JSON schema file, enter the following using a text editor:

[
    {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "first_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "last_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "dob",
        "type": "DATE",
        "mode": "NULLABLE"
    },
    {
        "name": "addresses",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "status",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "address",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "city",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "state",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "zip",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "numberOfYears",
                "type": "STRING",
                "mode": "NULLABLE"
            }
        ]
    }
]

After creating your JSON schema file, you can supply it via the command line.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

// To run this sample, you will need to create (or reuse) a context and
// an instance of the bigquery client.  For example:
// import "cloud.google.com/go/bigquery"
// ctx := context.Background()
// client, err := bigquery.NewClient(ctx, "your-project-id")
sampleSchema := bigquery.Schema{
	{Name: "id", Type: bigquery.StringFieldType},
	{Name: "first_name", Type: bigquery.StringFieldType},
	{Name: "last_name", Type: bigquery.StringFieldType},
	{Name: "dob", Type: bigquery.DateFieldType},
	{Name: "addresses",
		Type:     bigquery.RecordFieldType,
		Repeated: true,
		Schema: bigquery.Schema{
			{Name: "status", Type: bigquery.StringFieldType},
			{Name: "address", Type: bigquery.StringFieldType},
			{Name: "city", Type: bigquery.StringFieldType},
			{Name: "state", Type: bigquery.StringFieldType},
			{Name: "zip", Type: bigquery.StringFieldType},
			{Name: "numberOfYears", Type: bigquery.StringFieldType},
		}},
}

metaData := &bigquery.TableMetadata{
	Schema: sampleSchema,
}
tableRef := client.Dataset(datasetID).Table(tableID)
if err := tableRef.Create(ctx, metaData); err != nil {
	return err
}
fmt.Printf("created table %s\n", tableRef.FullyQualifiedName())

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

schema = [
    bigquery.SchemaField('id', 'STRING', mode='NULLABLE'),
    bigquery.SchemaField('first_name', 'STRING', mode='NULLABLE'),
    bigquery.SchemaField('last_name', 'STRING', mode='NULLABLE'),
    bigquery.SchemaField('dob', 'DATE', mode='NULLABLE'),
    bigquery.SchemaField('addresses', 'RECORD', mode='REPEATED', fields=[
        bigquery.SchemaField('status', 'STRING', mode='NULLABLE'),
        bigquery.SchemaField('address', 'STRING', mode='NULLABLE'),
        bigquery.SchemaField('city', 'STRING', mode='NULLABLE'),
        bigquery.SchemaField('state', 'STRING', mode='NULLABLE'),
        bigquery.SchemaField('zip', 'STRING', mode='NULLABLE'),
        bigquery.SchemaField('numberOfYears', 'STRING', mode='NULLABLE'),
    ]),
]
table_ref = dataset_ref.table('my_table')
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request

print('Created table {}'.format(table.full_table_id))

Modifying nested and repeated columns

After you add a nested column or a nested and repeated column to a table's schema definition, you can modify the column as you would any other type of column. BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode. For more information, see Modifying Table Schemas.

Additionally, you can manually modify a schema definition that includes nested and repeated columns. For more information, see Manually Changing Table Schemas.

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.