Specifying Nested and Repeated Fields

BigQuery performs best when your data is denormalized. Rather than preserving a relational schema such as a star or snowflake schema, denormalize your data and take advantage of nested and repeated fields. Nested and repeated fields can maintain relationships without the performance impact of preserving a relational (normalized) schema.

You can specify nested or nested and repeated data in the UI or a JSON schema file. To specify nested or nested and repeated data, you use the RECORD (STRUCT) data type.

Specifying nested and repeated fields

BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON, Avro, and Cloud Datastore backup files.

For example, a relational database used to track library books would likely keep all author information in a separate table. A key such as author_id would be used to link the book to the authors.

In BigQuery, you can preserve the relationship between book and author without creating a separate author table. Instead, you create an author field, and you nest fields within it. If a book has multiple authors, you can make the nested author field repeated.

To create a field with nested values, set the data type of the parent field to RECORD. A RECORD is stored as a STRUCT and can be accessed as a STRUCT in standard SQL. A STRUCT is a container of ordered fields each with a type (required) and field name (optional). To add nested values, you create child fields that are stored as an array of fields within the RECORD (parent).

Example

The following example shows sample nested/repeated data. This table contains information about people. It consists of the following fields:

  • id
  • first_name
  • last_name
  • dob (date of birth)
  • addresses (a nested and repeated field)
    • addresses.status (current or previous)
    • addresses.address
    • addresses.city
    • addresses.state
    • addresses.zip
    • addresses.numberOfYears (years at the address)

The JSON data file would look like the following. Notice that the address field contains an array of values (indicated by [ ]).

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

The schema for this table would look like the following:

[
    {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "first_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "last_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "dob",
        "type": "DATE",
        "mode": "NULLABLE"
    },
    {
        "name": "addresses",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "status",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "address",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "city",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "state",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "zip",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "numberOfYears",
                "type": "STRING",
                "mode": "NULLABLE"
            }
        ]
    }
]

Specifying nested and repeated fields in the UI

To specify the nested and repeated addresses field in the BigQuery web UI:

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your dataset name in the navigation and click Create new table. The process for loading data is the same as the process for creating an empty table.

  3. On the Create table page:

    • For Source Data, click Create from source.
    • For Destination Table, choose your dataset and enter the table name in the Destination table name field.
    • For Schema:

      • In the Name field, type addresses.
      • For Type, select RECORD.
      • For Mode, choose REPEATED.
      • To the right of RECORD, click the plus icon plus icon to add a nested field.
      • In the Name field, type status. Notice the field is pre-populated with addresses. indicating it's a nested field.
      • For Type, choose STRING.
      • For Mode, leave the value set to NULLABLE.
      • Repeat these steps to add address (NULLABLE STRING), city (NULLABLE STRING), state (NULLABLE STRING), zip (NULLABLE STRING), and numberOfYears (NULLABLE STRING).

        Nested field in the UI

    • Alternatively, click Edit as Text and specify the schema as a JSON array.

Viewing the schema in the UI

When you examine the schema in the BigQuery web UI, the address field looks like the following:

Nested field on the Schema tab

Specifying nested and repeated fields in a JSON schema file

To specify the nested author field in a JSON schema file, enter the following using a text editor:

[
    {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "first_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "last_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "dob",
        "type": "DATE",
        "mode": "NULLABLE"
    },
    {
        "name": "addresses",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "status",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "address",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "city",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "state",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "zip",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "numberOfYears",
                "type": "STRING",
                "mode": "NULLABLE"
            }
        ]
    }
]

After creating your JSON schema file, you can supply it via the command line.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...