Specify nested and repeated columns in table schemas

This page describes how to define a table schema with nested and repeated columns in BigQuery. For an overview of table schemas, see Specifying a schema.

Define nested and repeated columns

To create a column with nested data, set the data type of the column to RECORD in the schema. A RECORD can be accessed as a STRUCT type in standard SQL. A STRUCT is a container of ordered fields.

To create a column with repeated data, set the mode of the column to REPEATED in the schema. A repeated field can be accessed as an ARRAY type in standard SQL.

A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY. An array cannot contain another array directly. For more information, see Declaring an ARRAY type.

Limitations

Nested and repeated schemas are subject to the following limitations:

A schema cannot contain more than 15 levels of nested RECORD types.
Columns of type RECORD can contain nested RECORD types, also called child records. The maximum nested depth limit is 15 levels. This limit is independent of whether the RECORDs are scalar or array-based (repeated).

Example schema

The following example shows sample nested and repeated data. This table contains information about people. It consists of the following fields:

  • id
  • first_name
  • last_name
  • dob (date of birth)
  • addresses (a nested and repeated field)
    • addresses.status (current or previous)
    • addresses.address
    • addresses.city
    • addresses.state
    • addresses.zip
    • addresses.numberOfYears (years at the address)

The JSON data file would look like the following. Notice that the addresses column contains an array of values (indicated by [ ]). The multiple addresses in the array are the repeated data. The multiple fields within each address are the nested data.

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

The schema for this table looks like the following:

[
    {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "first_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "last_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "dob",
        "type": "DATE",
        "mode": "NULLABLE"
    },
    {
        "name": "addresses",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "status",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "address",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "city",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "state",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "zip",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "numberOfYears",
                "type": "STRING",
                "mode": "NULLABLE"
            }
        ]
    }
]

Specifying the nested and repeated columns in the example

To create a new table with the previous nested and repeated columns, select one of the following options:

Console

To specify the nested and repeated addresses column in the Google Cloud console:

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, expand your project and select a dataset.

  3. In the details panel, click Create table.

  4. On the Create table page, specify the following details:

    • For Source, in the Create table from field, select Empty table.
    • In the Destination section, specify the following fields:

      • For Dataset, select the dataset in which you want to create the table.
      • For Table, enter the name of the table that you want to create.
    • For Schema, click Add field and enter the following table schema:

      • For Field name, enter addresses.
      • For Type, select RECORD.
      • For Mode, choose REPEATED.

        Addresses schema

      • Specify the following fields for a nested field:

        • In the Field name field, enter status.
        • For Type, choose STRING.
        • For Mode, leave the value set to NULLABLE.
        • Click Add field to add the following fields:

          Field name Type Mode
          address STRING NULLABLE
          city STRING NULLABLE
          state STRING NULLABLE
          zip STRING NULLABLE
          numberOfYears STRING NULLABLE

        Alternatively, click Edit as text and specify the schema as a JSON array.

SQL

Use the CREATE TABLE statement. Specify the schema using the column option:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE TABLE IF NOT EXISTS mydataset.mytable (
      id STRING,
      first_name STRING,
      last_name STRING,
      dob DATE,
      addresses
        ARRAY<
          STRUCT<
            status STRING,
            address STRING,
            city STRING,
            state STRING,
            zip STRING,
            numberOfYears STRING>>
    ) OPTIONS (
        description = 'Example name and addresses table');
    

  3. Click Run.

For more information about how to run queries, see Running interactive queries.

bq

To specify the nested and repeated addresses column in a JSON schema file, use a text editor to create a new file. Paste in the example schema definition shown above.

After you create your JSON schema file, you can provide it through the bq command-line tool. For more information, see Using a JSON schema file.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
)

// createTableComplexSchema demonstrates creating a BigQuery table and specifying a complex schema that includes
// an array of Struct types.
func createTableComplexSchema(w io.Writer, projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydatasetid"
	// tableID := "mytableid"
	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	sampleSchema := bigquery.Schema{
		{Name: "id", Type: bigquery.StringFieldType},
		{Name: "first_name", Type: bigquery.StringFieldType},
		{Name: "last_name", Type: bigquery.StringFieldType},
		{Name: "dob", Type: bigquery.DateFieldType},
		{Name: "addresses",
			Type:     bigquery.RecordFieldType,
			Repeated: true,
			Schema: bigquery.Schema{
				{Name: "status", Type: bigquery.StringFieldType},
				{Name: "address", Type: bigquery.StringFieldType},
				{Name: "city", Type: bigquery.StringFieldType},
				{Name: "state", Type: bigquery.StringFieldType},
				{Name: "zip", Type: bigquery.StringFieldType},
				{Name: "numberOfYears", Type: bigquery.StringFieldType},
			}},
	}

	metaData := &bigquery.TableMetadata{
		Schema: sampleSchema,
	}
	tableRef := client.Dataset(datasetID).Table(tableID)
	if err := tableRef.Create(ctx, metaData); err != nil {
		return err
	}
	fmt.Fprintf(w, "created table %s\n", tableRef.FullyQualifiedName())
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.Field.Mode;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.StandardTableDefinition;
import com.google.cloud.bigquery.TableDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;

public class NestedRepeatedSchema {

  public static void runNestedRepeatedSchema() {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    createTableWithNestedRepeatedSchema(datasetName, tableName);
  }

  public static void createTableWithNestedRepeatedSchema(String datasetName, String tableName) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);

      Schema schema =
          Schema.of(
              Field.of("id", StandardSQLTypeName.STRING),
              Field.of("first_name", StandardSQLTypeName.STRING),
              Field.of("last_name", StandardSQLTypeName.STRING),
              Field.of("dob", StandardSQLTypeName.DATE),
              // create the nested and repeated field
              Field.newBuilder(
                      "addresses",
                      StandardSQLTypeName.STRUCT,
                      Field.of("status", StandardSQLTypeName.STRING),
                      Field.of("address", StandardSQLTypeName.STRING),
                      Field.of("city", StandardSQLTypeName.STRING),
                      Field.of("state", StandardSQLTypeName.STRING),
                      Field.of("zip", StandardSQLTypeName.STRING),
                      Field.of("numberOfYears", StandardSQLTypeName.STRING))
                  .setMode(Mode.REPEATED)
                  .build());

      TableDefinition tableDefinition = StandardTableDefinition.of(schema);
      TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build();

      bigquery.create(tableInfo);
      System.out.println("Table with nested and repeated schema created successfully");
    } catch (BigQueryException e) {
      System.out.println("Table was not created. \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library and create a client
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function nestedRepeatedSchema() {
  // Creates a new table named "my_table" in "my_dataset"
  // with nested and repeated columns in schema.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";
  // const schema = [
  //   {name: 'Name', type: 'STRING', mode: 'REQUIRED'},
  //   {
  //     name: 'Addresses',
  //     type: 'RECORD',
  //     mode: 'REPEATED',
  //     fields: [
  //       {name: 'Address', type: 'STRING'},
  //       {name: 'City', type: 'STRING'},
  //       {name: 'State', type: 'STRING'},
  //       {name: 'Zip', type: 'STRING'},
  //     ],
  //   },
  // ];

  // For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
  const options = {
    schema: schema,
    location: 'US',
  };

  // Create a new table in the dataset
  const [table] = await bigquery
    .dataset(datasetId)
    .createTable(tableId, options);

  console.log(`Table ${table.id} created.`);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

# from google.cloud import bigquery
# client = bigquery.Client()
# project = client.project
# dataset_ref = bigquery.DatasetReference(project, 'my_dataset')

schema = [
    bigquery.SchemaField("id", "STRING", mode="NULLABLE"),
    bigquery.SchemaField("first_name", "STRING", mode="NULLABLE"),
    bigquery.SchemaField("last_name", "STRING", mode="NULLABLE"),
    bigquery.SchemaField("dob", "DATE", mode="NULLABLE"),
    bigquery.SchemaField(
        "addresses",
        "RECORD",
        mode="REPEATED",
        fields=[
            bigquery.SchemaField("status", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("address", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("city", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("state", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("zip", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("numberOfYears", "STRING", mode="NULLABLE"),
        ],
    ),
]
table_ref = dataset_ref.table("my_table")
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request

print("Created table {}".format(table.full_table_id))

Modify nested and repeated columns

After you add a nested column or a nested and repeated column to a table's schema definition, you can modify the column as you would any other type of column. BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode. For more information, see Modifying table schemas.

Additionally, you can manually modify a schema definition that includes nested and repeated columns. For more information, see Manually changing table schemas.

When to use nested and repeated columns

BigQuery performs best when your data is denormalized. Rather than preserving a relational schema such as a star or snowflake schema, denormalize your data and take advantage of nested and repeated columns. Nested and repeated columns can maintain relationships without the performance impact of preserving a relational (normalized) schema.

For example, a relational database used to track library books would likely keep all author information in a separate table. A key such as author_id would be used to link the book to the authors.

In BigQuery, you can preserve the relationship between book and author without creating a separate author table. Instead, you create an author column, and you nest fields within it such as the author's first name, last name, date of birth, and so on. If a book has multiple authors, you can make the nested author column repeated.

BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON files, Avro files, Firestore export files, and Datastore export files.

Table security

To control access to tables in BigQuery, see Introduction to table access controls.

Next steps