Create table partitions and clusters

This document shows you how to use Dataform core to create table partitions and clusters.

BigQuery supports partitioned tables and table clustering. For more information about partitions and clusters in BigQuery, see Introduction to partitioned tables and Creating and using clustered tables.

Before you begin

  1. In the Cloud Console, go to the Dataform page.

    Go to the Dataform page

  2. Select or create a repository.

  3. Select or create a development workspace.

  4. Create a table.

Required roles

To get the permissions that you need to configure table partitions and clusters, ask your administrator to grant you the Dataform Editor (roles/dataform.editor) IAM role on workspaces. For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a table partition

To create a table partition, add a BigQuery partition_expression to the bigquery block in a table definition SQLX file.

To create a table partition, follow these steps:

  1. Go to your development workspace.
  2. In the Files pane, expand definitions/.
  3. Open a table definition SQLX file.
  4. In the config block, add the bigquery block below the table type declaration in the following format:

    config {
      type: "table",
      bigquery: {
      }
    }
    
  5. In the bigquery block, enter the following code snippet:

        partitionBy: "PARTITION_EXPRESSION"
    

    Replace PARTITION_EXPRESSION with an expression for partitioning the table.

  6. Optional: Click Format.

The following code sample shows partitioning a table by hour in a table definition SQLX file:

config {
  type: "table",
  bigquery: {
    partitionBy: "DATETIME_TRUNC(<timestamp_column>, HOUR)"
  }
}

The following code sample shows partitioning a table by an integer value in a table definition SQLX file:

config {
  type: "table",
  bigquery: {
    partitionBy: "RANGE_BUCKET(<integer_column>, GENERATE_ARRAY(0, 1000000, 1000))"
  }
}

Set a partition filter

For more information on the partition filter in BigQuery, see Setting the require partition filter attribute on a partitioned table.

To set a partition filter, follow these steps:

  1. Go to your development workspace.
  2. In the Files pane, expand definitions/.
  3. Open a partitioned table definition SQLX file.
  4. In the bigquery block, enter the following code snippet:

    requirePartitionFilter : true
    
  5. Optional: Click Format.

The following code sample shows a partition filter set in the bigquery block of a partitioned table SQLX file:

config {
  type: "table",
  bigquery: {
    partitionBy: "DATE(ts)",
    requirePartitionFilter : true
  }
}
SELECT CURRENT_TIMESTAMP() AS ts

Set a retention period for partitions

To control retention of all partitions in a partitioned table, follow these steps:

  1. Go to your development workspace.
  2. In the Files pane, expand definitions/.
  3. Open a partitioned table definition SQLX file.
  4. In the bigquery block, enter the following code snippet:

    partitionExpirationDays: NUMBER_OF_DAYS
    

    Replace NUMBER_OF_DAYS with the number of days that you want to retain partitions for.

  5. Optional: Click Format.

The following code sample shows a retention period for partitions set to 14 days in the bigquery block of a partitioned table SQLX file:

config {
  type: "table",
  bigquery: {
    partitionBy: "DATE(ts)",
    partitionExpirationDays: 14,
  }
}
SELECT CURRENT_TIMESTAMP() AS ts

Create a table cluster

To create a table cluster, add a BigQuery clustering_column_list to the bigquery block in a table definition SQLX file.

To create a table cluster, follow these steps:

  1. Go to your development workspace.
  2. In the Files pane, expand definitions/.
  3. Open a table definition SQLX file.
  4. In the bigquery block, enter the following code snippet:

        clusterBy: ["CLUSTER_COLUMN"]
    

    Replace CLUSTER_COLUMN with the name of the column by which you want to cluster the table.

  5. Optional: Click Format.

The following code sample shows a partitioned table clustered by name and revenue columns:

config {
  type: "table",
  bigquery: {
    partitionBy: "DATE(ts)",
    clusterBy: ["name", "revenue"]
  }
}
SELECT CURRENT_TIMESTAMP() as ts, name, revenue

What's next