Apply partition and cluster recommendations
This document shows how to apply partition and cluster recommendations to your BigQuery tables.
Limitations
The partitioning and clustering recommender does not support BigQuery tables with legacy SQL. When generating a recommendation, the recommender excludes any legacy SQL queries in its analysis. Additionally, applying partition recommendations on BigQuery tables with legacy SQL breaks any legacy SQL workflows in that table.
Before you apply partition recommendations, migrate your legacy SQL workflows into GoogleSQL.
The partitioning and clustering recommender does not support resources that are stored in the following regions:
europe-central2
,europe-west8
,europe-west9
,europe-west12
,europe-north1
,europe-southwest1
us-east1
,us-east5
,us-south1
me-central1
,me-central2
,me-west1
australia-southeast2
southamerica-west1
Before you begin
- Ensure that the Recommender API is enabled.
- Ensure that you have the necessary Identity and Access Management (IAM) permissions.
Apply cluster recommendations
You can apply cluster recommendations by applying clusters to a copy of the original table, applying them directly to the original table, or using materialized views.
Apply cluster recommendations to a copied table
When you apply cluster recommendations to a BigQuery table, you can first copy the original table and then apply the recommendation to the copied table. This method ensures that your original data is preserved if you need to roll back the change to the clustering configuration.
You can use this method to apply cluster recommendations to both unpartitioned and partitioned tables.
In the Google Cloud console, go to the BigQuery page.
In the query editor, create an empty table with the same metadata (including the clustering specifications) of the original table by using the
LIKE
operator:CREATE TABLE DATASET.COPIED_TABLE LIKE DATASET.ORIGINAL_TABLE
Replace the following:
DATASET
: the name of the dataset containing the table—for example,mydataset
COPIED_TABLE
: a name for your copied table—for example,copy_mytable
ORIGINAL_TABLE
: the name of your original table—for example,mytable
In the Google Cloud console, open the Cloud Shell Editor.
In the Cloud Shell Editor, update the clustering specification of the copied table to match the recommended clustering by using the
bq update
command:bq update --clustering_fields=CLUSTER_COLUMN DATASET.COPIED_TABLE
Replace
CLUSTER_COLUMN
with the column you are clustering on—for example,mycolumn
.You can also call the
tables.update
ortables.patch
API method to modify the clustering specification.In the query editor, retrieve the table schema with the partitioning and clustering configuration of the original table, if any partitioning or clustering exists. You can retrieve the schema by viewing the
INFORMATION_SCHEMA.TABLES
view of the original table:SELECT ddl FROM DATASET.INFORMATION_SCHEMA.TABLES WHERE table_name = 'DATASET.ORIGINAL_TABLE;'
The output is the full data definition language (DDL) statement of ORIGINAL_TABLE, including the
PARTITION BY
clause. For more information about the arguments in your DDL output, seeCREATE TABLE
statement.The DDL output indicates the type of partitioning in the original table:
Partitioning type Output example Not partitioned The PARTITION BY
clause is absent.Partitioned by table column PARTITION BY c0
PARTITION BY DATE(c0)
PARTITION BY DATETIME_TRUNC(c0, MONTH)
Partitioned by ingestion time PARTITION BY _PARTITIONDATE
PARTITION BY DATETIME_TRUNC(_PARTITIONTIME, MONTH)
Ingest data into the copied table. The process that you use is based on the partition type.
- If the original table is non-partitioned or partitioned by a table column,
ingest the data from the original table to the copied table:
INSERT INTO DATASET.COPIED_TABLE SELECT * FROM DATASET.ORIGINAL_TABLE
If the original table is partitioned by ingestion time, follow these steps:
Retrieve the list of columns to form the data ingestion expression by using the
INFORMATION_SCHEMA.COLUMNS
view:SELECT ARRAY_TO_STRING(( SELECT ARRAY( SELECT column_name FROM DATASET.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'ORIGINAL_TABLE')), ", ")
The output is a comma-separated list of column names.
Ingest the data from the original table to the copy table:
INSERT DATASET.COPIED_TABLE (COLUMN_NAMES, _PARTITIONTIME) SELECT *, _PARTITIONTIME FROM DATASET.ORIGINAL_TABLE
Replace
COLUMN_NAMES
with the list of columns that was the output in the preceding step, separated by commas—for example,col1, col2, col3
.
You now have a clustered copy table with the same data as the original table. In the next steps, you replace your original table with a newly clustered table.
- If the original table is non-partitioned or partitioned by a table column,
ingest the data from the original table to the copied table:
Rename the original table to a backup table:
ALTER TABLE DATASET.ORIGINAL_TABLE RENAME TO DATASET.BACKUP_TABLE
Replace
BACKUP_TABLE
with a name for your backup table—for example,backup_mytable
.Rename the copy table to the original table:
ALTER TABLE DATASET.COPIED_TABLE RENAME TO DATASET.ORIGINAL_TABLE
Your original table is now clustered according to the cluster recommendation.
- Access and permissions, such as IAM permissions, row-level access, or column-level access.
- Table artifacts such as table clones, table snapshots, or search indexes.
- The status of any ongoing table processes, such as any materialized views or any jobs that ran when you copied the table.
- The ability to access historical table data using time travel.
- Any metadata associated with the original table—for example,
table_option_list
orcolumn_option_list
. For more information, see Data definition language statements.
If any issues arise, you must manually migrate the affected artifacts to the new table.
After reviewing the clustered table, you can optionally delete the backup table with the following command:DROP TABLE DATASET.BACKUP_TABLE
Apply cluster recommendations directly
You can apply cluster recommendations directly to an existing BigQuery table. This method is quicker than applying recommendations to a copied table, but it does not preserve a backup table.
Follow these steps to apply a new clustering specification to unpartitioned or partitioned tables.
In the bq tool, update the clustering specification of your table to match the new clustering:
bq update --clustering_fields=CLUSTER_COLUMN DATASET.ORIGINAL_TABLE
Replace the following:
CLUSTER_COLUMN
: the column you are clustering on—for example,mycolumn
DATASET
: the name of the dataset containing the table—for example,mydataset
ORIGINAL_TABLE
: the name of your original table—for example,mytable
You can also call the
tables.update
ortables.patch
API method to modify the clustering specification.To cluster all rows according to the new clustering specification, run the following
UPDATE
statement:UPDATE DATASET.ORIGINAL_TABLE SET CLUSTER_COLUMN=CLUSTER_COLUMN WHERE true
Apply cluster recommendations using materialized views
You can create a materialized view of the table to store data from the original table with the recommendation applied. Using materialized views to apply recommendations ensures that the clustered data is kept up to date using automatic refreshes. There are pricing considerations when you query, maintain, and store materialized views. To learn how to create a clustered materialized view, see Clustered materialized views.Apply partition recommendations
You can apply partition recommendations by applying partitions to a copy of the original table.
Apply partition recommendations to a copied table
When you apply partition recommendations to a BigQuery table, you can first copy the original table and then apply the recommendation to the copied table. This approach ensures that your original data is preserved if you need to roll back a partition.
The following procedure uses an example recommendation to partition a table by
the partition time unit DAY
.
Create a copied table using the partition recommendations:
CREATE TABLE DATASET.COPIED_TABLE PARTITION BY DATE_TRUNC(PARTITION_COLUMN, DAY) AS SELECT * FROM DATASET.ORIGINAL_TABLE
Replace the following:
DATASET
: the name of the dataset containing the table—for example,mydataset
COPIED_TABLE
: a name for your copied table—for example,copy_mytable
PARTITION_COLUMN
: the column you are partitioning on—for example,mycolumn
For more information about creating partitioned tables, see Creating partitioned tables.
Rename the original table to a backup table:
ALTER TABLE DATASET.ORIGINAL_TABLE RENAME TO DATASET.BACKUP_TABLE
Replace
BACKUP_TABLE
with a name for your backup table—for example,backup_mytable
.Rename the copy table to the original table:
ALTER TABLE DATASET.COPIED_TABLE RENAME TO DATASET.ORIGINAL_TABLE
Your original table is now partitioned according to the partition recommendation.
- Access and permissions, such as IAM permissions, row-level access, or column-level access.
- Table artifacts such as table clones, table snapshots, or search indexes.
- The status of any ongoing table processes, such as any materialized views or any jobs that ran when you copied the table.
- The ability to access historical table data using time travel.
- Any metadata associated with the original table—for example,
table_option_list
orcolumn_option_list
. For more information, see Data definition language statements. - Ability to use legacy SQL to write query results into partitioned tables. The use of legacy SQL is not fully supported in partitioned tables. One solution is to migrate your legacy SQL workflows into GoogleSQL before applying a partition recommendation.
If any issues arise, you must manually migrate the affected artifacts to the new table.
After reviewing the partitioned table, you can optionally delete the backup table with the following command:DROP TABLE DATASET.BACKUP_TABLE
Pricing
When you apply a recommendation to a table, you can incur the following costs:
- Processing costs. When you apply a recommendation, you execute a data definition language (DDL) or data manipulation language (DML) query to your BigQuery project.
- Storage costs. If you use the method of copying a table, you use extra storage for the copied (or backup) table.
Standard processing and storage charges apply depending on the billing account that's associated with the project. For more information, see BigQuery pricing.