Updating partitioned table data using DML

This page provides an overview of data manipulation language (DML) support for partitioned tables.

For more information on DML, see:

Tables used in examples

The following JSON schema definitions represent the tables used in the examples on this page.

mytable: an ingestion-time partitioned table

    [
      {"name": "field1", "type": "INTEGER"},
      {"name": "field2", "type": "STRING"}
    ]

mytable2: a standard (non-partitioned) table

    [
      {"name": "id", "type": "INTEGER"},
      {"name": "ts", "type": "TIMESTAMP"}
    ]

mycolumntable: a partitioned table that is partitioned by using the ts TIMESTAMP column

    [
      {"name": "field1", "type": "INTEGER"},
      {"name": "field2", "type": "STRING"}
      {"name": "field3", "type": "BOOLEAN"}
      {"name": "ts", "type": "TIMESTAMP"}
    ]

In examples where COLUMN_ID appears, replace it with the name of the column you wish to operate on.

Inserting data

You use a DML INSERT statement to add rows to a partitioned table.

Inserting data into ingestion-time partitioned tables

When you use a DML statement to add rows to an ingestion-time partitioned table, you can specify the partition to which the rows should be added. You reference the partition using the _PARTITIONTIME pseudocolumn.

For example, the following INSERT statement adds a row to the May 1, 2017 partition of mytable — “2017-05-01”.

INSERT INTO
  project_id.dataset.mytable (_PARTITIONTIME,
    field1,
    field2)
SELECT
  TIMESTAMP("2017-05-01"),
  1,
  "one"

Only timestamps that correspond to exact date boundaries can be used. For example, the following DML statement returns an error:

INSERT INTO
  project_id.dataset.mytable (_PARTITIONTIME,
    field1,
    field2)
SELECT
  TIMESTAMP("2017-05-01 21:30:00"),
  1,
  "one"

Inserting data into partitioned tables

Inserting data into a partitioned table using DML is the same as inserting data into a non-partitioned table.

For example, the following INSERT statement adds rows to partitioned table mycolumntable by selecting data from mytable2 (a non-partitioned table).

INSERT INTO
  project_id.dataset.mycolumntable (ts,
    field1)
SELECT
  ts,
  id
FROM
  project_id.dataset.mytable2

Deleting data

You use a DML DELETE statement to delete rows from a partitioned table.

Deleting data in ingestion-time partitioned tables

The following DELETE statement deletes all rows from the June 1, 2017 partition ("2017-06-01") of mytable where field1 is equal to 21. You reference the partition using the _PARTITIONTIME pseudocolumn.

DELETE
  project_id.dataset.mytable
WHERE
  field1 = 21
  AND _PARTITIONTIME = "2017-06-01"

Deleting data in partitioned tables

Deleting data in a partitioned table using DML is the same as deleting data from a non-partitioned table.

For example, the following DELETE statement deletes all rows from the June 1, 2017 partition ("2017-06-01") of mycolumntable where field1 is equal to 21.

DELETE
  project_id.dataset.mycolumntable
WHERE
  field1 = 21
  AND DATE(ts) = "2017-06-01"

Using DML DELETE to delete partitions

If a qualifying DELETE statement covers all rows in a partition, BigQuery removes the entire partition. This removal is done without scanning bytes or consuming slots. The following example of a DELETE statement covers the entire partition of a filter on the _PARTITIONDATE pseudocolumn:

DELETE mydataset.mytable
WHERE _PARTITIONDATE IN ('2076-10-07', '2076-03-06');

Common disqualifications

Queries with the following characteristics may not benefit from the optimization:

Partial partition coverage
References to non-partitioning columns
Recently ingested data through the BigQuery Storage Write API or the legacy streaming API
Filters with subqueries or unsupported predicates

Eligibility for optimization can vary with the type of partitioning, the underlying storage metadata, and the filter predicates. As a best practice, perform a dry run to verify that the query results in 0 bytes processed.

Multi-statement transaction

This optimization works within a multi-statement transaction. The following query example replaces a partition with data from another table in a single transaction, without scanning the partition for the DELETE statement.

DECLARE REPLACE_DAY DATE;
BEGIN TRANSACTION;

-- find the partition which we want to replace
SET REPLACE_DAY = (SELECT MAX(d) FROM mydataset.mytable_staging);

-- delete the entire partition from mytable
DELETE FROM mydataset.mytable WHERE part_col = REPLACE_DAY;

-- insert the new data into the same partition in mytable
INSERT INTO mydataset.mytable
SELECT * FROM mydataset.mytable_staging WHERE part_col = REPLACE_DAY;

COMMIT TRANSACTION;

Updating data

You use an UPDATE statement to update rows in a partitioned table.

Updating data in ingestion-time partitioned tables

The following UPDATE statement moves rows from one partition to another. Rows in the May 1, 2017 partition (“2017-05-01”) of mytable where field1 is equal to 21 are moved to the June 1, 2017 partition (“2017-06-01”).

UPDATE
  project_id.dataset.mytable
SET
  _PARTITIONTIME = "2017-06-01"
WHERE
  _PARTITIONTIME = "2017-05-01"
  AND field1 = 21

Updating data in partitioned tables

Updating data in a partitioned table using DML is the same as updating data from a non-partitioned table. For example, the following UPDATE statement moves rows from one partition to another. Rows in the May 1, 2017 partition (“2017-05-01”) of mytable where field1 is equal to 21 are moved to the June 1, 2017 partition (“2017-06-01”).

UPDATE
  project_id.dataset.mycolumntable
SET
  ts = "2017-06-01"
WHERE
  DATE(ts) = "2017-05-01"
  AND field1 = 21

DML in hourly, monthly, and yearly partitioned tables

You can use DML statements to modify an hourly, monthly, or yearly partitioned table. Provide the hour, month, or year range of the relevant dates/timestamps/datetimes, as in the following example for monthly partitioned tables:

    bq query --nouse_legacy_sql 'DELETE FROM my_dataset.my_table WHERE
    TIMESTAMP_TRUNC(ts_column, MONTH) = "2020-01-01 00:00:00";'

Or another example for partitioned tables with DATETIME column:

    bq query --nouse_legacy_sql 'DELETE FROM my_dataset.my_table WHERE
    dt_column BETWEEN DATETIME("2020-01-01")
    AND DATETIME("2020-05-01");'

Using a `MERGE` statement

You use a DML MERGE statement to combine INSERT, UPDATE, and DELETE operations for a partitioned table into one statement and perform them atomically.

Pruning partitions when using a `MERGE` statement

When you run a MERGE statement against a partitioned table, you can limit which partitions are scanned by including the partitioning column in either a subquery filter, a search_condition filter, or a merge_condition filter. Pruning can occur when scanning the source table or the target table, or both.

Each of the examples below queries an ingestion-time partitioned table using the _PARTITIONTIME pseudocolumn as a filter.

Using a subquery to filter source data

In the following MERGE statement, the subquery in the USING clause filters on the _PARTITIONTIME pseudocolumn in the source table.

MERGE dataset.target T
USING (SELECT * FROM dataset.source WHERE _PARTITIONTIME = '2018-01-01') S
ON T.COLUMN_ID = S.COLUMN_ID
WHEN MATCHED THEN
  DELETE

Looking at the query execution plan, the subquery runs first. Only the rows in the '2018-01-01' partition in the source table are scanned. Here is the relevant stage in the query plan:

READ $10:name, $11:_PARTITIONTIME
FROM temp.source
WHERE equal($11, 1514764800.000000000)

Using a filter in the `search_condition` of a `when_clause`

If a search_condition contains a filter, then the query optimizer attempts to prune partitions. For example, in the following MERGE statement, each WHEN MATCHED and WHEN NOT MATCHED clause contains a filter on the _PARTITIONTIME pseudocolumn.

MERGE dataset.target T
USING dataset.source S
ON T.COLUMN_ID = S.COLUMN_ID
WHEN MATCHED AND T._PARTITIONTIME = '2018-01-01' THEN
  UPDATE SET COLUMN_ID = S.COLUMN_ID
WHEN MATCHED AND T._PARTITIONTIME = '2018-01-02' THEN
  UPDATE SET COLUMN_ID = S.COLUMN_ID + 10
WHEN NOT MATCHED BY SOURCE AND T._PARTITIONTIME = '2018-01-03' THEN
  DELETE

During the join stage, only the following partitions are scanned in the target table: '2018-01-01', '2018-01-02', and '2018-01-03' — that is, the union of all the search_condition filters.

From the query execution plan:

READ
$1:COLUMN_ID, $2:_PARTITIONTIME, $3:$file_temp_id, $4:$row_temp_id
FROM temp.target
WHERE or(equal($2, 1514764800.000000000), equal($2, 1514851200.000000000), equal($2, 1514937600.000000000))

However, in the following example, the WHEN NOT MATCHED BY SOURCE clause does not have a filter expression:

MERGE dataset.target T
USING dataset.source S
ON T.COLUMN_ID = S.COLUMN_ID
WHEN MATCHED AND T._PARTITIONTIME = '2018-01-01' THEN
  UPDATE SET COLUMN_ID = S.COLUMN_ID
WHEN NOT MATCHED BY SOURCE THEN
  UPDATE SET COLUMN_ID = COLUMN_ID + 1

This query must scan the entire target table to compute the WHEN NOT MATCHED BY SOURCE clause. As a result, no partitions are pruned.

Using a constant false predicate in a `merge_condition`

If you use the WHEN NOT MATCHED and WHEN NOT MATCHED BY SOURCE clauses together, then BigQuery usually performs a full outer join, which cannot be pruned. However, if the merge condition uses a constant false predicate, then BigQuery can use the filter condition for partition pruning. For more information about the use of constant false predicates, see the description of the merge_condition clause in the MERGE statement documentation.

The following example scans only the '2018-01-01' partition in both the target and source tables.

MERGE dataset.target T
USING dataset.source S
ON FALSE
WHEN NOT MATCHED AND _PARTITIONTIME = '2018-01-01' THEN
  INSERT(COLUMN_ID) VALUES(COLUMN_ID)
WHEN NOT MATCHED BY SOURCE AND _PARTITIONTIME = '2018-01-01' THEN
  DELETE

Using a filter in a `merge_condition`

The query optimizer attempts to use a filter in a merge_condition to prune partitions. The query optimizer might or might not be able to push the predicate down to the table scanning stage, depending on the type of join.

In the following example, the merge_condition is used as a predicate to join the source and target tables. The query optimizer can push this predicate down when it scans both tables. As a result, the query only scans the '2018-01-01' partition in both the target and source tables.

MERGE dataset.target T
USING dataset.source S
ON T.COLUMN_ID = S.COLUMN_ID AND
  T._PARTITIONTIME = '2018-01-01' AND
  S._PARTITIONTIME = '2018-01-01'
WHEN MATCHED THEN
  UPDATE SET COLUMN_ID = NEW_VALUE

In the next example, the merge_condition does not contain a predicate for the source table, so no partition pruning can be performed on the source table. The statement does contain a predicate for the target table, but the statement uses a WHEN NOT MATCHED BY SOURCE clause, rather than a WHEN MATCHED clause. That means the query has to scan the entire target table for the rows that don't match.

MERGE dataset.target T
USING dataset.source S
ON T.COLUMN_ID = S.COLUMN_ID AND T._PARTITIONTIME = '2018-01-01'
WHEN NOT MATCHED BY SOURCE THEN
  UPDATE SET COLUMN_ID = NEW_VALUE

Limitations

For information about DML limitations, see Limitations on the DML reference page.

Quotas

For information about DML quota information, see DML statements on the Quotas and limits page.

Pricing

For information about DML pricing, see DML pricing for partitioned tables.

Table security

To control access to tables in BigQuery, see Control access to resources with IAM.

What's next

Learn how to create partitioned tables
Learn how to query partitioned tables
Get an introduction to DML
Learn how to compose DML statements using DML syntax