Updating Partitioned Table Data Using DML Statements

This page provides an overview of Data Manipulation Language (DML) support for partitioned tables.

For more information on DML, see:

Tables used in examples

The following JSON schema definitions represent the tables used in the examples on this page.

mytable — an ingestion-time partitioned table

    [
      {"name": "field1", "type": "INTEGER"},
      {"name": "field2", "type": "STRING"}
    ]

mytable2 — a standard (non-partitioned) table

    [
      {"name": "id", "type": "INTEGER"},
      {"name": "ts", "type": "TIMESTAMP"}
    ]

mycolumntable — a partitioned table that is partitioned by using the ts TIMESTAMP column

    [
      {"name": "field1", "type": "INTEGER"},
      {"name": "field2", "type": "STRING"}
      {"name": "field3", "type": "BOOLEAN"}
      {"name": "ts", "type": "TIMESTAMP"}
    ]

Inserting data

You use a DML INSERT statement to add rows to a partitioned table.

Inserting data into ingestion-time partitioned tables

When you use a DML statement to add rows to an ingestion-time partitioned table, you can specify the partition to which the rows should be added. You reference the partition using the _PARTITIONTIME pseudo column.

For example, the following INSERT statement adds a row to the May 1, 2017 partition of mytable“2017-05-01”.

INSERT INTO PROJECT_ID.DATASET.mytable (_PARTITIONTIME, field1, field2) AS SELECT TIMESTAMP(“2017-05-01”), 1, “one”

Only timestamps that correspond to exact date boundaries can be used. For example, the following DML statement returns an error:

INSERT INTO PROJECT_ID.DATASET.mytable (_PARTITIONTIME, field1, field2) AS SELECT TIMESTAMP(“2017-05-01 21:30:00”), 1, “one”

Inserting data into partitioned tables

Inserting data into a partitioned table using DML is the same as inserting data into a non-partitioned table.

For example, the following INSERT statement adds rows to partitioned table mycolumntable by selecting data from mytable2 (a non-partitioned table).

INSERT INTO PROJECT_ID.DATASET.mycolumntable (ts, field1) AS SELECT ts, id from PROJECT_ID.DATASET.mytable2

Deleting data

You use a DML DELETE statement to delete rows from a partitioned table.

Deleting data in ingestion-time partitioned tables

The following DELETE statement deletes all rows from the June 1, 2017 partition (“2017-06-01”) or mytable where field1 is equal to 21. You reference the partition using the _PARTITIONTIME pseudo column.

DELETE PROJECT_ID.DATASET.mytable WHERE field1 = 21 AND _PARTITIONTIME = “2017-05-01”

Deleting data in partitioned tables

Deleting data in a partitioned table using DML is the same as deleting data from a non-partitioned table.

For example, the following DELETE statement deletes all rows from the June 1, 2017 partition (“2017-06-01”) or mycolumntable where field1 is equal to 21.

DELETE PROJECT_ID.DATASET.mycolumntable WHERE field1 = 21 AND DATE(ts) = “2017-05-01”

Updating data

You use an UPDATE statement to update rows in a partitioned table.

Updating data in ingestion-time partitioned tables

The following UPDATE statement moves rows from one partition to another. Rows in the May 1, 2017 partition (“2017-05-01”) of mytable where field1 is equal to 21 are moved to the June 1, 2017 partition (“2017-06-01”).

UPDATE PROJECT_ID.DATASET.mytable SET _PARTITIONTIME = “2017-06-01”  WHERE _PARTITIONTIME = “2017-05-01” AND field1 = 21

Updating data in partitioned tables

Updating data in a partitioned table using DML is the same as updating data from a non-partitioned table. For example, the following UPDATE statement moves rows from one partition to another. Rows in the May 1, 2017 partition (“2017-05-01”) of mytable where field1 is equal to 21 are moved to the June 1, 2017 partition (“2017-06-01”).

UPDATE PROJECT_ID.DATASET.mycolumntable SET ts = “2017-06-01”  WHERE DATE(ts) = “2017-05-01” AND field1 = 21

Using a MERGE statement

You use a DML MERGE statement to combine INSERT, UPDATE, and DELETE operations for a partitioned table into one statement and perform them atomically.

Pruning partitions when using a MERGE statement

When you run a MERGE statement against a partitioned table, you can limit the partitions involved in the statement by using the _PARTITIONTIME pseudo column (for ingestion-time partitioned tables) or by using the date or timestamp column (for partitioned tables). Pruning partitions reduces cost and improves query performance.

You can use partition pruning conditions in the following places: in a subquery filter, a search_condition filter, or a merge_condition filter.

Each of the examples below queries an ingestion-time partitioned table using the _PARTITIONTIME pseudo column.

Using a subquery to filter source data

You can use a filter in a subquery to prune partitions. For example, in the following MERGE statement, only the rows in the '2018-01-01' partition in the source table are scanned.

MERGE dataset.target T
USING (SELECT * FROM dataset.source WHERE _PARTITIONTIME = '2018-01-01') S
ON T.c1 = S.c1
WHEN MATCHED THEN
  DELETE

Using a filter in the search_condition of a when_clause

The query optimizer attempts to use a filter in a search_condition to prune partitions. For example, in the following MERGE statement, only the rows in the following partitions are scanned in the target table: '2018-01-01', '2018-01-02', and '2018-01-03'.

MERGE dataset.target T
USING dataset.source S
ON T.c1 = S.c1
WHEN MATCHED AND T._PARTITIONTIME = '2018-01-01' THEN
  UPDATE SET c1 = S.c1
WHEN MATCHED AND T._PARTITIONTIME = '2018-01-02' THEN
  UPDATE SET c1 = c1 + 10
WHEN NOT MATCHED BY SOURCE AND T._PARTITIONTIME = '2018-01-03' THEN
  DELETE

In the following example, the WHEN NOT MATCHED BY SOURCE clause needs all data from the target table. As a result, all partitions are scanned, and you are charged for the bytes read in all partitions.

MERGE dataset.target T
USING dataset.source S
ON T.c1 = S.c1
WHEN MATCHED AND T._PARTITIONTIME = '2018-01-01' THEN
  UPDATE SET c1 = S.c1
WHEN NOT MATCHED BY SOURCE THEN
  UPDATE SET c1 = c1 + 1

In general, when you use the WHEN NOT MATCHED and WHEN NOT MATCHED BY SOURCE clauses together, BigQuery assumes a FULL OUTER JOIN between the source and target tables. Normally, partitions cannot be pruned in a FULL OUTER JOIN. However, if a constant false predicate is used, the filter condition can be used for partition pruning. The following query uses partition pruning to scan only the '2018-01-01' partition in both the target and source tables.

MERGE dataset.target T
USING dataset.source S
ON FALSE
WHEN NOT MATCHED AND _PARTITIONTIME = '2018-01-01' THEN
  INSERT(c1) VALUES(c1)
WHEN NOT MATCHED BY SOURCE AND _PARTITIONTIME = '2018-01-01' THEN
  DELETE

Using a filter in a merge_condition

The query optimizer attempts to use a filter in a merge_condition to prune partitions. For example, following query will only scan the '2018-01-01' partition in both the target and source tables.

MERGE dataset.target T
USING dataset.source S
ON T.c1 = S.c1 AND
  T._PARTITIONTIME = '2018-01-01' AND
  S._PARTITIONTIME = '2018-01-01'
WHEN MATCHED THEN
  UPDATE SET c1 = S.c1

In this example, the merge_condition is used as a predicate to join the source and target tables. The query optimizer may or may not be able to use predicate push down (depending on the type of join).

In the following example, the MERGE statement does not allow partition pruning because the partition filter is a predicate in a join condition that cannot be applied directly on the table.

MERGE dataset.target T
USING dataset.source S
ON T.c1 = S.c1 AND T._PARTITIONTIME = '2018-01-01'
WHEN NOT MATCHED BY SOURCE THEN
  UPDATE SET c1 = S.c1

Limitations

For information on DML limitations, see Limitations on the Data Manipulation Language page.

Quotas

For information on DML quota information, see DML statements on the Quotas and Limits page.

Pricing

For information on DML pricing, see DML pricing for partitioned tables.

Next steps

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.