The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables.
You can execute DML statements just as you would a SELECT
statement, with the
following conditions:
- You must use standard SQL. To enable standard SQL, see Switching SQL dialects.
You cannot specify a destination table. For example, in the Cloud Console you must have Destination Table set to No table selected.
Limitations
Each DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement. There is no support for multi-statement transactions.
Rows that were written to a table recently by using streaming (the
tabledata.insertall
method) cannot be modified withUPDATE
,DELETE
, orMERGE
statements. Recent writes are typically those that occur within the last 30 minutes. Note that all other rows in the table remain modifiable by usingUPDATE
,DELETE
, orMERGE
statements.Correlated subqueries within a
when_clause
,search_condition
,merge_update_clause
ormerge_insert_clause
are not supported forMERGE
statements.Queries that contain DML statements cannot use a wildcard table as the target of the query. For example, a wildcard table can be used in the
FROM
clause of anUPDATE
query, but a wildcard table cannot be used as the target of theUPDATE
operation.DML statements are subject to the rate limits on table metadata update operations. For more information, see Data Manipulation Language statements in the Quotas and limits document.
Concurrent jobs
BigQuery manages the concurrency of DML statements that add or modify or delete rows in a table.
INSERT DML concurrency
During any 24 hour period, the first 1000 statements that INSERT
into a table
run concurrently. After this limit is reached, the concurrency of INSERT
statements that write to a table is limited to 10. Any INSERT
DML jobs beyond
10 are queued in PENDING state. After a previous job
finishes, the next PENDING job is dequeued and run. Currently, up to
100 INSERT
DML statements can be queued against a table at any given
time.
UPDATE, DELETE, MERGE DML concurrency
We refer to UPDATE
, DELETE
, and MERGE
DML statements as mutating DML
statements. If you submit one or more mutating DML statements on a table while
other mutating DML jobs on it are still running (or pending),
BigQuery will run them concurrently, up to a fixed number of
concurrent mutating DML statements. If you reach the concurrency limit,
BigQuery automatically queues any additional mutating DML jobs in
a PENDING state. When a previously running job finishes, the next PENDING job is
dequeued and run.
INTERACTIVE priority DML jobs that are queued for more than 6 hours fail.
DML statement conflicts
Concurrently running mutating DML statements on a table might fail due to conflicts in the changes they make. BigQuery retries these failures.
An
INSERT
DML statement that inserts rows to a table doesn't conflict with any other concurrently running DML statement.MERGE
DML statements that contain only anINSERT
clause with noUPDATE
orDELETE
clauses don't conflict with any other concurrently running DML statement.MERGE
DML statements withUPDATE
orDELETE
clauses don't conflict with any other concurrently running DML statement as long theMERGE
statement does not lead to update or deletion of existing rows.
Pricing
For information on DML pricing, see Data Manipulation Language pricing on the Pricing page.
Best practices
For best performance, we recommend the following patterns:
Avoid submitting large numbers of individual row updates or insertions. Instead, group DML operations together when possible. For more information, see DML statements that update or insert single rows.
If updates or deletions generally happen on older data, or within a particular range of dates, consider partitioning your tables. Partitioning ensures that the changes are limited to specific partitions within the table.
Avoid partitioning tables if the amount of data in each partition is small and each update modifies a large fraction of the partitions.
If you often update rows where one or more columns fall within a narrow range of values, consider using clustered tables. Clustering ensures that changes are limited to specific sets of blocks, reducing the amount of data that needs to be read and written. The following is an example of an
UPDATE
statement that filters on a range of column values:UPDATE s = "some string" WHERE id BETWEEN 54 AND 75
Here is a similar example that filters on a small list of column values:
UPDATE s = "some string" WHERE id IN (54, 57, 60)
Consider clustering on the
id
column in these cases.If you need OLTP functionality, consider using Cloud SQL federated queries, which enable BigQuery to query data that resides in Cloud SQL.
What's next
- See DML syntax and samples on the DML syntax page.