Transform data with data manipulation language (DML)

The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables.

You can execute DML statements just as you would a SELECT statement, with the following conditions:

  • You must use GoogleSQL. To enable GoogleSQL, see Switching SQL dialects.
  • You cannot specify a destination table for the query.

For a list of BigQuery DML statements and examples of how to use them, see Data manipulation language statements in GoogleSQL.

Limitations

  • Each DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement.

  • Rows that were recently written using the tabledata.insertall streaming method can't be modified with data manipulation language (DML), such as UPDATE, DELETE, MERGE, or TRUNCATE statements. The recent writes are those that occurred within the last 30 minutes. All other rows in the table remain modifiable by using UPDATE, DELETE, MERGE, or TRUNCATE statements. The streamed data can take up to 90 minutes to become available for copy operations.

    Alternatively, rows that were recently written using the Storage Write API can be modified using UPDATE, DELETE, or MERGE statements (Preview). For more information, see Use data manipulation language (DML) with recently streamed data.

  • Correlated subqueries within a when_clause, search_condition, merge_update_clause or merge_insert_clause are not supported for MERGE statements.

  • Queries that contain DML statements cannot use a wildcard table as the target of the query. For example, a wildcard table can be used in the FROM clause of an UPDATE query, but a wildcard table cannot be used as the target of the UPDATE operation.

Concurrent jobs

BigQuery manages the concurrency of DML statements that add, modify, or delete rows in a table.

INSERT DML concurrency

During any 24 hour period, the first 1500 INSERT statements run immediately after they are submitted. After this limit is reached, the concurrency of INSERT statements that write to a table is limited to 10. Additional INSERT statements are added to a PENDING queue. Up to 100 INSERT statements can be queued against a table at any given time. When an INSERT statement completes, the next INSERT statement is removed from the queue and run.

UPDATE, DELETE, MERGE DML concurrency

The UPDATE, DELETE, and MERGE DML statements are called mutating DML statements. If you submit one or more mutating DML statements on a table while other mutating DML jobs on it are still running (or pending), BigQuery runs up to 2 of them concurrently, after which up to 20 are queued as PENDING. When a previously running job finishes, the next pending job is dequeued and run. Queued mutating DML statements share a per-table queue with maximum length 20. Additional statements past the maximum queue length for each table fail with the error message: Resources exceeded during query execution: Too many DML statements outstanding against table PROJECT_ID:TABLE, limit is 20.

Interactive priority DML jobs that are queued for more than 6 hours fail with the following error message:

DML statement has been queued for too long

DML statement conflicts

Mutating DML statements that run concurrently on a table cause DML statement conflicts when the statements try to mutate the same partition. The statements succeed as long as they don't modify the same partition. BigQuery tries to rerun failed statements up to three times.

  • An INSERT DML statement that inserts rows to a table doesn't conflict with any other concurrently running DML statement.

  • A MERGE DML statement does not conflict with other concurrently running DML statements as long as the statement only inserts rows and does not delete or update any existing rows. This can include MERGE statements with UPDATE or DELETE clauses, as long as those clauses aren't invoked when the query runs.

Pricing

For information on DML pricing, see Data Manipulation Language pricing on the Pricing page.

Best practices

For best performance, Google recommends the following patterns:

  • Avoid submitting large numbers of individual row updates or insertions. Instead, group DML operations together when possible. For more information, see DML statements that update or insert single rows.

  • If updates or deletions generally happen on older data, or within a particular range of dates, consider partitioning your tables. Partitioning ensures that the changes are limited to specific partitions within the table.

  • Avoid partitioning tables if the amount of data in each partition is small and each update modifies a large fraction of the partitions.

  • If you often update rows where one or more columns fall within a narrow range of values, consider using clustered tables. Clustering ensures that changes are limited to specific sets of blocks, reducing the amount of data that needs to be read and written. The following is an example of an UPDATE statement that filters on a range of column values:

    UPDATE mydataset.mytable
    SET string_col = 'some string'
    WHERE id BETWEEN 54 AND 75;
    

    Here is a similar example that filters on a small list of column values:

    UPDATE mydataset.mytable
    SET string_col = 'some string'
    WHERE id IN (54, 57, 60);
    

    Consider clustering on the id column in these cases.

  • If you need OLTP functionality, consider using Cloud SQL federated queries, which enable BigQuery to query data that resides in Cloud SQL.

For best practices to optimize query performance, see Introduction to optimizing query performance.

What's next

  • For DML syntax information and samples, see DML syntax.
  • For information about using DML statements in scheduled queries, see Scheduling queries.