Multi-statement transactions

BigQuery supports multi-statement transactions inside scripts. A multi-statement transaction lets you perform mutating operations, such as inserting or deleting rows, on one or more tables, and either commit or roll back the changes atomically.

Uses for multi-statement transactions include:

  • Performing DML mutations on multiple tables as a single transaction. The tables can span multiple datasets or projects.
  • Performing mutations on a single table in several stages, based on intermediate computations.

Transactions guarantee ACID properties and support snapshot isolation. During a transaction, all reads return a consistent snapshot of the tables referenced in the transaction. If a statement in a transaction modifies a table, the changes are visible to subsequent statements within the same transaction.

Transaction scope

A transaction must be contained in a single SQL script. A script can contain multiple transactions, but they cannot be nested.

To start a transaction, use the BEGIN TRANSACTION statement. The transaction ends when any of the following occur:

  • The script executes a COMMIT TRANSACTION statement. This statement atomically commits all changes made inside the transaction.
  • The script executes a ROLLBACK TRANSACTION statement. This statement abandons all changes made inside the transaction.
  • The script ends before reaching either of these two statements. In that case, BigQuery automatically rolls back the transaction.

If an error occurs during a transaction and the script has an exception handler, then BigQuery transfers control to the exception handler. Inside the exception block, can choose whether to commit or roll back the transaction.

If an error occurs during a transaction and there is no exception handler, then the script fails and BigQuery automatically rolls back the transaction.

The following example shows an exception handler that rolls back a transaction:

BEGIN

  BEGIN TRANSACTION;
  INSERT INTO mydataset.NewArrivals
    VALUES ('top load washer', 100, 'warehouse #1');
  -- Trigger an error.
  SELECT 1/0;
  COMMIT TRANSACTION;

EXCEPTION WHEN ERROR THEN
  -- Roll back the transaction inside the exception handler.
  SELECT @@error.message;
  ROLLBACK TRANSACTION;
END;

Statements supported in transactions

The following statement types are supported in transactions:

  • Query statements: SELECT
  • DML statements: INSERT, UPDATE, DELETE, and MERGE
  • DDL statements on temporary entities:

    • CREATE TEMP TABLE
    • CREATE TEMP FUNCTION
    • DROP TABLE on a temporary table
    • DROP FUNCTION on a temporary function

DDL statements that create or drop permanent entities, such as datasets, tables, and functions, are not supported inside transactions.

Date/time functions in transactions

Within a transaction, the following date/time functions have special behaviors:

Example of a transaction

This example assumes there are two tables named Inventory and NewArrivals, created as follows:

CREATE OR REPLACE TABLE mydataset.Inventory
(
 product string,
 quantity int64,
 supply_constrained bool
);

CREATE OR REPLACE TABLE mydataset.NewArrivals
(
 product string,
 quantity int64,
 warehouse string
);

INSERT mydataset.Inventory (product, quantity)
VALUES('top load washer', 10),
     ('front load washer', 20),
     ('dryer', 30),
     ('refrigerator', 10),
     ('microwave', 20),
     ('dishwasher', 30);

INSERT mydataset.NewArrivals (product, quantity, warehouse)
VALUES('top load washer', 100, 'warehouse #1'),
     ('dryer', 200, 'warehouse #2'),
     ('oven', 300, 'warehouse #1');

The Inventory table contains information about current inventory, and NewArrivals contains information about newly arrived items.

The following transaction updates Inventory with new arrivals and deletes the corresponding records from NewArrivals. Assuming that all statements complete successfully, the changes in both tables are committed atomically as a single transaction.

BEGIN TRANSACTION;

-- Create a temporary table that holds new arrivals from 'warehouse #1'.
CREATE TEMP TABLE tmp
  AS SELECT * FROM mydataset.NewArrivals WHERE warehouse = 'warehouse #1';

-- Delete the matching records from the NewArravals table.
DELETE mydataset.NewArrivals WHERE warehouse = 'warehouse #1';

-- Merge the records from the temporary table into the Inventory table.
MERGE mydataset.Inventory AS I
USING tmp AS T
ON I.product = T.product
WHEN NOT MATCHED THEN
 INSERT(product, quantity, supply_constrained)
 VALUES(product, quantity, false)
WHEN MATCHED THEN
 UPDATE SET quantity = I.quantity + T.quantity;

-- Drop the temporary table and commit the transaction.
DROP TABLE tmp;

COMMIT TRANSACTION;

Transaction concurrency

If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are cancelled. Conflicting DML statements that run outside of a transaction are queued to run later, subject to queuing limits.

Operations that read or append new rows can run concurrently with the transaction. For example, any of the following operations can be performed concurrently on a table while a transaction mutates data in the same table:

  • SELECT statements
  • BigQuery Storage Read API read operations
  • Queries from BigQuery BI Engine
  • INSERT statements
  • Load jobs that use WRITE_APPEND disposition to append rows
  • Streaming writes

If a transaction only reads a table or appends new rows to it, any operation can be performed concurrently on that table.

Viewing transaction information

BigQuery assigns a transaction ID to each multi-statement transaction. The transaction ID is attached to each query that executes inside the transaction. To view the transaction IDs for your jobs, query the INFORMATION_SCHEMA.JOBS_BY_* views for the transaction_id column.

When a script runs, BigQuery creates a child job for each statement in the script. For a given transaction, every child job that is associated with that transaction has the same transaction_id value.

The following examples show how to find information about your transactions.

Find all committed or rolled back transactions

The following query returns all transactions that were successfully committed.

SELECT transaction_id, parent_job_id, query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE statement_type = "COMMIT_TRANSACTION" AND error_result IS NULL

The following query returns all transactions that were successfully rolled back.

SELECT
  transaction_id, parent_job_id, query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE statement_type = "ROLLBACK_TRANSACTION" AND error_result IS NULL

Find the start and end time of a transaction

The following query returns the starting and ending times for a specified transaction ID.

SELECT transaction_id, start_time, end_time, statement_type
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
WHERE transaction_id = "TRANSACTION_ID"
AND statement_type IN
  ("BEGIN_TRANSACTION", "COMMIT_TRANSACTION", "ROLLBACK_TRANSACTION")
ORDER BY start_time;

Find the transaction in which a job is running

The following query gets the transaction associated with a specified job ID. It returns NULL if the job is not running within a multi-statement transaction.

SELECT transaction_id
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE job_id = 'JOB_ID';

Find the current job running within a transaction

The following query returns information about the job that is currently running within a specified transaction, if any.

SELECT job_id, query, start_time, total_slot_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE transaction_id = 'TRANSACTION_ID' AND state = RUNNING;

Find the active transactions that affect a table

The following query returns the active transactions that affect a specified table. For each active transaction, it also returns the ID of the parent job — that is, the job for the script.

WITH running_transactions AS (
  SELECT DISTINCT transaction_id
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    EXCEPT DISTINCT
    select transaction_id FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE statement_type = "COMMIT_TRANSACTION" OR
    statement_type = "ROLLBACK_TRANSACTION"
)

SELECT
  transaction_id, parent_job_id, query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT AS jobs, running_transactions
WHERE destination_table = ("PROJECT_NAME", "DATASET_NAME", "TABLE_NAME")
  AND jobs.transaction_id = running_transactions.transaction_id

Find the active transactions running in a script

The following query returns the active transactions for a particular job, specified by the ID of the job that is running the script.

SELECT DISTINCT transaction_id
  FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE parent_job_id = "JOB_ID"
  EXCEPT DISTINCT
  select transaction_id FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE parent_job_id = "JOB_ID" AND
    (statement_type = "COMMIT_TRANSACTION" OR statement_type = "ROLLBACK_TRANSACTION")

Limitations

  • A transaction cannot span multiple scripts.
  • Transactions cannot use DDL statements that affect permanent entities.
  • Within a transaction, materialized views are interpreted as logical views. You can still query a materialized view inside a transaction, but it doesn't result in any performance improvement or cost reduction compared with the equivalent logical view.