Announcing the GA of BigQuery multi-statement transactions
BigQuery Product Manager
BigQuery Software Engineer
Try Google Cloud
Start building on Google Cloud with $300 in free credits and 20+ always free products.Free trial
Transactions are mission critical for modern enterprises supporting payments, logistics, and a multitude of business operations. And in today’s modern analytics-first and data-driven era, the need for the reliable processing of complex transactions extends beyond just the traditional OLTP database; today businesses also have to trust that their analytics environments are processing transactional data in an atomic, consistent, isolated, and durable (ACID) manner. So BigQuery set out to support DML statements spanning large numbers of tables in a single transaction and commit the associated changes atomically (all at once) if successful or rollback atomically upon failure. And today, we’d like to highlight the recent general availability launch of multi-statement transactions within BigQuery and the new business capabilities it unlocks.
While in preview, BigQuery multi-statement transactions were tremendously effective for customer use cases, such as keeping BigQuery synchronized with data stored in OLTP environments, the complex post processing of events pre-ingested into BigQuery, complying with GDPR’s right to be forgotten, etc. One of our customers, PLAID, leverages these multi-statement transactions within their customer experience platform KARTE to analyze the behavior and emotions of website visitors and application users, enabling businesses to deliver relevant communications in real time and further PLAID’s mission to Maximize the Value of People with the Power of Data.
“We see multi-statement transactions as a valuable feature for achieving expressive and fast analytics capabilities. For developers, it keeps queries simple and less hassle in error handling, and for users, it always gives reliable results.”—Takuya Ogawa, Lead Product Engineer
The general availability of multi-statement transactions not only provides customers with a production ready means of handling their business critical transactions in a comprehensive manner within a single transaction, but now also provides customers with far greater scalability compared to what was offered during the preview. At GA, multi-statement transactions increase support for mutating up to 100,000 table partitions and modifying up to 100 tables per transaction. This 10x scale in the number of table partitions and 2x scale in the number of tables was made possible by a careful re-design of our transaction commit protocol which optimizes the size of the transactionally committed metadata.
The GA of multi-statement transactions also introduces full compatibility with BigQuery sessions and procedural language scripting. Sessions are useful because they store state and enable the use of temporary tables and variables, which then can be run across multiple queries when combined with multi-statement transactions. Procedural language scripting provides users the ability to run multiple statements in a sequence with shared state and with complex logic using programming constructs such as IF … THEN and WHILE loops.
For instance, let’s say we wanted to enhance the current multi-statement transaction example, which uses transactions to atomically manage the existing inventory and supply of new arrivals of a retail company. Since we’re a retailer monitoring our current inventory on hand, we would now also like to add functionality to automatically suggest to our Sales team which items we should promote with sales offers when our inventory becomes too large. To do this, it would be useful to include a simple procedural IF statement, which monitors the current inventory and supply of new arrivals and modifies a new PromotionalSales table based on total inventory levels. And let’s validate the results ourselves before committing them as one single transaction to our sales team by using sessions. Let’s see how we’d do this via SQL.
First, we’ll create our tables using DDL statements:
Then, we’ll insert some values into our Inventory and NewArrivals tables:
Now, we’ll use a multi-statement transaction and procedural language scripting to atomically merge our NewArrivals table with the Inventory table while taking excess inventory into account to build out our PromotionalSales table. We’ll also create this within a session, which will allow us to validate the tables ourselves before committing the statement to everyone else.
From the results of the SELECT statements, we can see the warehouse #1 arrivals were successfully added to our inventory and the PromotionalSales table correctly reflects what excess inventory we have. Looks like these transactions are ready to be committed.
However, just in case there were some issues with our expected results, if others were to query the tables outside the session we created, the changes wouldn’t have taken effect. Thus, we have the ability to validate our results and could roll them back if needed without impacting others.
Going back to our configured session, since we’ve validated our Inventory, NewArrivals, and PromotionalSales tables are correct, we can go ahead and commit the multi-statement transaction within the session, which will propagate the changes outside the session too.
And now that the PromotionalSales table has been updated for all users, our sales team has some ideas of what products they should promote due to our excess inventory.
As you can tell, using multi-statement transactions is simple, scalable, and quite powerful, especially combined with other BigQuery Features. Give them a try yourself and see what’s possible.