Scripts and stored procedures

A script is a collection of SQL statements that you can execute in one request. Scripts can use variables and control-flow statements, and can have side effects. You can use scripts to:

  • Run multiple queries in a sequence, with shared state.
  • Automate management tasks such as creating or dropping tables.
  • Implement more complex logic using programming constructs such as IF and WHILE.

A procedure is a persistent script that you can invoke from inside a SQL statement. A procedure can take input arguments and return values as output.

The following example shows a script that sets a variable, runs an INSERT statement, and displays the result as a formatted text string. You can run this script in the same way as a query, for example in the Google Cloud Console or using the bq command-line tool. You can also save the script as a query.

DECLARE id STRING;
SET id = GENERATE_UUID();

INSERT INTO mydataset.customers (customer_id)
   VALUES(id);

SELECT FORMAT("Created customer ID %s", id);

Here is the same script converted into a procedure:

CREATE OR REPLACE PROCEDURE mydataset.create_customer()
BEGIN
  DECLARE id STRING;
  SET id = GENERATE_UUID();
  INSERT INTO mydataset.customers (customer_id)
    VALUES(id);
  SELECT FORMAT("Created customer %s", id);
END

In the preceding example, the name of the procedure is mydataset.create_customer, and the body of procedure appears between BEGIN and END statements.

To call the procedure, use the CALL statement:

CALL mydataset.create_customer();

Writing a script

A script consists of one or more SQL statements separated by semicolons. Any valid SQL statement can be used in a script. Scripts can also include scripting statements, which let you use variables or implement control flow with your SQL statements. The following example declares a variable and uses the variable inside an IF statement:

DECLARE day INT64;
SET day = (SELECT EXTRACT(DAYOFWEEK from CURRENT_DATE));
if day = 1 or day = 7 THEN
  SELECT 'Weekend';
ELSE
  SELECT 'Weekday';
END IF

Scripts are executed in BigQuery using jobs.insert, similar to any other query, with the multi-statement script specified as the query text. When a script runs, additional jobs, known as child jobs, are created for each statement in the script. You can enumerate the child jobs of a script by calling jobs.list, passing in the script’s job ID as the parentJobId parameter.

The jobs.getQueryResults method returns the query results for the last SELECT, DML, or DDL statement to execute in the script, with no query results if none of the above statements have executed. To get the results of all statements in the script, enumerate the child jobs and call jobs.getQueryResults on each of them.

BigQuery interprets any request with multiple statements as a script, unless the statements consist entirely of CREATE TEMP FUNCTION statements followed by a single query statement. For example, the following is not considered a script:

CREATE TEMP FUNCTION Add(x INT64, y INT64) AS (x + y);
SELECT Add(3, 4);

Variables

Variables must be declared either at the start of the script or at the start of a BEGIN block. Variables declared at the start of the script are in scope for the entire script. Variables declared inside a BEGIN block have scope for the block. They go out of scope after the corresponding END statement. The maximum size of a variable is 1 MB, and the maximum size of all variables used in a script is 10 MB.

DECLARE x INT64;

BEGIN
DECLARE y INT64;
-- Here you can reference x and y
END;

-- Here you can reference x, but not y

If a variable and column share the same name, the column takes precedence.

This returns column x + column x:

DECLARE x INT64 DEFAULT 0;
SET x = 10;

WITH Numbers AS (SELECT 50 AS x)
SELECT (x+x) AS result FROM Numbers;
+--------+
| result |
+--------+
| 100    |
+--------+

This returns column y + variable x:

DECLARE x INT64 DEFAULT 0;
SET x = 10;

WITH Numbers AS (SELECT 50 AS y)
SELECT (y+x) AS result FROM Numbers;
+--------+
| result |
+--------+
| 60     |
+--------+

Temporary tables

Temporary tables let you save intermediate results to a table. These temporary tables exist at the session level, so you don't need to save or maintain them in a dataset. They are automatically deleted some time after the script completes.

The following example creates a temporary table to store the results of a query and uses the temporary table in a subquery:

-- Find the top 100 names from the year 2017.
CREATE TEMP TABLE top_names(name STRING)
AS
 SELECT name
 FROM `bigquery-public-data`.usa_names.usa_1910_current
 WHERE year = 2017
 ORDER BY number DESC LIMIT 100
;
-- Which names appear as words in Shakespeare's plays?
SELECT
 name AS shakespeare_name
FROM top_names
WHERE name IN (
 SELECT word
 FROM `bigquery-public-data`.samples.shakespeare
);

Other than the use of TEMP or TEMPORARY, the syntax is identical to the CREATE TABLE syntax.

When you create a temporary table, don't use a project or dataset qualifier in the table name. The table is automatically created in a special dataset.

You can refer to a temporary table by name for the duration of the current script. This includes temporary tables created by a procedure within the script. You cannot share temporary tables, and they are not visible using any of the standard list or other table manipulation methods.

After a script finishes, the temporary table exists for up to 24 hours. It is not saved using the name you gave it, however, but is assigned a random name instead. To view the table structure and data, go to the BigQuery console, click Query history, and choose the query that created the temporary table. Then, in the Destination table row, click Temporary table. You are not charged for storing temporary tables.

You can delete a temporary table explicitly before the script completes by using theDROP TABLE statement:

CREATE TEMP TABLE table1(x INT64);
SELECT * FROM table1;  -- Succeeds
DROP TABLE table1;
SELECT * FROM table1;  -- Results in an error

When temporary tables are used together with a default dataset, unqualified table names refer to a temporary table if one exists, or a table in the default dataset. The exception is for CREATE TABLE statements, where the target table is considered a temporary table if and only if the TEMP or TEMPORARY keyword is present.

For example, consider the following script:

-- Create table t1 in the default dataset
CREATE TABLE t1 (x INT64);

-- Create temporary table t1.
CREATE TEMP TABLE t1 (x INT64);

-- This statement selects from the temporary table.
SELECT * FROM t1;

-- Drop the temporary table
DROP TABLE t1;

-- Now that the temporary table is dropped, this statement selects from the
-- table in the default dataset.
SELECT * FROM t1;

You can explicitly indicate that you are referring to a temporary table by qualifying the table name with _SESSION:

-- Create a temp table
CREATE TEMP TABLE t1 (x INT64);

-- Create a temp table using the `_SESSION` qualifier
CREATE TEMP TABLE _SESSION.t2 (x INT64);

-- Select from a temporary table using the `_SESSION` qualifier
SELECT * FROM _SESSION.t1;

If you use the _SESSION qualifier for a query of a temporary table that does not exist, the script throws an error indicating the table does not exist. For example, if there is no temporary table named t3, the script throws an error even if a table named t3 exists in the default dataset.

You cannot use _SESSION to create a non-temporary table:

CREATE TABLE _SESSION.t4 (x INT64);  -- Fails

Debugging a script

Here are some tips for debugging scripts and stored procedures:

  • Use the ASSERT statement to assert that a Boolean condition is true.

  • Use BEGIN...EXCEPTION to catch errors and display the error message and stack trace.

  • Use SELECT FORMAT("....") to show intermediate results.

  • When you run a script in the Google Cloud Console, you can view the output of each statement in the script. The bq command-line tool's 'bq query` command also shows the results of each step when you run a script.

  • In the Google Cloud Console, you can select an individual statement inside the query editor and run it.

Writing a procedure

To create a procedure, use the CREATE PROCEDURE statement. The body of the procedure appears between BEGIN and END statements:

CREATE PROCEDURE dataset_name.procedure_name
BEGIN
-- statements here
END

Parameters

A procedure can take a list of named parameters. Each parameter has a data type.

CREATE OR REPLACE PROCEDURE mydataset.create_customer(name STRING)
BEGIN
DECLARE id STRING;
SET id = GENERATE_UUID();
INSERT INTO mydataset.customers (customer_id, name)
  VALUES(id, name);
SELECT FORMAT("Created customer %s (%s)", id, name);
END

A procedure can have output parameters. An output parameter returns a value from the procedure, but does not allow input for the procedure. To create an output parameter, use the OUT keyword before the name of the parameter.

For example, this version of the procedure returns the new customer ID through the id parameter:

CREATE OR REPLACE PROCEDURE mydataset.create_customer(name STRING, OUT id STRING)
BEGIN
SET id = GENERATE_UUID();
INSERT INTO mydataset.customers (customer_id, name)
  VALUES(id, name);
SELECT FORMAT("Created customer %s (%s)", id, name);
END

To call this procedure, you must use a variable to receive the output value:

--- Create a new customer record.
DECLARE id STRING;
CALL mydataset.create_customer("alice",id);

--- Display the record.
SELECT * FROM temp.customers
WHERE customer_id = id;

A procedure can also have input/output parameters. An input/output parameter returns a value from the procedure and also accepts input for the procedure. To create an input/output parameter, use the INOUT keyword before the name of the parameter. For more information, see Argument mode.

Pricing

If you use on-demand billing, BigQuery charges for scripting based on the number of bytes processed during execution of the script.

For more information, see Query size calculation.