User-defined aggregate functions

For support during the preview, email bigquery-sql-preview-support@google.com.

This document describes how to create, call, and delete user-defined aggregate functions (UDAFs) in BigQuery.

A UDAF lets you create an aggregate function by using an expression that contains code. A UDAF accepts columns of input, performs a calculation on a group of rows at a time, and then returns the result of that calculation as a single value.

Create a SQL UDAF

This section describes the various ways that you can create a persistent or temporary SQL UDAF in BigQuery.

Create a persistent SQL UDAF

You can create a SQL UDAF that is persistent, meaning that you can reuse the UDAF across multiple queries. Persistent UDAFs are safe to call when they are shared between owners. UDAFs can't mutate data, talk to external systems, or send logs to Google Cloud Observability or similar applications.

To create a persistent UDAF, use the CREATE AGGREGATE FUNCTION statement without the TEMP or TEMPORARY keyword. You must include the dataset in the function path.

For example, the following query creates a persistent UDAF that's called ScaledAverage:

CREATE AGGREGATE FUNCTION myproject.mydataset.ScaledAverage(
  dividend FLOAT64,
  divisor FLOAT64)
RETURNS FLOAT64
AS (
  AVG(dividend / divisor)
);

Create a temporary SQL UDAF

You can create a SQL UDAF that is temporary, meaning that the UDAF only exists in the scope of a single query, script, session, or procedure.

To create a temporary UDAF, use the CREATE AGGREGATE FUNCTION statement with the TEMP or TEMPORARY keyword.

For example, the following query creates a temporary UDAF that's called ScaledAverage:

CREATE TEMP AGGREGATE FUNCTION ScaledAverage(
  dividend FLOAT64,
  divisor FLOAT64)
RETURNS FLOAT64
AS (
  AVG(dividend / divisor)
);

Use aggregate and non-aggregate parameters

You can create a SQL UDAF that has both aggregate and non-aggregate parameters.

UDAFs normally aggregate function parameters across all rows in a group. However, you can specify a function parameter as non-aggregate with the NOT AGGREGATE keyword.

A non-aggregate function parameter is a scalar function parameter with a constant value for all rows in a group. A valid non-aggregate function parameter must be a literal. Inside the UDAF definition, aggregate function parameters can only appear as function arguments to aggregate function calls. References to non-aggregate function parameters can appear anywhere in the UDAF definition.

For example, the following function contains an aggregate parameter that's called dividend, and a non-aggregate parameter called divisor:

-- Create the function.
CREATE TEMP AGGREGATE FUNCTION ScaledSum(
  dividend FLOAT64,
  divisor FLOAT64 NOT AGGREGATE)
RETURNS FLOAT64
AS (
  SUM(dividend) / divisor
);

Use the default project in the function body

In the body of a SQL UDAF, any references to BigQuery entities, such as tables or views, must include the project ID unless the entity resides in the same project that contains the UDAF.

For example, consider the following statement:

CREATE AGGREGATE FUNCTION project1.dataset_a.ScaledAverage(
  dividend FLOAT64,
  divisor FLOAT64)
RETURNS FLOAT64
AS (
  ( SELECT AVG(dividend / divisor) FROM dataset_a.my_table )
);

If you run the preceding statement in the project1 project, the statement succeeds because my_table exists in project1. However, if you run the preceding statement from a different project, the statement fails. To correct the error, include the project ID in the table reference:

CREATE AGGREGATE FUNCTION project1.dataset_a.ScaledAverage(
  dividend FLOAT64,
  divisor FLOAT64)
RETURNS FLOAT64
AS (
  ( SELECT AVG(dividend / divisor) FROM project1.dataset_a.my_table )
);

You can also reference an entity in a different project or dataset from the one where you create the function:

CREATE AGGREGATE FUNCTION project1.dataset_a.ScaledAverage(
  dividend FLOAT64,
  divisor FLOAT64)
RETURNS FLOAT64
AS (
  ( SELECT AVG(dividend / divisor) FROM project2.dataset_c.my_table )
);

Call a UDAF

This section describes the various ways that you can call a persistent or temporary UDAF after you create it in BigQuery.

Call a persistent UDAF

You can call a persistent UDAF in the same way that you call a built-in aggregate function. For more information, see Aggregate function calls. You must include the dataset in the function path.

In the following example, the query calls a persistent UDAF that's called WeightedAverage:

SELECT my_project.my_dataset.WeightedAverage(item, weight, 2) AS weighted_average
FROM (
  SELECT 1 AS item, 2.45 AS weight UNION ALL
  SELECT 3 AS item, 0.11 AS weight UNION ALL
  SELECT 5 AS item, 7.02 AS weight
);

A table with the following results is produced:

/*------------------*
 | weighted_average |
 +------------------+
 | 4.5              |
 *------------------*/

Call a temporary UDAF

You can call a temporary UDAF in the same way that you call a built-in aggregate function. For more information, see Aggregate function calls.

The temporary function must be included in a multi-statement query or procedure that contains the UDAF function call.

In the following example, the query calls a temporary UDAF that's called WeightedAverage:

CREATE TEMP AGGREGATE FUNCTION WeightedAverage(...)

-- Temporary UDAF function call
SELECT WeightedAverage(item, weight, 2) AS weighted_average
FROM (
  SELECT 1 AS item, 2.45 AS weight UNION ALL
  SELECT 3 AS item, 0.11 AS weight UNION ALL
  SELECT 5 AS item, 7.02 AS weight
);

A table with the following results is produced:

/*------------------*
 | weighted_average |
 +------------------+
 | 4.5              |
 *------------------*/

Delete a UDAF

This section describes the various ways that you can delete a persistent or temporary UDAF after you created it in BigQuery.

Delete a persistent UDAF

To delete a persistent UDAF, use the DROP FUNCTION statement. You must include the dataset in the function path.

In the following example, the query deletes a persistent UDAF that's called WeightedAverage:

DROP FUNCTION IF EXISTS my_project.my_dataset.WeightedAverage;

Delete a temporary UDAF

To delete a temporary UDAF, use the DROP FUNCTION statement.

In the following example, the query deletes a temporary UDAF that's called WeightedAverage:

DROP FUNCTION IF EXISTS WeightedAverage;

A temporary UDAF expires as soon as the query finishes. The UDAF doesn't need to be deleted unless you want to remove it early from a multi-statement query or procedure.

List UDAFs

UDAFs are a type of routine. To list all of the routines in a dataset, see List routines.

Limitations

  • UDAFs have the same limitations that apply to UDFs. For details, see UDF limitations.

  • Only literals, query parameters, and script variables can be passed in as non-aggregate arguments for a UDAF.

Pricing

UDAFs are billed using the standard BigQuery pricing model.

Quotas and limits

UDAFs have the same quotas and limits that apply to UDFs. For information about UDF quotas, see Quotas and limits.