BigQuery supports user-defined functions (UDFs). A UDF lets you create a function by using another SQL expression or JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value. For information on user-defined functions in legacy SQL, see User-defined functions in legacy SQL.
UDFs can either be persistent or temporary. You can reuse persistent UDFs across multiple queries and temporary UDFs in a single query.
UDF syntax
To create a persistent UDF, use the following syntax:
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [[project_name.]dataset_name.]function_name ([named_parameter[, ...]]) [RETURNS data_type] { sql_function_definition | javascript_function_definition }
To create a temporary UDF, use the following syntax:
CREATE [OR REPLACE] {TEMPORARY | TEMP} FUNCTION [IF NOT EXISTS] function_name ([named_parameter[, ...]]) [RETURNS data_type] { sql_function_definition | javascript_function_definition }
named_parameter: param_name param_type sql_function_definition: AS (sql_expression) javascript_function_definition: [determinism_specifier] LANGUAGE js [OPTIONS (library = library_array)] AS javascript_code determinism_specifier: { DETERMINISTIC | NOT DETERMINISTIC }
This syntax consists of the following components:
CREATE { FUNCTION | OR REPLACE FUNCTION | FUNCTION IF NOT EXISTS }. Creates or updates a function. To replace any existing function with the same name, use the
OR REPLACE
keyword. To treat the query as successful and take no action if a function with the same name already exists, use theIF NOT EXISTS
clause.project_name is the name of the project where you are creating the function. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks
`
(example:`google.com:my_project`
).dataset_name is the name of the dataset where you are creating the function. Defaults to the
defaultDataset
in the request.named_parameter. Consists of a comma-separated
param_name
andparam_type
pair. The value ofparam_type
is a BigQuery data type. For a SQL UDF, the value ofparam_type
can also beANY TYPE
.determinism_specifier. Applies only to JavaScript user-defined functions. Provides a hint to BigQuery as to whether the query result can be cached. Can be one of the following values:
DETERMINISTIC
: The function always returns the same result when passed the same arguments. The query result is potentially cacheable. For example, if the functionadd_one(i)
always returnsi + 1
, the function is deterministic.NOT DETERMINISTIC
: The function does not always return the same result when passed the same arguments, and therefore is not cacheable. For example, ifadd_random(i)
returnsi + rand()
, the function is not deterministic and BigQuery will not use cached results.If all of the invoked functions are DETERMINISTIC, BigQuery will try to cache the result, unless the results can't be cached for other reasons. For more information, see Using cached query results.
[RETURNS data_type]. Specifies the data type that the function returns.
- If the function is defined in SQL, then the
RETURNS
clause is optional. If theRETURNS
clause is omitted, then BigQuery infers the result type of the function from the SQL function body when a query calls the function. - If the function is defined in JavaScript, then the
RETURNS
clause is required. For more information about allowed values fordata_type
, see Supported JavaScript UDF data types.
- If the function is defined in SQL, then the
AS (sql_expression). Specifies the SQL expression that defines the function.
[OPTIONS (library = library_array)]. For a JavaScript UDF, specifies an array of JavaScript libraries to include in the function definition.
AS javascript_code. Specifies the definition of a JavaScript function.
javascript_code
is a string literal.- If the code includes quotes and backslashes, it must be either escaped
or represented as a raw string. For example, the code
return "\n";
can be represented as one of the following:- Quoted string
"return \"\\n\";"
. Both quotes and backslashes need to be escaped. - Triple quoted string:
"""return "\\n";"""
. Backslashes need to be escaped while quotes do not. - Raw string:
r"""return "\n";"""
. No escaping is needed.
- Quoted string
- If the code includes quotes and backslashes, it must be either escaped
or represented as a raw string. For example, the code
To delete a persistent user-defined function, use the following syntax:
DROP FUNCTION [IF EXISTS] [[project_name.]dataset_name.]function_name
Temporary UDFs expire as soon as the query finishes, so DROP FUNCTION
statements are only supported for temporary UDFs in
scripts and
procedures.
SQL UDFs
Create SQL UDFs using the following syntax:
CREATE [OR REPLACE] [TEMPORARY | TEMP] FUNCTION [IF NOT EXISTS] [[`project_name`.]dataset_name.]function_name ([named_parameter[, ...]]) [RETURNS data_type] AS (sql_expression) named_parameter: param_name param_type
Templated SQL UDF parameters
A templated parameter with param_type
= ANY TYPE
can match more than one argument type when the function is called.
- If more than one parameter has type
ANY TYPE
, BigQuery does not enforce any type relationship between these arguments. - The function return type cannot be
ANY TYPE
. It must be either omitted, which means to be automatically determined based onsql_expression
, or an explicit type. - Passing the function arguments of types that are incompatible with the function definition results in an error at call time.
SQL UDF examples
The following example shows a UDF that employs a SQL function.
CREATE TEMP FUNCTION addFourAndDivide(x INT64, y INT64) AS ((x + 4) / y);
WITH numbers AS
(SELECT 1 as val
UNION ALL
SELECT 3 as val
UNION ALL
SELECT 4 as val
UNION ALL
SELECT 5 as val)
SELECT val, addFourAndDivide(val, 2) AS result
FROM numbers;
+-----+--------+
| val | result |
+-----+--------+
| 1 | 2.5 |
| 3 | 3.5 |
| 4 | 4 |
| 5 | 4.5 |
+-----+--------+
The following example shows a SQL UDF that uses a templated parameter. The resulting function accepts arguments of various types.
CREATE TEMP FUNCTION addFourAndDivideAny(x ANY TYPE, y ANY TYPE) AS (
(x + 4) / y
);
SELECT addFourAndDivideAny(3, 4) AS integer_output,
addFourAndDivideAny(1.59, 3.14) AS floating_point_output;
+----------------+-----------------------+
| integer_output | floating_point_output |
+----------------+-----------------------+
| 1.75 | 1.7802547770700636 |
+----------------+-----------------------+
The following example shows a SQL UDF that uses a templated parameter to return the last element of an array of any type.
CREATE TEMP FUNCTION lastArrayElement(arr ANY TYPE) AS (
arr[ORDINAL(ARRAY_LENGTH(arr))]
);
SELECT
names[OFFSET(0)] AS first_name,
lastArrayElement(names) AS last_name
FROM (
SELECT ['Fred', 'McFeely', 'Rogers'] AS names UNION ALL
SELECT ['Marie', 'Skłodowska', 'Curie']
);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Fred | Rogers |
| Marie | Curie |
+------------+-----------+
The following example shows a SQL UDF that uses a scalar subquery to count the number of users with a given age in a user table.
CREATE TEMP TABLE users
AS SELECT 1 id, 10 age
UNION ALL SELECT 2, 30
UNION ALL SELECT 3, 10;
CREATE TEMP FUNCTION countUserByAge(userAge INT64)
AS ((SELECT COUNT(1) FROM users WHERE age = userAge));
SELECT countUserByAge(10) AS count_user_age_10,
countUserByAge(20) AS count_user_age_20,
countUserByAge(30) AS count_user_age_30;
+-------------------+-------------------+-------------------+
| count_user_age_10 | count_user_age_20 | count_user_age_30 |
+-------------------+-------------------+-------------------+
| 2 | 0 | 1 |
+-------------------+-------------------+-------------------+
JavaScript UDFs
Create JavaScript UDFs using the following structure.
CREATE [OR REPLACE] [TEMPORARY | TEMP] FUNCTION [IF NOT EXISTS] [[`project_name`.]dataset_name.]function_name ([named_parameter[, ...]]) RETURNS data_type [DETERMINISTIC | NOT DETERMINISTIC] LANGUAGE js [OPTIONS (library = library_array)] AS javascript_code
Supported JavaScript UDF data types
Some SQL types have a direct mapping to JavaScript types, but others do not. BigQuery represents types in the following manner:
BigQuery data type | JavaScript data type |
---|---|
ARRAY | ARRAY |
BOOL | BOOLEAN |
BYTES | base64-encoded STRING |
FLOAT64 | NUMBER |
NUMERIC, BIGNUMERIC | If a NUMERIC or BIGNUMERIC value can be represented exactly as an IEEE 754 floating-point value and has no fractional part, it is encoded as a Number. These values are in the range [-253, 253]. Otherwise, it is encoded as a String. |
STRING | STRING |
STRUCT | OBJECT where each STRUCT field is a named field |
TIMESTAMP | DATE with a microsecond field containing the microsecond
fraction of the timestamp |
DATE | DATE |
Because JavaScript does not support a 64-bit integer type,
INT64
is unsupported as an input type for JavaScript
UDFs. Instead, use FLOAT64
to represent integer
values as a number, or STRING
to represent integer
values as a string.
BigQuery does support INT64
as a return type
in JavaScript UDFs. In this case, the JavaScript function body can return either
a JavaScript Number or a String. BigQuery then converts either of
these types to INT64
.
If the return value of the JavaScript UDF is a
Promise
, BigQuery waits for the Promise
until it is settled. If the Promise
settles into a
fulfilled state, BigQuery returns its result. If the Promise
settles into a rejected state, BigQuery returns an error.
Quoting rules
You must enclose JavaScript code in quotes. For simple, one line code snippets, you can use a standard quoted string:
CREATE TEMP FUNCTION plusOne(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS "return x+1;";
SELECT val, plusOne(val) AS result
FROM UNNEST([1, 2, 3, 4, 5]) AS val;
+-----------+-----------+
| val | result |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 | 6 |
+-----------+-----------+
In cases where the snippet contains quotes, or consists of multiple lines, use triple-quoted blocks:
CREATE TEMP FUNCTION customGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS r"""
var d = new Date();
if (d.getHours() < 12) {
return 'Good Morning, ' + a + '!';
} else {
return 'Good Evening, ' + a + '!';
}
""";
SELECT customGreeting(names) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS names;
+-----------------------+
| everyone |
+-----------------------+
| Good Morning, Hannah! |
| Good Morning, Max! |
| Good Morning, Jakob! |
+-----------------------+
Including JavaScript libraries
You can extend your JavaScript UDFs using the OPTIONS
section.
This section lets you specify external code libraries for the UDF.
CREATE TEMP FUNCTION myFunc(a FLOAT64, b STRING)
RETURNS STRING
LANGUAGE js
OPTIONS (
library=["gs://my-bucket/path/to/lib1.js", "gs://my-bucket/path/to/lib2.js"]
)
AS
r"""
// Assumes 'doInterestingStuff' is defined in one of the library files.
return doInterestingStuff(a, b);
""";
SELECT myFunc(3.14, 'foo');
In the preceding example, code in lib1.js
and lib2.js
is available to any code in the [external_code]
section of the UDF.
JavaScript UDF examples
CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS r"""
return x*y;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;
+-----+-----+--------------+
| x | y | product |
+-----+-----+--------------+
| 1 | 5 | 5 |
| 2 | 10 | 20 |
| 3 | 15 | 45 |
+-----+-----+--------------+
You can pass the result of a UDF as input to another UDF. For example:
CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS r"""
return x*y;
""";
CREATE TEMP FUNCTION divideByTwo(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS r"""
return x/2;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x,
y,
multiplyInputs(divideByTwo(x), divideByTwo(y)) as half_product
FROM numbers;
+-----+-----+--------------+
| x | y | half_product |
+-----+-----+--------------+
| 1 | 5 | 1.25 |
| 2 | 10 | 5 |
| 3 | 15 | 11.25 |
+-----+-----+--------------+
The following example sums the values of all
fields named foo
in the given JSON string.
CREATE TEMP FUNCTION SumFieldsNamedFoo(json_row STRING)
RETURNS FLOAT64
LANGUAGE js AS r"""
function SumFoo(obj) {
var sum = 0;
for (var field in obj) {
if (obj.hasOwnProperty(field) && obj[field] != null) {
if (typeof obj[field] == "object") {
sum += SumFoo(obj[field]);
} else if (field == "foo") {
sum += obj[field];
}
}
}
return sum;
}
var row = JSON.parse(json_row);
return SumFoo(row);
""";
WITH Input AS (
SELECT STRUCT(1 AS foo, 2 AS bar, STRUCT('foo' AS x, 3.14 AS foo) AS baz) AS s, 10 AS foo UNION ALL
SELECT NULL, 4 AS foo UNION ALL
SELECT STRUCT(NULL, 2 AS bar, STRUCT('fizz' AS x, 1.59 AS foo) AS baz) AS s, NULL AS foo
)
SELECT
TO_JSON_STRING(t) AS json_row,
SumFieldsNamedFoo(TO_JSON_STRING(t)) AS foo_sum
FROM Input AS t;
+---------------------------------------------------------------------+---------+
| json_row | foo_sum |
+---------------------------------------------------------------------+---------+
| {"s":{"foo":1,"bar":2,"baz":{"x":"foo","foo":3.14}},"foo":10} | 14.14 |
| {"s":null,"foo":4} | 4 |
| {"s":{"foo":null,"bar":2,"baz":{"x":"fizz","foo":1.59}},"foo":null} | 1.59 |
+---------------------------------------------------------------------+---------+
For more information on how BigQuery data types map to JavaScript types, see Supported JavaScript UDF data types
Best practices for JavaScript UDFs
Prefilter your input
If your input can be easily filtered down before being passed to a JavaScript UDF, your query will likely be faster and cheaper.
Avoid persistent mutable state
Do not store or access mutable state across JavaScript UDF calls.
Use memory efficiently
The JavaScript processing environment has limited memory available per query. JavaScript UDF queries that accumulate too much local state might fail due to memory exhaustion.
Running a query with a UDF
Using the Cloud Console
You can use the Cloud Console to run queries using one or more UDFs.
- Click Compose new query.
-
In the Query Editor pane, enter the UDF statement. For example:
CREATE TEMPORARY FUNCTION timesTwo(x FLOAT64) RETURNS FLOAT64 LANGUAGE js AS r""" return x*2; """;
-
Below the UDF statement, enter your SQL query. For example:
SELECT timesTwo(numbers) AS doubles FROM UNNEST([1, 2, 3, 4, 5]) AS numbers;
-
Click Run query. The query results display underneath the buttons.
Using the bq
command-line tool
You can use the bq
command-line tool from the
Cloud SDK to run a query containing one or more UDFs.
Use the following syntax to run a query with a UDF:
bq query <statement_with_udf_and_query>
Authorized UDFs
An authorized UDF is a UDF that is authorized to access a particular dataset. The UDF can query tables in the dataset, even if the user who calls the UDF does not have access to those tables.
Authorized UDFs let you share query results with particular users or groups without giving those users or groups access to the underlying tables. For example, an authorized UDF can compute an aggregation over data or look up a table value and use that value in a computation.
To authorize a UDF, you can use the Google Cloud Console, the REST API, or the
bq
command-line tool:
Console
Go to the BigQuery page in the Cloud Console.
In the navigation panel, in the Resources section, expand your project and select a dataset.
In the details panel, click Authorize Routines.
In the Authorized routines page, in the Authorize routine section, select the project ID, dataset ID, and routine ID for the UDF that you want to authorize.
Click Add authorization.
API
Call the
datasets.get
method to fetch the dataset that you want the UDF to access. The response body contains a representation of theDataset
resource.Add the following JSON object to the
access
array in theDataset
resource:{ "routine": { "datasetId": "DATASET_NAME", "projectId": "PROJECT_ID", "routineId": "ROUTINE_NAME" } }
Where:
- DATASET_NAME is the name of the dataset that contains the UDF.
- PROJECT_ID is the project ID of the project that contains the UDF.
- ROUTINE_NAME is the name of the UDF.
Call the
dataset.update
method with the modifiedDataset
representation.
bq
Use the
bq show
command to get the JSON representation of the dataset that you want the UDF to access. The output from the command is a JSON representation of theDataset
resource. Save the result to a local file.bq show --format=prettyjson TARGET_DATASET > dataset.json
Replace TARGET_DATASET with the name of the dataset that the UDF will have access to.
Edit the file to add the following JSON object to the
access
array in theDataset
resource:{ "routine": { "datasetId": "DATASET_NAME", "projectId": "PROJECT_ID", "routineId": "ROUTINE_NAME" } }
Where:
- DATASET_NAME is the name of the dataset that contains the UDF.
- PROJECT_ID is the project ID of the project that contains the UDF.
- ROUTINE_NAME is the name of the UDF.
Use the
bq update
command to update the dataset.bq update --source dataset.json TARGET_DATASET
Authorized UDF example
The following is an end-to-end example of creating and using an authorized UDF.
Create two datasets named
private_dataset
andpublic_dataset
. For more information about creating a dataset, see Creating a dataset.Run the following statement to create a table named
private_table
inprivate_dataset
:CREATE OR REPLACE TABLE private_dataset.private_table AS SELECT key FROM UNNEST(['key1', 'key1','key2','key3']) key;
Run the following statement to create a UDF named
count_key
inpublic_dataset
. The UDF includes aSELECT
statement onprivate_table
.CREATE OR REPLACE FUNCTION public_dataset.count_key(input_key STRING) RETURNS INT64 AS ((SELECT COUNT(1) FROM private_dataset.private_table t WHERE t.key = input_key));
Grant the
bigquery.dataViewer
role to a user on thepublic_dataset
dataset. This role includes thebigquery.routines.get
permission, which lets the user call the function. For information about how to assign access controls to a dataset, see Controlling access to datasets.At this point, the user has permission to call the
count_key
function but cannot access the table inprivate_dataset
. If the user tries to call the function, they get an error message similar to the following:Access Denied: Table myproject:private_dataset.private_table: User does not have permission to query table myproject:private_dataset.private_table.
Using the
bq
command-line tool, run theshow
command as follows:bq show --format=prettyjson private_dataset > dataset.json
The output is saved to a local file named
dataset.json
.Edit
dataset.json
to add the following JSON object to theaccess
array:{ "routine": { "datasetId": "public_dataset", "projectId": "PROJECT_ID", "routineId": "count_key" } }
Replace PROJECT_ID with the project ID for
public_dataset
.Using the
bq
command-line tool, run theupdate
command as follows:bq update --source dataset.json private_dataset
To verify that the UDF has access to
private_dataset
, the user can run the following query:SELECT public_dataset.count_key('key1');
Adding descriptions to UDFs
To add a description to a UDF, follow these steps:
Console
Go to the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and dataset, then select the function.
In the Details pane, click the pencil icon next to Description to edit the description text.
In the dialog, enter a description in the box or edit the existing description. Click Update to save the new description text.
Alternatively, you can use a Standard SQL query to update the description
using the description
parameter of the OPTIONS
field. In the Query
editor box, enter your function definition, then add the following line:
OPTIONS (description="DESCRIPTION") AS """
Replace DESCRIPTION with the description you would like to add.
bq
Using the bq query
syntax from UDFs and
the bq command line tool, you can edit a function's description from the
command line. Specify standard SQL with a --nouse_legacy_sql
or --
use_legacy_sql=false
flag, then enter your function definition. Add the
following line to your definition to set the description
parameter in the
OPTIONS
field:
OPTIONS (description="DESCRIPTION") AS """
Replace DESCRIPTION with the description you would like to add.
Limitations
The following limitations apply to temporary and persistent user-defined functions:
- The DOM objects
Window
,Document
, andNode
, and functions that require them, are not supported. - JavaScript functions that rely on native code are not supported.
- A JavaScript UDF can time out and prevent your query from completing. Timeouts can be as short as 5 minutes, but can vary depending on several factors, including how much user CPU time your function consumes and how large your inputs and outputs to the JavaScript function are.
- Bitwise operations in JavaScript handle only the most significant 32 bits.
- UDFs are subject to certain rate limits and quota limits. For more information, see UDF limits.
The following limitations apply to persistent user-defined functions:
- Each dataset can only contain one persistent UDF with the same name. However, you can create a UDF whose name is the same as the name of a table in the same dataset.
- When referencing a persistent UDF from another persistent UDF or a
logical view, you must qualify the name with the dataset. For example:
CREATE FUNCTION mydataset.referringFunction() AS (mydataset.referencedFunction());
The following limitations apply to temporary user-defined functions.
- When creating a temporary UDF,
function_name
cannot contain periods. - Views and persistent UDFs cannot reference temporary UDFs.