Standard SQL user-defined functions

A user-defined function (UDF) lets you create a function by using a SQL expression or JavaScript code. A UDF accepts columns of input, performs actions on the input, and returns the result of those actions as a value.

You can define a UDFs as either persistent or temporary. You can reuse persistent UDFs across multiple queries, while temporary UDFs only exist in the scope of a single query.

To create a UDF, use the CREATE FUNCTION statement. To delete a persistent user-defined function, use the DROP FUNCTION statement. Temporary UDFs expire as soon as the query finishes. The DROP FUNCTION statement is only supported for temporary UDFs in scripts and procedures.

For information on UDFs in legacy SQL, see User-defined functions in legacy SQL.

SQL UDFs

The following example creates a temporary SQL UDF named AddFourAndDivide and calls it from within a SELECT statement:

CREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)
  RETURNS FLOAT64
  AS ((x + 4) / y);


SELECT val, AddFourAndDivide(val, 2)
  FROM UNNEST([2,3,5,8]) AS val;

This example produces the following output:

+-----+-----+
| val | f0_ |
+-----+-----+
|   2 | 3.0 |
|   3 | 3.5 |
|   5 | 4.5 |
|   8 | 6.0 |
+-----+-----+

The next example creates the same function as a persistent UDF:

CREATE FUNCTION mydataset.AddFourAndDivide(x INT64, y INT64)
  RETURNS FLOAT64
  AS ((x + 4) / y);

Because this UDF is persistent, you must specify a dataset for the function (mydataset in this example). After you run the CREATE FUNCTION statement, you can call the function from a query:

SELECT val, mydataset.AddFourAndDivide(val, 2)
  FROM UNNEST([2,3,5,8,12]) AS val;

Templated SQL UDF parameters

A parameter with a type equal to ANY TYPE can match more than one argument type when the function is called.

  • If more than one parameter has type ANY TYPE, then BigQuery doesn't enforce any type relationship between these arguments.
  • The function return type cannot be ANY TYPE. It must be either omitted, which means to be automatically determined based on sql_expression, or an explicit type.
  • Passing the function arguments of types that are incompatible with the function definition will result in an error at call time.

The following example shows a SQL UDF that uses a templated parameter.

CREATE TEMP FUNCTION addFourAndDivideAny(x ANY TYPE, y ANY TYPE) 
AS (
  (x + 4) / y
);

SELECT addFourAndDivideAny(3, 4) AS integer_output,
       addFourAndDivideAny(1.59, 3.14) AS floating_point_output;

This example produces the following output:

+----------------+-----------------------+
| integer_output | floating_point_output |
+----------------+-----------------------+
| 1.75           | 1.7802547770700636    |
+----------------+-----------------------+

The next example uses a templated parameter to return the last element of an array of any type:

CREATE TEMP FUNCTION lastArrayElement(arr ANY TYPE) AS (
  arr[ORDINAL(ARRAY_LENGTH(arr))]
);

SELECT lastArrayElement(x) as last_element
  FROM (SELECT [2,3,5,8,13] as x)

This example produces the following output:

+--------------+
| last_element |
+--------------+
| 13           |
+--------------+

Scalar subqueries

A SQL UDF can return the value of a scalar subquery. A scalar subquery must select a single column.

The following example shows a SQL UDF that uses a scalar subquery to count the number of users with a given age in a user table:

CREATE TEMP TABLE users
AS SELECT 1 id, 10 age
UNION ALL SELECT 2, 30
UNION ALL SELECT 3, 10;

CREATE TEMP FUNCTION countUserByAge(userAge INT64)
AS ((SELECT COUNT(1) FROM users WHERE age = userAge));

SELECT countUserByAge(10) AS count_user_age_10,
       countUserByAge(20) AS count_user_age_20,
       countUserByAge(30) AS count_user_age_30;

This example produces the following output:

+-------------------+-------------------+-------------------+
| count_user_age_10 | count_user_age_20 | count_user_age_30 |
+-------------------+-------------------+-------------------+
|                 2 |                 0 |                 1 |
+-------------------+-------------------+-------------------+

Default project in SQL expressions

In the body of a SQL UDF, any references to BigQuery entities, such as tables or views, must include the project ID, unless the entity resides in the same project that runs the CREATE FUNCTION statement.

For example, consider the following statement:

CREATE FUNCTION project1.mydataset.myfunction()
  AS ((SELECT COUNT(*) FROM mydataset.mytable))

If you run this statement from project1 and mydataset.mytable exists in project1, then the statement succeeds. However, if you run this statement from a different project, then the statement fails. To correct the error, include the project ID in the table reference:

CREATE FUNCTION project1.mydataset.myfunction()
  AS ((SELECT COUNT(*) FROM project1.mydataset.mytable))

You can also reference an entity in a different project or dataset from the one where you create the function:

CREATE FUNCTION project1.mydataset.myfunction()
  AS ((SELECT COUNT(*) FROM project2.another_dataset.another_table))

JavaScript UDFs

A JavaScript UDF lets you call code written in JavaScript from a SQL query.

The following example shows a JavaScript UDF. The JavaScript code is quoted within a raw string.

CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS r"""
  return x*y;
""";

WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;

This example produces the following output:

+-----+-----+--------------+
| x   | y   | product      |
+-----+-----+--------------+
| 1   | 5   | 5            |
| 2   | 10  | 20           |
| 3   | 15  | 45           |
+-----+-----+--------------+

The next example sums the values of all fields named foo in the given JSON string.

CREATE TEMP FUNCTION SumFieldsNamedFoo(json_row STRING)
  RETURNS FLOAT64
  LANGUAGE js AS r"""
function SumFoo(obj) {
  var sum = 0;
  for (var field in obj) {
    if (obj.hasOwnProperty(field) && obj[field] != null) {
      if (typeof obj[field] == "object") {
        sum += SumFoo(obj[field]);
      } else if (field == "foo") {
        sum += obj[field];
      }
    }
  }
  return sum;
}
var row = JSON.parse(json_row);
return SumFoo(row);
""";

WITH Input AS (
  SELECT STRUCT(1 AS foo, 2 AS bar, STRUCT('foo' AS x, 3.14 AS foo) AS baz) AS s, 10 AS foo UNION ALL
  SELECT NULL, 4 AS foo UNION ALL
  SELECT STRUCT(NULL, 2 AS bar, STRUCT('fizz' AS x, 1.59 AS foo) AS baz) AS s, NULL AS foo
)
SELECT
  TO_JSON_STRING(t) AS json_row,
  SumFieldsNamedFoo(TO_JSON_STRING(t)) AS foo_sum
FROM Input AS t;

The example produces the following output:

+---------------------------------------------------------------------+---------+
| json_row                                                            | foo_sum |
+---------------------------------------------------------------------+---------+
| {"s":{"foo":1,"bar":2,"baz":{"x":"foo","foo":3.14}},"foo":10}       | 14.14   |
| {"s":null,"foo":4}                                                  | 4       |
| {"s":{"foo":null,"bar":2,"baz":{"x":"fizz","foo":1.59}},"foo":null} | 1.59    |
+---------------------------------------------------------------------+---------+

Supported JavaScript UDF data types

Some SQL types have a direct mapping to JavaScript types, but others do not. BigQuery represents types in the following manner:

BigQuery data type JavaScript data type
ARRAY ARRAY
BOOL BOOLEAN
BYTES base64-encoded STRING
FLOAT64 NUMBER
NUMERIC, BIGNUMERIC If a NUMERIC or BIGNUMERIC value can be represented exactly as an IEEE 754 floating-point value and has no fractional part, it is encoded as a Number. These values are in the range [-253, 253]. Otherwise, it is encoded as a String.
STRING STRING
STRUCT OBJECT where each STRUCT field is a named field
TIMESTAMP DATE with a microsecond field containing the microsecond fraction of the timestamp
DATE DATE

Because JavaScript does not support a 64-bit integer type, INT64 is unsupported as an input type for JavaScript UDFs. Instead, use FLOAT64 to represent integer values as a number, or STRING to represent integer values as a string.

BigQuery does support INT64 as a return type in JavaScript UDFs. In this case, the JavaScript function body can return either a JavaScript Number or a String. BigQuery then converts either of these types to INT64.

If the return value of the JavaScript UDF is a Promise, BigQuery waits for the Promise until it is settled. If the Promise settles into a fulfilled state, BigQuery returns its result. If the Promise settles into a rejected state, BigQuery returns an error.

Quoting rules

You must enclose JavaScript code in quotes. For simple, one line code snippets, you can use a standard quoted string:

CREATE TEMP FUNCTION plusOne(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS "return x+1;";
SELECT val, plusOne(val) AS result
FROM UNNEST([1, 2, 3, 4, 5]) AS val;

+-----------+-----------+
| val       | result    |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 4         |
| 4         | 5         |
| 5         | 6         |
+-----------+-----------+

In cases where the snippet contains quotes, or consists of multiple lines, use triple-quoted blocks:

CREATE TEMP FUNCTION customGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS r"""
  var d = new Date();
  if (d.getHours() < 12) {
    return 'Good Morning, ' + a + '!';
  } else {
    return 'Good Evening, ' + a + '!';
  }
  """;
SELECT customGreeting(names) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS names;
+-----------------------+
| everyone              |
+-----------------------+
| Good Morning, Hannah! |
| Good Morning, Max!    |
| Good Morning, Jakob!  |
+-----------------------+

Including JavaScript libraries

You can extend your JavaScript UDFs using the OPTIONS section. This section lets you specify external code libraries for the UDF.

CREATE TEMP FUNCTION myFunc(a FLOAT64, b STRING)
  RETURNS STRING
  LANGUAGE js
  OPTIONS (
    library=["gs://my-bucket/path/to/lib1.js", "gs://my-bucket/path/to/lib2.js"]
  )
  AS
r"""
    // Assumes 'doInterestingStuff' is defined in one of the library files.
    return doInterestingStuff(a, b);
""";

SELECT myFunc(3.14, 'foo');

In the preceding example, code in lib1.js and lib2.js is available to any code in the [external_code] section of the UDF.

Best practices for JavaScript UDFs

Prefilter your input

If your input can be easily filtered down before being passed to a JavaScript UDF, your query will likely be faster and cheaper.

Avoid persistent mutable state

Do not store or access mutable state across JavaScript UDF calls.

Use memory efficiently

The JavaScript processing environment has limited memory available per query. JavaScript UDF queries that accumulate too much local state might fail due to memory exhaustion.

Authorized UDFs

An authorized UDF is a UDF that is authorized to access a particular dataset. The UDF can query tables in the dataset, even if the user who calls the UDF does not have access to those tables.

Authorized UDFs let you share query results with particular users or groups without giving those users or groups access to the underlying tables. For example, an authorized UDF can compute an aggregation over data or look up a table value and use that value in a computation.

To authorize a UDF, you can use the Google Cloud Console, the REST API, or the bq command-line tool:

Console

  1. Go to the BigQuery page in the Cloud Console.

    Go to BigQuery

  2. In the navigation panel, in the Resources section, expand your project and select a dataset.

  3. In the details panel, click Authorize Routines.

  4. In the Authorized routines page, in the Authorize routine section, select the project ID, dataset ID, and routine ID for the UDF that you want to authorize.

  5. Click Add authorization.

API

  1. Call the datasets.get method to fetch the dataset that you want the UDF to access. The response body contains a representation of the Dataset resource.

  2. Add the following JSON object to the access array in the Dataset resource:

    {
     "routine": {
       "datasetId": "DATASET_NAME",
       "projectId": "PROJECT_ID",
       "routineId": "ROUTINE_NAME"
     }
    }
    

    Where:

    • DATASET_NAME is the name of the dataset that contains the UDF.
    • PROJECT_ID is the project ID of the project that contains the UDF.
    • ROUTINE_NAME is the name of the UDF.
  3. Call the dataset.update method with the modified Dataset representation.

bq

  1. Use the bq show command to get the JSON representation of the dataset that you want the UDF to access. The output from the command is a JSON representation of the Dataset resource. Save the result to a local file.

    bq show --format=prettyjson TARGET_DATASET > dataset.json
    

    Replace TARGET_DATASET with the name of the dataset that the UDF will have access to.

  2. Edit the file to add the following JSON object to the access array in the Dataset resource:

    {
     "routine": {
       "datasetId": "DATASET_NAME",
       "projectId": "PROJECT_ID",
       "routineId": "ROUTINE_NAME"
     }
    }
    

    Where:

    • DATASET_NAME is the name of the dataset that contains the UDF.
    • PROJECT_ID is the project ID of the project that contains the UDF.
    • ROUTINE_NAME is the name of the UDF.
  3. Use the bq update command to update the dataset.

    bq update --source dataset.json TARGET_DATASET
    

Authorized UDF example

The following is an end-to-end example of creating and using an authorized UDF.

  1. Create two datasets named private_dataset and public_dataset. For more information about creating a dataset, see Creating a dataset.

  2. Run the following statement to create a table named private_table in private_dataset:

    CREATE OR REPLACE TABLE private_dataset.private_table
    AS SELECT key FROM UNNEST(['key1', 'key1','key2','key3']) key;
    
  3. Run the following statement to create a UDF named count_key in public_dataset. The UDF includes a SELECT statement on private_table.

    CREATE OR REPLACE FUNCTION public_dataset.count_key(input_key STRING)
    RETURNS INT64
    AS
    ((SELECT COUNT(1) FROM private_dataset.private_table t WHERE t.key = input_key));
    
  4. Grant the bigquery.dataViewer role to a user on the public_dataset dataset. This role includes the bigquery.routines.get permission, which lets the user call the function. For information about how to assign access controls to a dataset, see Controlling access to datasets.

  5. At this point, the user has permission to call the count_key function but cannot access the table in private_dataset. If the user tries to call the function, they get an error message similar to the following:

    Access Denied: Table myproject:private_dataset.private_table: User does
    not have permission to query table myproject:private_dataset.private_table.
    
  6. Using the bq command-line tool, run the show command as follows:

    bq show --format=prettyjson private_dataset > dataset.json
    

    The output is saved to a local file named dataset.json.

  7. Edit dataset.json to add the following JSON object to the access array:

    {
     "routine": {
       "datasetId": "public_dataset",
       "projectId": "PROJECT_ID",
       "routineId": "count_key"
     }
    }
    

    Replace PROJECT_ID with the project ID for public_dataset.

  8. Using the bq command-line tool, run the update command as follows:

    bq update --source dataset.json private_dataset
    
  9. To verify that the UDF has access to private_dataset, the user can run the following query:

    SELECT public_dataset.count_key('key1');
    

Adding descriptions to UDFs

To add a description to a UDF, follow these steps:

Console

  1. Go to the BigQuery page in the Cloud Console.

    Go to BigQuery

  2. In the Explorer panel, expand your project and dataset, then select the function.

  3. In the Details pane, click the pencil icon next to Description to edit the description text.

  4. In the dialog, enter a description in the box or edit the existing description. Click Update to save the new description text.

Alternatively, you can use a Standard SQL query to update the description using the description parameter of the OPTIONS field. In the Query editor box, enter your function definition, then add the following line:

OPTIONS (description="DESCRIPTION") AS """

Replace DESCRIPTION with the description you would like to add.

bq

Using the bq query syntax from UDFs and the bq command line tool, you can edit a function's description from the command line. Specify standard SQL with a --nouse_legacy_sql or -- use_legacy_sql=false flag, then enter your function definition. Add the following line to your definition to set the description parameter in the OPTIONS field:

OPTIONS (description="DESCRIPTION") AS """

Replace DESCRIPTION with the description you would like to add.

Limitations

The following limitations apply to temporary and persistent user-defined functions:

  • The DOM objects Window, Document, and Node, and functions that require them, are not supported.
  • JavaScript functions that rely on native code can fail, for example, if they make restricted system calls.
  • A JavaScript UDF can time out and prevent your query from completing. Timeouts can be as short as 5 minutes, but can vary depending on several factors, including how much user CPU time your function consumes and how large your inputs and outputs to the JavaScript function are.
  • Bitwise operations in JavaScript handle only the most significant 32 bits.
  • UDFs are subject to certain rate limits and quota limits. For more information, see UDF limits.

The following limitations apply to persistent user-defined functions:

  • Each dataset can only contain one persistent UDF with the same name. However, you can create a UDF whose name is the same as the name of a table in the same dataset.
  • When referencing a persistent UDF from another persistent UDF or a logical view, you must qualify the name with the dataset. For example:
    CREATE FUNCTION mydataset.referringFunction() AS (mydataset.referencedFunction());

The following limitations apply to temporary user-defined functions.

  • When creating a temporary UDF, function_name cannot contain periods.
  • Views and persistent UDFs cannot reference temporary UDFs.