Standard SQL User-Defined Functions

BigQuery supports user-defined functions (UDFs). A UDF enables you to create a function using another SQL expression or JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value. For information on user-defined functions in legacy SQL, see User-Defined Functions in Legacy SQL.

UDFs can either be persistent or temporary. You can reuse persistent UDFs across multiple queries, whereas you can only use temporary UDFs in a single query.

UDF Syntax

To create a persistent UDF, use the following syntax:

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
    [`project_name`.]dataset_name.function_name
    ([named_parameter[, ...]])
  [RETURNS data_type]
  { sql_function_definition | javascript_function_definition }

To create a temporary UDF, use the following syntax:

CREATE [OR REPLACE] {TEMPORARY | TEMP} FUNCTION [IF NOT EXISTS]
    function_name
    ([named_parameter[, ...]])
  [RETURNS data_type]
  { sql_function_definition | javascript_function_definition }

named_parameter:
  param_name param_type

sql_function_definition:
  AS (sql_expression)

javascript_function_definition:
  LANGUAGE js
  [OPTIONS (library = library_array)]
  AS javascript_code

This syntax consists of the following components:

  • CREATE { FUNCTION | OR REPLACE FUNCTION | FUNCTION IF NOT EXISTS }. Creates or updates a function. To replace any existing function with the same name, use the OR REPLACE keyword. To treat the query as successful and take no action if a function with the same name already exists, use the IF NOT EXISTS clause.
  • named_parameter. Consists of a comma-separated param_name and param_type pair. The value of param_type is a BigQuery data type. For a SQL UDF, the value of param_type may also be ANY TYPE.
  • [RETURNS data_type]. Specifies the data type that the function returns.
    • If the function is defined in SQL, then the RETURNS clause is optional. If the RETURNS clause is omitted, then BigQuery infers the result type of the function from the SQL function body when a query calls the function.
    • If the function is defined in JavaScript, then the RETURNS clause is required. See Supported JavaScript UDF data types for more information about allowed values for data_type.
  • AS (sql_expression). Specifies the SQL expression that defines the function.
  • [OPTIONS (library = library_array)]. For a JavaScript UDF, specifies an array of JavaScript libraries to include in the function definition.
  • AS javascript_code. Specifies the definition of a JavaScript function. javascript_code is a string literal.

To delete a persistent user-defined function, use the following syntax:

DROP FUNCTION [IF EXISTS] [`project_name`.]dataset_name.function_name

Temporary user-defined functions expire as soon as the query finishes, and thus do not support DROP FUNCTION statements except in scripts and procedures.

SQL UDF structure

Create SQL UDFs using the following syntax:

CREATE [OR REPLACE] [TEMPORARY | TEMP] FUNCTION [IF NOT EXISTS]
    [[`project_name`.]dataset_name.]function_name
    ([named_parameter[, ...]])
  [RETURNS data_type]
  AS (sql_expression)

named_parameter:
  param_name param_type

Templated SQL UDF parameters

A templated parameter with param_type = ANY TYPE can match more than one argument type when the function is called.

  • If more than one paramater has type ANY TYPE, BigQuery does not enforce any type relationship between these arguments.
  • The function return type cannot be ANY TYPE. It must be either omitted, which means to be automatically determined based on sql_expression, or an explicit type.
  • Passing the function arguments of types that are incompatible with the function definition will result in an error at call time.

SQL UDF examples

The following example shows a UDF that employs a SQL function.

CREATE TEMP FUNCTION addFourAndDivide(x INT64, y INT64) AS ((x + 4) / y);
WITH numbers AS
  (SELECT 1 as val
  UNION ALL
  SELECT 3 as val
  UNION ALL
  SELECT 4 as val
  UNION ALL
  SELECT 5 as val)
SELECT val, addFourAndDivide(val, 2) AS result
FROM numbers;

+-----+--------+
| val | result |
+-----+--------+
| 1   | 2.5    |
| 3   | 3.5    |
| 4   | 4      |
| 5   | 4.5    |
+-----+--------+

The following example shows a SQL UDF that uses a templated parameter. The resulting function accepts arguments of various types.

CREATE TEMP FUNCTION addFourAndDivideAny(x ANY TYPE, y ANY TYPE) AS (
  (x + 4) / y
);
SELECT addFourAndDivideAny(3, 4) AS integer_output,
       addFourAndDivideAny(1.59, 3.14) AS floating_point_output;

+----------------+-----------------------+
| integer_output | floating_point_output |
+----------------+-----------------------+
| 1.75           | 1.7802547770700636    |
+----------------+-----------------------+

The following example shows a SQL UDF that uses a templated parameter to return the last element of an array of any type.

CREATE TEMP FUNCTION lastArrayElement(arr ANY TYPE) AS (
  arr[ORDINAL(ARRAY_LENGTH(arr))]
);
SELECT
  names[OFFSET(0)] AS first_name,
  lastArrayElement(names) AS last_name
FROM (
  SELECT ['Fred', 'McFeely', 'Rogers'] AS names UNION ALL
  SELECT ['Marie', 'Skłodowska', 'Curie']
);

+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Fred       | Rogers    |
| Marie      | Curie     |
+------------+-----------+

JavaScript UDF structure

Create JavaScript UDFs using the following structure.

CREATE [OR REPLACE] [TEMPORARY | TEMP] FUNCTION [IF NOT EXISTS]
    [[`project_name`.]dataset_name.]function_name
    ([named_parameter[, ...]])
  RETURNS data_type
  LANGUAGE js
  [OPTIONS (library = library_array)]
  AS javascript_code

JavaScript UDF examples

CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x*y;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;

+-----+-----+--------------+
| x   | y   | product      |
+-----+-----+--------------+
| 1   | 5   | 5            |
| 2   | 10  | 20           |
| 3   | 15  | 45           |
+-----+-----+--------------+

You can pass the result of a UDF as input to another UDF. For example:

CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x*y;
""";
CREATE TEMP FUNCTION divideByTwo(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x/2;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x,
  y,
  multiplyInputs(divideByTwo(x), divideByTwo(y)) as half_product
FROM numbers;

+-----+-----+--------------+
| x   | y   | half_product |
+-----+-----+--------------+
| 1   | 5   | 1.25         |
| 2   | 10  | 5            |
| 3   | 15  | 11.25        |
+-----+-----+--------------+

The following example sums the values of all fields named "foo" in the given JSON string.

CREATE TEMP FUNCTION SumFieldsNamedFoo(json_row STRING)
  RETURNS FLOAT64
  LANGUAGE js AS """
function SumFoo(obj) {
  var sum = 0;
  for (var field in obj) {
    if (obj.hasOwnProperty(field) && obj[field] != null) {
      if (typeof obj[field] == "object") {
        sum += SumFoo(obj[field]);
      } else if (field == "foo") {
        sum += obj[field];
      }
    }
  }
  return sum;
}
var row = JSON.parse(json_row);
return SumFoo(row);
""";

WITH Input AS (
  SELECT STRUCT(1 AS foo, 2 AS bar, STRUCT('foo' AS x, 3.14 AS foo) AS baz) AS s, 10 AS foo UNION ALL
  SELECT NULL, 4 AS foo UNION ALL
  SELECT STRUCT(NULL, 2 AS bar, STRUCT('fizz' AS x, 1.59 AS foo) AS baz) AS s, NULL AS foo
)
SELECT
  TO_JSON_STRING(t) AS json_row,
  SumFieldsNamedFoo(TO_JSON_STRING(t)) AS foo_sum
FROM Input AS t;
+---------------------------------------------------------------------+---------+
| json_row                                                            | foo_sum |
+---------------------------------------------------------------------+---------+
| {"s":{"foo":1,"bar":2,"baz":{"x":"foo","foo":3.14}},"foo":10}       | 14.14   |
| {"s":null,"foo":4}                                                  | 4       |
| {"s":{"foo":null,"bar":2,"baz":{"x":"fizz","foo":1.59}},"foo":null} | 1.59    |
+---------------------------------------------------------------------+---------+

See Supported JavaScript UDF data types for information on how BigQuery data types map to JavaScript types.

Supported JavaScript UDF data types

Some SQL types have a direct mapping to JavaScript types, but others do not. BigQuery represents types in the following manner:

BigQuery Data Type JavaScript Data Type
ARRAY ARRAY
BOOL BOOLEAN
BYTES base64-encoded STRING
FLOAT64 NUMBER
NUMERIC If a NUMERIC value can be represented exactly as an IEEE 754 floating-point value and has no fractional part, it is encoded as a Number. These values are in the range [-253, 253]. Otherwise, it is encoded as a String.
STRING STRING
STRUCT OBJECT where each STRUCT field is a named field
TIMESTAMP DATE with a microsecond field containing the microsecond fraction of the timestamp
DATE DATE

Because JavaScript does not support a 64-bit integer type, INT64 is unsupported as an input type for JavaScript UDFs. Instead, use FLOAT64 to represent integer values as a number, or STRING to represent integer values as a string.

BigQuery does support INT64 as a return type in JavaScript UDFs. In this case, the JavaScript function body can return either a JavaScript Number or a String. BigQuery then converts either of these types to INT64.

If the return value of the JavaScript UDF is a Promise, BigQuery waits for the Promise until it is settled. If the Promise settles into a fulfilled state, BigQuery returns its result. If the Promise settles into a rejected state, BigQuery returns an error.

Quoting rules

You must enclose JavaScript code in quotes. For simple, one line code snippets, you can use a standard quoted string:

CREATE TEMP FUNCTION plusOne(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS "return x+1;";
SELECT val, plusOne(val) AS result
FROM UNNEST([1, 2, 3, 4, 5]) AS val;

+-----------+-----------+
| val       | result    |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 4         |
| 4         | 5         |
| 5         | 6         |
+-----------+-----------+

In cases where the snippet contains quotes, or consists of multiple lines, use triple-quoted blocks:

CREATE TEMP FUNCTION customGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
  var d = new Date();
  if (d.getHours() < 12) {
    return 'Good Morning, ' + a + '!';
  } else {
    return 'Good Evening, ' + a + '!';
  }
  """;
SELECT customGreeting(names) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS names;
+-----------------------+
| everyone              |
+-----------------------+
| Good Morning, Hannah! |
| Good Morning, Max!    |
| Good Morning, Jakob!  |
+-----------------------+

Including JavaScript libraries

You can extend your JavaScript UDFs using the OPTIONS section. This section allows you to specify external code libraries for the UDF.

CREATE TEMP FUNCTION myFunc(a FLOAT64, b STRING)
  RETURNS STRING
  LANGUAGE js
  OPTIONS (
    library=["gs://my-bucket/path/to/lib1.js", "gs://my-bucket/path/to/lib2.js"]
  )
  AS
"""
    // Assumes 'doInterestingStuff' is defined in one of the library files.
    return doInterestingStuff(a, b);
""";

SELECT myFunc(3.14, 'foo');

In the preceding example, code in lib1.js and lib2.js is available to any code in the [external_code] section of the UDF.

UDFs and the Web UI

You can use the BigQuery web UI to run queries using one or more UDFs.

Running a query with a UDF

  1. Click the COMPOSE QUERY button.
  2. Click the Query Editor tab.
  3. Click the Show Options button.
  4. Uncheck the Use Legacy SQL checkbox.
  5. Type the UDF statement into the Query Editor text area. For example:

    CREATE TEMPORARY FUNCTION timesTwo(x FLOAT64)
    RETURNS FLOAT64
      LANGUAGE js AS """
      return x*2;
    """;
  6. Below the UDF statement, type your query. For example:

    SELECT timesTwo(numbers) AS doubles
    FROM UNNEST([1, 2, 3, 4, 5]) AS numbers;
  7. Click the RUN QUERY button. The query results display underneath the buttons.

UDFs and the bq command-line tool

You can use the bq Command-Line Tool from the Google Cloud SDK to run a query containing one or more UDFs.

Use the following syntax to run a query with a UDF:

bq query <statement_with_udf_and_query>

Best practices for JavaScript UDFs

Pre-filter your input

If your input can be easily filtered down before being passed to a JavaScript UDF, your query will likely be faster and cheaper.

Avoid persistent mutable state

Do not store or access mutable state across JavaScript UDF calls.

Use memory efficiently

The JavaScript processing environment has limited memory available per query. JavaScript UDF queries that accumulate too much local state may fail due to memory exhaustion.

Limits

  • The amount of data that your JavaScript UDF outputs when processing a single row — approximately 5 MB or less.
  • Concurrent rate limit for legacy SQL queries that contain user-defined functions (UDFs) — 6 concurrent queries.
  • The concurrent rate limit for legacy SQL queries that contain UDFs includes both interactive and batch queries. Interactive queries that contain UDFs also count toward the concurrent rate limit for interactive queries. This limit does not apply to standard SQL queries.

  • Maximum number of JavaScript UDF resources, such as inline code blobs or external files, in a query job — 50
  • Maximum size of each inline code blob — 32 KB.
  • Maximum size of each external code resource — 1 MB.

The following limits apply to persistent user-defined functions.
  • Maximum length of a function name — 256 characters
  • Maximum number of arguments — 256
  • Maximum length of a argument name — 128 characters
  • Maximum depth of a user-defined function reference chain — 16
  • Maximum depth of argument or output of type STRUCT — 15
  • Maximum number of fields in argument or output of type STRUCT per UDF — 1024
  • Maximum number of unique UDF plus table references per query — 1000 After full expansion, each UDF can reference up to 1000 combined unique tables and UDFs.
  • Maximum number of JavaScript libraries in CREATE FUNCTION statement — 50
  • Max length of included JavaScript library paths — 5000 characters
  • Maximum update rate per UDF — 5 per 10 seconds After function creation, you can update each function up to 5 times per 10 seconds.
  • Each inline code blob is limited to a maximum size of 32 KB.
  • Each JavaScript code resource is limited to a maximum size of 1 MB.

Limitations

The following limitations apply to temporary and persistent user-defined functions:

  • The DOM objects Window, Document, and Node, and functions that require them, are unsupported.
  • JavaScript functions that rely on native code are unsupported.
  • A JavaScript UDF can timeout and prevent your query from completing. Timeouts can be as short as 5 minutes, but can vary depending on several factors, including how much user CPU time your function consumes and how large your inputs and outputs to the JS function are.
  • Because of their non-deterministic nature, queries that invoke JavaScript user-defined functions cannot use cached results.
  • Bitwise operations in JavaScript handle only the most significant 32 bits.
  • You cannot reference a table in a UDF.

The following limitations apply to persistent user-defined functions:

  • Each dataset can only contain one persistent UDF with the same name. However, you can create a UDF whose name is the same as the name of a table in the same dataset.
  • When referencing a persistent UDF from another persistent UDF, you must qualify the name with the dataset. For example:
    CREATE FUNCTION mydataset.referringFunction() AS (mydataset.referencedFunction());
  • When referencing a persistent UDF from a logical view, you must fully qualify the name with the project and dataset. For example:
    CREATE VIEW mydataset.sample_view AS SELECT `my-project`.mydataset.referencedFunction();

The following limitations apply to temporary user-defined functions.

  • When creating a temporary UDF, function_name cannot contain periods.
  • Logical views and persistent UDFs cannot reference temporary UDFs.

क्या यह पेज उपयोगी था? हमारे काम के बारे में अपनी राय दें:

निम्न के बारे में फ़ीडबैक भेजें...

सहायता चाहिए? हमारे सहायता पेज पर जाएं.