Standard SQL User-Defined Functions

User-Defined Functions in Standard SQL

BigQuery supports user-defined functions (UDFs). A UDF enables you to create a function using another SQL expression or another programming language, such as JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value. For information on user-defined functions in legacy SQL, see User-Defined Functions in Legacy SQL.

UDFs are temporary. This means you can only use them for the current query or command-line session. You do not use the UDF Editor tab in the web UI when creating a user-defined function for use with standard SQL queries. The UDF Editor tab is for use with legacy SQL.

General UDF Syntax

User-defined functions in BigQuery use the following general syntax:

CREATE  { TEMPORARY | TEMP }  FUNCTION
  function_name ([named_parameter[, ...]])
  [RETURNS data_type]
  { [LANGUAGE language AS """body"""] | [AS (function_definition)] };

named_parameter:
  param_name param_type

This syntax consists of the following components:

  • CREATE { TEMPORARY | TEMP } FUNCTION. Creates a new function. A function can contain zero or more named_parameters. You must include either TEMPORARY or TEMP when you create a UDF.
  • named_parameter. Consists of a comma-separated param_name and param_type pair. The value of param_type is a BigQuery data type. For a SQL UDF, the value of param_type may also be ANY TYPE.
  • [RETURNS data_type]. Specifies the data type that the function returns. If the function is defined in SQL, then the RETURNS clause is optional and BigQuery infers the result type of the function from the SQL function body. If the function is defined in an external language, then the RETURNS clause is required. See Supported UDF data types for more information about allowed values for data_type.
  • [LANGUAGE language AS """body"""]. Specifies the external language for the function and the code that defines the function.
  • AS (function_definition). Specifies the SQL code that defines the function. function_definition is a SQL expression.

External UDF structure

Create external UDFs using the following structure.

CREATE  { TEMPORARY | TEMP }  FUNCTION function_name ([named_parameter[, ...]])
  [RETURNS data_type]
  [LANGUAGE language]
  AS external_code

External UDF examples

CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x*y;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;

+-----+-----+--------------+
| x   | y   | product      |
+-----+-----+--------------+
| 1   | 5   | 5            |
| 2   | 10  | 20           |
| 3   | 15  | 45           |
+-----+-----+--------------+

You can create multiple UDFs before a query. For example:

CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x*y;
""";
CREATE TEMP FUNCTION divideByTwo(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x / 2;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x,
  y,
  multiplyInputs(x, y) as product,
  divideByTwo(x) as half_x,
  divideByTwo(y) as half_y
FROM numbers;

+-----+-----+--------------+--------+--------+
| x   | y   | product      | half_x | half_y |
+-----+-----+--------------+--------+--------+
| 1   | 5   | 5            | 0.5    | 2.5    |
| 2   | 10  | 20           | 1      | 5      |
| 3   | 15  | 45           | 1.5    | 7.5    |
+-----+-----+--------------+--------+--------+

You can pass the result of a UDF as input to another UDF. For example:

CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x*y;
""";
CREATE TEMP FUNCTION divideByTwo(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  return x/2;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x,
  y,
  multiplyInputs(divideByTwo(x), divideByTwo(y)) as half_product
FROM numbers;

+-----+-----+--------------+
| x   | y   | half_product |
+-----+-----+--------------+
| 1   | 5   | 1.25         |
| 2   | 10  | 5            |
| 3   | 15  | 11.25        |
+-----+-----+--------------+

The following example sums the values of all fields named "foo" in the given JSON string.

CREATE TEMP FUNCTION SumFieldsNamedFoo(json_row STRING)
  RETURNS FLOAT64
  LANGUAGE js AS """
function SumFoo(obj) {
  var sum = 0;
  for (var field in obj) {
    if (obj.hasOwnProperty(field) && obj[field] != null) {
      if (typeof obj[field] == "object") {
        sum += SumFoo(obj[field]);
      } else if (field == "foo") {
        sum += obj[field];
      }
    }
  }
  return sum;
}
var row = JSON.parse(json_row);
return SumFoo(row);
""";

WITH Input AS (
  SELECT STRUCT(1 AS foo, 2 AS bar, STRUCT('foo' AS x, 3.14 AS foo) AS baz) AS s, 10 AS foo UNION ALL
  SELECT NULL, 4 AS foo UNION ALL
  SELECT STRUCT(NULL, 2 AS bar, STRUCT('fizz' AS x, 1.59 AS foo) AS baz) AS s, NULL AS foo
)
SELECT
  TO_JSON_STRING(t) AS json_row,
  SumFieldsNamedFoo(TO_JSON_STRING(t)) AS foo_sum
FROM Input AS t;
+---------------------------------------------------------------------+---------+
| json_row                                                            | foo_sum |
+---------------------------------------------------------------------+---------+
| {"s":{"foo":1,"bar":2,"baz":{"x":"foo","foo":3.14}},"foo":10}       | 14.14   |
| {"s":null,"foo":4}                                                  | 4       |
| {"s":{"foo":null,"bar":2,"baz":{"x":"fizz","foo":1.59}},"foo":null} | 1.59    |
+---------------------------------------------------------------------+---------+

Supported external UDF languages

External UDFs support code written in JavaScript, which you specify using js as the LANGUAGE. For example:

CREATE TEMP FUNCTION greeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
  return "Hello, " + a + "!";
  """;
SELECT greeting(name) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS name;

+----------------+
| everyone       |
+----------------+
| Hello, Hannah! |
| Hello, Max!    |
| Hello, Jakob!  |
+----------------+

See SQL type encodings in JavaScript for information on how BigQuery data types map to JavaScript types.

Supported external UDF data types

For external UDFs, BigQuery supports the following data types:

  • ARRAY
  • BOOL
  • BYTES
  • DATE
  • FLOAT64
  • NUMERIC
  • STRING
  • STRUCT
  • TIMESTAMP

SQL type encodings in JavaScript

Some SQL types have a direct mapping to JavaScript types, but others do not.

Because JavaScript does not support a 64-bit integer type, INT64 is unsupported as an input type for JavaScript UDFs. Instead, use FLOAT64 to represent integer values as a number, or STRING to represent integer values as a string.

BigQuery does support INT64 as a return type in JavaScript UDFs. In this case, the JavaScript function body can return either a JavaScript Number or a String. BigQuery then converts either of these types to INT64.

BigQuery represents types in the following manner:

BigQuery Data Type JavaScript Data Type
ARRAY ARRAY
BOOL BOOLEAN
BYTES base64-encoded STRING
FLOAT64 NUMBER
NUMERIC If a NUMERIC value can be represented exactly as an IEEE 754 floating-point value and has no fractional part, it is encoded as a Number. These values are in the range [-253, 253]. Otherwise, it is encoded as a String.
STRING STRING
STRUCT OBJECT where each STRUCT field is a named field
TIMESTAMP DATE with a microsecond field containing the microsecond fraction of the timestamp
DATE DATE

Quoting rules

You must enclose external code in quotes. For simple, one line code snippets, you can use a standard quoted string:

CREATE TEMP FUNCTION plusOne(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS "return x+1;";
SELECT val, plusOne(val) AS result
FROM UNNEST([1, 2, 3, 4, 5]) AS val;

+-----------+-----------+
| val       | result    |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 4         |
| 4         | 5         |
| 5         | 6         |
+-----------+-----------+

In cases where the snippet contains quotes, or consists of multiple lines, use triple-quoted blocks:

CREATE TEMP FUNCTION customGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
  var d = new Date();
  if (d.getHours() < 12) {
    return 'Good Morning, ' + a + '!';
  } else {
    return 'Good Evening, ' + a + '!';
  }
  """;
SELECT customGreeting(names) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS names;
+-----------------------+
| everyone              |
+-----------------------+
| Good Morning, Hannah! |
| Good Morning, Max!    |
| Good Morning, Jakob!  |
+-----------------------+

SQL UDF structure

Create SQL UDFs using the following syntax:

CREATE  { TEMPORARY | TEMP }  FUNCTION function_name ([named_parameter[, ...]])
  [RETURNS data_type]
  AS (sql_expression)

named_parameter:
  param_name param_type

Templated SQL UDF parameters

A templated parameter can match more than one argument type at function call time. If a function signature includes a templated parameter, BigQuery allows function calls to pass one of several argument types to the function.

SQL user-defined function signatures can contain the following templated param_type value:

  • ANY TYPE. The function will accept an input of any type for this argument. If more than one parameter has the type ANY TYPE, BigQuery does not enforce any relationship between these arguments at the time of function creation. However, passing the function arguments of types that are incompatible with the function definition will result in an error at call time.

SQL UDF examples

The following example shows a UDF that employs a SQL function.

CREATE TEMP FUNCTION addFourAndDivide(x INT64, y INT64) AS ((x + 4) / y);
WITH numbers AS
  (SELECT 1 as val
  UNION ALL
  SELECT 3 as val
  UNION ALL
  SELECT 4 as val
  UNION ALL
  SELECT 5 as val)
SELECT val, addFourAndDivide(val, 2) AS result
FROM numbers;

+-----+--------+
| val | result |
+-----+--------+
| 1   | 2.5    |
| 3   | 3.5    |
| 4   | 4      |
| 5   | 4.5    |
+-----+--------+

The following example shows a SQL UDF that uses a templated parameter. The resulting function accepts arguments of various types.

CREATE TEMP FUNCTION addFourAndDivideAny(x ANY TYPE, y ANY TYPE) AS (
  (x + 4) / y
);
SELECT addFourAndDivideAny(3, 4) AS integer_output,
       addFourAndDivideAny(1.59, 3.14) AS floating_point_output;

+----------------+-----------------------+
| integer_output | floating_point_output |
+----------------+-----------------------+
| 1.75           | 1.7802547770700636    |
+----------------+-----------------------+

The following example shows a SQL UDF that uses a templated parameter to return the last element of an array of any type.

CREATE TEMP FUNCTION lastArrayElement(arr ANY TYPE) AS (
  arr[ORDINAL(ARRAY_LENGTH(arr))]
);
SELECT
  names[OFFSET(0)] AS first_name,
  lastArrayElement(names) AS last_name
FROM (
  SELECT ['Fred', 'McFeely', 'Rogers'] AS names UNION ALL
  SELECT ['Marie', 'Skłodowska', 'Curie']
);

+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Fred       | Rogers    |
| Marie      | Curie     |
+------------+-----------+

Including external libraries

You can extend your external UDFs using the OPTIONS section. This section allows you to specify external code libraries for the UDF.

CREATE TEMP FUNCTION myFunc(a FLOAT64, b STRING)
  RETURNS STRING
  LANGUAGE js AS
"""
    // Assumes 'doInterestingStuff' is defined in one of the library files.
    return doInterestingStuff(a, b);
"""
OPTIONS (
  library="gs://my-bucket/path/to/lib1.js",
  library=["gs://my-bucket/path/to/lib2.js", "gs://my-bucket/path/to/lib3.js"]
);

SELECT myFunc(3.14, 'foo');

In the preceding example, code in lib1.js, lib2.js, and lib3.js is available to any code in the [external_code] section of the UDF. Notice that you can specify library files using single-element or array syntax.

UDFs and the Web UI

You can use the BigQuery web UI to run queries using one or more UDFs.

Running a query with a UDF

  1. Click the COMPOSE QUERY button.
  2. Click the Query Editor tab.
  3. Click the Show Options button.
  4. Uncheck the Use Legacy SQL checkbox.
  5. Type the UDF statement into the Query Editor text area. For example:

    CREATE TEMPORARY FUNCTION timesTwo(x FLOAT64)
    RETURNS FLOAT64
      LANGUAGE js AS """
      return x*2;
    """;
  6. Below the UDF statement, type your query. For example:

    SELECT timesTwo(numbers) as doubles
    FROM UNNEST([1, 2, 3, 4, 5]) AS numbers;
  7. Click the RUN QUERY button. The query results display underneath the buttons.

UDFs and the bq command-line tool

You can use the bq Command-Line Tool from the Google Cloud SDK to run a query containing one or more UDFs.

Use the following syntax to run a query with a UDF:

bq query <statement_with_udf_and_query>

Best practices for JavaScript UDFs

Pre-filter your input

If your input can be easily filtered down before being passed to a JavaScript UDF, your query will likely be faster and cheaper.

Avoid persistent mutable state

Do not store or access mutable state across JavaScript UDF calls.

Use memory efficiently

The JavaScript processing environment has limited memory available per query. JavaScript UDF queries that accumulate too much local state may fail due to memory exhaustion.

Limits

  • The amount of data that your JavaScript UDF outputs when processing a single row should be approximately 5 MB or less.
  • Each user is limited to running approximately 6 JavaScript UDF queries in a specific project at the same time. If you receive an error that you're over the concurrent query limit, wait a few minutes and try again.
  • A JavaScript UDF can timeout and prevent your query from completing. Timeouts can be as short as 5 minutes, but can vary depending on several factors, including how much user CPU time your function consumes and how large your inputs and outputs to the JS function are.
  • A query job can have a maximum of 50 JavaScript UDF resources (inline code blobs or external files).
  • Each inline code blob is limited to a maximum size of 32 KB.
  • Each external code resource is limited to a maximum size of 1 MB.

Limitations

  • For temporary UDFs, function_name cannot contain periods.
  • The DOM objects Window, Document and Node, and functions that require them, are unsupported.
  • JavaScript functions that rely on native code are unsupported.
  • Bitwise operations in JavaScript handle only the most significant 32 bits.
  • Because of their non-deterministic nature, queries that invoke user-defined functions cannot use cached results.
  • You cannot reference a table in a UDF.
Cette page vous a-t-elle été utile ? Évaluez-la :

Envoyer des commentaires concernant…