User-Defined Functions

User-Defined Functions

BigQuery supports user-defined functions (UDFs). A UDF enables you to create a function using another SQL expression, or another programming language, such as JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value. See Supported UDF data types for a list of data types you can use with UDFs.

UDFs are temporary. This means you can only use them for the current query or command-line session.

UDF examples

CREATE TEMPORARY FUNCTION multiplyInputs(x INT64, y INT64)
RETURNS INT64
LANGUAGE js AS """
  return x*y;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;

+-----+-----+--------------+
| x   | y   | product      |
+-----+-----+--------------+
| 1   | 5   | 5            |
| 2   | 10  | 20           |
| 3   | 15  | 45           |
+-----+-----+--------------+

A given query can be preceded by multiple CREATE TEMPORARY FUNCTION statements. For example:

CREATE TEMPORARY FUNCTION multiplyInputs(x INT64, y INT64)
RETURNS INT64
LANGUAGE js AS """
  return x*y;
""";
CREATE TEMPORARY FUNCTION divideByTwo(x INT64)
RETURNS INT64
LANGUAGE js AS """
  return x/2;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x,
  y,
  multiplyInputs(x, y) as product,
  divideByTwo(x) as half_x,
  divideByTwo(y) as half_y
FROM numbers;

+-----+-----+--------------+--------+--------+
| x   | y   | product      | half_x | half_y |
+-----+-----+--------------+--------+--------+
| 1   | 5   | 5            | 1      | 3      |
| 2   | 10  | 20           | 1      | 5      |
| 3   | 15  | 45           | 2      | 8      |
+-----+-----+--------------+--------+--------+

A query statement can use the result of a UDF as the parameter in another UDF. For example:

CREATE TEMPORARY FUNCTION multiplyInputs(x INT64, y INT64)
RETURNS INT64
LANGUAGE js AS """
  return x*y;
""";
CREATE TEMPORARY FUNCTION divideByTwo(x INT64)
RETURNS INT64
LANGUAGE js AS """
  return x/2;
""";
WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x,
  y,
  multiplyInputs(divideByTwo(x), divideByTwo(y)) as half_product
FROM numbers;

+-----+-----+--------------+
| x   | y   | half_product |
+-----+-----+--------------+
| 1   | 5   | 3            |
| 2   | 10  | 5            |
| 3   | 15  | 16           |
+-----+-----+--------------+

The following example shows a UDF that employs a SQL function.

CREATE TEMPORARY FUNCTION addFourAndDivide(x INT64, y INT64) AS ((x + 4) / y);
WITH numbers AS
  (SELECT 1 as val
  UNION ALL
  SELECT 3 as val
  UNION ALL
  SELECT 4 as val
  UNION ALL
  SELECT 5 as val)
SELECT val, addFourAndDivide(val, 2) AS result
FROM numbers;

+-----+--------+
| val | result |
+-----+--------+
| 1   | 2.5    |
| 3   | 3.5    |
| 4   | 4      |
| 5   | 4.5    |
+-----+--------+

External UDF structure

You create external UDFs using the following structure.

CREATE [TEMPORARY | TEMP] FUNCTION function_name ([named_parameter[, ...]])
  RETURNS [data_type]
  LANGUAGE [language]
  AS [external_code]

Each UDF consists of the following components:

  • CREATE [TEMPORARY | TEMP ] FUNCTION. Creates a new temporary function. A function can contain zero or more named_parameters, each consisting of comma-separated param_name param_type pairs.
  • RETURNS [data_type]. Specifies the data type that the temporary function returns. See Supported UDF data types for more information.
  • Language [language]. Specifies the language for the temporary function. See Supported UDF languages for more information.
  • AS [external_code]. Specifies the code that the temporary function runs. See Quoting rules for more information about how to add code to a UDF.

Supported UDF data types

For external UDF functions, BigQuery supports the following data types:

  • ARRAY
  • BOOL
  • BYTES
  • FLOAT64
  • INT64
  • STRING
  • STRUCT
  • TIMESTAMP

Supported UDF languages

UDFs support code written in JavaScript, which you specify using js. For example:

CREATE TEMPORARY FUNCTION greeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
  return "Hello, " + a + "!";
  """;
SELECT greeting(name) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS name;

+----------------+
| everyone       |
+----------------+
| Hello, Hannah! |
| Hello, Max!    |
| Hello, Jakob!  |
+----------------+

See SQL type encodings in JavaScript for information on how BigQuery data types map to JavaScript types.

SQL type encodings in JavaScript

Some SQL types have a direct mapping to JavaScript types, but others do not. BigQuery represents types in the following manner:

BigQuery Data Type JavaScript Data Type
BOOL BOOLEAN
INT64 decimal STRING
FLOAT64 NUMBER
STRING STRING
BYTES base64-encoded STRING
TIMESTAMP DATE with a microsecond field containing the microsecond fraction of the timestamp
ARRAY ARRAY
STRUCT OBJECT where each STRUCT field is a named field

To treat INT64 inputs as integers, use a function, such as PARSEINT(). This will truncate large numbers to the most-significant 53 bits.

Quoting rules

You must enclose external code in quotes. For simple, one line code snippets, you can use a standard quoted string:

CREATE TEMPORARY FUNCTION plusOne(x INT64)
RETURNS INT64
LANGUAGE js
AS "return parseInt(x)+1";
SELECT val, plusOne(val) AS result
FROM UNNEST([1, 2, 3, 4, 5]) AS val;

+-----------+-----------+
| val       | result    |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 4         |
| 4         | 5         |
| 5         | 6         |
+-----------+-----------+

In cases where the snippet contains quotes, or consists of multiple lines, use triple-quoted blocks:

CREATE TEMPORARY FUNCTION customGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
  var d = new Date();
  if (d.getHours() < 12) {
    return 'Good Morning, ' + a + '!';
  } else {
    return 'Good Evening, ' + a + '!';
  }
  """;
SELECT customGreeting(names) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS names;
+-----------------------+
| everyone              |
+-----------------------+
| Good Morning, Hannah! |
| Good Morning, Max!    |
| Good Morning, Jakob!  |
+-----------------------+

SQL UDF structure

You create SQL UDFs using the following structure.

CREATE [TEMPORARY | TEMP] FUNCTION function_name ([named_parameter[, ...]])
  [RETURNS data_type]
  AS (sql_expression)

Each UDF consists of the following components:

  • CREATE [TEMPORARY | TEMP ] FUNCTION. Creates a new temporary function. A function can contain zero or more named_parameters, each consisting of comma-separated param_name param_type pairs.
  • [RETURNS data_type]. Optional. Specifies the data type that the temporary function returns.
  • AS [sql_expression]. Specifies the SQL expression that the temporary function evaluates and returns.

Including external libraries

You can extend your UDF statements using the OPTIONS section. This section allows you to specify external code libraries for the UDF.

CREATE TEMP FUNCTION myFunc(a INT64, b STRING)
  RETURNS STRING
  LANGUAGE js AS
"""
    // Assumes 'doInterestingStuff' is defined in one of the library files.
    return doInterestingStuff(a, b);
"""
OPTIONS (
  library="gs://my-bucket/path/to/lib1.js",
  library=["gs://my-bucket/path/to/lib2.js", "gs://my-bucket/path/to/lib3.js"]
)

In the preceding example, code in example.js and example-common.js is available to any code in the [external_code] section of the UDF. Notice that you can specify library files using single-element or array syntax.

UDFs and the Web UI

You can use the BigQuery web UI to run queries using one or more UDFs.

Prerequisites

To use the BigQuery web UI, your account needs to have access to a BigQuery-enabled project in the Google Cloud Platform Console.

  1. If you haven't used the Cloud Platform Console before, go to the console, accept the terms of service, and create a new project.
  2. Navigate to the web UI.

Running a query with a UDF

  1. Click the COMPOSE QUERY button.
  2. Click the Query Editor tab.
  3. Click the Show Options button.
  4. Uncheck the Use Legacy SQL checkbox.
  5. Type the UDF statement into the Query Editor text area. For example:

    CREATE TEMPORARY FUNCTION timesTwo(x INT64)
    RETURNS INT64
      LANGUAGE js AS """
      return x*2;
    """;
  6. Below the UDF statement, type your query. For example:

    SELECT timesTwo(numbers) as doubles
    FROM UNNEST([1, 2, 3, 4, 5]) AS numbers;
  7. Click the RUN QUERY button. The query results display underneath the buttons.

UDFs and the bq command-line tool

You can use the bq Command-Line Tool from the Google Cloud SDK to run a query containing one or more UDFs.

Use the following syntax to run a query with a UDF:

bq query <statement_with_udf_and_query>

Best practices

Pre-filter your input

If your input can be easily filtered down before being passed to a UDF, your query will likely be faster and cheaper.

Avoid persistent mutable state

Do not store or access mutable state across UDF calls.

Use memory efficiently

The JavaScript processing environment has limited memory available per query. UDF queries that accumulate too much local state may fail due to memory exhaustion.

Limits

  • The amount of data that your UDF outputs when processing a single row should be approxiately 5 Mb or less.
  • Each user is limited to running approximately 6 UDF queries in a specific project at the same time. If you receive an error that you're over the concurrent query limit, wait a few minutes and try again.
  • A UDF can timeout and prevent your query from completing. Timeouts can be as short as 5 minutes, but can vary depending on several factors, including how much user CPU time your function consumes and how large your inputs and outputs to the JS function are.
  • A query job can have a maximum of 50 UDF resources (inline code blobs or external files).
  • Each inline code blob is limited to a maximum size of 32 KB.
  • Each external code resource is limited to a maximum size of 1 MB.

Limitations

  • The DOM objects Window, Document and Node, and functions that require them, are unsupported.
  • JavaScript functions that rely on native code are unsupported.
  • Bitwise operations in JavaScript handle only the most significant 32 bits.

Known issues

Performance

Standard SQL UDFs are currently slower than their legacy SQL counterparts.

Disable query cache

BigQuery does not take the contents of CREATE TEMPORARY FUNCTION statements into account when determining whether to return results from the query cache. Consequently, changing the external code for a UDF still allows cached results to be returned.

To work around this issue, disable Use cached results in your query options before running queries with UDFs.

INT64 arguments

Arguments declared as type INT64 are represented as strings in the JavaScript UDF environment. This creates the possibility for subtle errors because JavaScript interprets the + operator to mean “concatenation” for strings.

To have integer values imported into the JavaScript environment as native JavaScript numbers, declare the argument type as FLOAT64 in your CREATE TEMPORARY FUNCTION statement. This triggers the conversion from integer to JavaScript number automatically.

You can also cast integer values to STRING in your SQL query, and declare the arguments as type STRING. This requires that you convert the values to a JavaScript number or other type, such as goog.Math.Long in your JavaScript code.

Error message line numbers are sometimes inaccurate

Line numbers in messages related to code in your CREATE TEMP FUNCTION statement are off by one.

Error messages are missing file/module name

Error messages for Standard SQL UDFs are currently lacking context for the file or module in which the error occurred.

フィードバックを送信...