User-Defined Functions
BigQuery supports user-defined functions (UDFs). A UDF enables you to create a function using another SQL expression, or another programming language, such as JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value. See Supported UDF data types for a list of data types you can use with UDFs.
UDFs are temporary. This means you can only use them for the current query or command-line session.
UDF examples
CREATE TEMPORARY FUNCTION multiplyInputs(x INT64, y INT64)
RETURNS INT64
LANGUAGE js AS """
return x*y;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;
+-----+-----+--------------+
| x | y | product |
+-----+-----+--------------+
| 1 | 5 | 5 |
| 2 | 10 | 20 |
| 3 | 15 | 45 |
+-----+-----+--------------+
A given query can be preceded by multiple CREATE TEMPORARY FUNCTION
statements. For example:
CREATE TEMPORARY FUNCTION multiplyInputs(x INT64, y INT64)
RETURNS INT64
LANGUAGE js AS """
return x*y;
""";
CREATE TEMPORARY FUNCTION divideByTwo(x INT64)
RETURNS INT64
LANGUAGE js AS """
return x/2;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x,
y,
multiplyInputs(x, y) as product,
divideByTwo(x) as half_x,
divideByTwo(y) as half_y
FROM numbers;
+-----+-----+--------------+--------+--------+
| x | y | product | half_x | half_y |
+-----+-----+--------------+--------+--------+
| 1 | 5 | 5 | 1 | 3 |
| 2 | 10 | 20 | 1 | 5 |
| 3 | 15 | 45 | 2 | 8 |
+-----+-----+--------------+--------+--------+
A query statement can use the result of a UDF as the parameter in another UDF. For example:
CREATE TEMPORARY FUNCTION multiplyInputs(x INT64, y INT64)
RETURNS INT64
LANGUAGE js AS """
return x*y;
""";
CREATE TEMPORARY FUNCTION divideByTwo(x INT64)
RETURNS INT64
LANGUAGE js AS """
return x/2;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x,
y,
multiplyInputs(divideByTwo(x), divideByTwo(y)) as half_product
FROM numbers;
+-----+-----+--------------+
| x | y | half_product |
+-----+-----+--------------+
| 1 | 5 | 3 |
| 2 | 10 | 5 |
| 3 | 15 | 16 |
+-----+-----+--------------+
The following example shows a UDF that employs a SQL function.
CREATE TEMPORARY FUNCTION addFourAndDivide(x INT64, y INT64) AS ((x + 4) / y);
WITH numbers AS
(SELECT 1 as val
UNION ALL
SELECT 3 as val
UNION ALL
SELECT 4 as val
UNION ALL
SELECT 5 as val)
SELECT val, addFourAndDivide(val, 2) AS result
FROM numbers;
+-----+--------+
| val | result |
+-----+--------+
| 1 | 2.5 |
| 3 | 3.5 |
| 4 | 4 |
| 5 | 4.5 |
+-----+--------+
External UDF structure
You create external UDFs using the following structure.
CREATE [TEMPORARY | TEMP] FUNCTION function_name ([named_parameter[, ...]])
RETURNS [data_type]
LANGUAGE [language]
AS [external_code]
Each UDF consists of the following components:
- CREATE [TEMPORARY | TEMP ] FUNCTION. Creates a new temporary function. A
function can contain zero or more
named_parameter
s, each consisting of comma-separatedparam_name
param_type
pairs. - RETURNS [data_type]. Specifies the data type that the temporary function returns. See Supported UDF data types for more information.
- Language [language]. Specifies the language for the temporary function. See Supported UDF languages for more information.
- AS [external_code]. Specifies the code that the temporary function runs. See Quoting rules for more information about how to add code to a UDF.
Supported UDF data types
For external UDF functions, BigQuery supports the following data types:
- ARRAY
- BOOL
- BYTES
- FLOAT64
- INT64
- STRING
- STRUCT
- TIMESTAMP
Supported UDF languages
UDFs support code written in JavaScript, which you specify using js
.
For example:
CREATE TEMPORARY FUNCTION greeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
return "Hello, " + a + "!";
""";
SELECT greeting(name) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS name;
+----------------+
| everyone |
+----------------+
| Hello, Hannah! |
| Hello, Max! |
| Hello, Jakob! |
+----------------+
See SQL type encodings in JavaScript for information on how BigQuery data types map to JavaScript types.
SQL type encodings in JavaScript
Some SQL types have a direct mapping to JavaScript types, but others do not. BigQuery represents types in the following manner:
BigQuery Data Type | JavaScript Data Type |
---|---|
BOOL | BOOLEAN |
INT64 | decimal STRING |
FLOAT64 | NUMBER |
STRING | STRING |
BYTES | base64-encoded STRING |
TIMESTAMP | DATE with a microsecond field containing the microsecond
fraction of the timestamp |
ARRAY | ARRAY |
STRUCT | OBJECT where each STRUCT field is a named field |
To treat INT64 inputs as integers, use a function, such as PARSEINT()
.
This will truncate large numbers to the most-significant 53 bits.
Quoting rules
You must enclose external code in quotes. For simple, one line code snippets, you can use a standard quoted string:
CREATE TEMPORARY FUNCTION plusOne(x INT64)
RETURNS INT64
LANGUAGE js
AS "return parseInt(x)+1";
SELECT val, plusOne(val) AS result
FROM UNNEST([1, 2, 3, 4, 5]) AS val;
+-----------+-----------+
| val | result |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 | 6 |
+-----------+-----------+
In cases where the snippet contains quotes, or consists of multiple lines, use triple-quoted blocks:
CREATE TEMPORARY FUNCTION customGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
var d = new Date();
if (d.getHours() < 12) {
return 'Good Morning, ' + a + '!';
} else {
return 'Good Evening, ' + a + '!';
}
""";
SELECT customGreeting(names) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS names;
+-----------------------+
| everyone |
+-----------------------+
| Good Morning, Hannah! |
| Good Morning, Max! |
| Good Morning, Jakob! |
+-----------------------+
SQL UDF structure
You create SQL UDFs using the following structure.
CREATE [TEMPORARY | TEMP] FUNCTION function_name ([named_parameter[, ...]])
[RETURNS data_type]
AS (sql_expression)
Each UDF consists of the following components:
- CREATE [TEMPORARY | TEMP ] FUNCTION. Creates a new temporary function. A
function can contain zero or more
named_parameter
s, each consisting of comma-separatedparam_name
param_type
pairs. - [RETURNS data_type]. Optional. Specifies the data type that the temporary function returns.
- AS [sql_expression]. Specifies the SQL expression that the temporary function evaluates and returns.
Including external libraries
You can extend your UDF statements using the OPTIONS
section. This section
allows you to specify external code libraries for the UDF.
CREATE TEMP FUNCTION myFunc(a INT64, b STRING)
RETURNS STRING
LANGUAGE js AS
"""
// Assumes 'doInterestingStuff' is defined in one of the library files.
return doInterestingStuff(a, b);
"""
OPTIONS (
library="gs://my-bucket/path/to/lib1.js",
library=["gs://my-bucket/path/to/lib2.js", "gs://my-bucket/path/to/lib3.js"]
)
In the preceding example, code in example.js
and example-common.js
is
available to any code in the [external_code]
section of the UDF. Notice that
you can specify library files using single-element or array syntax.
UDFs and the Web UI
You can use the BigQuery web UI to run queries using one or more UDFs.
Prerequisites
To use the BigQuery web UI, your account needs to have access to a BigQuery-enabled project in the Google Cloud Platform Console.
- If you haven't used the Cloud Platform Console before, go to the console, accept the terms of service, and create a new project.
- Navigate to the web UI.
Running a query with a UDF
- Click the COMPOSE QUERY button.
- Click the Query Editor tab.
- Click the Show Options button.
- Uncheck the Use Legacy SQL checkbox.
-
Type the UDF statement into the Query Editor text area. For example:
CREATE TEMPORARY FUNCTION timesTwo(x INT64) RETURNS INT64 LANGUAGE js AS """ return x*2; """;
-
Below the UDF statement, type your query. For example:
SELECT timesTwo(numbers) as doubles FROM UNNEST([1, 2, 3, 4, 5]) AS numbers;
-
Click the RUN QUERY button. The query results display underneath the buttons.
UDFs and the bq command-line tool
You can use the bq Command-Line Tool from the Google Cloud SDK to run a query containing one or more UDFs.
Use the following syntax to run a query with a UDF:
bq query <statement_with_udf_and_query>
Best practices
Pre-filter your input
If your input can be easily filtered down before being passed to a UDF, your query will likely be faster and cheaper.
Avoid persistent mutable state
Do not store or access mutable state across UDF calls.
Use memory efficiently
The JavaScript processing environment has limited memory available per query. UDF queries that accumulate too much local state may fail due to memory exhaustion.
Limits
- The amount of data that your UDF outputs when processing a single row should be approxiately 5 Mb or less.
- Each user is limited to running approximately 6 UDF queries in a specific project at the same time. If you receive an error that you're over the concurrent query limit, wait a few minutes and try again.
- A UDF can timeout and prevent your query from completing. Timeouts can be as short as 5 minutes, but can vary depending on several factors, including how much user CPU time your function consumes and how large your inputs and outputs to the JS function are.
- A query job can have a maximum of 50 UDF resources (inline code blobs or external files).
- Each inline code blob is limited to a maximum size of 32 KB.
- Each external code resource is limited to a maximum size of 1 MB.
Limitations
- The DOM objects
Window
,Document
andNode
, and functions that require them, are unsupported. - JavaScript functions that rely on native code are unsupported.
- Bitwise operations in JavaScript handle only the most significant 32 bits.
Known issues
Performance
Standard SQL UDFs are currently slower than their legacy SQL counterparts.
Disable query cache
BigQuery does not take the contents of CREATE TEMPORARY FUNCTION
statements
into account when determining whether to return results from the query cache.
Consequently, changing the external code for a UDF still allows cached results
to be returned.
To work around this issue, disable Use cached results in your query options before running queries with UDFs.
INT64 arguments
Arguments declared as type INT64 are represented as strings in the JavaScript
UDF environment. This creates the possibility for subtle errors because
JavaScript interprets the +
operator to mean “concatenation” for strings.
To have integer values imported into the JavaScript environment as native
JavaScript numbers, declare the argument type as FLOAT64 in your CREATE
TEMPORARY FUNCTION
statement. This triggers the conversion from integer
to JavaScript number automatically.
You can also cast integer values to STRING in your SQL query, and declare the arguments as type STRING. This requires that you convert the values to a JavaScript number or other type, such as goog.Math.Long in your JavaScript code.
Error message line numbers are sometimes inaccurate
Line numbers in messages related to code in your CREATE TEMP FUNCTION
statement are off by one.
Error messages are missing file/module name
Error messages for Standard SQL UDFs are currently lacking context for the file or module in which the error occurred.