The ML.DISTANCE function
This document describes the ML.DISTANCE
scalar function, which lets you
compute the distance between two vectors.
Syntax
ML.DISTANCE(vector1, vector2 [, type])
Arguments
ML.DISTANCE
has the following arguments:
vector1
: anARRAY
value that represents the first vector, in one of the following forms:ARRAY<Numerical type>
ARRAY<STRUCT<STRING, Numerical type>>
ARRAY<STRUCT<INT64, Numerical type>>
where
Numerical type
isBIGNUMERIC
,FLOAT64
,INT64
orNUMERIC
. For exampleARRAY<STRUCT<INT64, BIGNUMERIC>>
.When a vector is expressed as
ARRAY<Numerical type>
, each element of the array denotes one dimension of the vector. An example of a four-dimensional vector is[0.0, 1.0, 1.0, 0.0]
.When a vector is expressed as
ARRAY<STRUCT<STRING, Numerical type>>
orARRAY<STRUCT<INT64, Numerical type>>
, eachSTRUCT
array item denotes one dimension of the vector. An example of a three-dimensional vector is[("a", 0.0), ("b", 1.0), ("c", 1.0)]
.The initial
INT64
orSTRING
value in theSTRUCT
is used as an identifier to match theSTRUCT
values invector2
. The ordering of data in the array doesn't matter; the values are matched by the identifier rather than by their position in the array. If either vector has anySTRUCT
values with duplicate identifiers, running this function returns an error.vector2
: anARRAY
value that represents the second vector.vector2
must have the same type asvector1
.For example, if
vector1
is anARRAY<STRUCT<STRING, FLOAT64>>
column with three elements, like[("a", 0.0), ("b", 1.0), ("c", 1.0)]
, thenvector2
must also be anARRAY<STRUCT<STRING, FLOAT64>>
column.When
vector1
andvector2
areARRAY<Numerical type>
columns, they must have the same array length.type
: aSTRING
value that specifies the type of distance to calculate. Valid values areEUCLIDEAN
,MANHATTAN
, andCOSINE
. If this argument isn't specified, the default value isEUCLIDEAN
.
Output
ML.DISTANCE
returns a FLOAT64
value that represents the distance between
the vectors. Returns NULL
if either vector1
or vector2
is NULL
.
Example
Get the Euclidean distance for two tensors of ARRAY<FLOAT64>
values:
Create the table
t1
:CREATE TABLE mydataset.t1 ( v1 ARRAY<FLOAT64>, v2 ARRAY<FLOAT64> )
Populate
t1
:INSERT mydataset.t1 (v1,v2) VALUES ([4.1,0.5,1.0], [3.0,0.0,2.5])
Calculate the Euclidean norm for
v1
andv2
:SELECT v1, v2, ML.DISTANCE(v1, v2, 'EUCLIDEAN') AS output FROM mydataset.t1
This query produces the following output:
+---------------+---------------+-------------------+ | v1 | v2 | output | +---------------+---------------+-------------------| | [4.1,0.5,1.0] | [3.0,0.0,2.5] | 1.926136028425822 | +------------+------------------+-------------------+
What's next
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.