The ML.DISTANCE function
Stay organized with collections
Save and categorize content based on your preferences.
This document describes the ML.DISTANCE
scalar function, which lets you
compute the distance between two vectors.
Syntax
ML.DISTANCE(vector1, vector2 [, type])
Arguments
ML.DISTANCE
has the following arguments:
vector1
: anARRAY
value that represents the first vector, in one of the following forms:ARRAY<Numerical type>
ARRAY<STRUCT<STRING, Numerical type>>
ARRAY<STRUCT<INT64, Numerical type>>
where
Numerical type
isBIGNUMERIC
,FLOAT64
,INT64
orNUMERIC
. For exampleARRAY<STRUCT<INT64, BIGNUMERIC>>
.When a vector is expressed as
ARRAY<Numerical type>
, each element of the array denotes one dimension of the vector. An example of a four-dimensional vector is[0.0, 1.0, 1.0, 0.0]
.When a vector is expressed as
ARRAY<STRUCT<STRING, Numerical type>>
orARRAY<STRUCT<INT64, Numerical type>>
, eachSTRUCT
array item denotes one dimension of the vector. An example of a three-dimensional vector is[("a", 0.0), ("b", 1.0), ("c", 1.0)]
.The initial
INT64
orSTRING
value in theSTRUCT
is used as an identifier to match theSTRUCT
values invector2
. The ordering of data in the array doesn't matter; the values are matched by the identifier rather than by their position in the array. If either vector has anySTRUCT
values with duplicate identifiers, running this function returns an error.vector2
: anARRAY
value that represents the second vector.vector2
must have the same type asvector1
.For example, if
vector1
is anARRAY<STRUCT<STRING, FLOAT64>>
column with three elements, like[("a", 0.0), ("b", 1.0), ("c", 1.0)]
, thenvector2
must also be anARRAY<STRUCT<STRING, FLOAT64>>
column.When
vector1
andvector2
areARRAY<Numerical type>
columns, they must have the same array length.type
: aSTRING
value that specifies the type of distance to calculate. Valid values areEUCLIDEAN
,MANHATTAN
, andCOSINE
. If this argument isn't specified, the default value isEUCLIDEAN
.
Output
ML.DISTANCE
returns a FLOAT64
value that represents the distance between
the vectors. Returns NULL
if either vector1
or vector2
is NULL
.
Example
Get the Euclidean distance for two tensors of ARRAY<FLOAT64>
values:
Create the table
t1
:CREATE TABLE mydataset.t1 ( v1 ARRAY<FLOAT64>, v2 ARRAY<FLOAT64> )
Populate
t1
:INSERT mydataset.t1 (v1,v2) VALUES ([4.1,0.5,1.0], [3.0,0.0,2.5])
Calculate the Euclidean norm for
v1
andv2
:SELECT v1, v2, ML.DISTANCE(v1, v2, 'EUCLIDEAN') AS output FROM mydataset.t1
This query produces the following output:
+---------------+---------------+-------------------+ | v1 | v2 | output | +---------------+---------------+-------------------| | [4.1,0.5,1.0] | [3.0,0.0,2.5] | 1.926136028425822 | +------------+------------------+-------------------+
What's next
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.