In Vertex AI Matching Engine, you can restrict vector matching searches to a subset of the index by using Boolean rules. Boolean predicates tell Matching Engine which vectors in the index to ignore.
Vector attributes
In a vector similarity search over a database of vectors, each vector is described by zero-or-more attributes (or tokens) from each of several attribute categories (or namespaces).
In the following example application, vectors are tagged with a color
and
a shape
:
color
andshape
are namespaces.red
andblue
are tokens from thecolor
namespace.square
andcircle
are tokens from theshape
namespace.
Specify vector attributes
The following code examples identify vector attributes in the example application:
- To specify a "red circle":
{color: red}, {shape: circle}
. - To specify a "red and blue square":
{color: red, blue}, {shape: square}
. - To specify an object with no color, omit the "color"
namespace in the
restricts
field.
For information about the schema used to specify this data, see Specify namespaces and tokens in the input data.
Queries
- Queries express an AND logical operator across namespaces and an OR logical
operator within each namespace. A query that specifies
{color: red, blue}, {shape: square, circle}
, matches all database points that satisfy(red || blue) && (square || circle)
. - A query that specifies
{color: red}
, matches allred
objects of any kind, with no restriction onshape
.
Denylist
To enable more advanced scenarios, Google supports a form of negation known as denylist tokens. When a query denylists a token, matches are excluded for any datapoint that has the denylisted token. If a query namespace has only denylisted tokens, all points not explicitly denylisted, match, in exactly the same way that an empty namespace matches with all points.
Datapoints can also denylist a token, excluding matches with any query specifying that token.
For example, define the following data points with the specified tokens:
A: {} // empty set matches everything B: {red} // only a 'red' token C: {blue} // only a 'blue' token D: {orange} // only an 'orange' token E: {red, blue} // multiple tokens F: {red, !blue} // deny the 'blue' token G: {red, blue, !blue} // a weird edge-case H: {!blue} // deny-only (similar to empty-set)
The system behaves as follows:
- Empty query namespaces are match-all wildcards. For example,
Q:
{}
matches DB:{color:red}
. Empty datapoint namespaces are not match-all wildcards. For example, Q:
{color:red}
doesn't match DB:{}
.
Specify namespaces and tokens in the input data
For information about how to structure your input data overall, see Input data format and structure.
The following tabs show how to specify the namespaces and tokens associated with each input vector.
JSON
For each vector's record, add a field called
restricts
, to contain an array of objects, each of which is a namespace.- Each object must have a field named
namespace
. This field is the TokenNamespace.namespace namespace. - The value of the field
allow
, if present, is an array of strings. This array of strings is the TokenNamespace.string_tokens list. - The value of the field
deny
, if present, is an array of strings. This array of strings is the TokenNamespace.string_denylist_tokens list.
- Each object must have a field named
The following are two example records in JSON format:
{"id": "42", "embedding": [0.5, 1.0], "restricts": [{"namespace": "class",
"allow": ["cat", "pet"]},{"namespace": "category", "allow": ["feline"]}]}
{"id": "43", "embedding": [0.6, 1.0], "restricts": [{"namespace":
"class", "allow": ["dog", "pet"]},{"namespace": "category", "allow":
["canine"]}]}
Avro
Avro records use the following schema:
{
"type": "record",
"name": "FeatureVector",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "embedding",
"type": {
"type": "array",
"items": "float"
}
},
{
"name": "restricts",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "Restrict",
"fields": [
{
"name": "namespace",
"type": "string"
},
{
"name": "allow",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
},
{
"name": "deny",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
}
]
}
}
]
},
{
"name": "crowding_tag",
"type": [
"null",
"string"
]
}
]
}