Filter vector matches

In Vertex AI Matching Engine, you can restrict vector matching searches to a subset of the index by using Boolean rules. Boolean predicates tell Matching Engine which vectors in the index to ignore.

Vector attributes

In a vector similarity search over a database of vectors, each vector is described by zero-or-more attributes (or tokens) from each of several attribute categories (or namespaces).

In the following example application, vectors are tagged with a color and a shape:

  • color and shape are namespaces.
  • red and blue are tokens from the color namespace.
  • square and circle are tokens from the shape namespace.

Specify vector attributes

The following code examples identify vector attributes in the example application:

  • To specify a "red circle": {color: red}, {shape: circle}.
  • To specify a "red and blue square": {color: red, blue}, {shape: square}.
  • To specify an object with no color, omit the "color" namespace in the restricts field.

For information about the schema used to specify this data, see Specify namespaces and tokens in the input data.

Queries

  • Queries express an AND logical operator across namespaces and an OR logical operator within each namespace. A query that specifies {color: red, blue}, {shape: square, circle}, matches all database points that satisfy (red || blue) && (square || circle).
  • A query that specifies {color: red}, matches all red objects of any kind, with no restriction on shape.

Denylist

To enable more advanced scenarios, Google supports a form of negation known as denylist tokens. When a query denylists a token, matches are excluded for any datapoint that has the denylisted token. If a query namespace has only denylisted tokens, all points not explicitly denylisted, match, in exactly the same way that an empty namespace matches with all points.

Datapoints can also denylist a token, excluding matches with any query specifying that token.

For example, define the following data points with the specified tokens:

{}                  // empty set matches everything
{red}               // only a 'red' token
{blue}              // only a 'blue' token
{orange}            // only an 'orange' token
{red, blue}         // multiple tokens
{red, !blue}        // deny the 'blue' token
{red, blue, !blue}  // a weird edge-case
{!blue}             // deny-only (similar to empty-set)

The system behaves as follows:

  • Empty query namespaces are match-all wildcards. For example, Q:{} matches DB:{color:red}.
  • Empty datapoint namespaces are not match-all wildcards. For example, Q:{color:red} doesn't match DB:{}.

    Query and database points.

Specify namespaces and tokens in the input data

For information about how to structure your input data overall, see Input data format and structure.

The following tabs show how to specify the namespaces and tokens associated with each input vector.

JSON

  • For each vector's record, add a field called restricts, to contain an array of objects, each of which is a namespace.

    • Each object must have a field named namespace. This field is the TokenNamespace.namespace namespace.
    • The value of the field allow, if present, is an array of strings. This array of strings is the TokenNamespace.string_tokens list.
    • The value of the field deny, if present, is an array of strings. This array of strings is the TokenNamespace.string_denylist_tokens list.

The following are two example records in JSON format:

{"id": "42", "embedding": [0.5, 1.0], "restricts": [{"namespace": "class",
"allow": ["cat", "pet"]},{"namespace": "category", "allow": ["feline"]}]}
{"id": "43", "embedding": [0.6, 1.0], "restricts": [{"namespace":
"class", "allow": ["dog", "pet"]},{"namespace": "category", "allow":
["canine"]}]}

Avro

Avro records use the following schema:

{
   "type": "record",
   "name": "FeatureVector",
   "fields": [
      {"name": "id", "type": "string"},
      {"name": "embedding",
       "type": {
          "type": "array",
    "items": "float"
  }
      },
      {"name": "restricts",
       "type": [
         "null",
         {"type": "array",
          "items": {
          "type": "record",
          "name": "Restrict",
          "fields": [
            {"name": "namespace", "type": "string"},
            {"name": "allow", "type": ["null", {"type": "array", "items": "string"}]},
            {"name": "deny", "type": ["null", {"type": "array", "items": "string"}]}]}}]},
      {"name": "crowding_tag", "type": ["null", "string"]}]
}

What's next