Query indexes to get nearest neighbors

Stay organized with collections Save and categorize content based on your preferences.

After you've created the index, you can run queries to get its nearest neighbors.

Each DeployedIndex has a DEPLOYED_INDEX_SERVER_IP, which you can retrieve by listing IndexEndpoints. To query a DeployedIndex, connect to its DEPLOYED_INDEX_SERVER_IP at port 10000 and call the Match or BatchMatch method.

The following examples use the open source tool grpc_cli to send grpc requests to the deployed index server. In the first example, you send a single query using the Match method

./grpc_cli call ${DEPLOYED_INDEX_SERVER_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match '{deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]}'

In the second example, you combine two separate queries into the same BatchMatch request.

./grpc_cli call ${DEPLOYED_INDEX_SERVER_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.BatchMatch 'requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]}, {deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.2,..]}]}]'

You must make calls to these APIs from a client running in the same VPC that the service was peered with.

To run these queries, you can also use the Python Cloud Client Library for Vertex AI. To learn more, see Client libraries explained

Tuning the index

Tuning the index requires setting the configuration parameters that impact the performance of deployed indexes, especially recall and latency. These parameters are set when you first create the index. You can use brute force indexes to measure recall.

Configuration parameters that impact recall and latency

  1. distanceMeasureType

    The following values are supported:

    • SQUARED_L2_DISTANCE: Euclidean L2 distance
    • L1_DISTANCE: Manhattan L1 distance
    • COSINE_DISTANCE: Cosine distance defined as '1 - cosine similarity'
    • DOT_PRODUCT_DISTANCE: vDot product distance, defined as a negative of the dot product. This is the default value.

    In most cases, the embedding vectors used for similarity matching are computed by using metric learning models (also called Siamese networks or two-tower models). These models use a distance metric to compute the contrastive loss function. Ideally, the value of the distanceMeasureType parameter for the matching index matches the distance measure used by the model that produced the embedding vectors.

  2. approximateNeighborsCount

    The default number of neighbors to find by using approximate search before exact reordering is performed. Exact reordering is a procedure where results returned by an approximate search algorithm are reordered by a more expensive distance computation. Increasing this value increases recall, which can create a proportionate increase in latency.

  3. treeAhConfig.leafNodesToSearchPercent

    The percentage of leaves to be searched for each query. Increasing this value increases recall, which can also create a proportionate increase in latency. The default value is 10 or 10% of the leaves.

  4. treeAhConfig.leafNodeEmbeddingCount

    The number of embeddings for each leaf node. By default, this number is set to 1000.

    This parameter does not have a linear correlation to recall. Increasing or decreasing the value of the treeAhConfig.leafNodeEmbeddingCount parameter doesn't always increase or decrease recall. Experiment to find the optimal value. Changing the value of the treeAhConfig.leafNodeEmbeddingCount parameter generally has less affect than changing the value of the other parameters.

Using a brute force index to measure recall

To get the exact nearest neighbors, use indexes with the brute force algorithm. The brute force algorithm provides 100% recall at the expense of higher latency. Using a brute force index to measure recall is usually not a good choice for production serving, but you might find it useful for evaluating the recall of various indexing options offline.

To create an index with the brute force algorithm, specify brute_force_config in the index metadata:

curl -X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer `gcloud auth print-access-token`" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/indexes \
-d '{
    displayName: "'${DISPLAY_NAME}'",
    description: "'${DESCRIPTION}'",
    metadata: {
       contentsDeltaUri: "'${INPUT_DIR}'",
       config: {
          dimensions: 100,
          approximateNeighborsCount: 150,
          distanceMeasureType: "DOT_PRODUCT_DISTANCE",
          featureNormType: "UNIT_L2_NORM",
          algorithmConfig: {
             bruteForceConfig: {}