Query PSA or PSC indexes to get nearest neighbors

Once you've created a PSA or PSC index, you can run queries to get its nearest neighbors.

About querying PSC indexes

The created compute address from a PSC index can be used to send queries to it. In the following example, replace TARGET_IP with the created compute address.

About querying PSA indexes

Each DeployedIndex has a TARGET_IP, which you can retrieve by listing IndexEndpoints.

Query an index

To query a DeployedIndex, connect to its TARGET_IP at port 10000 and call the Match or BatchMatch method. Additionally, you can query using DOC_ID.

The following examples use the open source tool grpc_cli to send grpc requests to the deployed index server.


In the first example, you send a single query using the Match method.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]'

In the second example, you combine two separate queries into the same BatchMatch request.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.BatchMatch 'requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]}, {deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.2,..]}]}]'

You must make calls to these APIs from a client running in the same [VPC that the service was peered with](#vpc-network-peering-setup).

To run a query using a DOC_ID, use the following example.

./grpc_cli call ${TARGET_IP}:10000  google.cloud.aiplatform.container.v1.MatchService.Match "deployed_index_id:'"test_index1"',embedding_id: '"606431"'"

In this example, you send a query using token and numeric restricts.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [1, 1], "sparse_embedding": {"values": [111.0,111.1,111.2], "dimensions": [10,20,30]}, numeric_restricts: [{name: "double-ns", value_double: 0.3, op: LESS_EQUAL}, {name: "double-ns", value_double: -1.2, op: GREATER}, {name: "double-ns", value_double: 0., op: NOT_EQUAL}], restricts: [{name: "color", allow_tokens: ["red"]}]'

To learn more, see Client libraries explained.


Use these instructions to query a VPC index from the console.

  1. In the Vertex AI section of the Google Cloud console, go to the Deploy and Use section. Select Vector Search

    Go to Vector Search

  2. Select the VPC index you want to query. The Index info page opens.
  3. Scroll down to the Deployed indexes section and select the deployed index you want to query. The Deployed index info page opens.
  4. From the Query index section, select your query parameters. You can choose to query by a vector, or a specific data point.
  5. Execute the query using the open source tool grpc_cli, or by using the Vertex AI SDK for Python.

Query-time settings that impact performance

The following query-time parameters can affect latency, availability, and cost when using Vector Search. This guidance applies to most cases. However, always experiment with your configurations to make sure that they work for your use case.

For parameter definitions, see Index configuration parameters.

Parameter About Performance impact

Tells the algorithm the number of approximate results to retrieve from each shard.

The value of approximateNeighborsCount should always be greater than the value of setNeighborsCount. If the value of setNeighborsCount is small, 10 times that value is recommended for approximateNeighborsCount. For larger setNeighborsCount values, a smaller multiplier can be used.

Increasing the value of approximateNeighborsCount can affect performance in the following ways:

  • Recall: Increased
  • Latency: Potentially increased
  • Availability: No impact
  • Cost: Can increase because more data is processed during a search

Decreasing the value of approximateNeighborsCount can affect performance in the following ways:

  • Recall: Decreased
  • Latency: Potentially decreases
  • Availability: No impact
  • Cost: Can decrease cost because less data is processed during a search
setNeighborCount Specifies the number of results that you want the query to return.

Values less than or equal to 300 remain performant in most use cases. For larger values, test for your specific use case.

fractionLeafNodesToSearch Controls the percentage of leaf nodes to visit when searching for nearest neighbors. This is related to the leafNodeEmbeddingCount in that the more embeddings per leaf node, the more data examined per leaf.

Increasing the value of fractionLeafNodesToSearch can affect performance in the following ways:

  • Recall: Increased
  • Latency: Increased
  • Availability: No impact
  • Cost: Can increase because higher latency occupies more machine resources

Decreasing the value of fractionLeafNodesToSearch can affect performance in the following ways:

  • Recall: Decreased
  • Latency: Decreased
  • Availability: No impact
  • Cost: Can decrease because lower latency occupies fewer machine resources

What's next