Update and rebuild an active index

Stay organized with collections Save and categorize content based on your preferences.

With large search queries, updating your indices is important to always having the most accurate information. Today you can update your Vertex AI Matching Engine indices using a batch update, which allows you to insert and delete data points through a batch schedule.

Update index content with Batch Updates

To update the content of an existing Index, use the IndexService.UpdateIndex method.

To replace the existing content of an existing Index:

  • Set Index.metadata.contentsDeltaUri to the Cloud Storage URI that includes the vectors you want to update.
  • Set isCompleteOverwrite to true.

Please note that if you set the contentsDeltaUri field when calling IndexService.UpdateIndex, then no other index fields (such as displayName, description, or userLabels) can be also updated as part of the same call.

gcloud

  1. Update your index metadata file.
  2. Use the gcloud ai indexes update command:
gcloud ai indexes update INDEX_ID \
  --metadata-file=LOCAL_PATH_TO_METADATA_FILE \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • INDEX_ID: The ID of the index.
  • LOCAL_PATH_TO_METADATA_FILE: The local file path to the metadata file.
  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

REST & CMD LINE

Before using any of the request data, make the following replacements:

  • LOCATION: Your region.
  • PROJECT: Your project ID.
  • INPUT_DIR: The Cloud Storage directory path of the index content.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

PATCH https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexes/INDEX_ID

Request JSON body:

{
  "metadata": {
    "contentsDeltaUri": "INPUT_DIR",
    "isCompleteOverwrite": true
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.UpdateIndexOperationMetadata",
    "genericMetadata": {
      "createTime": "2022-01-12T23:56:14.480948Z",
      "updateTime": "2022-01-12T23:56:14.480948Z"
    }
  }
}
You can poll for the status of the operation until the response includes "done": true.

If the Index has any associated deployments (see the Index.deployed_indexes field), then when certain changes to the original Index are done, the DeployedIndex is automatically updated asynchronously in the background to reflect these changes.

To check whether the change has been propagated, compare the update index operation finish time and the DeployedIndex.index_sync_time.

Update an index using Streaming Updates

Previously, Vertex AI Matching Engine only supported Batch Updates to process the updates and update the deployments. With Streaming Updates, you can update and query your index within a few seconds. During this preview launch period, using Streaming Updates does not incur a cost. After the global availability launch, you are charged $0.45 per GB used for Streaming Updates. To learn more about Vertex AI Matching Engine pricing, see the Vertex AI pricing page.

At this time you can't use Streaming Updates on an existing index, you must create a new index. See Create an index for Streaming Update to learn more.

Upsert Datapoints

You can upsert up to 20 data points in one request. If the data point ID exists in the index, the embedding is updated, otherwise, a new embedding is added.

DATAPOINT_ID_1=
DATAPOINT_ID_2=
curl -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`" https://${ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${REGION}/indexes/${INDEX_ID}:upsertDatapoints \
-d '{datapoints: [{datapoint_id: "'${DATAPOINT_ID_1}'", feature_vector: [...]},
{datapoint_id: "'${DATAPOINT_ID_2}'", feature_vector: [...]}]}'

Upserting with restricts

curl -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`" https://${ENDPOINT}/v1/projects/${PROJECT_ID}/locations/us-central1/indexes/${INDEX_ID}:upsertDatapoints \
-d '{
datapoints: [
  {
    datapoint_id: "'${DATAPOINT_ID_1}'",
    feature_vector: [...],
    restricts: { namespace: "color", allow_list: ["red"], deny_list: ["blue"]}
  }
]}'

Upserting with crowding

curl -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`" https://${ENDPOINT}/v1/projects/${PROJECT_ID}/locations/us-central1/indexes/${INDEX_ID}:upsertDatapoints \
-d '{
datapoints: [
  {
    datapoint_id: "'${DATAPOINT_ID_1}'",
    feature_vector: [...],
    crowding_tag: { crowding_attribute: "dog"},
  }
]}'

Remove datapoints

curl -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`" https://${ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${REGION}/indexes/${INDEX_ID}:removeDatapoints -d '{datapoint_ids: ["'${DATAPOINT_ID_1}'", "'${DATAPOINT_ID_2}'"]}'

Compaction

Streaming Updates are directly applied to the deployed indexes in memory so they are reflected in query results after a short delay. Periodically, we rebuild the index with all the updates since the last rebuild to improve query performance and reliability. These rebuilds are referred to as "compactions". Compactions are transparent to you, but you are be billed for the cost of rebuilding the index at the same rate of batch update, in addition to the Streaming Update costs. Note: If you created the index with a service account, you won't receive the email.

Rebuild and query your index

You can send Match/BatchMatch requests as usual with grpc cli, client library or python SDK. You can expect to see your updates within a few seconds. Query an index here.