Manage indexes

The following sections describe how to configure, create, list, and delete your indexes.

Index overview

An index is a file or files consisting of your embedding vectors. These vectors are made from large amounts of data you want to deploy and query with Vector Search. With Vector Search, you can create two types of indexes, depending on how you plan to update them with your data. You can create an index designed for batch updates, or an index designed for streaming your updates.

A batch index is for when you want to update your index in a batch, with data which has been stored over a set amount of time, like systems which are processed weekly or monthly. A streaming index is when you want index data to be updated as new data is added to your datastore, for instance, if you have a bookstore and want to show new inventory online as soon as possible. Which type you choose is important, since setup and requirements are different.

Configure index parameters

Before you create an index, configure the parameters for your index.

For example, create a file named index_metadata.json:

{
  "contentsDeltaUri": "gs://BUCKET_NAME/path",
  "config": {
    "dimensions": 100,
    "approximateNeighborsCount": 150,
    "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
    "shardSize": "SHARD_SIZE_MEDIUM",
    "algorithm_config": {
      "treeAhConfig": {
        "leafNodeEmbeddingCount": 5000,
        "fractionLeafNodesToSearch": 0.03
      }
    }
  }
}

You can find the definition for each of these fields in Index configuration parameters.

Create an index

Index size

Index data is split into equal parts called shards for processing. When you create an index, you must specify the size of the shards to use. The supported sizes are as follows:

SHARD_SIZE_SMALL: 2 GiB per shard.
SHARD_SIZE_MEDIUM: 20 GiB per shard.
SHARD_SIZE_LARGE: 50 GiB per shard.

The machine types that you can use to deploy your index (using public endpoints or using VPC endpoints) depends on the shard size of the index. The following table shows the shard sizes that each machine type supports:

Machine type	`SHARD_SIZE_SMALL`	`SHARD_SIZE_MEDIUM`	`SHARD_SIZE_LARGE`
`n1-standard-16`
`n1-standard-32`
`e2-standard-2`	(default)
`e2-standard-16`		(default)
`e2-highmem-16`			(default)
`n2d-standard-32`

To learn how shard size and machine type affect pricing, see the Vertex AI pricing page. To learn how shard size impacts performance, see Configuration parameters that impact performance.

Create an index for batch update

Use these instructions to create and deploy your index. If you don't have your embeddings ready yet, you can skip to Create an empty batch index. With this option, no embeddings data is required at index creation time.

To create an index:

gcloud

Before using any of the command data below, make the following replacements:

LOCAL_PATH_TO_METADATA_FILE: The local path to the metadata file.
INDEX_NAME: Display name for the index.
LOCATION: The region where you are using Vertex AI.
PROJECT_ID: Your Google Cloud project ID.

Execute the following command:

Linux, macOS, or Cloud Shell

gcloud ai indexes create \
    --metadata-file=LOCAL_PATH_TO_METADATA_FILE \
    --display-name=INDEX_NAME \
    --region=LOCATION \
    --project=PROJECT_ID

Windows (PowerShell)

gcloud ai indexes create `
    --metadata-file=LOCAL_PATH_TO_METADATA_FILE `
    --display-name=INDEX_NAME `
    --region=LOCATION `
    --project=PROJECT_ID

Windows (cmd.exe)

gcloud ai indexes create ^
    --metadata-file=LOCAL_PATH_TO_METADATA_FILE ^
    --display-name=INDEX_NAME ^
    --region=LOCATION ^
    --project=PROJECT_ID

You should receive a response similar to the following:

You can poll for the status of the operation for the response
to include "done": true. Use the following example to poll the status.

  $ gcloud ai operations describe 1234567890123456789 --project=my-test-project --region=us-central1

See gcloud ai operations to learn more about the describe command.

REST

Before using any of the request data, make the following replacements:

INPUT_DIR: The Cloud Storage directory path of the index content.
INDEX_NAME: Display name for the index.
LOCATION: The region where you are using Vertex AI.
PROJECT_ID: Your Google Cloud project ID.
PROJECT_NUMBER: Your project's automatically generated project number.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes

Request JSON body:

{
  "display_name": "INDEX_NAME",
  "metadata": {
    "contentsDeltaUri": "INPUT_DIR",
    "config": {
      "dimensions": 100,
      "approximateNeighborsCount": 150,
      "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
      "algorithm_config": {
        "treeAhConfig": {
          "leafNodeEmbeddingCount": 500,
          "leafNodesToSearchPercent": 7
        }
      }
    }
  }
}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes"

PowerShell (Windows)

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateIndexOperationMetadata",
    "genericMetadata": {
      "createTime": "2022-01-08T01:21:10.147035Z",
      "updateTime": "2022-01-08T01:21:10.147035Z"
    }
  }
}

Terraform

The following sample uses the google_vertex_ai_index Terraform resource to create an index for batch updates.

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

# Cloud Storage bucket name must be unique
resource "random_id" "bucket_name_suffix" {
  byte_length = 8
}

# Create a Cloud Storage bucket
resource "google_storage_bucket" "bucket" {
  name                        = "vertex-ai-index-bucket-${random_id.bucket_name_suffix.hex}"
  location                    = "us-central1"
  uniform_bucket_level_access = true
}

# Create index content
resource "google_storage_bucket_object" "data" {
  name    = "contents/data.json"
  bucket  = google_storage_bucket.bucket.name
  content = <<EOF
{"id": "42", "embedding": [0.5, 1.0], "restricts": [{"namespace": "class", "allow": ["cat", "pet"]},{"namespace": "category", "allow": ["feline"]}]}
{"id": "43", "embedding": [0.6, 1.0], "restricts": [{"namespace": "class", "allow": ["dog", "pet"]},{"namespace": "category", "allow": ["canine"]}]}
EOF
}

resource "google_vertex_ai_index" "default" {
  region       = "us-central1"
  display_name = "sample-index-batch-update"
  description  = "A sample index for batch update"
  labels = {
    foo = "bar"
  }

  metadata {
    contents_delta_uri = "gs://${google_storage_bucket.bucket.name}/contents"
    config {
      dimensions                  = 2
      approximate_neighbors_count = 150
      distance_measure_type       = "DOT_PRODUCT_DISTANCE"
      algorithm_config {
        tree_ah_config {
          leaf_node_embedding_count    = 500
          leaf_nodes_to_search_percent = 7
        }
      }
    }
  }
  index_update_method = "BATCH_UPDATE"

  timeouts {
    create = "2h"
    update = "1h"
  }
}

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

def vector_search_create_index(
    project: str, location: str, display_name: str, gcs_uri: Optional[str] = None
) -> aiplatform.MatchingEngineIndex:
    """Create a vector search index.

    Args:
        project (str): Required. Project ID
        location (str): Required. The region name
        display_name (str): Required. The index display name
        gcs_uri (str): Optional. The Google Cloud Storage uri for index content

    Returns:
        The created MatchingEngineIndex.
    """
    # Initialize the Vertex AI client
    aiplatform.init(project=project, location=location)

    # Create Index
    index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
        display_name=display_name,
        contents_delta_uri=gcs_uri,
        description="Matching Engine Index",
        dimensions=100,
        approximate_neighbors_count=150,
        leaf_node_embedding_count=500,
        leaf_nodes_to_search_percent=7,
        index_update_method="BATCH_UPDATE",  # Options: STREAM_UPDATE, BATCH_UPDATE
        distance_measure_type=aiplatform.matching_engine.matching_engine_index_config.DistanceMeasureType.DOT_PRODUCT_DISTANCE,
    )

    return index

Console

Use these instructions to create an index for batch updates.

In the Vertex AI section of the Google Cloud console, go to the Deploy and Use section. Select Vector Search
Go to Vector Search
Click Create new index to open the Index pane. The Create a new index pane appears.
In the Display name field, provide a name to uniquely identify your index.
In the Description field, provide a description for what the index is for.
In the Region field, select a region from the drop-down.
In the Cloud Storage field, search and select the Cloud Storage folder where your vector data is stored.
In the Algorithm type drop-down, select the algorithm type that Vector Search uses for efficient search. If you select the treeAh algorithm, enter the approximate neighbors count.
In the Dimensions field, enter the number of dimensions of your input vectors.
In the Update method field, select Batch.
In the Shard size field, select from the drop-down the shard size you want.
Click Create. Your new index appears in your list of indexes once it's ready. Note: Build time can take up to an hour to complete.

Create an empty batch index

To create and deploy your index right away, you can create an empty batch index. With this option, no embeddings data is required at index creation time.

To create an empty index, the request is almost identical to creating an index for batch updates. The difference is you remove the contentsDeltaUri field, since you aren't linking a data location. Here's an empty batch index example:

Empty index request example

{
  "display_name": INDEX_NAME,
  "indexUpdateMethod": "BATCH_UPDATE",
  "metadata": {
    "config": {
      "dimensions": 100,
      "approximateNeighborsCount": 150,
      "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
      "algorithm_config": {
        "treeAhConfig": {
          "leafNodeEmbeddingCount": 500,
          "leafNodesToSearchPercent": 7
        }
      }
    }
  }
}

Create an index for streaming updates

Use these instructions to create and deploy your streaming index. If you don't have your embeddings ready yet, skip to Create an empty index for streaming updates. With this option, no embeddings data is required at index creation time.

REST

Before using any of the request data, make the following replacements:

INDEX_NAME: Display name for the index.
DESCRIPTION: A description of the index.
INPUT_DIR: The Cloud Storage directory path of the index content.
DIMENSIONS: Number of dimensions of the embedding vector.
PROJECT_ID: Your Google Cloud project ID.
PROJECT_NUMBER: Your project's automatically generated project number.
LOCATION: The region where you are using Vertex AI.

HTTP method and URL:

POST https://ENDPOINT-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes

Request JSON body:

{
  displayName: "INDEX_NAME",
  description: "DESCRIPTION",
  metadata: {
     contentsDeltaUri: "INPUT_DIR",
     config: {
        dimensions: "DIMENSIONS",
        approximateNeighborsCount: 150,
        distanceMeasureType: "DOT_PRODUCT_DISTANCE",
        algorithmConfig: {treeAhConfig: {leafNodeEmbeddingCount: 10000, leafNodesToSearchPercent: 2}}
     },
  },
  indexUpdateMethod: "STREAM_UPDATE"
}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://ENDPOINT-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes"

PowerShell (Windows)

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://ENDPOINT-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.ui.CreateIndexOperationMetadata",
    "genericMetadata": {
      "createTime": "2023-12-05T23:17:45.416117Z",
      "updateTime": "2023-12-05T23:17:45.416117Z",
      "state": "RUNNING",
      "worksOn": [
        "projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID"
      ]
    }
  }
}

Terraform

The following sample uses the google_vertex_ai_index Terraform resource to create an index for streaming updates.

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

# Cloud Storage bucket name must be unique
resource "random_id" "default" {
  byte_length = 8
}

# Create a Cloud Storage bucket
resource "google_storage_bucket" "bucket" {
  name                        = "vertex-ai-index-bucket-${random_id.default.hex}"
  location                    = "us-central1"
  uniform_bucket_level_access = true
}

# Create index content
resource "google_storage_bucket_object" "data" {
  name    = "contents/data.json"
  bucket  = google_storage_bucket.bucket.name
  content = <<EOF
{"id": "42", "embedding": [0.5, 1.0], "restricts": [{"namespace": "class", "allow": ["cat", "pet"]},{"namespace": "category", "allow": ["feline"]}]}
{"id": "43", "embedding": [0.6, 1.0], "restricts": [{"namespace": "class", "allow": ["dog", "pet"]},{"namespace": "category", "allow": ["canine"]}]}
EOF
}

resource "google_vertex_ai_index" "streaming_index" {
  region       = "us-central1"
  display_name = "sample-index-streaming-update"
  description  = "A sample index for streaming update"
  labels = {
    foo = "bar"
  }

  metadata {
    contents_delta_uri = "gs://${google_storage_bucket.bucket.name}/contents"
    config {
      dimensions                  = 2
      approximate_neighbors_count = 150
      distance_measure_type       = "DOT_PRODUCT_DISTANCE"
      algorithm_config {
        tree_ah_config {
          leaf_node_embedding_count    = 500
          leaf_nodes_to_search_percent = 7
        }
      }
    }
  }
  index_update_method = "STREAM_UPDATE"

  timeouts {
    create = "2h"
    update = "1h"
  }
}

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

def vector_search_create_streaming_index(
    project: str, location: str, display_name: str, gcs_uri: Optional[str] = None
) -> aiplatform.MatchingEngineIndex:
    """Create a vector search index.

    Args:
        project (str): Required. Project ID
        location (str): Required. The region name
        display_name (str): Required. The index display name
        gcs_uri (str): Optional. The Google Cloud Storage uri for index content

    Returns:
        The created MatchingEngineIndex.
    """
    # Initialize the Vertex AI client
    aiplatform.init(project=project, location=location)

    # Create Index
    index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
        display_name=display_name,
        contents_delta_uri=gcs_uri,
        description="Matching Engine Index",
        dimensions=100,
        approximate_neighbors_count=150,
        leaf_node_embedding_count=500,
        leaf_nodes_to_search_percent=7,
        index_update_method="STREAM_UPDATE",  # Options: STREAM_UPDATE, BATCH_UPDATE
        distance_measure_type=aiplatform.matching_engine.matching_engine_index_config.DistanceMeasureType.DOT_PRODUCT_DISTANCE,
    )

    return index

Console

Use these instructions to create an index for streaming updates in the Google Cloud console.

To create an index available for Streaming Updates requires similar steps to setting up a Batch Update index, except you need to set indexUpdateMethod to STREAM_UPDATE.

In the Vertex AI section of the Google Cloud console, go to the Deploy and Use section. Select Vector Search
Go to Vector Search
Click Create new index to open the Index pane. The Create a new index pane appears.
In the Display name field, provide a name to uniquely identify your index.
In the Description field, provide a description for what the index is for.
In the Region field, select a region from the drop-down.
In the Cloud Storage field, search and select the Cloud Storage folder where your vector data is stored.
In the Algorithm type drop-down, select the algorithm type that Vector Search will use to perform your search. If you select the treeAh algorithm, enter the approximate neighbors count.
In the Dimensions field, enter the number of dimensions of your input vectors.
In the Update method field, select Stream.
In the Shard size field, select from the drop-down the shard size you want.
Click Create. Your new index appears in your list of indexes once it's ready. Note: Build time can take up to an hour to complete.

Create an empty index for streaming updates

To create and deploy your index right away, you can create an empty index for streaming. With this option, no embeddings data is required at index creation time.

To create an empty index, the request is almost identical to creating an index for streaming. The difference is you remove the contentsDeltaUri field, since you aren't linking a data location. Here's an empty streaming index example:

Empty index request example

{
  "display_name": INDEX_NAME,
  "indexUpdateMethod": "STREAM_UPDATE",
  "metadata": {
    "config": {
      "dimensions": 100,
      "approximateNeighborsCount": 150,
      "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
      "algorithm_config": {
        "treeAhConfig": {
          "leafNodeEmbeddingCount": 500,
          "leafNodesToSearchPercent": 7
        }
      }
    }
  }
}

List indexes

gcloud

Before using any of the command data below, make the following replacements:

INDEX_NAME: Display name for the index.
LOCATION: The region where you are using Vertex AI.
PROJECT_ID: Your Google Cloud project ID.

Execute the following command:

Linux, macOS, or Cloud Shell

gcloud ai indexes list \
    --region=LOCATION \
    --project=PROJECT_ID

Windows (PowerShell)

gcloud ai indexes list `
    --region=LOCATION `
    --project=PROJECT_ID

Windows (cmd.exe)

gcloud ai indexes list ^
    --region=LOCATION ^
    --project=PROJECT_ID

You should receive a response similar to the following:

You can poll for the status of the operation for the response
to include "done": true. Use the following example to poll the status.

  $ gcloud ai operations describe 1234567890123456789 --project=my-test-project --region=us-central1

See gcloud ai operations to learn more about the describe command.

REST

Before using any of the request data, make the following replacements:

INDEX_NAME: Display name for the index.
LOCATION: The region where you are using Vertex AI.
PROJECT_ID: Your Google Cloud project ID.
PROJECT_NUMBER: Your project's automatically generated project number.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Execute the following command:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes"

PowerShell (Windows)

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/indexes" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
 "indexes": [
   {
     "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID",
     "displayName": "INDEX_NAME",
     "metadataSchemaUri": "gs://google-cloud-aiplatform/schema/matchingengine/metadata/nearest_neighbor_search_1.0.0.yaml",
     "metadata": {
       "config": {
         "dimensions": 100,
         "approximateNeighborsCount": 150,
         "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
         "featureNormType": "NONE",
         "algorithmConfig": {
           "treeAhConfig": {
             "maxLeavesToSearch": 50,
             "leafNodeCount": 10000
           }
         }
       }
     },
     "etag": "AMEw9yNU8YX5IvwuINeBkVv3yNa7VGKk11GBQ8GkfRoVvO7LgRUeOo0qobYWuU9DiEc=",
     "createTime": "2020-11-08T21:56:30.558449Z",
     "updateTime": "2020-11-08T22:39:25.048623Z"
   }
 ]
}

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

def vector_search_list_index(
    project: str, location: str
) -> List[aiplatform.MatchingEngineIndex]:
    """List vector search indexes.

    Args:
        project (str): Required. Project ID
        location (str): Required. The region name

    Returns:
        List of aiplatform.MatchingEngineIndex
    """
    # Initialize the Vertex AI client
    aiplatform.init(project=project, location=location)

    # List Indexes
    return aiplatform.MatchingEngineIndex.list()

Console

Use these instructions to view a list of your indexes.

In the Vertex AI section of the Google Cloud console, go to the Deploy and Use section. Select Vector Search
Go to Vector Search
A list of your active indexes is displayed.

Tuning the index

Tuning the index requires setting the configuration parameters that impact the performance of deployed indexes, especially recall and latency. These parameters are set when you first create the index. You can use brute-force indexes to measure recall.

Configuration parameters that impact performance

The following configuration parameters can be set at index creation time and can affect recall, latency, availability, and cost when using Vector Search. This guidance applies to most cases. However, always experiment with your configurations to make sure that they work for your use case.

For parameter definitions, see Index configuration parameters.

Parameter About Performance impact

Parameter	About	Performance impact
`shardSize`	Controls the amount of data on each machine. When choosing a shard size, estimate how large your dataset will be in the future. If the size of your dataset has an upper bound, pick the appropriate shard size to accommodate it. If there is no upper bound or if your use case is extremely sensitive to latency variability, choosing a large shard size is recommended.	If you configure for a larger number of smaller shards, a larger number of candidate results are processed during search. More shards can affect performance in the following ways: Recall: Increased Latency: Potentially increased, more variability Availability: Shard outages affect a smaller percentage of data Cost: Can increase if the same machine type is used with more shards If you configure for a smaller number of larger shards, fewer candidate results are processed during search. Fewer shards can affect performance in the following ways: Recall: Decreased Latency: Reduced, less variability Availability: Shard outages affect a larger percentage of data Cost: Can decrease if the same machine type is used with fewer shards
`distanceMeasureType`	Determines the algorithm used for distance calculation between data points and the query vector.	The following `distanceMeasureType` settings can help reduce query latency: `DOT_PRODUCT_DISTANCE` is most optimized for reducing latency `DOT_PRODUCT_DISTANCE` combined with setting `FeatureNormType` to `UNIT_L2_NORM` is recommended for cosine similarity
`leafNodeEmbeddingCount`	The number of embeddings for each leaf node. By default, this number is set to 1000. Generally, changing the value of `leafNodeEmbeddingCount` has less effect than changing the value of other parameters.	Increasing the number of embeddings for each leaf node can reduce latency but reduce recall quality. It can affect performance in the following ways: Recall: Decreased due to less targeted search Latency: Reduced, as long as the value is not >15k for most use cases Availability: No impact Cost: Can decrease because fewer replicas are needed for the same QPS Decreasing the number of embeddings for each leaf node can affect performance in the following ways: Recall: Can increase because more targeted leafs are collected Latency: Increased Availability: No impact Cost: Can increase because more replicas are needed for the same QPS

shardSize

Controls the amount of data on each machine.

When choosing a shard size, estimate how large your dataset will be in the future. If the size of your dataset has an upper bound, pick the appropriate shard size to accommodate it. If there is no upper bound or if your use case is extremely sensitive to latency variability, choosing a large shard size is recommended.

If you configure for a larger number of smaller shards, a larger number of candidate results are processed during search. More shards can affect performance in the following ways:

Recall: Increased
Latency: Potentially increased, more variability
Availability: Shard outages affect a smaller percentage of data
Cost: Can increase if the same machine type is used with more shards

If you configure for a smaller number of larger shards, fewer candidate results are processed during search. Fewer shards can affect performance in the following ways:

Recall: Decreased
Latency: Reduced, less variability
Availability: Shard outages affect a larger percentage of data
Cost: Can decrease if the same machine type is used with fewer shards

distanceMeasureType

Determines the algorithm used for distance calculation between data points and the query vector.

The following distanceMeasureType settings can help reduce query latency:

DOT_PRODUCT_DISTANCE is most optimized for reducing latency
DOT_PRODUCT_DISTANCE combined with setting FeatureNormType to UNIT_L2_NORM is recommended for cosine similarity

leafNodeEmbeddingCount

The number of embeddings for each leaf node. By default, this number is set to 1000.

Generally, changing the value of leafNodeEmbeddingCount has less effect than changing the value of other parameters.

Increasing the number of embeddings for each leaf node can reduce latency but reduce recall quality. It can affect performance in the following ways:

Recall: Decreased due to less targeted search
Latency: Reduced, as long as the value is not >15k for most use cases
Availability: No impact
Cost: Can decrease because fewer replicas are needed for the same QPS

Decreasing the number of embeddings for each leaf node can affect performance in the following ways:

Recall: Can increase because more targeted leafs are collected
Latency: Increased
Availability: No impact
Cost: Can increase because more replicas are needed for the same QPS

Using a brute-force index to measure recall

To get the exact nearest neighbors, use indexes with the brute-force algorithm. The brute-force algorithm provides 100% recall at the expense of higher latency. Using a brute-force index to measure recall is usually not a good choice for production serving, but you might find it useful for evaluating the recall of various indexing options offline.

To create an index with the brute-force algorithm, specify brute_force_config in the index metadata:

curl -X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer `gcloud auth print-access-token`" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/indexes \
-d '{
    displayName: "'${DISPLAY_NAME}'",
    description: "'${DESCRIPTION}'",
    metadata: {
       contentsDeltaUri: "'${INPUT_DIR}'",
       config: {
          dimensions: 100,
          approximateNeighborsCount: 150,
          distanceMeasureType: "DOT_PRODUCT_DISTANCE",
          featureNormType: "UNIT_L2_NORM",
          algorithmConfig: {
             bruteForceConfig: {}
          }
       },
    },
}'

Delete an index

gcloud

Before using any of the command data below, make the following replacements:

INDEX_ID: The ID of the index.
LOCATION: The region where you are using Vertex AI.
PROJECT_ID: Your Google Cloud project ID.

Execute the following command:

Linux, macOS, or Cloud Shell

gcloud ai indexes delete INDEX_ID \
    --region=LOCATION \
    --project=PROJECT_ID

Windows (PowerShell)

gcloud ai indexes delete INDEX_ID `
    --region=LOCATION `
    --project=PROJECT_ID

Windows (cmd.exe)

gcloud ai indexes delete INDEX_ID ^
    --region=LOCATION ^
    --project=PROJECT_ID

REST

Before using any of the request data, make the following replacements:

INDEX_ID: The ID of the index.
LOCATION: The region where you are using Vertex AI.
PROJECT_ID: Your Google Cloud project ID.
PROJECT_NUMBER: Your project's automatically generated project number.

HTTP method and URL:

DELETE https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Execute the following command:

curl -X DELETE \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID"

PowerShell (Windows)

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method DELETE `
    -Headers $headers `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeleteOperationMetadata",
    "genericMetadata": {
      "createTime": "2022-01-08T02:35:56.364956Z",
      "updateTime": "2022-01-08T02:35:56.364956Z"
    }
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.protobuf.Empty"
  }
}

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

def vector_search_delete_index(
    project: str, location: str, index_name: str
) -> None:
    """Delete a vector search index.

    Args:
        project (str): Required. Project ID
        location (str): Required. The region name
        index_name (str): Required. The index to update. A fully-qualified index
          resource name or a index ID.  Example:
          "projects/123/locations/us-central1/indexes/my_index_id" or
          "my_index_id".
    """
    # Initialize the Vertex AI client
    aiplatform.init(project=project, location=location)

    # Create the index instance from an existing index
    index = aiplatform.MatchingEngineIndex(index_name=index_name)

    # Delete the index
    index.delete()

Console

Use these instructions to delete one or more indexes.

In the Vertex AI section of the Google Cloud console, go to the Deploy and Use section. Select Vector Search
Go to Vector Search
A list of your active indexes is displayed.
To delete an index, go to the options menu that is in the same row as the index and select Delete.

What's next

Learn about Index configuration parameters
Learn how to Deploy and manage public index endpoints
Learn how to Deploy and manage index endpoints in a VPC network
Learn how to Update and rebuild your index
Learn how to Monitor an index