Deploy and manage index endpoints

Stay organized with collections Save and categorize content based on your preferences.

Deploying an index includes the following three tasks:

  1. Create an IndexEndpoint if needed, or reuse an existing IndexEndpoint.
  2. Get the IndexEndpoint ID.
  3. Deploy the index to the IndexEndpoint.

Create an IndexEndpoint within your VPC network

If you are deploying an Index to an existing IndexEndpoint, you can skip this step.

Before you use an index to serve online vector matching queries, you must deploy the Index to an IndexEndpoint within your VPC Network Peering network. The first step is to create an IndexEndpoint. You can deploy more than one index to an IndexEndpoint that shares the same VPC network.

gcloud CLI

The following example uses the gcloud ai index-endpoints create command:

gcloud ai index-endpoints create \
  --display-name=INDEX_ENDPOINT_NAME \
  --network=VPC_NETWORK_NAME \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • INDEX_ENDPOINT_NAME: Display name of the index endpoint.
  • VPC_NETWORK_NAME: The Google Compute Engine network name to which the index endpoint should be peered.
  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

The Google Cloud CLI tool might take a few minutes to create the IndexEndpoint.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: Your region.
  • PROJECT: Your project ID.
  • INDEX_ENDPOINT_NAME: Display name of the index endpoint.
  • VPC_NETWORK_NAME: The Google Compute Engine network name to which the index endpoint should be peered.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints

Request JSON body:

{
  "display_name": "INDEX_ENDPOINT_NAME",
  "network": "VPC_NETWORK_NAME"
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateIndexEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2022-01-13T04:09:56.641107Z",
      "updateTime": "2022-01-13T04:09:56.641107Z"
    }
  }
}
You can poll for the status of the operation until the response includes "done": true.

Deploy an index

gcloud CLI

This example uses the gcloud ai index-endpoints deploy-index command:

gcloud ai index-endpoints deploy-index INDEX_ENDPOINT_ID \
  --deployed-index-id=DEPLOYED_INDEX_ID \
  --display-name=DEPLOYED_INDEX_NAME \
  --index=INDEX_ID \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • DEPLOYED_INDEX_NAME: Display name of the deployed index.
  • INDEX_ID: The ID of the index.
  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: The region where you are using Vertex AI.
  • PROJECT: Your project ID.
  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • DEPLOYED_INDEX_NAME: Display name of the deployed index.
  • INDEX_ID: The ID of the index.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:deployIndex

Request JSON body:

{
  "deployedIndex": {
    "id": "DEPLOYED_INDEX_ID",
    "index": "projects/PROJECT/locations/LOCATION/indexes/INDEX_ID",
    "displayName": "DEPLOYED_INDEX_NAME"
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployIndexOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    },
    "deployedIndexId": "DEPLOYED_INDEX_ID"
  }
}
You can poll for the status of the operation until the response includes "done": true.

Enable autoscaling

Matching Engine supports autoscaling, which can automatically resize the number of nodes based on the demands of your workloads. When demand is high, nodes are added to the node pool, which won't exceed the maximum size you designate. When demand is low, the node pool scales back down to a minimum size that you designate. You can check the actual nodes in use and the changes by monitoring the current replicas.

To enable autoscaling, specify the maxReplicaCount and minReplicaCount when you deploy your index:

gcloud CLI

The following example uses the gcloud ai index-endpoints deploy-index command:

gcloud ai index-endpoints deploy-index INDEX_ENDPOINT_ID \
  --deployed-index-id=DEPLOYED_INDEX_ID \
  --display-name=DEPLOYED_INDEX_NAME \
  --index=INDEX_ID \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • DEPLOYED_INDEX_NAME: Display name of the deployed index.
  • INDEX_ID: The ID of the index.
  • MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
  • MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: The region where you are using Vertex AI.
  • PROJECT: Your project ID.
  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • DEPLOYED_INDEX_NAME: Display name of the deployed index.
  • INDEX_ID: The ID of the index.
  • MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
  • MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:deployIndex

Request JSON body:

{
  "deployedIndex": {
    "id": "DEPLOYED_INDEX_ID",
    "index": "projects/PROJECT/locations/LOCATION/indexes/INDEX_ID",
    "displayName": "DEPLOYED_INDEX_NAME",
    "automaticResources": {
      "minReplicaCount": MIN_REPLICA_COUNT,
      "maxReplicaCount": MAX_REPLICA_COUNT
    }
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployIndexOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    },
    "deployedIndexId": "DEPLOYED_INDEX_ID"
  }
}
You can poll for the status of the operation until the response includes "done": true.
  • If both minReplicaCount and maxReplicaCount are not set, they are set to 2 by default.
  • If only maxReplicaCount is set, minReplicaCount is set to 2 by default.
  • If only minReplicaCount is set, maxReplicaCount is set to equal minReplicaCount.

Mutate an DeployedIndex

You can use MutateDeployedIndex API to update the deployment resources (for example, minReplicaCount and maxReplicaCount) of an already deployed index.

  • Users are not allowed to change the machineType after the index is deployed.
  • If maxReplicaCount is not specified in the request, the DeployedIndex will keep using the existing maxReplicaCount.

gcloud CLI

The following example uses the gcloud ai index-endpoints mutate-deployed-index command:

gcloud ai index-endpoints mutate-deployed-index INDEX_ENDPOINT_ID \
  --deployed-index-id=DEPLOYED_INDEX_ID \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
  • MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: The region where you are using Vertex AI.
  • PROJECT: Your project ID.
  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
  • MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:mutateDeployedIndex

Request JSON body:

{
  "id": "DEPLOYED_INDEX_ID",
  "automaticResources": {
    "minReplicaCount": MIN_REPLICA_COUNT,
    "maxReplicaCount": MAX_REPLICA_COUNT
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.MutateDeployedIndexOperationMetadata",
    "genericMetadata": {
      "createTime": "2022-01-24T20:36:37.902782Z",
      "updateTime": "2022-01-24T20:36:37.902782Z"
    },
    "deployedIndexId": "gcloud_deployed_index_2"
  }
}
You can poll for the status of the operation until the response includes "done": true.

List IndexEndpoints

To list your IndexEndpoint resources and view the information of any associated DeployedIndex instances, run the following code:

gcloud CLI

The following example uses the gcloud ai index-endpoints list command:

gcloud ai index-endpoints list \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: Your region.
  • PROJECT: Your project ID.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "indexEndpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID",
      "displayName": "INDEX_ENDPOINT_NAME",
      "deployedIndexes": [
        {
          "id": "DEPLOYED_INDEX_ID",
          "index": "projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID",
          "displayName": "DEPLOYED_INDEX_ID",
          "createTime": "2021-06-04T02:23:40.178286Z",
          "privateEndpoints": {
            "matchGrpcAddress": "GRPC_ADDRESS"
          },
          "indexSyncTime": "2022-01-13T04:22:00.151916Z",
          "automaticResources": {
            "minReplicaCount": 2,
            "maxReplicaCount": 10
          }
        }
      ],
      "etag": "AMEw9yP367UitPkLo-khZ1OQvqIK8Q0vLAzZVF7QjdZ5O3l7Zow-mzBo2l6xmiuuMljV",
      "createTime": "2021-03-17T04:47:28.460373Z",
      "updateTime": "2021-06-04T02:23:40.930513Z",
      "network": "VPC_NETWORK_NAME"
    }
  ]
}

For more information, see the reference documentation for IndexEndpoint.

Undeploy an index

To undeploy an index, run the following code:

gcloud CLI

The following example uses the gcloud ai index-endpoints undeploy-index command:

gcloud ai index-endpoints undeploy-index INDEX_ENDPOINT_ID \
  --deployed-index-id=DEPLOYED_INDEX_ID \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: Your region.
  • PROJECT: Your project ID.
  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:undeployIndex

Request JSON body:

{
  "deployed_index_id": "DEPLOYED_INDEX_ID"
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.UndeployIndexOperationMetadata",
    "genericMetadata": {
      "createTime": "2022-01-13T04:09:56.641107Z",
      "updateTime": "2022-01-13T04:09:56.641107Z"
    }
  }
}

Delete an IndexEndpoint

Before you delete an IndexEndpoint, you must undeploy all the indexes associated with it.

gcloud CLI

The following example uses the gcloud ai index-endpoints delete command:

gcloud ai index-endpoints delete INDEX_ENDPOINT_ID \
  --project=PROJECT_ID \
  --region=LOCATION

Replace the following:

  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • PROJECT_ID: The ID of the project.
  • LOCATION: The region where you are using Vertex AI.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: Your region.
  • PROJECT: Your project ID.
  • INDEX_ENDPOINT_ID: The ID of the index endpoint.
  • PROJECT_NUMBER: Project number for your project

HTTP method and URL:

DELETE https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeleteOperationMetadata",
    "genericMetadata": {
      "createTime": "2022-01-13T04:36:19.142203Z",
      "updateTime": "2022-01-13T04:36:19.142203Z"
    }
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.protobuf.Empty"
  }
}