Deploying an index includes the following three tasks:
- Create an
IndexEndpoint
if needed, or reuse an existingIndexEndpoint
. - Get the
IndexEndpoint
ID. - Deploy the index to the
IndexEndpoint
.
Create an IndexEndpoint
within your VPC network
If you are deploying an Index
to an existing IndexEndpoint
, you can skip this step.
Before you use an index to serve online vector matching queries, you
must deploy the Index
to an IndexEndpoint
within your
VPC Network Peering network. The
first step is to create an IndexEndpoint
. You can deploy more than one index
to an IndexEndpoint
that shares the same VPC network.
gcloud CLI
The following example uses the gcloud ai index-endpoints create
command:
gcloud ai index-endpoints create \
--display-name=INDEX_ENDPOINT_NAME \
--network=VPC_NETWORK_NAME \
--project=PROJECT_ID \
--region=LOCATION
Replace the following:
- INDEX_ENDPOINT_NAME: Display name of the index endpoint.
- VPC_NETWORK_NAME: The Google Compute Engine network name to which the index endpoint should be peered.
- PROJECT_ID: The ID of the project.
- LOCATION: The region where you are using Vertex AI.
The Google Cloud CLI tool might take a few minutes to create the IndexEndpoint
.
REST
Before using any of the request data, make the following replacements:
- LOCATION: Your region.
- PROJECT: Your project ID.
- INDEX_ENDPOINT_NAME: Display name of the index endpoint.
- VPC_NETWORK_NAME: The Google Compute Engine network name to which the index endpoint should be peered.
- PROJECT_NUMBER: Project number for your project
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints
Request JSON body:
{ "display_name": "INDEX_ENDPOINT_NAME", "network": "VPC_NETWORK_NAME" }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateIndexEndpointOperationMetadata", "genericMetadata": { "createTime": "2022-01-13T04:09:56.641107Z", "updateTime": "2022-01-13T04:09:56.641107Z" } } }
"done": true
.
Deploy an index
gcloud CLI
This example uses the gcloud ai index-endpoints deploy-index
command:
gcloud ai index-endpoints deploy-index INDEX_ENDPOINT_ID \
--deployed-index-id=DEPLOYED_INDEX_ID \
--display-name=DEPLOYED_INDEX_NAME \
--index=INDEX_ID \
--project=PROJECT_ID \
--region=LOCATION
Replace the following:
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- DEPLOYED_INDEX_NAME: Display name of the deployed index.
- INDEX_ID: The ID of the index.
- PROJECT_ID: The ID of the project.
- LOCATION: The region where you are using Vertex AI.
REST
Before using any of the request data, make the following replacements:
- LOCATION: The region where you are using Vertex AI.
- PROJECT: Your project ID.
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- DEPLOYED_INDEX_NAME: Display name of the deployed index.
- INDEX_ID: The ID of the index.
- PROJECT_NUMBER: Project number for your project
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:deployIndex
Request JSON body:
{ "deployedIndex": { "id": "DEPLOYED_INDEX_ID", "index": "projects/PROJECT/locations/LOCATION/indexes/INDEX_ID", "displayName": "DEPLOYED_INDEX_NAME" } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployIndexOperationMetadata", "genericMetadata": { "createTime": "2020-10-19T17:53:16.502088Z", "updateTime": "2020-10-19T17:53:16.502088Z" }, "deployedIndexId": "DEPLOYED_INDEX_ID" } }
"done": true
.
Enable autoscaling
Matching Engine supports autoscaling, which can automatically resize the number of nodes based on the demands of your workloads. When demand is high, nodes are added to the node pool, which won't exceed the maximum size you designate. When demand is low, the node pool scales back down to a minimum size that you designate. You can check the actual nodes in use and the changes by monitoring the current replicas.
To enable autoscaling, specify the maxReplicaCount
and
minReplicaCount
when you deploy your index:
gcloud CLI
The following example uses the gcloud ai index-endpoints deploy-index
command:
gcloud ai index-endpoints deploy-index INDEX_ENDPOINT_ID \
--deployed-index-id=DEPLOYED_INDEX_ID \
--display-name=DEPLOYED_INDEX_NAME \
--index=INDEX_ID \
--min-replica-count=MIN_REPLICA_COUNT \
--max-replica-count=MAX_REPLICA_COUNT \
--project=PROJECT_ID \
--region=LOCATION
Replace the following:
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- DEPLOYED_INDEX_NAME: Display name of the deployed index.
- INDEX_ID: The ID of the index.
- MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
- MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
- PROJECT_ID: The ID of the project.
- LOCATION: The region where you are using Vertex AI.
REST
Before using any of the request data, make the following replacements:
- LOCATION: The region where you are using Vertex AI.
- PROJECT: Your project ID.
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- DEPLOYED_INDEX_NAME: Display name of the deployed index.
- INDEX_ID: The ID of the index.
- MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
- MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
- PROJECT_NUMBER: Project number for your project
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:deployIndex
Request JSON body:
{ "deployedIndex": { "id": "DEPLOYED_INDEX_ID", "index": "projects/PROJECT/locations/LOCATION/indexes/INDEX_ID", "displayName": "DEPLOYED_INDEX_NAME", "automaticResources": { "minReplicaCount": MIN_REPLICA_COUNT, "maxReplicaCount": MAX_REPLICA_COUNT } } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployIndexOperationMetadata", "genericMetadata": { "createTime": "2020-10-19T17:53:16.502088Z", "updateTime": "2020-10-19T17:53:16.502088Z" }, "deployedIndexId": "DEPLOYED_INDEX_ID" } }
"done": true
.
- If both
minReplicaCount
andmaxReplicaCount
are not set, they are set to 2 by default. - If only
maxReplicaCount
is set,minReplicaCount
is set to 2 by default. - If only
minReplicaCount
is set,maxReplicaCount
is set to equalminReplicaCount
.
Mutate an DeployedIndex
You can use MutateDeployedIndex
API to update the deployment resources (for example, minReplicaCount
and maxReplicaCount
) of an already deployed index.
- Users are not allowed to change the
machineType
after the index is deployed. - If
maxReplicaCount
is not specified in the request, theDeployedIndex
will keep using the existingmaxReplicaCount
.
gcloud CLI
The following example uses the gcloud ai index-endpoints mutate-deployed-index
command:
gcloud ai index-endpoints mutate-deployed-index INDEX_ENDPOINT_ID \
--deployed-index-id=DEPLOYED_INDEX_ID \
--min-replica-count=MIN_REPLICA_COUNT \
--max-replica-count=MAX_REPLICA_COUNT \
--project=PROJECT_ID \
--region=LOCATION
Replace the following:
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
- MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
- PROJECT_ID: The ID of the project.
- LOCATION: The region where you are using Vertex AI.
REST
Before using any of the request data, make the following replacements:
- LOCATION: The region where you are using Vertex AI.
- PROJECT: Your project ID.
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- MIN_REPLICA_COUNT: Minimum number of machine replicas the deployed index will be always deployed on. If specified, the value must be equal to or larger than 1.
- MAX_REPLICA_COUNT: Maximum number of machine replicas the deployed index could be deployed on.
- PROJECT_NUMBER: Project number for your project
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:mutateDeployedIndex
Request JSON body:
{ "id": "DEPLOYED_INDEX_ID", "automaticResources": { "minReplicaCount": MIN_REPLICA_COUNT, "maxReplicaCount": MAX_REPLICA_COUNT } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.MutateDeployedIndexOperationMetadata", "genericMetadata": { "createTime": "2022-01-24T20:36:37.902782Z", "updateTime": "2022-01-24T20:36:37.902782Z" }, "deployedIndexId": "gcloud_deployed_index_2" } }
"done": true
.
List IndexEndpoints
To list your IndexEndpoint
resources and view the information of
any associated DeployedIndex
instances, run the following
code:
gcloud CLI
The following example uses the gcloud ai index-endpoints list
command:
gcloud ai index-endpoints list \
--project=PROJECT_ID \
--region=LOCATION
Replace the following:
- PROJECT_ID: The ID of the project.
- LOCATION: The region where you are using Vertex AI.
REST
Before using any of the request data, make the following replacements:
- LOCATION: Your region.
- PROJECT: Your project ID.
- PROJECT_NUMBER: Project number for your project
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "indexEndpoints": [ { "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID", "displayName": "INDEX_ENDPOINT_NAME", "deployedIndexes": [ { "id": "DEPLOYED_INDEX_ID", "index": "projects/PROJECT_NUMBER/locations/LOCATION/indexes/INDEX_ID", "displayName": "DEPLOYED_INDEX_ID", "createTime": "2021-06-04T02:23:40.178286Z", "privateEndpoints": { "matchGrpcAddress": "GRPC_ADDRESS" }, "indexSyncTime": "2022-01-13T04:22:00.151916Z", "automaticResources": { "minReplicaCount": 2, "maxReplicaCount": 10 } } ], "etag": "AMEw9yP367UitPkLo-khZ1OQvqIK8Q0vLAzZVF7QjdZ5O3l7Zow-mzBo2l6xmiuuMljV", "createTime": "2021-03-17T04:47:28.460373Z", "updateTime": "2021-06-04T02:23:40.930513Z", "network": "VPC_NETWORK_NAME" } ] }
For more information, see the reference documentation for
IndexEndpoint
.
Undeploy an index
To undeploy an index, run the following code:
gcloud CLI
The following example uses the gcloud ai index-endpoints undeploy-index
command:
gcloud ai index-endpoints undeploy-index INDEX_ENDPOINT_ID \
--deployed-index-id=DEPLOYED_INDEX_ID \
--project=PROJECT_ID \
--region=LOCATION
Replace the following:
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- PROJECT_ID: The ID of the project.
- LOCATION: The region where you are using Vertex AI.
REST
Before using any of the request data, make the following replacements:
- LOCATION: Your region.
- PROJECT: Your project ID.
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- DEPLOYED_INDEX_ID: A user specified string to uniquely identify the deployed index. It must start with a letter and contain only letters, numbers or underscores. See DeployedIndex.id for format guidelines.
- PROJECT_NUMBER: Project number for your project
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID:undeployIndex
Request JSON body:
{ "deployed_index_id": "DEPLOYED_INDEX_ID" }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.UndeployIndexOperationMetadata", "genericMetadata": { "createTime": "2022-01-13T04:09:56.641107Z", "updateTime": "2022-01-13T04:09:56.641107Z" } } }
Delete an IndexEndpoint
Before you delete an IndexEndpoint
, you must undeploy all
the indexes associated with it.
gcloud CLI
The following example uses the gcloud ai index-endpoints delete
command:
gcloud ai index-endpoints delete INDEX_ENDPOINT_ID \
--project=PROJECT_ID \
--region=LOCATION
Replace the following:
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- PROJECT_ID: The ID of the project.
- LOCATION: The region where you are using Vertex AI.
REST
Before using any of the request data, make the following replacements:
- LOCATION: Your region.
- PROJECT: Your project ID.
- INDEX_ENDPOINT_ID: The ID of the index endpoint.
- PROJECT_NUMBER: Project number for your project
HTTP method and URL:
DELETE https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION/indexEndpoints/INDEX_ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeleteOperationMetadata", "genericMetadata": { "createTime": "2022-01-13T04:36:19.142203Z", "updateTime": "2022-01-13T04:36:19.142203Z" } }, "done": true, "response": { "@type": "type.googleapis.com/google.protobuf.Empty" } }