The storage-optimized performance tier for Vector Search is designed for indexing and searching massive datasets. This tier implements a disk-based architecture instead of using RAM, significantly reducing your operational costs. When your priority is cost efficiency at scale as opposed to the lowest possible query latency, the storage-optimized tier is your best choice.
When to use a storage-optimized index
Consider storage-optimized indexes if you have any of the following:
A very large dataset: You must index very large numbers of vectors, and the cost of hosting a large number of performance-optimized shards is prohibitive.
A low-QPS workload: In low-query-volume applications, the cost savings from using fewer shards can be significant.
Flexible latency requirements: Your application can tolerate a minor increase in query latency, which is the time it takes to get a search result.
Performance trade-offs
Compared to the default performance-optimized index, a storage-optimized index has the following characteristics:
- Increased query latency: Queries have a slightly higher latency at a given recall level.
How to configure a storage-optimized index
To create an index
that is storage optimized, set the shardSize
parameter to
SHARD_SIZE_SO_DYNAMIC
in your index configuration.
Example: Creating a storage-optimized index
The following example shows the metadata for creating a new streaming index that is storage-optimized.
{
"displayName": "my-storage-optimized-index",
"description": "An index configured to prioritize storage over performance.",
"metadata": {
"contentsDeltaUri": "gs://your-bucket/source-data/",
"config": {
"dimensions": 100,
"approximateNeighborsCount": 150,
"distanceMeasureType": "DOT_PRODUCT_DISTANCE",
"shardSize": "SHARD_SIZE_SO_DYNAMIC"
}
},
"indexUpdateMethod": "STREAM_UPDATE"
}
In the example, shardSize
is set to SHARD_SIZE_SO_DYNAMIC
, which
instructs Vector Search to build a denser index. This allows
each shard to hold significantly more data points, thereby reducing the
total number of shards needed for your dataset. Other fields, such as
dimensions
and distanceMeasureType
, are configured according to your needs.