This guide shows you how to create a text embedding by using the Vertex AI Text embeddings API. This page covers the following topics:
- Before you begin: Set up your project and choose a task type for your embeddings.
- Supported models: Lists the available text embedding models.
- Get text embeddings for a snippet of text: Generate embeddings using the API or the Python SDK.
- Add an embedding to a vector database: Store your generated embeddings in a vector database for efficient retrieval.
The Vertex AI text embeddings API uses dense vector representations of text. These embeddings are created using deep-learning methods similar to those used by large language models. Unlike sparse vectors, which tend to map words directly to numbers, dense vectors are designed to represent the meaning of a piece of text. This lets you search for passages that align with the meaning of a query, even if the passages don't use the same words.
Key characteristics of these embeddings include:
- High dimensionality: Models produce high-dimensional vectors. For example, gemini-embedding-001 uses 3072-dimensional vectors. You can reduce the output dimensionality to save storage space and increase computational efficiency.
- Normalized vectors: The output vectors are normalized, so you can use cosine similarity, dot product, or Euclidean distance to get the same similarity rankings.
To learn more, see the following resources:
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Vertex AI API.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Vertex AI API.
- Choose a task type for your embeddings job.
Supported models
You can get text embeddings by using the following models:
Model name | Description | Output Dimensions | Max sequence length | Supported text languages |
---|---|---|---|---|
gemini-embedding-001 |
State-of-the-art performance across English, multilingual and code tasks. It unifies the previously specialized models like text-embedding-005 and text-multilingual-embedding-002 and achieves better performance in their respective domains. Read our Tech Report for more detail. |
up to 3072 | 2048 tokens | Supported text languages |
text-embedding-005 |
Specialized in English and code tasks. | up to 768 | 2048 tokens | English |
text-multilingual-embedding-002 |
Specialized in multilingual tasks. | up to 768 | 2048 tokens | Supported text languages |
For superior embedding quality, gemini-embedding-001
is our large
model designed to provide the highest performance. Note that
gemini-embedding-001
supports one instance per request.
Only use the model names as they are listed in the supported models table. Do not specify a model name without the @version
suffix or use @latest
, as these formats are not valid.
Get text embeddings for a snippet of text
You can get text embeddings for a snippet of text by using the Vertex AI API or the Vertex AI SDK for Python.
API limits
Each request is limited to 250 input texts and a total of 20,000 input tokens. If a request exceeds the token limit, it returns a 400 error. Each individual input text is limited to 2048 tokens, and any excess tokens are silently truncated. To disable silent truncation, set autoTruncate
to false
.
For more information, see Text embedding limits.
Choose an embedding dimension
By default, all models produce a full-length embedding vector. For gemini-embedding-001
, this vector has 3072 dimensions, and for other models, it's 768 dimensions. To control the size of the output embedding vector, you can use the output_dimensionality
parameter. A smaller output dimensionality can save storage space and increase computational efficiency for downstream applications, with a potential trade-off in quality.
The following examples use the gemini-embedding-001
model.
Python
Install
pip install --upgrade google-genai
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=global export GOOGLE_GENAI_USE_VERTEXAI=True
Add an embedding to a vector database
After you generate your embedding, you can add it to a vector database, like Vector Search. This enables low-latency retrieval and is critical as the size of your data increases.
To learn more about Vector Search, see Overview of Vector Search.
What's next
- Learn about Generative AI on Vertex AI rate limits.
- Learn how to get batch text embeddings predictions.
- Learn how to get multimodal embeddings.
- Learn how to tune text embeddings.
- Learn about the research behind
text-embedding-005
andtext-multilingual-embedding-002
in the research paper Gecko: Versatile Text Embeddings Distilled from Large Language Models.