Text embeddings API

The Text embeddings API converts textual data into numerical vectors. These vector representations are designed to capture the semantic meaning and context of the words they represent.

Supported Models:

English models Multilingual models
textembedding-gecko@001 textembedding-gecko-multilingual@001
textembedding-gecko@003 text-multilingual-embedding-002



REGION = us-central1

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict -d \
    "instances": [
    "parameters": {


REGION = us-central1

import vertexai
from vertexai.language_models import TextEmbeddingModel

vertexai.init(project=PROJECT_ID, location=REGION)

model = TextEmbeddingModel.from_pretrained(MODEL_ID)
embeddings = model.get_embeddings(...)

Parameter list



list of union[string, TextEmbeddingInput]

Each instance represents a single piece of text to be embedded.



The text that you want to generate embeddings for.


Optional: bool

When set to true, input text will be truncated. When set to false, an error is returned if the input text is longer than the maximum length supported by the model. Defaults to true.


Optional: int

Used to specify output embedding size. If set, output embeddings will be truncated to the size specified.

Request body

  "instances": [
      "task_type": "RETRIEVAL_DOCUMENT",
      "title": "document title",
      "content": "I would like embeddings for this text!"



The text that you want to generate embeddings for.


Optional: string

Used to convey intended downstream application to help the model produce better embeddings. If left blank, the default used is RETRIEVAL_QUERY.


The task_type parameter is not supported for the textembedding-gecko@001 model.

For more information about task types, see Choose an embeddings task type.


Optional: string

Used to help the model produce better embeddings. Only valid with task_type=RETRIEVAL_DOCUMENT.


The following table describes the task_type parameter values and their use cases:

task_type Description
RETRIEVAL_QUERY Specifies the given text is a query in a search or retrieval setting.
RETRIEVAL_DOCUMENT Specifies the given text is a document in a search or retrieval setting.
SEMANTIC_SIMILARITY Specifies the given text is used for Semantic Textual Similarity (STS).
CLASSIFICATION Specifies that the embedding is used for classification.
CLUSTERING Specifies that the embedding is used for clustering.
QUESTION_ANSWERING Specifies that the query embedding is used for answering questions. Use RETRIEVAL_DOCUMENT for the document side.
FACT_VERIFICATION Specifies that the query embedding is used for fact verification.
CODE_RETRIEVAL_QUERY Specifies that the query embedding is used for code retrieval for Java and Python.

Retrieval Tasks:

Query: Use task_type=RETRIEVAL_QUERY to indicate that the input text is a search query. Corpus: Use task_type=RETRIEVAL_DOCUMENT to indicate that the input text is part of the document collection being searched.

Similarity Tasks:

Semantic similarity: Use task_type= SEMANTIC_SIMILARITY for both input texts to assess their overall meaning similarity.

Response body

  "predictions": [
      "embeddings": {
        "statistics": {
          "truncated": boolean,
          "token_count": integer
        "values": [ number ]
Response element Description
embeddings The result generated from input text.
statistics The statistics computed from the input text.
truncated Indicates if the input text was longer than max allowed tokens and truncated.
tokenCount Number of tokens of the input text.
values The values field contains the embedding vectors corresponding to the words in the input text.

Sample response

  "predictions": [
      "embeddings": {
        "values": [
        "statistics": {
          "token_count": 4,
          "truncated": false


Embed a text string

Basic use case

The following example shows how to obtain the embedding of a text string.


After you set up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TEXT: The text that you want to generate embeddings for. Limit: five texts of up to 2,048 tokens per text for all models except textembedding-gecko@001. The max input token length for textembedding-gecko@001 is 3072.
  • AUTO_TRUNCATE: If set to false, text that exceeds the token limit causes the request to fail. The default value is true.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict

Request JSON body:

  "instances": [
    { "content": "TEXT"}
  "parameters": { 
    "autoTruncate": AUTO_TRUNCATE 

To send your request, choose one of these options:


Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \


Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict" | Select-Object -Expand Content

You should receive a JSON response similar to the following. Note that values has been truncated to save space.

Note the following in the URL for this sample:
  • Use the generateContent method to request that the response is returned after it's fully generated. To reduce the perception of latency to a human audience, stream the response as it's being generated by using the streamGenerateContent method.
  • The multimodal model ID is located at the end of the URL before the method (for example, gemini-1.5-flash or gemini-1.0-pro-vision). This sample may support other models as well.


To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

from __future__ import annotations

from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel

def embed_text() -> list[list[float]]:
    """Embeds texts with a pre-trained, foundational model.

        A list of lists containing the embedding vectors for each input text

    # A list of texts to be embedded.
    texts = ["banana muffins? ", "banana bread? banana muffins?"]
    # The dimensionality of the output embeddings.
    dimensionality = 256
    # The task type for embedding. Check the available tasks in the model's documentation.

    model = TextEmbeddingModel.from_pretrained("text-embedding-004")
    inputs = [TextEmbeddingInput(text, task) for text in texts]
    kwargs = dict(output_dimensionality=dimensionality) if dimensionality else {}
    embeddings = model.get_embeddings(inputs, **kwargs)

    # Example response:
    # [[0.006135190837085247, -0.01462465338408947, 0.004978656303137541, ...], [0.1234434666, ...]],
    return [embedding.values for embedding in embeddings]


Advanced Use Case

The following example demonstrates some advanced features

  • Use task_type and title to improve embedding quality.
  • Use parameters to control the behavior of the API.


Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TEXT: The text that you want to generate embeddings for. Limit: five texts of up to 3,072 tokens per text.
  • TASK_TYPE: Used to convey the intended downstream application to help the model produce better embeddings.
  • TITLE: Used to help the model produce better embeddings.
  • AUTO_TRUNCATE: If set to false, text that exceeds the token limit causes the request to fail. The default value is true.
  • OUTPUT_DIMENSIONALITY: Used to specify output embedding size. If set, output embeddings will be truncated to the size specified.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/textembedding-gecko@003:predict

Request JSON body:

  "instances": [
    { "content": "TEXT",
      "task_type": "TASK_TYPE",
      "title": "TITLE"
  "parameters": {
    "autoTruncate": AUTO_TRUNCATE,
    "outputDimensionality": OUTPUT_DIMENSIONALITY

To send your request, choose one of these options:


Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \


Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/textembedding-gecko@003:predict" | Select-Object -Expand Content

You should receive a JSON response similar to the following. Note that values has been truncated to save space.


To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

import re

from google.cloud.aiplatform import initializer as aiplatform_init
from vertexai.language_models import TextEmbeddingModel

def tune_embedding_model(
    api_endpoint: str,
    base_model_name: str = "text-embedding-004",
    corpus_path: str = "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/corpus.jsonl",
    queries_path: str = "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/queries.jsonl",
    train_label_path: str = "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/train.tsv",
    test_label_path: str = "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/test.tsv",
):  # noqa: ANN201
    """Tune an embedding model using the specified parameters.
        api_endpoint (str): The API endpoint for the Vertex AI service.
        base_model_name (str): The name of the base model to use for tuning.
        corpus_path (str): GCS URI of the JSONL file containing the corpus data.
        queries_path (str): GCS URI of the JSONL file containing the queries data.
        train_label_path (str): GCS URI of the TSV file containing the training labels.
        test_label_path (str): GCS URI of the TSV file containing the test labels.
    match = re.search(r"^(\w+-\w+)", api_endpoint)
    location = match.group(1) if match else "us-central1"
    base_model = TextEmbeddingModel.from_pretrained(base_model_name)
    tuning_job = base_model.tune_model(
        batch_size=128,  # The batch size to use for training.
        train_steps=1000,  # The number of training steps.
        output_dimensionality=768,  # The dimensionality of the output embeddings.
        learning_rate_multiplier=1.0,  # The multiplier for the learning rate.
    return tuning_job


Supported text languages

All text embedding models support and have been evaluated on English-language text. The textembedding-gecko-multilingual@001 and text-multilingual-embedding-002 models additionally support and have been evaluated on the following languages:

  • Evaluated languages: Arabic (ar), Bengali (bn), English (en), Spanish (es), German (de), Persian (fa), Finnish (fi), French (fr), Hindi (hi), Indonesian (id), Japanese (ja), Korean (ko), Russian (ru), Swahili (sw), Telugu (te), Thai (th), Yoruba (yo), Chinese (zh)
  • Supported languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusiasn, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

Model versions

To use a stable model version, specify the model version number, for example text-embedding-004. Each stable version is available for six months after the release date of the subsequent stable version.

The following table contains the available stable model versions:

Model name Release date Discontinuation date
text-embedding-004 May 14, 2024 To be determined.
text-multilingual-embedding-002 May 14, 2024 To be determined.
textembedding-gecko@003 December 12, 2023 May 14, 2025
textembedding-gecko-multilingual@001 November 2, 2023 May 14, 2025
(regressed, but still supported)
November 2, 2023 April 9, 2025
(regressed, but still supported)
June 7, 2023 April 9, 2025
multimodalembedding@001 February 12, 2024 To be determined.

For more information, see Model versions and lifecycle.

