Get text embeddings

With the Vertex AI text-embeddings API, you can easily create a text embedding with Generative AI. A text embedding is a vector representation of text, and they are used in many ways to find similar items. You interact with them every time you complete a Google search, see recommendations when online shopping, or when your favorite music streaming service suggests a rock band you might like based on your listening history. Some common use cases for text embeddings include:

  • Semantic search: Search text ranked by semantic similarity.
  • Classification: Return the class of items whose text attributes are similar to the given text.
  • Clustering: Cluster items whose text attributes are similar to the given text.
  • Outlier Detection: Return items where text attributes are least related to the given text.
  • Conversational interface: Clusters groups of sentences which can lead to similar responses, like in a conversation-level embedding space.

When you create text embeddings, you get vector representations of natural text as arrays of floating point numbers. What this means, is that all of your input text is assigned a numerical representation. By comparing the numerical distance between the vector representations of two pieces of text, an application can determine the similarity between the text or the objects represented by the text.

For example, let's say you wanted to develop a book recommendation chatbot. The first thing you would do is use a deep neural network (DNN) to convert each book into an embedding vector, where one embedding vector represents one book. We could feed, as input to the DNN, just the book title or just the text content. Or we could use both of these together, along with any other metadata describing the book, such as the genre.

The embeddings in this example could be comprised of thousands of book titles with summaries and their genre, and it might have representations for books like Wuthering Heights by Emily Brontë and Persuasion by Jane Austen that are similar to each other (small distance between numerical representation). Whereas the numerical representation for the book The Great Gatsby by F. Scott Fitzgerald would be further, as the time period, genre, and summary is less similar.

The inputs are the main influence to the orientation of the embedding space. For example, if we only had book title inputs, then two books with similar titles, but very different summaries, could be close together. However, if we include the title and summary, then these same books are less similar (further away) in the embedding space.

Working with Generative AI, this book-suggestion chatbot could summarize, suggest, and show you books which you might like (or dislike), based on your query.

To learn more about embeddings, see Meet AI's multitool: Vector embeddings. To take a foundational ML crash course on embeddings, see Embeddings.

After converting each book to an embedding representation, it's time to index these embeddings in a vector database, like Vector Search. This enables low-latency retrieval, and is critical as the size of our corpus of books (vectors) increases.

To learn more about Vector Search, see Overview of Vector Search.

Supported models

To learn which stable text embedding model versions are available, see Available stable model versions. To learn which latest text embedding model versions are available, see Latest models.

It is strongly recommended to specify a stable model version (for example, textembedding-gecko@003). The latest version of a model is in Preview and is not General Availability (GA). Because the latest version is in Preview, it isn't guaranteed to be production ready.

It is especially important to use a stable model version for example, textembedding-gecko@003 for applications that require backward compatible embeddings. If backward compatibility isn't a concern and you would like to use the latest model version, you should specify @latest explicitly. If no version is specified, textembedding-gecko defaults to textembedding-gecko@003, and textembedding-gecko-multilingual defaults to textembedding-gecko-multilingual@001.


There are specific prerequisites for successfully creating an embedding. To get started, see quickstart: Try text embeddings.

Use this colab to call the newly released text embedding models (textembedding-gecko and textembedding-gecko-multilingual).

Jupyter notebook: You can run this tutorial as a Jupyter notebook.
Run in Colab

Get text embeddings for a snippet of text

You can get text embeddings for a snippet of text by using the Vertex AI API or the Vertex AI SDK for Python. For each request, you're limited to five input texts. Each input text has a token limit of 3,072. Inputs longer than this length are silently truncated. You can also disable silent truncation by setting autoTruncate to false.

These examples use the textembedding-gecko@003 model.


To get text embeddings, send a POST request by specifying the model ID of the publisher model.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TEXT: The text that you want to generate embeddings for. Limit: five texts of up to 3,072 tokens per text.
  • AUTO_TRUNCATE: If set to false, text that exceeds the token limit causes the request to fail. The default value is true.

HTTP method and URL:


Request JSON body:

  "instances": [
    { "content": "TEXT"}
  "parameters": { 
    "autoTruncate": AUTO_TRUNCATE 

To send your request, choose one of these options:


Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \


Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "" | Select-Object -Expand Content

You should receive a JSON response similar to the following. Note that values has been truncated to save space.

Example curl command


curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \${MODEL_ID}:predict -d \
  "instances": [
    { "content": "What is life?"}

Vertex AI SDK for Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.

from vertexai.language_models import TextEmbeddingModel

def text_embedding() -> list:
    """Text embedding with a Large Language Model."""
    model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")
    embeddings = model.get_embeddings(["What is life?"])
    for embedding in embeddings:
        vector = embedding.values
        print(f"Length of Embedding Vector: {len(vector)}")
    return vector

if __name__ == "__main__":


Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import (

	aiplatform ""

// generateEmbeddings creates embeddings from text provided.
func generateEmbeddings(w io.Writer, prompt, project, location, publisher, model string) error {
	ctx := context.Background()

	apiEndpoint := fmt.Sprintf("", location)

	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	if err != nil {
		fmt.Fprintf(w, "unable to create prediction client: %v", err)
		return err
	defer client.Close()

	// PredictRequest requires an endpoint, instances, and parameters
	// Endpoint
	base := fmt.Sprintf("projects/%s/locations/%s/publishers/%s/models", project, location, publisher)
	url := fmt.Sprintf("%s/%s", base, model)

	// Instances: the prompt
	promptValue, err := structpb.NewValue(map[string]interface{}{
		"content": prompt,
	if err != nil {
		fmt.Fprintf(w, "unable to convert prompt to Value: %v", err)
		return err

	// PredictRequest: create the model prediction request
	req := &aiplatformpb.PredictRequest{
		Endpoint:  url,
		Instances: []*structpb.Value{promptValue},

	// PredictResponse: receive the response from the model
	resp, err := client.Predict(ctx, req)
	if err != nil {
		fmt.Fprintf(w, "error in prediction: %v", err)
		return err

	fmt.Fprintf(w, "embeddings generated: %v", resp.Predictions[0])
	return nil


Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import java.util.ArrayList;
import java.util.List;

public class PredictTextEmbeddingsSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // Details about text embedding request structure and supported models are available in:
    String instance = "{ \"content\": \"What is life?\"}";
    String project = "YOUR_PROJECT_ID";
    String location = "us-central1";
    String publisher = "google";
    String model = "textembedding-gecko@001";

    predictTextEmbeddings(instance, project, location, publisher, model);

  // Get text embeddings from a supported embedding model
  public static void predictTextEmbeddings(
      String instance, String project, String location, String publisher, String model)
      throws IOException {
    String endpoint = String.format("", location);
    PredictionServiceSettings predictionServiceSettings =

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);

      // Use Value.Builder to convert instance to a dynamically typed value that can be
      // processed by the service.
      Value.Builder instanceValue = Value.newBuilder();
      JsonFormat.parser().merge(instance, instanceValue);
      List<Value> instances = new ArrayList<>();

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, ValueConverter.EMPTY_VALUE);
      System.out.println("Predict Response");
      for (Value prediction : predictResponse.getPredictionsList()) {
        System.out.format("\tPrediction: %s\n", prediction);


Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: '',

const publisher = 'google';
const model = 'textembedding-gecko@001';

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function callPredict() {
  // Configure the parent resource
  const endpoint = `projects/${project}/locations/${location}/publishers/${publisher}/models/${model}`;

  const instance = {
    content: 'What is life?',
  const instanceValue = helpers.toValue(instance);
  const instances = [instanceValue];

  const parameter = {
    temperature: 0,
    maxOutputTokens: 256,
    topP: 0,
    topK: 1,
  const parameters = helpers.toValue(parameter);

  const request = {

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  console.log('Get text embeddings response');
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const prediction of predictions) {
    console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);


API changes to models released on or after August 2023

When using model versions released on or after August 2023, including textembedding-gecko@003 and textembedding-gecko-multilingual@001, there is a new task type parameter and the optional title (only valid with task_type=RETRIEVAL_DOCUMENT).

These new parameters apply to these public preview models and all stable models going forward.

  "instances": [
      "task_type": "RETRIEVAL_DOCUMENT",
      "title": "document title",
      "content": "I would like embeddings for this text!"

The task_type parameter is defined as the intended downstream application to help the model produce better quality embeddings. It is a string that can take on one of the following values:

task_type Description
RETRIEVAL_QUERY Specifies the given text is a query in a search or retrieval setting.
RETRIEVAL_DOCUMENT Specifies the given text is a document in a search or retrieval setting.
SEMANTIC_SIMILARITY Specifies the given text will be used for Semantic Textual Similarity (STS).
CLASSIFICATION Specifies that the embeddings will be used for classification.
CLUSTERING Specifies that the embeddings will be used for clustering.

Language coverage for textembedding-gecko-multilingual models.

The textembedding-gecko-multilingual@001 model has been evaluated on the following languages: Arabic (ar), Bengali (bn), English (en), Spanish (es), German (de), Persian (fa), Finnish (fi), French (fr), Hindi (hi), Indonesian (id), Japanese (ja), Korean (ko), Russian (ru), Swahili (sw), Telugu (te), Thai (th), Yoruba (yo), Chinese (zh).

The following is the full list of supported languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

Use Vector Search

To make use of your text-embedding for large search engines or recommendations systems in production, you can take advantage of Vector Search.

Feature Vertex AI text embedding API Vector Search
Creates text embeddings Yes No
Vector quality High Dependent on where the embedding was created
Suited for Creating high-quality text embeddings (vector representations) which can be used for text classification and question answering Performing large approximate nearest neighbor (ANN) searches, which can power search engines and recommendation systems.

What's next