自 2025 年 4 月 29 日起，Gemini 1.5 Pro 和 Gemini 1.5 Flash 模型將無法用於先前未使用這些模型的專案，包括新專案。詳情請參閱「模型版本和生命週期」。

本頁面由 Cloud Translation API 翻譯而成。

取得多模態嵌入

多模態嵌入模型會根據您提供的輸入內容 (可包含圖片、文字和影片資料的組合)，生成 1408 維度的向量。然後，嵌入向量可用於後續工作，例如圖片分類或影片內容審查。

圖片嵌入向量和文字嵌入向量位於相同的語意空間，且維度相同。因此，這些向量可互換使用，適用於以文字搜尋圖片或以圖片搜尋影片等用途。

如果是僅限文字的嵌入用途，建議改用 Vertex AI text-embeddings API。舉例來說，文字嵌入 API 可能更適合用於文字型語意搜尋、分群、長篇文件分析，以及其他文字檢索或問答用途。詳情請參閱「取得文字嵌入」。

支援的模型

您可以使用下列模型取得多模態嵌入：

multimodalembedding

最佳做法

使用多模態嵌入模型時，請考量下列輸入內容：

圖片中的文字：模型可以辨識圖片中的文字，類似於光學字元辨識 (OCR)。如要區分圖片內容的說明和圖片中的文字，請考慮使用提示工程指定目標內容。舉例來說，請指定「貓的圖片」或「『貓』這個字」，而非單純指定「貓」，視您的用途而定。

「cat」文字
貓咪圖片	^{圖片來源：Manja Vitolic 發表於 Unsplash。}

嵌入相似度：嵌入的點積並非經過校正的機率。點積是相似度指標，不同用途的分數分布可能不同。因此，請避免使用固定值門檻來評估品質。請改用排名方法進行擷取，或使用 Sigmoid 進行分類。

API 使用量

API 上限

使用 multimodalembedding 模型產生文字和圖片嵌入時，會受到下列限制：

限制	值和說明
	文字和圖片資料
每個專案每分鐘的 API 要求數量上限	120 到 600，視地區而定
文字長度上限	32 個權杖 (約 32 個字) 文字長度上限為 32 個權杖 (約 32 個字)。如果輸入內容超過 32 個權杖，模型會在內部將輸入內容縮短至這個長度。
語言	英文
圖片格式	BMP、GIF、JPG、PNG
圖片大小	Base64 編碼圖片：20 MB (轉碼為 PNG 時) Cloud Storage 圖片：20 MB (原始檔案格式) 可接受的圖片大小上限為 20 MB。為避免網路延遲時間增加，請使用較小的圖片。此外，模型會將圖片調整為 512 x 512 像素的解析度。因此，您不需要提供更高解析度的圖片。
	影片資料
支援音訊	不適用 - 模型產生影片嵌入內容時不會考量音訊內容
影片格式	AVI、FLV、MKV、MOV、MP4、MPEG、MPG、WEBM 和 WMV
影片長度上限 (Cloud Storage)	沒有限制。但一次只能分析 2 分鐘的內容。

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

為環境設定驗證方法。

Select the tab for how you plan to use the samples on this page:
Java

如要在本機開發環境中使用本頁的 Java 範例，請安裝並初始化 gcloud CLI，然後使用使用者憑證設定應用程式預設憑證。
1. Install the Google Cloud CLI.
2. 如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。
3. 初始化 gcloud CLI 後，請更新並安裝必要元件：
  gcloud components update gcloud components install beta
詳情請參閱 Google Cloud 驗證說明文件中的「為本機開發環境設定 ADC」。
Node.js

如要在本機開發環境中使用本頁的 Node.js 範例，請安裝並初始化 gcloud CLI，然後使用使用者憑證設定應用程式預設憑證。
1. Install the Google Cloud CLI.
2. 如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。
3. 初始化 gcloud CLI 後，請更新並安裝必要元件：
  gcloud components update gcloud components install beta
詳情請參閱 Google Cloud 驗證說明文件中的「為本機開發環境設定 ADC」。
Python

如要在本機開發環境中使用本頁的 Python 範例，請安裝並初始化 gcloud CLI，然後使用使用者憑證設定應用程式預設憑證。
1. Install the Google Cloud CLI.
2. 如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。
3. 初始化 gcloud CLI 後，請更新並安裝必要元件：
  gcloud components update gcloud components install beta
詳情請參閱 Google Cloud 驗證說明文件中的「為本機開發環境設定 ADC」。
REST

如要在本機開發環境中使用本頁的 REST API 範例，請使用您提供給 gcloud CLI 的憑證。
1. Install the Google Cloud CLI.
2. 如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。
3. 初始化 gcloud CLI 後，請更新並安裝必要元件：
  gcloud components update gcloud components install beta
詳情請參閱 Google Cloud 驗證說明文件中的「Authenticate for using REST」。
如要使用 Python SDK，請按照「安裝 Python 適用的 Vertex AI SDK」一文中的操作說明進行。詳情請參閱 Vertex AI SDK for Python API 參考說明文件。
(選用步驟) 請參閱這項功能的定價。嵌入的價格取決於您傳送的資料類型 (例如圖片或文字)，以及您用於特定資料類型的模式 (例如 Video Plus、Video Standard 或 Video Essential)。

位置

位置是您可以在要求中指定的區域，用來控管靜態資料的儲存位置。如需可用區域的清單，請參閱「 Vertex AI 的生成式 AI 服務地區」。

錯誤訊息

超過配額錯誤

google.api_core.exceptions.ResourceExhausted: 429 Quota exceeded for
aiplatform.googleapis.com/online_prediction_requests_per_base_model with base
model: multimodalembedding. Please submit a quota increase request.

如果這是您第一次收到這項錯誤，請使用 Google Cloud 控制台申請調整專案配額。提出調整要求前，請先使用下列篩選器：

Service ID: aiplatform.googleapis.com
metric: aiplatform.googleapis.com/online_prediction_requests_per_base_model
base_model:multimodalembedding

前往「配額」

如果已送出配額調整要求，請等待一段時間再送出其他要求。如要進一步提高配額，請重複提出配額調整要求，並說明需要持續調整配額的理由。

指定低維度嵌入

根據預設，嵌入要求會針對資料類型傳回 1408 個浮點向量。您也可以為文字和圖片資料指定低維度嵌入 (128、256 或 512 個浮點向量)。您可以根據預計使用嵌入內容的方式，選擇最佳化延遲和儲存空間，或最佳化品質。低維度嵌入可減少儲存空間需求，並降低後續嵌入工作 (例如搜尋或建議) 的延遲時間，而高維度嵌入則可提高相同工作的準確度。

REST

新增 parameters.dimension 欄位即可存取低維度。這個參數可接受下列任一值：128、256、512 或 1408。回覆會包含該維度的嵌入。

使用任何要求資料之前，請先替換以下項目：

LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
PROJECT_ID：您的 Google Cloud 專案 ID。
IMAGE_URI：要取得嵌入內容的目標圖片 Cloud Storage URI。例如：gs://my-bucket/embeddings/supermarket-img.png。
您也可以提供圖片做為 Base64 編碼的位元組字串：
```
[...]
"image": {
  "bytesBase64Encoded": "B64_ENCODED_IMAGE"
}
[...]
```
TEXT：要取得嵌入的目標文字。例如：a cat。
EMBEDDING_DIMENSION：嵌入維度的數量。值越小，後續工作使用這些嵌入內容時的延遲時間就越短；值越大，準確度就越高。可用值：128、256、512 和 1408 (預設值)。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

JSON 要求主體：

{
  "instances": [
    {
      "image": {
        "gcsUri": "IMAGE_URI"
      },
      "text": "TEXT"
    }
  ],
  "parameters": {
    "dimension": EMBEDDING_DIMENSION
  }
}

如要傳送要求，請選擇以下其中一個選項：

curl

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content

模型傳回的嵌入是您指定維度的浮點向量。為節省空間，以下範例回應已縮短。

128 個維度：

{
  "predictions": [
    {
      "imageEmbedding": [
        0.0279239565,
        [...128 dimension vector...]
        0.00403284049
      ],
      "textEmbedding": [
        0.202921599,
        [...128 dimension vector...]
        -0.0365431122
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

256 個維度：

{
  "predictions": [
    {
      "imageEmbedding": [
        0.248620048,
        [...256 dimension vector...]
        -0.0646447465
      ],
      "textEmbedding": [
        0.0757875815,
        [...256 dimension vector...]
        -0.02749932
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

512 個維度：

{
  "predictions": [
    {
      "imageEmbedding": [
        -0.0523675755,
        [...512 dimension vector...]
        -0.0444030389
      ],
      "textEmbedding": [
        -0.0592851527,
        [...512 dimension vector...]
        0.0350437127
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

import vertexai

from vertexai.vision_models import Image, MultiModalEmbeddingModel

# TODO(developer): Update & uncomment line below
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

# TODO(developer): Try different dimenions: 128, 256, 512, 1408
embedding_dimension = 128

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file(
    "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)

embeddings = model.get_embeddings(
    image=image,
    contextual_text="Colosseum",
    dimension=embedding_dimension,
)

print(f"Image Embedding: {embeddings.image_embedding}")
print(f"Text Embedding: {embeddings.text_embedding}")

# Example response:
# Image Embedding: [0.0622573346, -0.0406507477, 0.0260440577, ...]
# Text Embedding: [0.27469793, -0.146258667, 0.0222803634, ...]

Go

import (
	"context"
	"encoding/json"
	"fmt"
	"io"

	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
	"google.golang.org/api/option"
	"google.golang.org/protobuf/encoding/protojson"
	"google.golang.org/protobuf/types/known/structpb"
)

// generateWithLowerDimension shows how to generate lower-dimensional embeddings for text and image inputs.
func generateWithLowerDimension(w io.Writer, project, location string) error {
	// location = "us-central1"
	ctx := context.Background()
	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	if err != nil {
		return fmt.Errorf("failed to construct API client: %w", err)
	}
	defer client.Close()

	model := "multimodalembedding@001"
	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)

	// This is the input to the model's prediction call. For schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
	instance, err := structpb.NewValue(map[string]any{
		"image": map[string]any{
			// Image input can be provided either as a Google Cloud Storage URI or as
			// base64-encoded bytes using the "bytesBase64Encoded" field.
			"gcsUri": "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png",
		},
		"text": "Colosseum",
	})
	if err != nil {
		return fmt.Errorf("failed to construct request payload: %w", err)
	}

	// TODO(developer): Try different dimenions: 128, 256, 512, 1408
	outputDimensionality := 128
	params, err := structpb.NewValue(map[string]any{
		"dimension": outputDimensionality,
	})
	if err != nil {
		return fmt.Errorf("failed to construct request params: %w", err)
	}

	req := &aiplatformpb.PredictRequest{
		Endpoint: endpoint,
		// The model supports only 1 instance per request.
		Instances:  []*structpb.Value{instance},
		Parameters: params,
	}

	resp, err := client.Predict(ctx, req)
	if err != nil {
		return fmt.Errorf("failed to generate embeddings: %w", err)
	}

	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
	if err != nil {
		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
	}
	// For response schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
	var instanceEmbeddings struct {
		ImageEmbeddings []float32 `json:"imageEmbedding"`
		TextEmbeddings  []float32 `json:"textEmbedding"`
	}
	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
		return fmt.Errorf("failed to unmarshal JSON: %w", err)
	}

	imageEmbedding := instanceEmbeddings.ImageEmbeddings
	textEmbedding := instanceEmbeddings.TextEmbeddings

	fmt.Fprintf(w, "Text embedding (length=%d): %v\n", len(textEmbedding), textEmbedding)
	fmt.Fprintf(w, "Image embedding (length=%d): %v\n", len(imageEmbedding), imageEmbedding)
	// Example response:
	// Text Embedding (length=128): [0.27469793 -0.14625867 0.022280363 ... ]
	// Image Embedding (length=128): [0.06225733 -0.040650766 0.02604402 ... ]

	return nil
}

傳送嵌入要求 (圖片和文字)

使用下列程式碼範例，傳送含有圖片和文字資料的嵌入要求。範例會說明如何傳送包含兩種資料類型的要求，但您也可以使用個別資料類型。

取得文字和圖片嵌入

REST

如要進一步瞭解 multimodalembedding 模型要求，請參閱 multimodalembedding 模型 API 參考資料。

使用任何要求資料之前，請先替換以下項目：

LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
PROJECT_ID：您的 Google Cloud 專案 ID。
TEXT：要取得嵌入的目標文字。例如：a cat。
B64_ENCODED_IMG：要取得嵌入內容的目標圖片。圖片必須指定為 base64 編碼的位元組字串。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

JSON 要求主體：

{
  "instances": [
    {
      "text": "TEXT",
      "image": {
        "bytesBase64Encoded": "B64_ENCODED_IMG"
      }
    }
  ]
}

如要傳送要求，請選擇以下其中一個選項：

curl

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content

模型傳回的嵌入是 1408 個浮點數向量。下列範例回應已縮短，以節省空間。

{
  "predictions": [
    {
      "textEmbedding": [
        0.010477379,
        -0.00399621,
        0.00576670747,
        [...]
        -0.00823613815,
        -0.0169572588,
        -0.00472954148
      ],
      "imageEmbedding": [
        0.00262696808,
        -0.00198890246,
        0.0152047109,
        -0.0103145819,
        [...]
        0.0324628279,
        0.0284924973,
        0.011650892,
        -0.00452344026
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Python API 參考說明文件。

import vertexai
from vertexai.vision_models import Image, MultiModalEmbeddingModel

# TODO(developer): Update & uncomment line below
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file(
    "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)

embeddings = model.get_embeddings(
    image=image,
    contextual_text="Colosseum",
    dimension=1408,
)
print(f"Image Embedding: {embeddings.image_embedding}")
print(f"Text Embedding: {embeddings.text_embedding}")
# Example response:
# Image Embedding: [-0.0123147098, 0.0727171078, ...]
# Text Embedding: [0.00230263756, 0.0278981831, ...]

Node.js

在試用這個範例之前，請先按照Node.js使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Node.js API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
// const baseImagePath = 'YOUR_BASE_IMAGE_PATH';
// const textPrompt = 'YOUR_TEXT_PROMPT';
const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};
const publisher = 'google';
const model = 'multimodalembedding@001';

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictImageFromImageAndText() {
  // Configure the parent resource
  const endpoint = `projects/${project}/locations/${location}/publishers/${publisher}/models/${model}`;

  const fs = require('fs');
  const imageFile = fs.readFileSync(baseImagePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const prompt = {
    text: textPrompt,
    image: {
      bytesBase64Encoded: encodedImage,
    },
  };
  const instanceValue = helpers.toValue(prompt);
  const instances = [instanceValue];

  const parameter = {
    sampleCount: 1,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  console.log('Get image embedding response');
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const prediction of predictions) {
    console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);
  }
}

await predictImageFromImageAndText();

Java

在試用這個範例之前，請先按照Java使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Java API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。


import com.google.cloud.aiplatform.v1beta1.EndpointName;
import com.google.cloud.aiplatform.v1beta1.PredictResponse;
import com.google.cloud.aiplatform.v1beta1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1beta1.PredictionServiceSettings;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Base64;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class PredictImageFromImageAndTextSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace this variable before running the sample.
    String project = "YOUR_PROJECT_ID";
    String textPrompt = "YOUR_TEXT_PROMPT";
    String baseImagePath = "YOUR_BASE_IMAGE_PATH";

    // Learn how to use text prompts to update an image:
    // https://cloud.google.com/vertex-ai/docs/generative-ai/image/edit-images
    Map<String, Object> parameters = new HashMap<String, Object>();
    parameters.put("sampleCount", 1);

    String location = "us-central1";
    String publisher = "google";
    String model = "multimodalembedding@001";

    predictImageFromImageAndText(
        project, location, publisher, model, textPrompt, baseImagePath, parameters);
  }

  // Update images using text prompts
  public static void predictImageFromImageAndText(
      String project,
      String location,
      String publisher,
      String model,
      String textPrompt,
      String baseImagePath,
      Map<String, Object> parameters)
      throws IOException {
    final String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    final PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      final EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);

      // Convert the image to Base64
      byte[] imageData = Base64.getEncoder().encode(Files.readAllBytes(Paths.get(baseImagePath)));
      String encodedImage = new String(imageData, StandardCharsets.UTF_8);

      JsonObject jsonInstance = new JsonObject();
      jsonInstance.addProperty("text", textPrompt);
      JsonObject jsonImage = new JsonObject();
      jsonImage.addProperty("bytesBase64Encoded", encodedImage);
      jsonInstance.add("image", jsonImage);

      Value instanceValue = stringToValue(jsonInstance.toString());
      List<Value> instances = new ArrayList<>();
      instances.add(instanceValue);

      Gson gson = new Gson();
      String gsonString = gson.toJson(parameters);
      Value parameterValue = stringToValue(gsonString);

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, parameterValue);
      System.out.println("Predict Response");
      System.out.println(predictResponse);
      for (Value prediction : predictResponse.getPredictionsList()) {
        System.out.format("\tPrediction: %s\n", prediction);
      }
    }
  }

  // Convert a Json string to a protobuf.Value
  static Value stringToValue(String value) throws InvalidProtocolBufferException {
    Value.Builder builder = Value.newBuilder();
    JsonFormat.parser().merge(value, builder);
    return builder.build();
  }
}

Go

在試用這個範例之前，請先按照Go使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Go API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

import (
	"context"
	"encoding/json"
	"fmt"
	"io"

	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
	"google.golang.org/api/option"
	"google.golang.org/protobuf/encoding/protojson"
	"google.golang.org/protobuf/types/known/structpb"
)

// generateForTextAndImage shows how to use the multimodal model to generate embeddings for
// text and image inputs.
func generateForTextAndImage(w io.Writer, project, location string) error {
	// location = "us-central1"
	ctx := context.Background()
	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	if err != nil {
		return fmt.Errorf("failed to construct API client: %w", err)
	}
	defer client.Close()

	model := "multimodalembedding@001"
	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)

	// This is the input to the model's prediction call. For schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
	instance, err := structpb.NewValue(map[string]any{
		"image": map[string]any{
			// Image input can be provided either as a Google Cloud Storage URI or as
			// base64-encoded bytes using the "bytesBase64Encoded" field.
			"gcsUri": "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png",
		},
		"text": "Colosseum",
	})
	if err != nil {
		return fmt.Errorf("failed to construct request payload: %w", err)
	}

	req := &aiplatformpb.PredictRequest{
		Endpoint: endpoint,
		// The model supports only 1 instance per request.
		Instances: []*structpb.Value{instance},
	}

	resp, err := client.Predict(ctx, req)
	if err != nil {
		return fmt.Errorf("failed to generate embeddings: %w", err)
	}

	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
	if err != nil {
		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
	}
	// For response schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
	var instanceEmbeddings struct {
		ImageEmbeddings []float32 `json:"imageEmbedding"`
		TextEmbeddings  []float32 `json:"textEmbedding"`
	}
	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
		return fmt.Errorf("failed to unmarshal JSON: %w", err)
	}

	imageEmbedding := instanceEmbeddings.ImageEmbeddings
	textEmbedding := instanceEmbeddings.TextEmbeddings

	fmt.Fprintf(w, "Text embedding (length=%d): %v\n", len(textEmbedding), textEmbedding)
	fmt.Fprintf(w, "Image embedding (length=%d): %v\n", len(imageEmbedding), imageEmbedding)
	// Example response:
	// Text embedding (length=1408): [0.0023026613 0.027898183 -0.011858357 ... ]
	// Image embedding (length=1408): [-0.012314269 0.07271844 0.00020170923 ... ]

	return nil
}

傳送嵌入要求 (影片、圖片或文字)

傳送嵌入要求時，您可以單獨指定輸入影片，也可以指定影片、圖片和文字資料的組合。

影片嵌入模式

影片嵌入功能提供三種模式：基本、標準或進階。模式對應於產生的嵌入向量密度，可透過要求中的 interval_sec config 指定。系統會針對每個長度為 interval_sec 的影片間隔生成嵌入內容。影片間隔長度最短為 4 秒。間隔長度超過 120 秒可能會對產生的嵌入內容品質造成負面影響。

影片嵌入的價格取決於你使用的模式。詳情請參閱定價。

下表摘要說明影片嵌入功能的三種模式：

模式	每分鐘的嵌入數量上限	影片嵌入間隔 (最小值)
基本服務	4	15 這對應於：`intervalSec` >= 15
標準	8	8 這對應於：8 <= `intervalSec` < 15
Plus	15	4 這對應於：4 <= `intervalSec` < 8

影片嵌入最佳做法

傳送影片嵌入要求時，請注意下列事項：

如要為任意長度的輸入影片前兩分鐘生成單一嵌入，請使用下列 videoSegmentConfig 設定：

request.json：
```
// other request body content
"videoSegmentConfig": {
  "intervalSec": 120
}
// other request body content
```

如要為長度超過兩分鐘的影片生成嵌入內容，可以傳送多項要求，並在 videoSegmentConfig 中指定開始和結束時間：

request1.json：

// other request body content
"videoSegmentConfig": {
  "startOffsetSec": 0,
  "endOffsetSec": 120
}
// other request body content

request2.json：

// other request body content
"videoSegmentConfig": {
  "startOffsetSec": 120,
  "endOffsetSec": 240
}
// other request body content

取得影片嵌入

使用下列範例，取得影片內容的嵌入內容。

REST

如要進一步瞭解 multimodalembedding 模型要求，請參閱 multimodalembedding 模型 API 參考資料。

下列範例使用 Cloud Storage 中的影片。您也可以使用 video.bytesBase64Encoded 欄位，提供影片的 Base64 編碼字串表示法。

使用任何要求資料之前，請先替換以下項目：

LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
PROJECT_ID：您的 Google Cloud 專案 ID。
VIDEO_URI：要取得嵌入內容的目標影片 Cloud Storage URI。例如：gs://my-bucket/embeddings/supermarket-video.mp4。
您也可以提供 Base64 編碼的位元組字串：
```
[...]
"video": {
  "bytesBase64Encoded": "B64_ENCODED_VIDEO"
}
[...]
```
videoSegmentConfig (START_SECOND、END_SECOND、 INTERVAL_SECONDS)。選用。生成嵌入內容的特定影片片段 (以秒為單位)。
您為 videoSegmentConfig.intervalSec 設定的值會影響收費的價格層級。詳情請參閱影片嵌入模式一節和定價頁面。

例如：
```
[...]
"videoSegmentConfig": {
  "startOffsetSec": 10,
  "endOffsetSec": 60,
  "intervalSec": 10
}
[...]
```
使用這項設定可指定 10 秒到 60 秒的影片資料，並為下列 10 秒的影片間隔生成嵌入內容：[10, 20)、[20, 30)、[30, 40)、[40, 50)、[50, 60)。這個影片間隔 ("intervalSec": 10) 屬於標準影片嵌入模式，系統會以標準模式定價費率向使用者收費。

如果省略 videoSegmentConfig，服務會使用下列預設值： "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }。這個影片間隔 ("intervalSec": 16) 屬於基本影片嵌入模式，因此系統會以基本模式的價格費率向使用者收費。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

JSON 要求主體：

{
  "instances": [
    {
      "video": {
        "gcsUri": "VIDEO_URI",
        "videoSegmentConfig": {
          "startOffsetSec": START_SECOND,
          "endOffsetSec": END_SECOND,
          "intervalSec": INTERVAL_SECONDS
        }
      }
    }
  ]
}

如要傳送要求，請選擇以下其中一個選項：

curl

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content

模型傳回的嵌入是 1408 個浮點數向量。下列範例回應已縮短，以節省空間。

回覆 (7 秒影片，未指定 videoSegmentConfig)：

{
  "predictions": [
    {
      "videoEmbeddings": [
        {
          "endOffsetSec": 7,
          "embedding": [
            -0.0045467657,
            0.0258095954,
            0.0146885719,
            0.00945400633,
            [...]
            -0.0023291884,
            -0.00493789,
            0.00975185353,
            0.0168156829
          ],
          "startOffsetSec": 0
        }
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

回覆 (59 秒影片，影片片段設定如下："videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 60, "intervalSec": 10 })：

{
  "predictions": [
    {
      "videoEmbeddings": [
        {
          "endOffsetSec": 10,
          "startOffsetSec": 0,
          "embedding": [
            -0.00683252793,
            0.0390476175,
            [...]
            0.00657121744,
            0.013023301
          ]
        },
        {
          "startOffsetSec": 10,
          "endOffsetSec": 20,
          "embedding": [
            -0.0104404651,
            0.0357737206,
            [...]
            0.00509833824,
            0.0131902946
          ]
        },
        {
          "startOffsetSec": 20,
          "embedding": [
            -0.0113538112,
            0.0305239167,
            [...]
            -0.00195809244,
            0.00941874553
          ],
          "endOffsetSec": 30
        },
        {
          "embedding": [
            -0.00299320649,
            0.0322436653,
            [...]
            -0.00993082579,
            0.00968887936
          ],
          "startOffsetSec": 30,
          "endOffsetSec": 40
        },
        {
          "endOffsetSec": 50,
          "startOffsetSec": 40,
          "embedding": [
            -0.00591270532,
            0.0368893594,
            [...]
            -0.00219071587,
            0.0042470959
          ]
        },
        {
          "embedding": [
            -0.00458270218,
            0.0368121453,
            [...]
            -0.00317760976,
            0.00595594104
          ],
          "endOffsetSec": 59,
          "startOffsetSec": 50
        }
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Python API 參考說明文件。

import vertexai

from vertexai.vision_models import MultiModalEmbeddingModel, Video
from vertexai.vision_models import VideoSegmentConfig

# TODO(developer): Update & uncomment line below
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")

embeddings = model.get_embeddings(
    video=Video.load_from_file(
        "gs://cloud-samples-data/vertex-ai-vision/highway_vehicles.mp4"
    ),
    video_segment_config=VideoSegmentConfig(end_offset_sec=1),
)

# Video Embeddings are segmented based on the video_segment_config.
print("Video Embeddings:")
for video_embedding in embeddings.video_embeddings:
    print(
        f"Video Segment: {video_embedding.start_offset_sec} - {video_embedding.end_offset_sec}"
    )
    print(f"Embedding: {video_embedding.embedding}")

# Example response:
# Video Embeddings:
# Video Segment: 0.0 - 1.0
# Embedding: [-0.0206376351, 0.0123456789, ...]

Go

在試用這個範例之前，請先按照Go使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Go API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

import (
	"context"
	"encoding/json"
	"fmt"
	"io"
	"time"

	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
	"google.golang.org/api/option"
	"google.golang.org/protobuf/encoding/protojson"
	"google.golang.org/protobuf/types/known/structpb"
)

// generateForVideo shows how to use the multimodal model to generate embeddings for video input.
func generateForVideo(w io.Writer, project, location string) error {
	// location = "us-central1"

	// The default context timeout may be not enough to process a video input.
	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
	defer cancel()

	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	if err != nil {
		return fmt.Errorf("failed to construct API client: %w", err)
	}
	defer client.Close()

	model := "multimodalembedding@001"
	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)

	// This is the input to the model's prediction call. For schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
	instances, err := structpb.NewValue(map[string]any{
		"video": map[string]any{
			// Video input can be provided either as a Google Cloud Storage URI or as base64-encoded
			// bytes using the "bytesBase64Encoded" field.
			"gcsUri": "gs://cloud-samples-data/vertex-ai-vision/highway_vehicles.mp4",
			"videoSegmentConfig": map[string]any{
				"startOffsetSec": 1,
				"endOffsetSec":   5,
			},
		},
	})
	if err != nil {
		return fmt.Errorf("failed to construct request payload: %w", err)
	}

	req := &aiplatformpb.PredictRequest{
		Endpoint: endpoint,
		// The model supports only 1 instance per request.
		Instances: []*structpb.Value{instances},
	}
	resp, err := client.Predict(ctx, req)
	if err != nil {
		return fmt.Errorf("failed to generate embeddings: %w", err)
	}

	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
	if err != nil {
		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
	}
	// For response schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
	var instanceEmbeddings struct {
		VideoEmbeddings []struct {
			Embedding      []float32 `json:"embedding"`
			StartOffsetSec float64   `json:"startOffsetSec"`
			EndOffsetSec   float64   `json:"endOffsetSec"`
		} `json:"videoEmbeddings"`
	}
	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
		return fmt.Errorf("failed to unmarshal json: %w", err)
	}
	// Get the embedding for our single video segment (`.videoEmbeddings` object has one entry per
	// each processed segment).
	videoEmbedding := instanceEmbeddings.VideoEmbeddings[0]

	fmt.Fprintf(w, "Video embedding (seconds: %.f-%.f; length=%d): %v\n",
		videoEmbedding.StartOffsetSec,
		videoEmbedding.EndOffsetSec,
		len(videoEmbedding.Embedding),
		videoEmbedding.Embedding,
	)
	// Example response:
	// Video embedding (seconds: 1-5; length=1408): [-0.016427778 0.032878537 -0.030755188 ... ]

	return nil
}

取得圖片、文字和影片嵌入

請使用下列範例，取得影片、文字和圖片內容的嵌入項目。

REST

如要進一步瞭解 multimodalembedding 模型要求，請參閱 multimodalembedding 模型 API 參考資料。

以下範例使用圖片、文字和影片資料。您可以在要求主體中使用這些資料類型的任意組合。

此外，這個範例會使用 Cloud Storage 中的影片。您也可以使用 video.bytesBase64Encoded 欄位，提供影片的 Base64 編碼字串表示法。

使用任何要求資料之前，請先替換以下項目：

LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
PROJECT_ID：您的 Google Cloud 專案 ID。
TEXT：要取得嵌入的目標文字。例如：a cat。
IMAGE_URI：要取得嵌入內容的目標圖片 Cloud Storage URI。例如：gs://my-bucket/embeddings/supermarket-img.png。
您也可以提供圖片做為 Base64 編碼的位元組字串：
```
[...]
"image": {
  "bytesBase64Encoded": "B64_ENCODED_IMAGE"
}
[...]
```
VIDEO_URI：要取得嵌入內容的目標影片 Cloud Storage URI。例如：gs://my-bucket/embeddings/supermarket-video.mp4。
您也可以提供 Base64 編碼的位元組字串：
```
[...]
"video": {
  "bytesBase64Encoded": "B64_ENCODED_VIDEO"
}
[...]
```
videoSegmentConfig (START_SECOND、END_SECOND、 INTERVAL_SECONDS)。選用。生成嵌入內容的特定影片片段 (以秒為單位)。
您為 videoSegmentConfig.intervalSec 設定的值會影響收費的價格層級。詳情請參閱影片嵌入模式一節和定價頁面。

例如：
```
[...]
"videoSegmentConfig": {
  "startOffsetSec": 10,
  "endOffsetSec": 60,
  "intervalSec": 10
}
[...]
```
使用這項設定可指定 10 秒到 60 秒的影片資料，並為下列 10 秒的影片間隔生成嵌入內容：[10, 20)、[20, 30)、[30, 40)、[40, 50)、[50, 60)。這個影片間隔 ("intervalSec": 10) 屬於標準影片嵌入模式，系統會以標準模式定價費率向使用者收費。

如果省略 videoSegmentConfig，服務會使用下列預設值： "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }。這個影片間隔 ("intervalSec": 16) 屬於基本影片嵌入模式，因此系統會以基本模式的價格費率向使用者收費。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

JSON 要求主體：

{
  "instances": [
    {
      "text": "TEXT",
      "image": {
        "gcsUri": "IMAGE_URI"
      },
      "video": {
        "gcsUri": "VIDEO_URI",
        "videoSegmentConfig": {
          "startOffsetSec": START_SECOND,
          "endOffsetSec": END_SECOND,
          "intervalSec": INTERVAL_SECONDS
        }
      }
    }
  ]
}

如要傳送要求，請選擇以下其中一個選項：

curl

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content

模型傳回的嵌入是 1408 個浮點數向量。下列範例回應已縮短，以節省空間。

{
  "predictions": [
    {
      "textEmbedding": [
        0.0105433334,
        -0.00302835181,
        0.00656806398,
        0.00603460241,
        [...]
        0.00445805816,
        0.0139605571,
        -0.00170318608,
        -0.00490092579
      ],
      "videoEmbeddings": [
        {
          "startOffsetSec": 0,
          "endOffsetSec": 7,
          "embedding": [
            -0.00673126569,
            0.0248149596,
            0.0128901172,
            0.0107588246,
            [...]
            -0.00180952181,
            -0.0054573305,
            0.0117037306,
            0.0169312079
          ]
        }
      ],
      "imageEmbedding": [
        -0.00728622358,
        0.031021487,
        -0.00206603738,
        0.0273937676,
        [...]
        -0.00204976718,
        0.00321615417,
        0.0121978866,
        0.0193375275
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Python API 參考說明文件。

import vertexai

from vertexai.vision_models import Image, MultiModalEmbeddingModel, Video
from vertexai.vision_models import VideoSegmentConfig

# TODO(developer): Update & uncomment line below
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")

image = Image.load_from_file(
    "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)
video = Video.load_from_file(
    "gs://cloud-samples-data/vertex-ai-vision/highway_vehicles.mp4"
)

embeddings = model.get_embeddings(
    image=image,
    video=video,
    video_segment_config=VideoSegmentConfig(end_offset_sec=1),
    contextual_text="Cars on Highway",
)

print(f"Image Embedding: {embeddings.image_embedding}")

# Video Embeddings are segmented based on the video_segment_config.
print("Video Embeddings:")
for video_embedding in embeddings.video_embeddings:
    print(
        f"Video Segment: {video_embedding.start_offset_sec} - {video_embedding.end_offset_sec}"
    )
    print(f"Embedding: {video_embedding.embedding}")

print(f"Text Embedding: {embeddings.text_embedding}")
# Example response:
# Image Embedding: [-0.0123144267, 0.0727186054, 0.000201397663, ...]
# Video Embeddings:
# Video Segment: 0.0 - 1.0
# Embedding: [-0.0206376351, 0.0345234685, ...]
# Text Embedding: [-0.0207006838, -0.00251058186, ...]

Go

在試用這個範例之前，請先按照Go使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Go API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

import (
	"context"
	"encoding/json"
	"fmt"
	"io"
	"time"

	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
	"google.golang.org/api/option"
	"google.golang.org/protobuf/encoding/protojson"
	"google.golang.org/protobuf/types/known/structpb"
)

// generateForImageTextAndVideo shows how to use the multimodal model to generate embeddings for
// image, text and video data.
func generateForImageTextAndVideo(w io.Writer, project, location string) error {
	// location = "us-central1"

	// The default context timeout may be not enough to process a video input.
	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
	defer cancel()

	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	if err != nil {
		return fmt.Errorf("failed to construct API client: %w", err)
	}
	defer client.Close()

	model := "multimodalembedding@001"
	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)

	// This is the input to the model's prediction call. For schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
	instance, err := structpb.NewValue(map[string]any{
		"text": "Domestic cats in natural conditions",
		"image": map[string]any{
			// Image and video inputs can be provided either as a Google Cloud Storage URI or as
			// base64-encoded bytes using the "bytesBase64Encoded" field.
			"gcsUri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg",
		},
		"video": map[string]any{
			"gcsUri": "gs://cloud-samples-data/video/cat.mp4",
		},
	})
	if err != nil {
		return fmt.Errorf("failed to construct request payload: %w", err)
	}

	req := &aiplatformpb.PredictRequest{
		Endpoint: endpoint,
		// The model supports only 1 instance per request.
		Instances: []*structpb.Value{instance},
	}

	resp, err := client.Predict(ctx, req)
	if err != nil {
		return fmt.Errorf("failed to generate embeddings: %w", err)
	}

	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
	if err != nil {
		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
	}
	// For response schema, see:
	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
	var instanceEmbeddings struct {
		ImageEmbeddings []float32 `json:"imageEmbedding"`
		TextEmbeddings  []float32 `json:"textEmbedding"`
		VideoEmbeddings []struct {
			Embedding      []float32 `json:"embedding"`
			StartOffsetSec float64   `json:"startOffsetSec"`
			EndOffsetSec   float64   `json:"endOffsetSec"`
		} `json:"videoEmbeddings"`
	}
	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
		return fmt.Errorf("failed to unmarshal JSON: %w", err)
	}

	imageEmbedding := instanceEmbeddings.ImageEmbeddings
	textEmbedding := instanceEmbeddings.TextEmbeddings
	// Get the embedding for our single video segment (`.videoEmbeddings` object has one entry per
	// each processed segment).
	videoEmbedding := instanceEmbeddings.VideoEmbeddings[0].Embedding

	fmt.Fprintf(w, "Image embedding (length=%d): %v\n", len(imageEmbedding), imageEmbedding)
	fmt.Fprintf(w, "Text embedding (length=%d): %v\n", len(textEmbedding), textEmbedding)
	fmt.Fprintf(w, "Video embedding (length=%d): %v\n", len(videoEmbedding), videoEmbedding)
	// Example response:
	// Image embedding (length=1408): [-0.01558477 0.0258355 0.016342038 ... ]
	// Text embedding (length=1408): [-0.005894961 0.008349559 0.015355394 ... ]
	// Video embedding (length=1408): [-0.018867437 0.013997682 0.0012682161 ... ]

	return nil
}

後續步驟

請參閱「What is Multimodal Search: 'LLMs with vision' change businesses」網誌。
如要瞭解僅限文字的應用實例 (以文字為基礎的語意搜尋、叢集、長篇文件分析，以及其他文字檢索或問答應用實例)，請參閱「取得文字嵌入」。
如要查看所有 Vertex AI 圖片生成式 AI 產品，請參閱「 Vertex AI 中的 Imagen 總覽」。
在模型園地探索更多預先訓練模型。
瞭解 Vertex AI 中的負責任 AI 最佳做法和安全篩選器。

取得多模態嵌入 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

支援的模型

最佳做法

API 使用量

API 上限

事前準備

Java

Node.js

Python

REST

位置

錯誤訊息

超過配額錯誤

指定低維度嵌入

REST

curl

PowerShell

Python

Go

傳送嵌入要求 (圖片和文字)

取得文字和圖片嵌入

REST

curl

PowerShell

Python

Node.js

Java

Go

傳送嵌入要求 (影片、圖片或文字)

影片嵌入模式

影片嵌入最佳做法

取得影片嵌入

REST

curl

PowerShell

Python

Go

取得圖片、文字和影片嵌入

REST

curl

PowerShell

Python

Go

後續步驟

取得多模態嵌入