自 2025 年 4 月 29 日起，Gemini 1.5 Pro 和 Gemini 1.5 Flash 模型將無法用於先前未使用這些模型的專案，包括新專案。詳情請參閱「模型版本和生命週期」。

本頁面由 Cloud Translation API 翻譯而成。

圖像問題回答 (VQA)

Imagen for Captioning & VQA (imagetext) 是支援圖片問答的模型名稱。Imagen for Captioning & VQA 可回答特定圖片的問題，即使模型從未見過該圖片也沒問題。

如要在控制台中探索這個模型，請參閱 Model Garden 中的「Imagen for Captioning & VQA」模型資訊卡。

查看 Imagen for Captioning & VQA 模型資訊卡

用途

圖片問答的常見用途包括：

透過問與答功能，讓使用者與視覺內容互動。
讓消費者與零售應用程式和網站上顯示的產品圖片互動。
為視障使用者提供無障礙選項。

HTTP 要求

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/imagetext:predict

要求主體

{
  "instances": [
    {
      "prompt": string,
      "image": {
        // Union field can be only one of the following:
        "bytesBase64Encoded": string,
        "gcsUri": string,
        // End of list of possible types for union field.
        "mimeType": string
      }
    }
  ],
  "parameters": {
    "sampleCount": integer,
    "seed": integer
  }
}

請使用下列參數，生成視覺問答模型 imagetext。詳情請參閱「使用視覺問答 (VQA)」。

參數	說明	可接受的值
`instances`	這個陣列包含物件，內含提示和圖片詳細資料，可取得相關資訊。	陣列 (允許 1 個圖片物件)
`prompt`	你想詢問圖片的問題。	字串 (最多 80 個權杖)
`bytesBase64Encoded`	要取得相關資訊的圖片。	Base64 編碼的圖片字串 (PNG 或 JPEG，大小上限為 20 MB)
`gcsUri`	要取得相關資訊的圖片 Cloud Storage URI。	Cloud Storage 中圖片檔案的 URI 字串 (PNG 或 JPEG，大小上限為 20 MB)
`mimeType`	(選用步驟) 您指定的圖片 MIME 類型。	字串 (`image/jpeg` 或 `image/png`)
`sampleCount`	產生的文字字串數量。	Int 值：1 到 3
`seed`	(選用步驟) 隨機號碼產生器 (RNG) 的種子。如果輸入內容的 RNG 種子相同，預測結果也會相同。	整數

要求範例

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的 Google Cloud 專案 ID。
LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
VQA_PROMPT：你想詢問圖片的問題。
- 這雙鞋是什麼顏色？
- 這件襯衫的袖子是什麼類型？
B64_IMAGE：要取得說明文字的圖片。圖片必須指定為 Base64 編碼的位元組字串。大小上限：10 MB。
RESPONSE_COUNT：要生成的答案數量。接受的整數值：1 到 3。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict

JSON 要求主體：

{
  "instances": [
    {
      "prompt": "VQA_PROMPT",
      "image": {
          "bytesBase64Encoded": "B64_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": RESPONSE_COUNT
  }
}

如要傳送要求，請選擇以下其中一個選項：

curl

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict" | Select-Object -Expand Content

以下範例回應適用於含有 "sampleCount": 2 和 "prompt": "What is this?" 的要求。回應會傳回兩個預測字串答案。

{
  "predictions": [
    "cappuccino",
    "coffee"
  ]
}

回應主體


{
  "predictions": [
    string
  ]
}

回應元素	說明
`predictions`	代表 VQA 答案的字串清單，依信賴度排序。

回應範例

以下是包含 "sampleCount": 2 和 "prompt": "What is this?" 的要求範例回應。回應會傳回兩個預測字串答案。

{
  "predictions": [
    "cappuccino",
    "coffee"
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID",
  "model": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID",
  "modelDisplayName": "MODEL_DISPLAYNAME",
  "modelVersionId": "1"
}

圖像問題回答 (VQA) 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

用途

HTTP 要求

要求主體

要求範例

curl

PowerShell

回應主體

回應範例

圖像問題回答 (VQA)