{"instances":[{"prompt":string,"image":{// Union field can be only one of the following:"bytesBase64Encoded":string,"gcsUri":string,// End of list of possible types for union field."mimeType":string}}],"parameters":{"sampleCount":integer,"seed":integer}}
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Visual question and answering (VQA)\n\nImagen for Captioning \\& VQA (`imagetext`) is the name of the model that supports image question and\nanswering. Imagen for Captioning \\& VQA answers a question provided for a given image, even\nif it hasn't been seen before by the model.\n\nTo explore this model in the console, see the Imagen for Captioning \\& VQA model card in\nthe Model Garden.\n\n\n[View Imagen for Captioning \\& VQA model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/imagetext)\n\nUse cases\n---------\n\nSome common use cases for image question and answering include:\n\n- Empower users to engage with visual content with Q\\&A.\n- Enable customers to engage with product images shown on retail apps and websites.\n- Provide accessibility options for visually impaired users.\n\nHTTP request\n------------\n\n POST https://us-central1-aiplatform.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/us-central1/publishers/google/models/imagetext:predict\n\nRequest body\n------------\n\n {\n \"instances\": [\n {\n \"prompt\": string,\n \"image\": {\n // Union field can be only one of the following:\n \"bytesBase64Encoded\": string,\n \"gcsUri\": string,\n // End of list of possible types for union field.\n \"mimeType\": string\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": integer,\n \"seed\": integer\n }\n }\n\nUse the following parameters for the visual Q\\&A generation model `imagetext`.\nFor more information, see [Use Visual Question Answering (VQA)](/vertex-ai/generative-ai/docs/image/visual-question-answering).\n\nSample request\n--------------\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: Your Google Cloud [project ID](/resource-manager/docs/creating-managing-projects#identifiers).\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: Your project's region. For example, `us-central1`, `europe-west2`, or `asia-northeast3`. For a list of available regions, see [Generative AI on Vertex AI locations](/vertex-ai/generative-ai/docs/learn/locations-genai).\n- \u003cvar translate=\"no\"\u003eVQA_PROMPT\u003c/var\u003e: The question you want to get answered about your image.\n - *What color is this shoe?*\n - *What type of sleeves are on the shirt?*\n- \u003cvar translate=\"no\"\u003eB64_IMAGE\u003c/var\u003e: The image to get captions for. The image must be specified as a [base64-encoded](/vertex-ai/generative-ai/docs/image/base64-encode) byte string. Size limit: 10 MB.\n- \u003cvar translate=\"no\"\u003eRESPONSE_COUNT\u003c/var\u003e: The number of answers you want to generate. Accepted integer values: 1-3.\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\n```\n\n\nRequest JSON body:\n\n```\n{\n \"instances\": [\n {\n \"prompt\": \"VQA_PROMPT\",\n \"image\": {\n \"bytesBase64Encoded\": \"B64_IMAGE\"\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": RESPONSE_COUNT\n }\n}\n```\n\nTo send your request, choose one of these options: \n\n#### curl\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\"\n```\n\n#### PowerShell\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\" | Select-Object -Expand Content\n```\nThe following sample responses are for a request with `\"sampleCount\": 2` and `\"prompt\": \"What is this?\"`. The response returns two prediction string answers.\n\n```\n{\n \"predictions\": [\n \"cappuccino\",\n \"coffee\"\n ]\n}\n```\n\n\u003cbr /\u003e\n\nResponse body\n-------------\n\n\n {\n \"predictions\": [\n string\n ]\n }\n\nSample response\n---------------\n\nThe following sample responses is for a request with `\"sampleCount\": 2` and\n`\"prompt\": \"What is this?\"`. The response returns two prediction string answers. \n\n {\n \"predictions\": [\n \"cappuccino\",\n \"coffee\"\n ],\n \"deployedModelId\": \"DEPLOYED_MODEL_ID\",\n \"model\": \"projects/PROJECT_ID/locations/us-central1/models/MODEL_ID\",\n \"modelDisplayName\": \"MODEL_DISPLAYNAME\",\n \"modelVersionId\": \"1\"\n }"]]