{"instances":[{"prompt":string,"image":{// Union field can be only one of the following:"bytesBase64Encoded":string,"gcsUri":string,// End of list of possible types for union field."mimeType":string}}],"parameters":{"sampleCount":integer,"seed":integer}}
시각적 Q&A 생성 모델 imagetext에 다음 매개변수를 사용합니다.
자세한 내용은 시각적 질문 답변(VQA) 사용을 참조하세요.
매개변수
설명
사용 가능한 값
instances
정보를 가져올 프롬프트 및 이미지 세부정보가 있는 객체가 포함된 배열입니다.
배열(이미지 객체 1개 허용)
prompt
이미지와 관련하여 답변해야 하는 질문
문자열(토큰 최대 80개)
bytesBase64Encoded
정보를 가져올 이미지입니다.
Base64로 인코딩된 이미지 문자열(PNG 또는 JPEG, 최대 20MB)
gcsUri
정보를 가져올 이미지의 Cloud Storage URI입니다.
Cloud Storage에 있는 이미지 파일의 문자열 URI(PNG 또는 JPEG, 최대 20MB)
mimeType
선택사항입니다. 지정한 이미지의 MIME 유형입니다.
문자열(image/jpeg 또는 image/png)
sampleCount
생성된 텍스트 문자열 수입니다.
Int value: 1-3
seed
선택사항입니다. 랜덤 숫자 생성기(RNG)의 시드. 입력이 있는 요청에서 RNG 시드가 동일하면 예측 결과가 동일합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Visual question and answering (VQA)\n\nImagen for Captioning \\& VQA (`imagetext`) is the name of the model that supports image question and\nanswering. Imagen for Captioning \\& VQA answers a question provided for a given image, even\nif it hasn't been seen before by the model.\n\nTo explore this model in the console, see the Imagen for Captioning \\& VQA model card in\nthe Model Garden.\n\n\n[View Imagen for Captioning \\& VQA model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/imagetext)\n\nUse cases\n---------\n\nSome common use cases for image question and answering include:\n\n- Empower users to engage with visual content with Q\\&A.\n- Enable customers to engage with product images shown on retail apps and websites.\n- Provide accessibility options for visually impaired users.\n\nHTTP request\n------------\n\n POST https://us-central1-aiplatform.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/us-central1/publishers/google/models/imagetext:predict\n\nRequest body\n------------\n\n {\n \"instances\": [\n {\n \"prompt\": string,\n \"image\": {\n // Union field can be only one of the following:\n \"bytesBase64Encoded\": string,\n \"gcsUri\": string,\n // End of list of possible types for union field.\n \"mimeType\": string\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": integer,\n \"seed\": integer\n }\n }\n\nUse the following parameters for the visual Q\\&A generation model `imagetext`.\nFor more information, see [Use Visual Question Answering (VQA)](/vertex-ai/generative-ai/docs/image/visual-question-answering).\n\nSample request\n--------------\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: Your Google Cloud [project ID](/resource-manager/docs/creating-managing-projects#identifiers).\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: Your project's region. For example, `us-central1`, `europe-west2`, or `asia-northeast3`. For a list of available regions, see [Generative AI on Vertex AI locations](/vertex-ai/generative-ai/docs/learn/locations-genai).\n- \u003cvar translate=\"no\"\u003eVQA_PROMPT\u003c/var\u003e: The question you want to get answered about your image.\n - *What color is this shoe?*\n - *What type of sleeves are on the shirt?*\n- \u003cvar translate=\"no\"\u003eB64_IMAGE\u003c/var\u003e: The image to get captions for. The image must be specified as a [base64-encoded](/vertex-ai/generative-ai/docs/image/base64-encode) byte string. Size limit: 10 MB.\n- \u003cvar translate=\"no\"\u003eRESPONSE_COUNT\u003c/var\u003e: The number of answers you want to generate. Accepted integer values: 1-3.\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\n```\n\n\nRequest JSON body:\n\n```\n{\n \"instances\": [\n {\n \"prompt\": \"VQA_PROMPT\",\n \"image\": {\n \"bytesBase64Encoded\": \"B64_IMAGE\"\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": RESPONSE_COUNT\n }\n}\n```\n\nTo send your request, choose one of these options: \n\n#### curl\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\"\n```\n\n#### PowerShell\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\" | Select-Object -Expand Content\n```\nThe following sample responses are for a request with `\"sampleCount\": 2` and `\"prompt\": \"What is this?\"`. The response returns two prediction string answers.\n\n```\n{\n \"predictions\": [\n \"cappuccino\",\n \"coffee\"\n ]\n}\n```\n\n\u003cbr /\u003e\n\nResponse body\n-------------\n\n\n {\n \"predictions\": [\n string\n ]\n }\n\nSample response\n---------------\n\nThe following sample responses is for a request with `\"sampleCount\": 2` and\n`\"prompt\": \"What is this?\"`. The response returns two prediction string answers. \n\n {\n \"predictions\": [\n \"cappuccino\",\n \"coffee\"\n ],\n \"deployedModelId\": \"DEPLOYED_MODEL_ID\",\n \"model\": \"projects/PROJECT_ID/locations/us-central1/models/MODEL_ID\",\n \"modelDisplayName\": \"MODEL_DISPLAYNAME\",\n \"modelVersionId\": \"1\"\n }"]]