Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
imagetext es el nombre del modelo que admite la leyenda de las imágenes. imagetext
genera una leyenda a partir de una imagen que proporcionas según el idioma que
especifiques. El modelo admite los siguientes idiomas: inglés (en), alemán
(de), francés (fr), español (es) y, además, italiano (it).
Para explorar este modelo en la consola, consulta la tarjeta de modelo Image Captioning en Model Garden.
{"instances":[{"image":{// Union field can be only one of the following:"bytesBase64Encoded":string,"gcsUri":string,// End of list of possible types for union field."mimeType":string}}],"parameters":{"sampleCount":integer,"storageUri":string,"language":string,"seed":integer}}
Un arreglo que contiene el objeto con detalles de la imagen para obtener información.
arreglo (se permite 1 objeto de imagen)
bytesBase64Encoded
La imagen que se debe subtitular.
String de imagen codificada en base64 (PNG o JPEG, 20 MB como máximo)
gcsUri
El URI de Cloud Storage de la imagen a los que se realizará la leyenda.
URI de string del archivo de imagen en Cloud Storage (PNG o JPEG, 20 MB como máximo)
mimeType
Opcional. El tipo de MIME de la imagen que especificas.
cadena (image/jpeg o image/png)
sampleCount
Cantidad de cadenas de texto generadas.
Valor de nro. entero: de 1 a 3
seed
Opcional. El valor inicial para el generador de números aleatorios (RNG). Si el valor inicial de RNG es el mismo para las solicitudes con las entradas, los resultados de la predicción serán los mismos.
número entero
storageUri
Opcional. La ubicación de Cloud Storage para guardar las respuestas de texto generadas.
string
language
Opcional. La instrucción de texto para guiar la respuesta.
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Información o código de muestra incorrectos","incorrectInformationOrSampleCode","thumb-down"],["Faltan la información o los ejemplos que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-09-04 (UTC)"],[],[],null,["# Image captions\n\n| **Caution:** Starting on June 24, 2025, Imagen versions 1 and 2 are deprecated. Imagen models `imagegeneration@002`, `imagegeneration@005`, and `imagegeneration@006` will be removed on September 24, 2025 . For more information about migrating to Imagen 3, see [Migrate to\n| Imagen 3](/vertex-ai/generative-ai/docs/image/migrate-to-imagen-3).\n\n\u003cbr /\u003e\n\n`imagetext` is the name of the model that supports image captioning. `imagetext`\ngenerates a caption from an image you provide based on the language that you\nspecify. The model supports the following languages: English (`en`), German\n(`de`), French (`fr`), Spanish (`es`) and Italian (`it`).\n\nTo explore this model in the console, see the `Image Captioning` model card in\nthe Model Garden.\n\n\n[View Imagen for Captioning \\& VQA model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/imagetext)\n\nUse cases\n---------\n\nSome common use cases for image captioning include:\n\n- Creators can generate captions for uploaded images and videos (for example, a short description of a video sequence)\n- Generate captions to describe products\n- Integrate captioning with an app using the API to create new experiences\n\nHTTP request\n------------\n\n POST https://us-central1-aiplatform.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/us-central1/publishers/google/models/imagetext:predict\n\nRequest body\n------------\n\n {\n \"instances\": [\n {\n \"image\": {\n // Union field can be only one of the following:\n \"bytesBase64Encoded\": string,\n \"gcsUri\": string,\n // End of list of possible types for union field.\n \"mimeType\": string\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": integer,\n \"storageUri\": string,\n \"language\": string,\n \"seed\": integer\n }\n }\n\nUse the following parameters for the Imagen model `imagetext`.\nFor more information, see\n[Get image descriptions using visual captioning](/vertex-ai/generative-ai/docs/image/image-captioning).\n\nSample request\n--------------\n\n### REST\n\nTo test a text prompt by using the Vertex AI API, send a POST request to the\npublisher model endpoint.\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: Your Google Cloud [project ID](/resource-manager/docs/creating-managing-projects#identifiers).\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: Your project's region. For example, `us-central1`, `europe-west2`, or `asia-northeast3`. For a list of available regions, see [Generative AI on Vertex AI locations](/vertex-ai/generative-ai/docs/learn/locations-genai).\n- \u003cvar translate=\"no\"\u003eB64_IMAGE\u003c/var\u003e: The image to get captions for. The image must be specified as a [base64-encoded](/vertex-ai/generative-ai/docs/image/base64-encode) byte string. Size limit: 10 MB.\n- \u003cvar translate=\"no\"\u003eRESPONSE_COUNT\u003c/var\u003e: The number of image captions you want to generate. Accepted integer values: 1-3.\n- \u003cvar translate=\"no\"\u003eLANGUAGE_CODE\u003c/var\u003e: One of the supported language codes. Languages supported:\n - English (`en`)\n - French (`fr`)\n - German (`de`)\n - Italian (`it`)\n - Spanish (`es`)\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\n```\n\n\nRequest JSON body:\n\n```\n{\n \"instances\": [\n {\n \"image\": {\n \"bytesBase64Encoded\": \"B64_IMAGE\"\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": RESPONSE_COUNT,\n \"language\": \"LANGUAGE_CODE\"\n }\n}\n```\n\nTo send your request, choose one of these options: \n\n#### curl\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\"\n```\n\n#### PowerShell\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\" | Select-Object -Expand Content\n```\nThe following sample responses are for a request with `\"sampleCount\": 2`. The response returns two prediction strings.\n\n**English (`en`):** \n\n```\n{\n \"predictions\": [\n \"a yellow mug with a sheep on it sits next to a slice of cake\",\n \"a cup of coffee with a heart shaped latte art next to a slice of cake\"\n ],\n \"deployedModelId\": \"DEPLOYED_MODEL_ID\",\n \"model\": \"projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID\",\n \"modelDisplayName\": \"MODEL_DISPLAYNAME\",\n \"modelVersionId\": \"1\"\n}\n```\n\n**Spanish (`es`):**\n\n```\n{\n \"predictions\": [\n \"una taza de café junto a un plato de pastel de chocolate\",\n \"una taza de café con una forma de corazón en la espuma\"\n ]\n}\n```\n\n\u003cbr /\u003e\n\nResponse body\n-------------\n\n {\n \"predictions\": [ string ]\n }\n\nSample response\n---------------\n\n {\n \"predictions\": [\n \"text1\",\n \"text2\"\n ]\n }"]]