使用视觉标注获取图片说明

注意：自 2025 年 6 月 24 日起，Imagen 版本 1 和 2 已弃用。Imagen 模型 imagegeneration@002、imagegeneration@005 和 imagegeneration@006 将于 2025 年 9 月 24 日移除。如需详细了解如何迁移到 Imagen 3，请参阅迁移到 Imagen 3。

通过可视化图片说明，您可以为图片生成相关说明。您可以将此信息用于多种用途：

获取有关图片的详细元数据，用于存储和搜索。
生成自动图片说明以支持无障碍功能应用场景。
快速获得产品和视觉资产的说明。

图片来源：Santhosh Kumar，Unsplash（经过裁剪）

标注（短）：带有白色波点的蓝色衬衫挂在衣架上

支持的语言

可视化图片说明支持以下语言：

英语 (en)
法语 (fr)
德语 (de)
意大利语 (it)
西班牙语 (es)

性能和限制

使用此模型时，存在以下限制：

限制	值
每项目每分钟的 API 请求（短）数上限	500
响应（短）中返回的词元数上限	64 个词元
请求（仅限短 VQA）中接受的词元数上限	80 个词元

使用此模型时，预计会有以下服务延迟时间。这些值仅作说明之用，并非服务承诺：

延迟时间	值
API 请求（短）	1.5 秒

位置

位置是您可以在请求中指定的区域，用于控制静态数据的存储位置。如需查看可用区域的列表，请参阅 Vertex AI 上的生成式 AI 位置。

Responsible AI 安全过滤

图片标注和 Visual Question Answering (VQA) 特征模型不支持用户可配置的安全过滤器。但是，整体 Imagen 安全过滤会对以下数据进行：

用户输入
模型输出

因此，如果 Imagen 应用这些安全过滤器，您的输出可能会与示例输出有所不同。请参考以下示例。

过滤后的输入

如果输入已过滤，则回答类似于以下内容：

{
  "error": {
    "code": 400,
    "message": "Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.DebugInfo",
        "detail": "[ORIGINAL ERROR] generic::invalid_argument: Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394 [google.rpc.error_details_ext] { message: \"Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394\" }"
      }
    ]
  }
}

过滤后的输出

如果返回的回答数量小于您指定的样本数量，这意味着缺失的回答已被 Responsible AI 过滤。例如，以下是对具有 "sampleCount": 2 的请求的回答，但其中一个回答已被过滤掉：

{
  "predictions": [
    "cappuccino"
  ]
}

如果所有输出都经过过滤，则回答是一个类似于以下内容的空对象：

{}

获取短图片标注

使用以下示例为图片生成短标注。

REST

如需详细了解 imagetext 模型请求，请参阅 imagetext 模型 API 参考文档。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：您的 Google Cloud 项目 ID。
LOCATION：您的项目的区域。例如 us-central1、europe-west2 或 asia-northeast3。如需查看可用区域的列表，请参阅 Vertex AI 上的生成式 AI 位置。
B64_IMAGE：要获取其说明的图片。图片必须指定为 base64 编码的字节字符串。大小上限：10 MB。
RESPONSE_COUNT：您要生成的图片说明数量。接受的整数值：1-3。
LANGUAGE_CODE：支持的语言代码之一。支持的语言：
- 英语 (en)
- 法语 (fr)
- 德语 (de)
- 意大利语 (it)
- 西班牙语 (es)

HTTP 方法和网址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict

请求 JSON 正文：

{
  "instances": [
    {
      "image": {
          "bytesBase64Encoded": "B64_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": RESPONSE_COUNT,
    "language": "LANGUAGE_CODE"
  }
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict" | Select-Object -Expand Content

以下示例响应适用于包含 "sampleCount": 2 的请求。该响应会返回两个预测字符串。

英语 (en)：

{
  "predictions": [
    "a yellow mug with a sheep on it sits next to a slice of cake",
    "a cup of coffee with a heart shaped latte art next to a slice of cake"
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID",
  "model": "projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID",
  "modelDisplayName": "MODEL_DISPLAYNAME",
  "modelVersionId": "1"
}

西班牙语 (es)：

{
  "predictions": [
    "una taza de café junto a un plato de pastel de chocolate",
    "una taza de café con una forma de corazón en la espuma"
  ]
}

Python

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Python 设置说明执行操作。如需了解详情，请参阅 Vertex AI Python API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。

在此示例中，您将使用 load_from_file 方法引用本地文件作为基础 Image，以获取相关图片说明。指定基础图片后，您可以对 ImageTextModel 使用 get_captions 方法并显示输出。


import vertexai
from vertexai.preview.vision_models import Image, ImageTextModel

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# input_file = "input-image.png"

vertexai.init(project=PROJECT_ID, location="us-central1")

model = ImageTextModel.from_pretrained("imagetext@001")
source_img = Image.load_from_file(location=input_file)

captions = model.get_captions(
    image=source_img,
    # Optional parameters
    language="en",
    number_of_results=2,
)

print(captions)
# Example response:
# ['a cat with green eyes looks up at the sky']

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。

在此示例中，您将对 PredictionServiceClient 调用 predict 方法。该服务会返回所提供图片的字幕。

/**
 * TODO(developer): Update these variables before running the sample.
 */
const projectId = process.env.CAIP_PROJECT_ID;
const location = 'us-central1';
const inputFile = 'resources/cat.png';

const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction Service Client library
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: `${location}-aiplatform.googleapis.com`,
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function getShortFormImageCaptions() {
  const fs = require('fs');
  // Configure the parent resource
  const endpoint = `projects/${projectId}/locations/${location}/publishers/google/models/imagetext@001`;

  const imageFile = fs.readFileSync(inputFile);
  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const instance = {
    image: {
      bytesBase64Encoded: encodedImage,
    },
  };
  const instanceValue = helpers.toValue(instance);
  const instances = [instanceValue];

  const parameter = {
    // Optional parameters
    language: 'en',
    sampleCount: 2,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  const predictions = response.predictions;
  if (predictions.length === 0) {
    console.log(
      'No captions were generated. Check the request parameters and image.'
    );
  } else {
    predictions.forEach(prediction => {
      console.log(prediction.stringValue);
    });
  }
}
await getShortFormImageCaptions();

使用图片标注参数

当您获得图片说明时，您可以根据自己的用例设置多个参数。

结果数量

使用结果数参数来限制为您发送的每个请求返回的说明数量。如需了解详情，请参阅 imagetext（图片标注）模型 API 参考文档。

种子编号

您为请求添加的数字，以使生成的说明具有确定性。通过在请求中添加种子编号，可确保您每次都获得相同的预测结果（说明）。但是，图片说明不一定以相同顺序返回。如需了解详情，请参阅 imagetext（图片标注）模型 API 参考文档。

后续步骤

阅读有关 Imagen 和其他 Vertex AI 上的生成式 AI 产品的文章：

使用视觉标注获取图片说明 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

支持的语言

性能和限制

位置

Responsible AI 安全过滤

过滤后的输入

过滤后的输出

获取短图片标注

REST

curl

PowerShell

Python

Node.js

使用图片标注参数

结果数量

种子编号

后续步骤

使用视觉标注获取图片说明