VisionReasoningModelInstance

Vision reasoning input format for large vision model. Model only supports one instance at a time.

Fields
prompt string

The text prompt for guiding the response in QA.

mask object (Image)

Text responses will be generated from the masked area if mask is provided.

content Union type
content can be only one of the following:
image object (Image)

The image bytes or Cloud Storage URI to make the prediction on.

video object (Video)

The video bytes or Cloud storage URI to make the prediction on.

JSON representation
{
  "prompt": string,
  "mask": {
    object (Image)
  },

  // content
  "image": {
    object (Image)
  },
  "video": {
    object (Video)
  }
  // Union type
}

Image

Fields
mimeType string

Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

data Union type
The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytesBase64Encoded string

Base64 encoded bytes string representing the image.

gcsUri string

Cloud Storage URI representing the image in user project.

JSON representation
{
  "mimeType": string,

  // data
  "bytesBase64Encoded": string,
  "gcsUri": string
  // Union type
}

Video

Fields
data Union type
The video string bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytesBase64Encoded string

Base64 encoded bytes string representing the video.

gcsUri string
JSON representation
{

  // data
  "bytesBase64Encoded": string,
  "gcsUri": string
  // Union type
}