VisionReasoningModelInstance

Vision reasoning input format for large vision model. Model only supports one instance at a time.

Fields

prompt string

The text prompt for guiding the response in QA.

mask object (Image)

Text responses will be generated from the masked area if mask is provided.

content Union type

content can be only one of the following:

image object (Image)

The image bytes or Cloud Storage URI to make the prediction on.

video object (Video)

The video bytes or Cloud storage URI to make the prediction on.

JSON representation
{ "prompt": string, "mask": { object (`Image`) }, // content "image": { object (`Image`) }, "video": { object (`Video`) } // Union type }

Image

Fields

mimeType string

Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

data Union type

The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:

bytesBase64Encoded string

Base64 encoded bytes string representing the image.

gcsUri string

Cloud Storage URI representing the image in user project.

JSON representation
{ "mimeType": string, // data "bytesBase64Encoded": string, "gcsUri": string // Union type }

Fields

data Union type

The video string bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:

bytesBase64Encoded string

Base64 encoded bytes string representing the video.

gcsUri string

JSON representation
{ // data "bytesBase64Encoded": string, "gcsUri": string // Union type }