Vision reasoning input format for large vision model. Model only supports one instance at a time.
promptstring
The text prompt for guiding the response in QA.
Text responses will be generated from the masked area if mask is provided.
Image
mimeTypestring
Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
dataUnion type
The image bytes or Cloud Storage URI to make the prediction on.
data can be only one of the following:bytesBase64Encodedstring
Base64 encoded bytes string representing the image.
gcsUristring
Cloud Storage URI representing the image in user project.
| JSON representation |
|---|
{ "mimeType": string, // data "bytesBase64Encoded": string, "gcsUri": string // Union type } |
Video
dataUnion type
The video string bytes or Cloud Storage URI to make the prediction on.
data can be only one of the following:bytesBase64Encodedstring
Base64 encoded bytes string representing the video.
gcsUristring
| JSON representation |
|---|
{ // data "bytesBase64Encoded": string, "gcsUri": string // Union type } |