VideoGenerationModelInstance

Video generation input format for video generation model.

Fields

prompt string

The text prompt for generating the videos.

image object (Image)

An image to use as the first frame of the generated video. If an input image is provided, an input video is not supported.

video object (Video)

An input video. If this field is provided, an input image is not supported. If a mask is provided along with the video, this video will be editing using the mask. Otherwise, this video will be extended by the given duration.

lastFrame object (Image)

Image to use as the last frame of generated videos. An input image must also be provided.

cameraControl string

Camera motion to use in generated videos. An input image must also be provided. Valid values are: - fixed - pan_left - pan_right - tilt_up - tilt_down - truck_left - truck_right - pedestal_up - pedestal_down - push_in - pull_out

mask object (Mask)

Mask to use in generated videos.

referenceImages[] object (ReferenceImage)

The images to use as the references to generate the videos. If this field is provided, the text prompt field must also be provided. The image, video, or lastFrame field are not supported. Each image must be associated with a type. Veo 2 supports up to 3 asset images or 1 style image.

JSON representation

JSON representation
{ "prompt": string, "image": { object (`Image`) }, "video": { object (`Video`) }, "lastFrame": { object (`Image`) }, "cameraControl": string, "mask": { object (`Mask`) }, "referenceImages": [ { object (`ReferenceImage`) } ] }

{
  "prompt": string,
  "image": {
    object (Image)
  },
  "video": {
    object (Video)
  },
  "lastFrame": {
    object (Image)
  },
  "cameraControl": string,
  "mask": {
    object (Mask)
  },
  "referenceImages": [
    {
      object (ReferenceImage)
    }
  ]
}

Image

Image input format for the prediction.

Fields

mimeType string

The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

data Union type

The image data. data can be only one of the following:

bytesBase64Encoded string

Base64 encoded bytes string representing the image.

gcsUri string

The Google Cloud Storage location of the image.

JSON representation
{ "mimeType": string, // data "bytesBase64Encoded": string, "gcsUri": string // Union type }

Video

Video input format for the prediction.

Fields

mimeType string

The MIME type of the content of the video. Only the videos in below listed MIME types are supported. - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv

data Union type

The video data. data can be only one of the following:

gcsUri string

The Google Cloud Storage location of the video on which to perform the prediction.

bytesBase64Encoded string

Base64 encoded bytes string representing the video.

JSON representation
{ "mimeType": string, // data "gcsUri": string, "bytesBase64Encoded": string // Union type }

Mask

Mask input format for the prediction.

Fields

mimeType string

Valid values: - image/png - image/jpeg - image/webp - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv

maskMode string

Describes how the mask will be used. Inpainting masks must match the aspect ration of the input video. Outpainting masks can be either 9:16 or 16:9. Available options are: - insert: The image mask contains a masked rectangular region which is applied on the first frame of the input video. The object described in the prompt is inserted into this region and will appear in subsequent frames. - remove: The image mask is used to determine an object in the first video frame to track. This object is removed from the video. - remove_static: The image mask is used to determine a region in the video. Objects in this region will be removed. - outpaint: The image mask contains a masked rectangular region where the input video will go. The remaining area will be generated. Video masks are not supported.

data Union type

The mask data. data can be only one of the following:

bytesBase64Encoded string

Base64 encoded bytes string representing the mask.

gcsUri string

The Google Cloud Storage location of the mask.

JSON representation
{ "mimeType": string, "maskMode": string, // data "bytesBase64Encoded": string, "gcsUri": string // Union type }

ReferenceImage

Reference image input format for the prediction. A ReferenceImage is an image that is used to provide additional context for the video generation.

Fields

image object (Image)

The image data to be used as the reference image.

referenceType string

The type of the reference image, which defines how the reference image will be used to generate the video. Supported types are: - asset: The reference image provides assets to the generated video, such as the scene, an object, a character, etc. - style: The aesthetics of the reference image, including colors, lighting, texture, etc., are used as the style of the generated video, such as 'anime', 'photography', 'origami', etc.

JSON representation
{ "image": { object (`Image`) }, "referenceType": string }