Package google.cloud.aiplatform.v1beta1.schema.predict.params

Index

AlternateInitConfig

Fields
enabled

bool

Whether to use AlternateInitConfig

max_inpainting_mask_area

float

Maximum inpainting area below which to consider using AlternateInitConfig

BackgroundSwapProcessingConfig

BackgroundSwapConfig for imagen-3.0-capability-001

Fields
blending_mode

string

The blending mode for background swap. The values can be one of: * alpha-blending

blending_factor

float

The blending factor for background swap blending. Valid range: [0, 1]. Default value: 0

ControlNetConfig

Fields
enable_control_net

bool

true if ControlNet is enabled.

conditions[]

ControlNetConditionConfig

Configurations for each condition.

original_image_weight

float

The weight for the original image. Valid range: [0, 1]. When set to 1.0, the output basically copies the input image. When set to 0.0, the output not respect the input image at all.

ControlNetConditionConfig

Fields
condition_name

string

Currently supported conditions: * cannyEdges * depth

condition_map_bytes_base64_encoded

bytes

When the condition map is provided by the user, we will not compute the condition map on our side.

condition_weight

float

The guidance weight for the condition signal. Valid range: [0, 1]. The higher the weight, the model respects the ControlNet condition more. The default value is 1.0 if unspecified.

condition_max_t

float

The strength of the ControlNet's effect on each diffusion step. Valid range: [0, 1].

EditConfig

Fields
buffer_zones[]

BufferZone

Buffer zone, if provided, must be length 2.

base_guidance_scale[]

int32

Guidance scale: this controls strength of text guidance. If provided, must be a list of 4 integers representing values during 4 stages of diffusion [fine-grained,...,...,coarse].

enable_clamping

bool

Whether to enable clamping mode, which: * Enables the rest of the configurations in EditConfig. * Better preserves unmasked area * Skips model internal dilation so client can fully control this

base_steps

int32

Number of sampling steps.

base_gamma

float

Gamma: influences how much noise is added during sampling.

sr1_steps

int32

Number of sampling steps for sr1 stage.

sr2_steps

int32

Number of sampling steps for sr2 stage.

semantic_filter_config

SemanticFilterConfig

NOTE: for experiment use, not production ready. Semantic Filter Config. This config reduces object hallucination on inpainted images. Users can set filter classes and filter entities to filter out generated images that hallucinate undesired objects in the inpainted area. This config is only enabled in Editing config.

experiment_use_servo_backend

bool

Experiment flag to use servo backend.

edit_mode

string

The editing mode that describes the use case for editing. The values can be one of: * inpainting-remove * inpainting-insert * outpainting

alternate_init_config

AlternateInitConfig

Parameters for AlternateInitConfig

experimental_sr_version

string

Experimental flag for sr version.

experimental_base_version

string

Experimental flag for base version.

embedding_scale

float

Parameter to control embedding scale, range: [0, 1], default: 0.6.

enable_border_replicate_padding
(deprecated)

bool

Parameter to enable recompute with BORDER_REPLICATE mode for outpainting image padding.

enable_post_processing_blend
(deprecated)

bool

Parameter to enable post-processing blending for masked editing.

outpainting_config

OutpaintingProcessingConfig

Outpainting processing config.

bgswap_config

BackgroundSwapProcessingConfig

Background swap processing config.

BufferZone

Fields
pixels

int32

The number of pixels for the mask to dilate.

diffusion_t

float

When during diffusion this pixel dilation takes effect, 1=start, 0=end.

EditConfigV6

EditConfig for imagegeneration@006

Fields
buffer_zones[]

BufferZone

Buffer zone, if provided, must be length 2.

edit_mode

string

The editing mode that describes the use case for editing. The values can be one of: * inpainting-remove * inpainting-insert * outpainting * product-image

mask_dilation

float

Parameter to control mask dilation, range: [0, 1], default: 0.03.

guidance_scale

int32

Guidance scale: this controls strength of text guidance.

product_position

string

Product position: this controls the product position in the returned product editing image. The values can be one of: * reposition - the default behavior in the GPS pipeline * fixed - keeps the product in the same position as in the position as in the input image. This assume input image is square.

mask_mode

MaskMode

Automatic mask generation configuration.

base_steps

int32

Number of sampling steps for base model.

backend

string

The backend to use for the model. The values can be one of: * experimental * prod

semantic_filter_config

SemanticFilterConfig

Semantic Filter Config. This config reduces object hallucination on inpainted images. Users can set filter classes and filter entities to filter out generated images that hallucinate undesired objects in the inpainted area. This config is only enabled in Editing config.

alternate_init_config

AlternateInitConfig

Parameters for AlternateInitConfig

outpainting_config

OutpaintingProcessingConfig

Outpainting config.

BufferZone

BufferZone config.

Fields
pixels

int32

The number of pixels for the mask to dilate.

diffusion_t

float

When during diffusion this pixel dilation takes effect, 1=start, 0=end.

EditMode

EditMode for imagen3capability.

Enums
EDIT_MODE_DEFAULT Default editing mode.
EDIT_MODE_INPAINT_REMOVAL Inpainting removal mode. Remove objects based on the mask given
EDIT_MODE_INPAINT_INSERTION Inpainting insertion mode. Insert objects based on the mask given
EDIT_MODE_OUTPAINT Outpainting mode. Expand the image based on the mask given
EDIT_MODE_CONTROLLED_EDITING Controlled editing mode. Pass a sketch or face mesh image to control the editing.
EDIT_MODE_STYLE Style editing mode. Pass a style image to define a generation style for the prompt
EDIT_MODE_BGSWAP Background swap mode. Pass a background image to swap the background of the image.
EDIT_MODE_PRODUCT_IMAGE Product image mode.

ExpansionConfig

ExpansionConfig to fix one-side expansion issue by adding padding to the image and mask in the backend server and cropped them out in the post-processing.

Fields
top

int32

Number of pixels to expand the image and mask from the top Value is an integer that has a minimum of 0 and a maximum of 500.

bottom

int32

Number of pixels to expand the image and mask from the bottom Value is an integer that has a minimum of 0 and a maximum of 500.

left

int32

Number of pixels to expand the image and mask from the left Value is an integer that has a minimum of 0 and a maximum of 500.

right

int32

Number of pixels to expand the image and mask from the right Value is an integer that has a minimum of 0 and a maximum of 500.

GenSelfieConfig

Fields
per_example_seeds[]

int32

Initialization seed per generation sample. len(seeds) should be equal to sample_count.

identity_control

float

Parameter for identity control. Valid range: [0, 1.0] Default value: 0.9

structure_control

float

Parameter for structure control. Valid range: [0, 1.0] Default value: 1.0

experimental_base_version

string

The version for the base model.

skip_face_cropping

bool

Whether to skip detecting and cropping the face in the input image. Default value: false.

sampling_steps

int32

Number of sampling steps.

enable_sharpening

bool

Whether to enable image sharpening post-processing.

detection_score_threshold

float

The threshold for the face detection model. Images with a face detection score below this threshold will be rejected.

face_selection_criteria

string

The criteria to select the face for Gen Selfie. Accepted values: * LARGEST * MOST_CONFIDENT

style

string

The style for the generated image. Accepted values: * watercolor * hand-drawing * illustration * 3d-character

ImageOutputOptions

Fields
mime_type

string

Currently supported: -- image/jpeg -- image/png. Defaults to image/png.

compression_quality

int32

Optional compression quality if encoding in image/jpeg. Valid range is any integer [0, 100]. Defaults to 75.

MaskMode

Fields
mask_type

string

The type of mask to generate from the provided input image. The values can be one of: * background * foreground * semantic

classes[]

Value

The class IDs to generate masks of using the Semantic Segmenter model. Only numeric class IDs are supported.

Not used if the mask_type value is not semantic.

OutpaintingProcessingConfig

OutpaintingProcessingConfig for imagen-3.0-capability-001

Fields
blending_mode

string

The blending mode for outpainting. The values can be one of: * alpha-blending * pyramid-blending

blending_factor

float

The blending factor for outpainting blending. Valid range: [0, 1]. Default value: 0

enable_border_replicate_padding

bool

Parameter to enable recompute with BORDER_REPLICATE mode for outpainting image padding.

expansion_config

ExpansionConfig

Fix to one-side expansion issue by adding padding to the image and mask in the backend server and cropped them out in the post-processing.

OutputOptions

Configuration options for the output image.

Fields
mime_type

string

Currently supported: -- image/jpeg -- image/png. Defaults to image/png.

compression_quality

int32

Optional compression quality if encoding in image/jpeg. Valid range is any integer [0, 100]. Defaults to 75.

SemanticFilterConfig

Fields
filter_classes[]

string

Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked.

filter_entities[]

string

Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities.

filter_classes_outpainting[]

string

For outpainting case. Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked.

filter_entities_outpainting[]

string

For outpainting case. Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities.

filter_classes_special_init[]

string

For special_init case. Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked.

filter_entities_special_init[]

string

For special_init case. Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities.

enable_semantic_filter

bool

Whether to enable semantic filtering mode, which enables the following parameters to apply semantic filter on image editing results.

intersect_ratio_threshold

float

A threshold value to decide what detected boxes should be included in semantic filter checking.

additional_sample_count

int32

Additional count of samples, expect a value between 0 and 4.

semantic_filter_mode

string

A string to specify semantic filter experimental mode. This allows semantic filter to change the default behavior to filter generated images.

detection_score_threshold

float

A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking.

intersect_ratio_threshold_outpainting

float

For outpainting case. A threshold value to decide what detected boxes should be included in semantic filter checking.

detection_score_threshold_outpainting

float

For outpainting case. A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking.

intersect_ratio_threshold_special_init

float

For special_init case. A threshold value to decide what detected boxes should be included in semantic filter checking.

detection_score_threshold_special_init

float

For special_init case. A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking.

TextEmbeddingPredictionParams

Prediction model parameters for Text Embedding.

Fields
auto_truncate

bool

Whether to silently truncate inputs longer than the max sequence length. This behavior is enabled by default. If this option is set to false, oversized inputs will lead to an INVALID_ARGUMENT error, similar to other text APIs.

output_dimensionality

int32

An optional argument for the output embedding's dimensionality. This parameter is only supported by some models, and the supported value range is specific to the requested model. If this parameter is specified for a model that does not support it, or if the specified value is not supported by the model, the request will fail with an INVALID_ARGUMENT error.

UpscaleConfig

Fields
enhance_input_image

bool

Whether to add an image enhancing step before upscaling. It is expected to suppress the noise and JPEG compression artifact from the input image. Default value: false.

enable_faster_upscaling

bool

NOTE: For experimental use, not production-ready. Whether to speed up upscaling. This option can't be used with high QPS since it lowers the availability of the upscaling API.

upscale_factor

string

The factor to which the image will be upscaled. If not specified, the upscale factor will be determined from the longer side of the input image and sampleImageSize. enum: - x2 - x4

image_preservation_factor

float

With a higher image preservation factor, the original image pixels are more respected. The output image is more similar to input image. With a lower image preservation factor, the output image will have be more different from the input image, but maybe with finer details and fewer noises. Only works with: * imagegeneration@003 Valid range: [0, 1.0] Default value: 0.5

VideoGenerationModelParams

NextID: 15

Fields
sample_count

int32

Number of output videos.

storage_uri

string

The gcs bucket where to save the generated videos.

fps

int32

Frames per second for video generation.

duration_seconds

double

Duration of the clip for video generation in seconds.

seed

int32

The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result. If the sample count is greater than 1, random seeds will be used for each sample.

aspect_ratio

string

The aspect ratio for the generated video. 16:9 (landscape) and 9:16 (portrait) are supported.

resolution

string

The resolution for the generated video. Supported values are: 720p 1080p

person_generation

string

Whether allow to generate person videos, and restrict to specific ages. Supported values are: dont_allow allow_adult allow_all

pubsub_topic

string

The pubsub topic where to publish the video generation progress.

negative_prompt

string

Optional field in addition to the text content. Negative prompts can be explicitly stated here to help generate the video.

enable_prompt_rewriting
(deprecated)

bool

Whether to enable prompt rewriting.

enhance_prompt

bool

If true, the prompt will be improved before it is used to generate videos. The RNG seed, if provided, will not result in consistent results if prompts are enhanced.

generate_audio

bool

If true, audio will be generated along with the video.

compression_quality

string

Compression quality of the generated videos. Supported values are: optimized lossless If not specified, the default value is optimized.

VirtualTryOnModelParams

Parameter format for the Virtual Try On model.

Fields
output_options

OutputOptions

Configuration options for the output image.

sample_count

int32

The number of images to generate.

storage_uri

string

The Cloud Storage location where generated images will be saved.

seed

int32

The RNG seed. If set, requests with equal inputs will produce deterministic results. The addWatermark parameter must be set to false if the seed is set.

base_steps

int32

Number of sampling steps for the base model.

safety_setting

string

Safety settings applying various restrictions in generating images. Case insensitive. Levels are: block_low_and_above block_medium_and_above block_only_high block_none

person_generation

string

Whether to restrict the generation of images with persons. Case insensitive. Supported values are: dont_allow, allow_adult, allow_all

add_watermark

bool

Whether to add a watermark to the generated images. Defaults to true.

enhance_prompt

bool

Whether to enhance the user-provided prompt internally for models that support it.

VisionEmbeddingModelParams

This type has no fields.

Parameter format for large vision model embedding api.

VisionGenerativeModelParams

Next ID: 34

Fields
sample_count

int32

Number of output images.

sample_image_size

string

The size of output images. If empty, will use default size 1024 for imagen 2 and 3 models, 1K for Imagen 4 models. Supported size: 64, 256, 512, 1024, 2048, and 4096 for imagen 2 and 3 models. 1K, 2K (case-insensitive) for Imagen 4 models.

storage_uri

string

The gcs bucket where to save the generated images.

negative_prompt

string

Optional field in addition to the text content. Negative prompts can be explicitly stated here to help generate the images.

seed

int32

The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result.

mode

string

The parameter to specify editing mode. Currently support: -- interactive -- upscale

model
(deprecated)

string

Select underlying model to do the generation. Only listed models are supported: -- muse -- imagen

aspect_ratio

string

Optional generation mode parameter that controls aspect ratio. Supported ratios include: -- 1:1 (default, square) -- 5:4 (frame and print) -- 3:2 (print photography) -- 7:4 (TV screens and smartphone screens) -- 4:3 (TV) -- 16:9 (landscape) -- 9:16 (portrait)

guidance_scale

float

Optional editing mode parameter that controls the strength of the prompt. Suggested values are: -- 0-9 (low strength) -- 10-20 (medium strength) -- 21+ (high strength)

enable_person_face_filter
(deprecated)

bool

Whether to enable person/face rai filtering. Default to be false.

disable_person_face
(deprecated)

bool

safety_setting

string

Different safety setting applying various restricness in generating images. Case insensitive. Levels are: block_low_and_above block_medium_and_above block_only_high block_none

Deprecated values respectively are: block_most block_some block_few block_fewest

rai_level
(deprecated)

int32

enable_child_filter
(deprecated)

bool

Whether to enable child rai filtering. Default to be true. This requires users are allowlisted. Otherwise, this value will be ignored.

disable_child
(deprecated)

bool

person_generation

string

Whether allow to generate person images, and restrict to specific ages. Supported values are: dont_allow allow_adult allow_all

sample_image_style

string

Optional. The pre-defined style for generated images. No styles will be applied if this field is empty of unspecified. Possible values could be: - photograph - digital_art - landscape - sketch - watercolor - cyberpunk - pop_art

include_rai_reason

bool

Whether to include the reason why generated images are filtered

is_product_image

bool

Whether use self background editing for product images.

control_net_config

ControlNetConfig

Configurations for ControlNet conditions.

image_output_options
(deprecated)

ImageOutputOptions

Output configuration.

output_options

ImageOutputOptions

upscale_config

UpscaleConfig

Configurations for upscaling API.

edit_config

EditConfig

Configurations for editing API (imagegeneration@{003, 004})

edit_config_v6

EditConfigV6

Configurations for editing API for imagegeneration@006

edit_mode

EditMode

Configurations for edit mode in imagen 3 capability.

language

string

Language which the prompt language is in The supported values are: - auto (Autodetect language) - en (English) - ko (Korean) - ja (Japanese) - hi (Hindi)

include_safety_attributes

bool

Whether to include the safety attributes scores for both input and output.

model_variant

string

The size variant of the model. Only supported in imagegeneration@004 for now. enum: - large - medium - v1_large - v1_1 - v1_1_turbo

add_watermark

bool

Whether to add SynthID watermark to generated images. Default value: false.

gen_selfie_config

GenSelfieConfig

Configurations for GenSelfie API.

show_rai_error_codes

bool

Show rai error codes instead of messgaes

enhance_prompt

bool

Whether to use the new prompt rewriting logic.

VisionReasoningModelParams

Parameter format for large vision model.

Fields
sample_count

int32

Number of output text responses.

storage_uri

string

The gcs bucket where to save the generated text responses.

seed

int32

The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result.

language

string

Specific output text language. Support lanagues are: - en (default) - de - fr - it - es