Index
AlternateInitConfig
(message)BackgroundSwapProcessingConfig
(message)ControlNetConfig
(message)ControlNetConfig.ControlNetConditionConfig
(message)EditConfig
(message)EditConfig.BufferZone
(message)EditConfigV6
(message)EditConfigV6.BufferZone
(message)EditMode
(enum)ExpansionConfig
(message)GenSelfieConfig
(message)ImageOutputOptions
(message)MaskMode
(message)OutpaintingProcessingConfig
(message)OutputOptions
(message)SemanticFilterConfig
(message)TextEmbeddingPredictionParams
(message)UpscaleConfig
(message)VideoGenerationModelParams
(message)VirtualTryOnModelParams
(message)VisionEmbeddingModelParams
(message)VisionGenerativeModelParams
(message)VisionReasoningModelParams
(message)
AlternateInitConfig
Fields | |
---|---|
enabled |
Whether to use AlternateInitConfig |
max_inpainting_mask_area |
Maximum inpainting area below which to consider using AlternateInitConfig |
BackgroundSwapProcessingConfig
BackgroundSwapConfig for imagen-3.0-capability-001
Fields | |
---|---|
blending_mode |
The blending mode for background swap. The values can be one of: * alpha-blending |
blending_factor |
The blending factor for background swap blending. Valid range: [0, 1]. Default value: 0 |
ControlNetConfig
Fields | |
---|---|
enable_control_net |
|
conditions[] |
Configurations for each condition. |
original_image_weight |
The weight for the original image. Valid range: [0, 1]. When set to 1.0, the output basically copies the input image. When set to 0.0, the output not respect the input image at all. |
ControlNetConditionConfig
Fields | |
---|---|
condition_name |
Currently supported conditions: * cannyEdges * depth |
condition_map_bytes_base64_encoded |
When the condition map is provided by the user, we will not compute the condition map on our side. |
condition_weight |
The guidance weight for the condition signal. Valid range: [0, 1]. The higher the weight, the model respects the ControlNet condition more. The default value is 1.0 if unspecified. |
condition_max_t |
The strength of the ControlNet's effect on each diffusion step. Valid range: [0, 1]. |
EditConfig
Fields | |
---|---|
buffer_zones[] |
Buffer zone, if provided, must be length 2. |
base_guidance_scale[] |
Guidance scale: this controls strength of text guidance. If provided, must be a list of 4 integers representing values during 4 stages of diffusion [fine-grained,...,...,coarse]. |
enable_clamping |
Whether to enable clamping mode, which: * Enables the rest of the configurations in EditConfig. * Better preserves unmasked area * Skips model internal dilation so client can fully control this |
base_steps |
Number of sampling steps. |
base_gamma |
Gamma: influences how much noise is added during sampling. |
sr1_steps |
Number of sampling steps for sr1 stage. |
sr2_steps |
Number of sampling steps for sr2 stage. |
semantic_filter_config |
NOTE: for experiment use, not production ready. Semantic Filter Config. This config reduces object hallucination on inpainted images. Users can set filter classes and filter entities to filter out generated images that hallucinate undesired objects in the inpainted area. This config is only enabled in Editing config. |
experiment_use_servo_backend |
Experiment flag to use servo backend. |
edit_mode |
The editing mode that describes the use case for editing. The values can be one of: * inpainting-remove * inpainting-insert * outpainting |
alternate_init_config |
Parameters for AlternateInitConfig |
experimental_sr_version |
Experimental flag for sr version. |
experimental_base_version |
Experimental flag for base version. |
embedding_scale |
Parameter to control embedding scale, range: [0, 1], default: 0.6. |
enable_border_replicate_padding |
Parameter to enable recompute with BORDER_REPLICATE mode for outpainting image padding. |
enable_post_processing_blend |
Parameter to enable post-processing blending for masked editing. |
outpainting_config |
Outpainting processing config. |
bgswap_config |
Background swap processing config. |
BufferZone
Fields | |
---|---|
pixels |
The number of pixels for the mask to dilate. |
diffusion_t |
When during diffusion this pixel dilation takes effect, 1=start, 0=end. |
EditConfigV6
EditConfig for imagegeneration@006
Fields | |
---|---|
buffer_zones[] |
Buffer zone, if provided, must be length 2. |
edit_mode |
The editing mode that describes the use case for editing. The values can be one of: * inpainting-remove * inpainting-insert * outpainting * product-image |
mask_dilation |
Parameter to control mask dilation, range: [0, 1], default: 0.03. |
guidance_scale |
Guidance scale: this controls strength of text guidance. |
product_position |
Product position: this controls the product position in the returned product editing image. The values can be one of: * reposition - the default behavior in the GPS pipeline * fixed - keeps the product in the same position as in the position as in the input image. This assume input image is square. |
mask_mode |
Automatic mask generation configuration. |
base_steps |
Number of sampling steps for base model. |
backend |
The backend to use for the model. The values can be one of: * experimental * prod |
semantic_filter_config |
Semantic Filter Config. This config reduces object hallucination on inpainted images. Users can set filter classes and filter entities to filter out generated images that hallucinate undesired objects in the inpainted area. This config is only enabled in Editing config. |
alternate_init_config |
Parameters for AlternateInitConfig |
outpainting_config |
Outpainting config. |
BufferZone
BufferZone config.
Fields | |
---|---|
pixels |
The number of pixels for the mask to dilate. |
diffusion_t |
When during diffusion this pixel dilation takes effect, 1=start, 0=end. |
EditMode
EditMode for imagen3capability.
Enums | |
---|---|
EDIT_MODE_DEFAULT |
Default editing mode. |
EDIT_MODE_INPAINT_REMOVAL |
Inpainting removal mode. Remove objects based on the mask given |
EDIT_MODE_INPAINT_INSERTION |
Inpainting insertion mode. Insert objects based on the mask given |
EDIT_MODE_OUTPAINT |
Outpainting mode. Expand the image based on the mask given |
EDIT_MODE_CONTROLLED_EDITING |
Controlled editing mode. Pass a sketch or face mesh image to control the editing. |
EDIT_MODE_STYLE |
Style editing mode. Pass a style image to define a generation style for the prompt |
EDIT_MODE_BGSWAP |
Background swap mode. Pass a background image to swap the background of the image. |
EDIT_MODE_PRODUCT_IMAGE |
Product image mode. |
ExpansionConfig
ExpansionConfig to fix one-side expansion issue by adding padding to the image and mask in the backend server and cropped them out in the post-processing.
Fields | |
---|---|
top |
Number of pixels to expand the image and mask from the top Value is an integer that has a minimum of 0 and a maximum of 500. |
bottom |
Number of pixels to expand the image and mask from the bottom Value is an integer that has a minimum of 0 and a maximum of 500. |
left |
Number of pixels to expand the image and mask from the left Value is an integer that has a minimum of 0 and a maximum of 500. |
right |
Number of pixels to expand the image and mask from the right Value is an integer that has a minimum of 0 and a maximum of 500. |
GenSelfieConfig
Fields | |
---|---|
per_example_seeds[] |
Initialization seed per generation sample. |
identity_control |
Parameter for identity control. Valid range: [0, 1.0] Default value: 0.9 |
structure_control |
Parameter for structure control. Valid range: [0, 1.0] Default value: 1.0 |
experimental_base_version |
The version for the base model. |
skip_face_cropping |
Whether to skip detecting and cropping the face in the input image. Default value: false. |
sampling_steps |
Number of sampling steps. |
enable_sharpening |
Whether to enable image sharpening post-processing. |
detection_score_threshold |
The threshold for the face detection model. Images with a face detection score below this threshold will be rejected. |
face_selection_criteria |
The criteria to select the face for Gen Selfie. Accepted values: * LARGEST * MOST_CONFIDENT |
style |
The style for the generated image. Accepted values: * watercolor * hand-drawing * illustration * 3d-character |
ImageOutputOptions
Fields | |
---|---|
mime_type |
Currently supported: -- image/jpeg -- image/png. Defaults to image/png. |
compression_quality |
Optional compression quality if encoding in image/jpeg. Valid range is any integer [0, 100]. Defaults to 75. |
MaskMode
Fields | |
---|---|
mask_type |
The type of mask to generate from the provided input image. The values can be one of: * background * foreground * semantic |
classes[] |
The class IDs to generate masks of using the Semantic Segmenter model. Only numeric class IDs are supported. Not used if the mask_type value is not |
OutpaintingProcessingConfig
OutpaintingProcessingConfig for imagen-3.0-capability-001
Fields | |
---|---|
blending_mode |
The blending mode for outpainting. The values can be one of: * alpha-blending * pyramid-blending |
blending_factor |
The blending factor for outpainting blending. Valid range: [0, 1]. Default value: 0 |
enable_border_replicate_padding |
Parameter to enable recompute with BORDER_REPLICATE mode for outpainting image padding. |
expansion_config |
Fix to one-side expansion issue by adding padding to the image and mask in the backend server and cropped them out in the post-processing. |
OutputOptions
Configuration options for the output image.
Fields | |
---|---|
mime_type |
Currently supported: -- image/jpeg -- image/png. Defaults to image/png. |
compression_quality |
Optional compression quality if encoding in image/jpeg. Valid range is any integer [0, 100]. Defaults to 75. |
SemanticFilterConfig
Fields | |
---|---|
filter_classes[] |
Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked. |
filter_entities[] |
Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities. |
filter_classes_outpainting[] |
For outpainting case. Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked. |
filter_entities_outpainting[] |
For outpainting case. Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities. |
filter_classes_special_init[] |
For special_init case. Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked. |
filter_entities_special_init[] |
For special_init case. Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities. |
enable_semantic_filter |
Whether to enable semantic filtering mode, which enables the following parameters to apply semantic filter on image editing results. |
intersect_ratio_threshold |
A threshold value to decide what detected boxes should be included in semantic filter checking. |
additional_sample_count |
Additional count of samples, expect a value between 0 and 4. |
semantic_filter_mode |
A string to specify semantic filter experimental mode. This allows semantic filter to change the default behavior to filter generated images. |
detection_score_threshold |
A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking. |
intersect_ratio_threshold_outpainting |
For outpainting case. A threshold value to decide what detected boxes should be included in semantic filter checking. |
detection_score_threshold_outpainting |
For outpainting case. A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking. |
intersect_ratio_threshold_special_init |
For special_init case. A threshold value to decide what detected boxes should be included in semantic filter checking. |
detection_score_threshold_special_init |
For special_init case. A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking. |
TextEmbeddingPredictionParams
Prediction model parameters for Text Embedding.
Fields | |
---|---|
auto_truncate |
Whether to silently truncate inputs longer than the max sequence length. This behavior is enabled by default. If this option is set to false, oversized inputs will lead to an INVALID_ARGUMENT error, similar to other text APIs. |
output_dimensionality |
An optional argument for the output embedding's dimensionality. This parameter is only supported by some models, and the supported value range is specific to the requested model. If this parameter is specified for a model that does not support it, or if the specified value is not supported by the model, the request will fail with an INVALID_ARGUMENT error. |
UpscaleConfig
Fields | |
---|---|
enhance_input_image |
Whether to add an image enhancing step before upscaling. It is expected to suppress the noise and JPEG compression artifact from the input image. Default value: false. |
enable_faster_upscaling |
NOTE: For experimental use, not production-ready. Whether to speed up upscaling. This option can't be used with high QPS since it lowers the availability of the upscaling API. |
upscale_factor |
The factor to which the image will be upscaled. If not specified, the upscale factor will be determined from the longer side of the input image and |
image_preservation_factor |
With a higher image preservation factor, the original image pixels are more respected. The output image is more similar to input image. With a lower image preservation factor, the output image will have be more different from the input image, but maybe with finer details and fewer noises. Only works with: * imagegeneration@003 Valid range: [0, 1.0] Default value: 0.5 |
VideoGenerationModelParams
NextID: 15
Fields | |
---|---|
sample_count |
Number of output videos. |
storage_uri |
The gcs bucket where to save the generated videos. |
fps |
Frames per second for video generation. |
duration_seconds |
Duration of the clip for video generation in seconds. |
seed |
The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result. If the sample count is greater than 1, random seeds will be used for each sample. |
aspect_ratio |
The aspect ratio for the generated video. 16:9 (landscape) and 9:16 (portrait) are supported. |
resolution |
The resolution for the generated video. Supported values are: 720p 1080p |
person_generation |
Whether allow to generate person videos, and restrict to specific ages. Supported values are: dont_allow allow_adult allow_all |
pubsub_topic |
The pubsub topic where to publish the video generation progress. |
negative_prompt |
Optional field in addition to the text content. Negative prompts can be explicitly stated here to help generate the video. |
enable_prompt_rewriting |
Whether to enable prompt rewriting. |
enhance_prompt |
If true, the prompt will be improved before it is used to generate videos. The RNG seed, if provided, will not result in consistent results if prompts are enhanced. |
generate_audio |
If true, audio will be generated along with the video. |
compression_quality |
Compression quality of the generated videos. Supported values are: optimized lossless If not specified, the default value is optimized. |
VirtualTryOnModelParams
Parameter format for the Virtual Try On model.
Fields | |
---|---|
output_options |
Configuration options for the output image. |
sample_count |
The number of images to generate. |
storage_uri |
The Cloud Storage location where generated images will be saved. |
seed |
The RNG seed. If set, requests with equal inputs will produce deterministic results. The addWatermark parameter must be set to false if the seed is set. |
base_steps |
Number of sampling steps for the base model. |
safety_setting |
Safety settings applying various restrictions in generating images. Case insensitive. Levels are: block_low_and_above block_medium_and_above block_only_high block_none |
person_generation |
Whether to restrict the generation of images with persons. Case insensitive. Supported values are: dont_allow, allow_adult, allow_all |
add_watermark |
Whether to add a watermark to the generated images. Defaults to true. |
enhance_prompt |
Whether to enhance the user-provided prompt internally for models that support it. |
VisionEmbeddingModelParams
This type has no fields.
Parameter format for large vision model embedding api.
VisionGenerativeModelParams
Next ID: 34
Fields | |
---|---|
sample_count |
Number of output images. |
sample_image_size |
The size of output images. If empty, will use default size 1024 for imagen 2 and 3 models, 1K for Imagen 4 models. Supported size: 64, 256, 512, 1024, 2048, and 4096 for imagen 2 and 3 models. 1K, 2K (case-insensitive) for Imagen 4 models. |
storage_uri |
The gcs bucket where to save the generated images. |
negative_prompt |
Optional field in addition to the text content. Negative prompts can be explicitly stated here to help generate the images. |
seed |
The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result. |
mode |
The parameter to specify editing mode. Currently support: -- interactive -- upscale |
model |
Select underlying model to do the generation. Only listed models are supported: -- muse -- imagen |
aspect_ratio |
Optional generation mode parameter that controls aspect ratio. Supported ratios include: -- 1:1 (default, square) -- 5:4 (frame and print) -- 3:2 (print photography) -- 7:4 (TV screens and smartphone screens) -- 4:3 (TV) -- 16:9 (landscape) -- 9:16 (portrait) |
guidance_scale |
Optional editing mode parameter that controls the strength of the prompt. Suggested values are: -- 0-9 (low strength) -- 10-20 (medium strength) -- 21+ (high strength) |
enable_person_face_filter |
Whether to enable person/face rai filtering. Default to be false. |
disable_person_face |
|
safety_setting |
Different safety setting applying various restricness in generating images. Case insensitive. Levels are: block_low_and_above block_medium_and_above block_only_high block_none Deprecated values respectively are: block_most block_some block_few block_fewest |
rai_level |
|
enable_child_filter |
Whether to enable child rai filtering. Default to be true. This requires users are allowlisted. Otherwise, this value will be ignored. |
disable_child |
|
person_generation |
Whether allow to generate person images, and restrict to specific ages. Supported values are: dont_allow allow_adult allow_all |
sample_image_style |
Optional. The pre-defined style for generated images. No styles will be applied if this field is empty of unspecified. Possible values could be: - photograph - digital_art - landscape - sketch - watercolor - cyberpunk - pop_art |
include_rai_reason |
Whether to include the reason why generated images are filtered |
is_product_image |
Whether use self background editing for product images. |
control_net_config |
Configurations for ControlNet conditions. |
image_output_options |
Output configuration. |
output_options |
|
upscale_config |
Configurations for upscaling API. |
edit_config |
Configurations for editing API (imagegeneration@{003, 004}) |
edit_config_v6 |
Configurations for editing API for imagegeneration@006 |
edit_mode |
Configurations for edit mode in imagen 3 capability. |
language |
Language which the prompt language is in The supported values are: - auto (Autodetect language) - en (English) - ko (Korean) - ja (Japanese) - hi (Hindi) |
include_safety_attributes |
Whether to include the safety attributes scores for both input and output. |
model_variant |
The size variant of the model. Only supported in imagegeneration@004 for now. enum: - large - medium - v1_large - v1_1 - v1_1_turbo |
add_watermark |
Whether to add SynthID watermark to generated images. Default value: false. |
gen_selfie_config |
Configurations for GenSelfie API. |
show_rai_error_codes |
Show rai error codes instead of messgaes |
enhance_prompt |
Whether to use the new prompt rewriting logic. |
VisionReasoningModelParams
Parameter format for large vision model.
Fields | |
---|---|
sample_count |
Number of output text responses. |
storage_uri |
The gcs bucket where to save the generated text responses. |
seed |
The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result. |
language |
Specific output text language. Support lanagues are: - en (default) - de - fr - it - es |