Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
To customize an image, you provide one or more reference images. Each reference image must have a referenceType that specifies how the model should use it. The following table describes the available reference types.
Reference Type
Description
Use Case
REFERENCE_TYPE_SUBJECT
Provides an image of a subject (like a person, animal, or product) to be incorporated into the generated image. You can provide multiple images for the same subject to improve quality.
Placing a specific person or object into a new scene or style.
REFERENCE_TYPE_STYLE
Provides an image that defines the artistic style (e.g., watercolor, sketch, pop art) for the generated image.
Applying a consistent artistic style to a generated image based on a source style image.
REFERENCE_TYPE_CONTROL
Uses a control image (like a canny edge, scribble, or face mesh) to guide the structure, pose, or composition of the generated image.
Controlling the exact pose of a character or the outline of an object.
REFERENCE_TYPE_RAW
Provides the base image for editing tasks. The output image has the same dimensions as this raw image.
Editing an existing image, such as inpainting or outpainting.
REFERENCE_TYPE_MASK
Provides a mask to specify which parts of a raw image should be edited (inpainting) or preserved. The mask can be user-provided or automatically generated.
Modifying a specific region of an image while leaving the rest unchanged.
Parameter list
The following sections describe the request parameters and response fields. For implementation details, see the examples.
Request parameters
REST
Parameters
referenceType
Required enumeration:
REFERENCE_TYPE_RAW
Required for editing use cases.
At most one raw reference image is allowed per request.
The output image has the same dimensions as the raw reference image.
REFERENCE_TYPE_MASK
Required for masked editing.
Must have the same dimensions as the raw reference image, if provided.
You can provide your own mask or have one generated from the reference image.
If the mask image is empty and maskMode isn't MASK_MODE_USER_PROVIDED, the mask is computed from the raw reference image.
REFERENCE_TYPE_CONTROL
Must have the same dimensions as the raw reference image, if provided.
If the control image is empty and enableControlImageComputation is true, the control image is computed from the raw reference image.
REFERENCE_TYPE_SUBJECT
You can provide multiple reference images with the same referenceId to potentially improve output quality.
REFERENCE_TYPE_STYLE
referenceId
Required integer
The ID for the reference image. Use this ID in your prompt to refer to the corresponding image. For example, use [1] to refer to images with referenceId=1 and [2] for images with referenceId=2.
referenceImage.bytesBase64Encoded
Required string
A Base64-encoded string of the reference image.
maskImageConfig.maskMode
Optional enumeration.
Use this parameter when referenceType is REFERENCE_TYPE_MASK.
MASK_MODE_USER_PROVIDED: If the reference image is a mask image.
MASK_MODE_BACKGROUND: To automatically generate a mask using background segmentation.
MASK_MODE_FOREGROUND: To automatically generate a mask using foreground segmentation.
MASK_MODE_SEMANTIC: To automatically generate a mask using semantic segmentation, and the given mask class.
maskImageConfig.dilation
Optional float. Range: [0, 1]
The percentage of image width to dilate this mask by.
Use this parameter when referenceType is REFERENCE_TYPE_MASK.
If referenceType is REFERENCE_TYPE_CONTROL, set this to true to have Imagen compute the control image from the reference image. Otherwise, set to false and provide your own control image.
language
Optional: string (imagen-3.0-capability-001,
imagen-3.0.generate-001, and
imagegeneration@006 only)
The language code that corresponds to your text prompt language.
The following values are supported:
auto: Automatic detection. If Imagen
detects a supported language, the prompt and an optional negative
prompt are translated to English. If the language detected isn't
supported, Imagen uses the input text verbatim, which
might result in an unexpected output. No error code is returned.
en: English (if omitted, the default value)
es: Spanish
hi: Hindi
ja: Japanese
ko: Korean
pt: Portuguese
zh-TW: Chinese (traditional)
zh or zh-CN: Chinese (simplified)
subjectImageConfig.subjectDescription
Required string.
A short description of the subject in the image. For example, a woman
with short brown hair.
Use this parameter when referenceType is REFERENCE_TYPE_SUBJECT.
subjectImageConfig.subjectType
Required enumeration.
Use this parameter when referenceType is REFERENCE_TYPE_SUBJECT.
SUBJECT_TYPE_PERSON: Person subject type.
SUBJECT_TYPE_ANIMAL: Animal subject type.
SUBJECT_TYPE_PRODUCT: Product subject type.
SUBJECT_TYPE_DEFAULT: Default subject type.
styleImageConfig.styleDescription
Optional string.
A short description for the style.
Use this parameter when referenceType is REFERENCE_TYPE_STYLE.
Response body
The following table describes the fields in the response body.
Parameter
predictions
An array of
VisionGenerativeModelResult objects,
one for each requested sampleCount. If any images are
filtered by responsible AI, they are not included.
Vision generative model result object
The following table describes the fields in the VisionGenerativeModelResult object.
Parameter
bytesBase64Encoded
The base64 encoded generated image. This field is not present if the output image
did not pass responsible AI filters.
mimeType
The MIME type of the generated image. This field is not present if the output image did
not pass responsible AI filters.
Examples
The following example shows how to use the Imagen model
to customize an image.
REST
Before using any of the request data,
make the following replacements:
LOCATION: Your project's region. For example,
us-central1, europe-west2, or asia-northeast3. For a list
of available regions, see
Generative AI on Vertex AI locations.
TEXT_PROMPT: The text prompt guides what images the model
generates. To use Imagen 3 Customization, include the referenceId of
the reference image or images
you provide in the format [$referenceId]. For example:
The following text prompt is for a request that has two reference images with
"referenceId": 1. Both images have an optional
description of "subjectDescription": "man with short hair":
Create an image about a man with short hair to match the description: A
pencil style sketch of a full-body portrait of a man with short hair [1] with
hatch-cross
drawing, hatch drawing of portrait with 6B and graphite pencils, white background, pencil
drawing, high quality, pencil stroke, looking at camera, natural human eyes
"referenceId": The ID of the reference image, or the ID for a series of reference
images that correspond to the same subject or style. In this example the two reference images
are of the same person, so they share the same referenceId (1).
BASE64_REFERENCE_IMAGE: A reference image to guide image generation. The
image must be specified as a base64-encoded byte
string.
SUBJECT_DESCRIPTION: Optional. A text description of the reference image you can
then use in the prompt field. For example:
"prompt": "a full-body portrait of a man with short hair [1] with hatch-cross
drawing",
[...],
"subjectDescription": "man with short hair"
IMAGE_COUNT: The number of generated images.
Accepted integer values: 1-4. Default value: 4.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict
The following sample response is for a request with
"sampleCount": 2. The response returns two prediction objects, with
the generated image bytes base64-encoded.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-21 UTC."],[],[]]