AI & Machine Learning

A developer’s guide to getting started with Imagen 3 on Vertex AI

August 29, 2024

Katie Nguyen

Developer Relations Engineer

Join us at Google Cloud Next

Early bird pricing available now through Feb 14th.

Over the past few months, early users put Imagen 3 on Vertex AI through its paces and shared valuable insights with us. It’s clear that users want an AI model that generates stunning visuals and empowers your practical creative applications. We’ve used their feedback to identify three common themes:

Demand for unparalleled quality across diverse artistic styles and formats
Desire for strong prompt adherence and fast image generation
Controls to protect and build trust with SynthID watermarking and advanced safety filters

Throughout this post, we will walk you through each of these concepts in depth. We will also provide some code examples and best prompt practices so you can get the most out of Imagen 3.

Uncompromising quality and versatility

Imagen 3 sets a new standard in quality and control over your generated images. This text-to-image model produces photorealistic visuals with exceptional composition, sharpness, color accuracy, and resolution. With Imagen 3, you can explore a wider spectrum of artistic styles and formats. From photorealistic masterpieces to whimsical claymation scenes, the model's expanded range of styles and formats provides the tools to express your unique artistic vision.

To demonstrate these photorealistic capabilities, let’s walk through an example of creating image mockups for a new cookbook cover. Using the following prompt, the generated image has incredible detail, composition and photorealism.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1-Cookbook.max-2200x2200.png

Text rendering

Imagen 3 also brings new possibilities when it comes to rendering text within images. A fun way to play around with this feature is to generate images of greeting cards, posters, and social media posts with captions in various fonts and colors. This feature is as easy as adding a short text description you would like to see to the prompt. Let’s say you would like to add a title and regenerate a cookbook cover.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2-Cookbook-with-title.max-2200x2200.png

Closer to your intent

Imagen 3's prompt comprehension translates your natural language descriptions, no matter how nuanced, into closely matched visuals. You can specify everything from specific camera angles to types of lenses to image compositions in your description. Imagen 3 adheres closely to the prompt, which helps close the gap between your mental picture and the final image. You can provide the model with simple subject-action-setting prompts or intricate, multi-layered descriptions, and the model adapts to your creative process to enable a broad range of styles.

Since Imagen 3 does well with elaborate prompts, providing robust details usually yields higher quality and more precise results. Below are a few options to consider when crafting your prompts:

Arrangement: Direct the scene by specifying where you want subjects positioned.
Lighting: Create atmosphere with soft or harsh lighting, and control its direction and focus.
Angles & lenses: Add depth and perspective with camera angles and lens choices.
Styles: Go beyond photorealism and generate digital art, cinematic, vintage, minimalist images, and more.

Reduced latency

While Imagen 3 is our highest quality model to date, we are also offering Imagen 3 Fast, which is optimized for generation speed. Imagen 3 Fast is suitable for creating brighter, higher contrast images. Compared to Imagen 2, you can see a 40% decrease in latency. To demonstrate these two models, you can generate two images with the same prompt. Let’s generate two options for a photo of a salad to add to the same cookbook from earlier.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3-Imagen3-fast-salad.max-2200x2200.png

Image generated by Imagen 3 Fast

https://storage.googleapis.com/gweb-cloudblog-publish/images/4-Imagen3-salad.max-2200x2200.png

Image generated by Imagen 3

Protect your work and create responsibly

Imagen 3 has built in safeguards that let you focus on your artistic vision without compromising control. In partnership with Google DeepMind, Imagen 3 utilizes SynthID, a technology which embeds an invisible watermark at the pixel level. By default, a digital watermark is added to all Imagen 3 generated images, but you can explicitly enable this feature with the add_watermark parameter. You can also use the API to verify whether an image was generated using Imagen. This verifies the authenticity of your AI-generated images, providing transparency and helping to safeguard your work from misuse.

With Imagen 3's advanced safety filters, you can also control the types of images generated to make sure they meet your brand values or principles. To configure safety filter thresholds for generated images, modify the safety_filter_level. The safety level can be changed to “block_most”, “block_some”, or “block_few”. To change the safety setting that controls the type of people generated, modify person_generation to “allow_all”, “allow_adult”, or “dont_allow”.

What’s next?

Imagen 3 is now generally available with an allowlist. The developers who've already experienced Imagen 3 are buzzing about its photorealistic capabilities and quality. As one early adopter remarked,

“The precision and realism in capturing the diverse locations and objects of destinations around the world is particularly impressive”, adding that “this level of detail is sure to be a strong competitive edge for Imagen 3.” – Sungmin Han

We’re currently prioritizing access to Imagen 3 on Vertex AI for developers at businesses with well-defined use cases. You can sign up for access through this form. We'll review your application and get back to you as soon as possible.

In the meantime you can learn more about Imagen 3 and integrate its capabilities in your applications by checking out the resources below!