AI & Machine Learning

Building on the bananas momentum of generative media models on Google Cloud

October 2, 2025

https://storage.googleapis.com/gweb-cloudblog-publish/images/generative_media_momentum.max-2000x2000.jpg

Michael Gerstenhaber

VP of Product Management, Vertex AI

It’s been exciting to see the capabilities of Nano Banana, our latest image editing model available in Gemini 2.5 Flash Image, go viral. And with transformative workflows like these, it is easy to see why:

https://storage.googleapis.com/gweb-cloudblog-publish/images/GenMedia_Sept_Bundle_Marketing_Assets.max-1000x1000.png

Iterative refinement with Gemini 2.5 Flash Image

https://storage.googleapis.com/gweb-cloudblog-publish/images/GenMedia_Sept_Bundle_Marketing_Assets_1.max-1000x1000.png

Context aware conversational editing with Gemini 2.5 Flash Image

https://storage.googleapis.com/gweb-cloudblog-publish/images/GenMedia_Sept_Bundle_Marketing_Assets_2.max-1000x1000.png

Geospatial reasoning and understanding with Gemini 2.5 Flash Image

The incredible response makes it clear: enterprises now have the ability to quickly create and refine high-quality media across more formats and channels than ever before, without compromising on appeal, consistency, or security.

That’s why we’re thrilled to announce major updates across our suite of generative media models—including Gemini 2.5 Flash Image (now GA!), Veo, Imagen, and Gemini 2.5 Text-To-Speech — on Vertex AI. These updates help you create faster, with more control, and across all the formats that matter most: sight, sound, and motion. Let’s take a look.

Gemini 2.5 Flash Image is Generally Available (GA) on Vertex AI

We are excited to announce the General Availability of Gemini 2.5 Flash Image. Our state-of-the-art image generation and editing model is now production ready and backed by Google Cloud’s enterprise-grade infrastructure and security. In addition, the model now creates images across multiple aspect ratios and supports batch processing.

We’re already seeing incredible adoption of Gemini 2.5 Flash Image. Here’s an example of how companies are pushing the creative boundaries of Gemini 2.5 Flash Image:

“Gemini 2.5 Flash Image and such high-quality AI tools mean, quite literally, nothing is off limits anymore. As a result, our team has never been more creative. We’re collaborating on ideas, able to visualize them faster and launch campaigns in days, instead of weeks. Our core mission is always to give creators and businesses the most advanced AI tools, and putting Gemini 2.5 Flash Image in their hands fulfills that promise. This is, without a doubt, the most exciting time to be a creator." - Shahar Aizenberg, CMO, Artlist.io

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/Mercado_Libre_Gif.gif

“Gemini 2.5 Flash Image has redefined what’s possible for Mercado Libre’s Photo Studio. The model's creativity, aesthetic quality, and precise instruction following have elevated our product listings and unlocked new possibilities. Today, our only limit is imagination.” - Franco Seia, Software Development Manager, Mercado Libre

Veo: Dream up creations in new formats with greater control

Veo 3, our latest video generation model with native audio and dialogue, has been rapidly adopted by creators looking to bring their stories to life with unprecedented control. We've been listening to your feedback, and we're excited to announce new features that make Veo 3 even more powerful on Vertex AI:

Output vertical formats for social media: You told us you wanted to create vertical videos, and we listened! Veo 3 and Veo 3 Fast now support a 9:16 aspect ratio. Creators can produce larger, more immersive visuals that are optimized for the vertical orientation of social media platforms. Say goodbye to awkward cropping!
Control the flow and timing of your story: With the 4, 6, or 8 second duration options, you can seamlessly tune your video content for transitions and intercut scenes, allowing you to craft more flexible narratives for your needs.

Our customers are already leveraging Veo to transform their creative workflows and connect with their audiences in powerful new ways.

“For Palo Alto Networks "Be a Genius. Deploy Bravely" campaign, we proved you no longer have to choose between speed, creativity, and cost. With Gemini and Veo 3, you get all three. Like our customers, we are navigating the incredible promise of the AI revolution. The smartest move in this new era isn't just to adopt AI, but to do it securely.” - Kelly Waldher, CMO, Palo Alto Networks

“We’ve always believed that the future of creativity is a dynamic partnership between creatives and technology. That’s why we’re bringing together the best-in-class AI models across video, images and audio directly into our unlimited Envato subscription. Early signals indicate strong usage of models like Veo 3, accelerating our community's creativity and empowering them to thrive in the process.” - Hichame Assi, CEO, Envato

Imagen 4 is Generally Available (GA) on Vertex AI

Our leading text-to-image model, Imagen 4, is engineered for creativity and speed. It delivers photorealistic images, sharp clarity, and text rendering and typography, bringing your imagination to life faster than ever before. It is Generally Available and production ready on Vertex AI.

Shutterstock, a family of brands delivering scalable creative and GenAI solutions, is using Google’s Imagen 4 models to power high-quality, commercially ready AI images.

"At Shutterstock, our mission is to empower businesses with the essential, universal ingredients to make their work more effective. By bringing Google’s Imagen 4 models into our AI Image Generator, we’re making it easier than ever for teams to go from an idea to impact with market-ready visuals in seconds. Imagen 4 allows us to deliver high-quality, commercially ready outputs that meet the standards our customers expect. This integration ensures Shutterstock customers are always equipped with the most advanced, future-ready tools in creative AI." - Keenan Kadam, Senior Product Manager, Shutterstock

Gemini 2.5 TTS (Text to Speech) is Generally Available (GA) on Vertex AI

Our powerful generative media capabilities also include audio. Leverage Gemini 2.5 Text-to-Speech for the creation of high-fidelity voice applications, all with the security and scale of Vertex AI.

We are thrilled to announce that Gemini 2.5 Text-to-Speech (TTS) is now Generally Available across both Pro and Flash models. This model prioritizes human-like expression and control, transforming how you build voice applications:

Studio-Quality Dialogue, Now GA: Stop relying on choppy, single-speaker systems. Gemini 2.5 Flash and Gemini 2.5 Pro TTS are now ready for production, enabling you to generate dynamic, multi-speaker dialogue in a single API call for podcasts, audiobooks, and rich conversational customer service.
Advanced Style and Tone Control: Leverage natural language prompts to dictate the performance, not just the text. You can now control the voice's tone, emotional expression, and accent, ensuring your brand’s voice is delivered with perfect fidelity.
Global Reach, Perfect Delivery: Gemini 2.5 Flash and Gemini 2.5 Pro TTS are now available across more than 70 languages, guaranteeing your global audience receives the same high-quality, expressive voice experience, regardless of region.

Which gen media model to use, and when

We have a suite of options for enterprise-grade work because we know choice is important when it comes to the right model for your project. If you’re not sure where to start, here’s a quick cheat sheet:

Choose Veo 3 if your workflow demands dynamic, high-quality video creation with granular control over scenes, characters, and narrative flow. It's perfect for social media content, marketing campaigns, and any project where bringing stories to life through motion is key. Your input can be text, images, or a combination. For the latest Veo 3 pricing go here.
Choose Gemini 2.5 Flash Image as a starting point for image creation or if your workflow is iterative and requires creating or editing an image with strong visual consistency. It's the right choice for conversational editing, sketch-to-image tasks, style transfers, and adapting existing visuals. Your input is often a combination of images and text prompts.
Choose Imagen 4 if your workflow is focused on generating net-new images from text with speed and higher resolution. It's built for high-volume text-to-image applications where speed and resolution are your primary concerns.
Choose Gemini 2.5 Flash or Gemini 2.5 Pro TTS (Text-to-Speech) if your workflow is centered on bringing text to life with high-quality, emotionally expressive audio. It's the right choice for creating lifelike voice agents, professional narration for content like podcasts and e-learning including multi-speaker synthesis, and dynamic character voices for gaming and entertainment. Your input is text.

Get started with enterprise-grade creativity on Vertex AI

Gemini 2.5 Flash Image, Veo 3, Imagen 4, and Gemini 2.5 TTS are available on Vertex AI today.

Dive into the Vertex AI Studio to get started with Gemini 2.5 Flash Image and Gemini 2.5 TTS today. For Veo 3 and Imagen 4, get started at Vertex AI Media Studio.

Posted in