Generative media models

Build the next generation of creative and multimodal experiences

Nano Banana

Models

Gemini Image

Nano Banana Pro

Nano Banana Pro is a sophisticated reasoning-driven engine for professional-grade image editing and generation, offering studio-quality precision and advanced creative control. Nano Banana Pro is best for complex graphic design, high-fidelity product mockups, and factual data visualizations that require accurate text rendering and real-world grounding via Google Search.

Nano Banana 2 provides high-quality image generation and conversational editing at a mainstream price point and low latency. It serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume use cases.

Gemini Image (Nano Banana)

Gemini Image

Nano Banana Pro

Nano Banana Pro is a sophisticated reasoning-driven engine for professional-grade image editing and generation, offering studio-quality precision and advanced creative control. Nano Banana Pro is best for complex graphic design, high-fidelity product mockups, and factual data visualizations that require accurate text rendering and real-world grounding via Google Search.

Nano Banana 2 provides high-quality image generation and conversational editing at a mainstream price point and low latency. It serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume use cases.

Veo

Veo 3.1

Veo is Google’s state-of-the-art video generation model, designed to produce high-fidelity videos with stunning realism and natively generated audio. Veo supports both landscape and portrait aspect ratios, multiple resolutions up to 4K, and durations of 4, 6, or 8 seconds.

To meet diverse workflow requirements, the current generation is available in three distinct tiers:

  • Veo 3.1: This model is designed for state-of-the-art video generation where visual fidelity is the top priority for final production cuts
  • Veo 3.1 Fast: This option delivers faster video generation while maintaining high quality, making it ideal for standard production workflows
  • Veo 3.1 Lite: This is our most cost-effective model, empowering businesses to build high-volume video applications and rapidly iterate and scale

Gemini Audio

Gemini Audio

Gemini Audio is an advanced suite of models that allows you to seamlessly talk, create, and control sound. Using simple, natural language prompts, you can generate highly expressive speech, giving you granular control over style, tone, and performance to craft custom narratives.

For audio understanding, the models allow you to extract deep insights directly from your audio files, making it easy to analyze and process unstructured recordings.

Finally, to support live interaction, the suite enables you to build reliable, next-generation voice agents. These models deliver natural conversational capabilities and improved tonal understanding for fluid voice interactions.

Lyria

Lyria 3, Google's family of music generation models, is available on Vertex AI in public preview. With Lyria 3 models, you can generate high-quality and high-fidelity stereo audio from text prompts and from images with a vocal support. 

Use cases

Wizard of Oz
How Google DeepMind and Google Cloud are helping to bring a cinema classic to larger-than-life in Las Vegas.

Embed generative media models directly into your application to help professionals explore ideas and create content.

Retailers use image and video models to build virtual try-ons and enrich product catalogs, giving shoppers a better understanding of items before they buy.

Use generative models to manage production pipelines across pre-production, visual effects, and post-production.

Integrate multimodal capabilities into mobile applications and games. Device manufacturers can build voice and image features directly into hardware, while game developers can generate dynamic in-game assets.

Build with confidence at enterprise scale

Google Cloud provides the infrastructure and governance required to deploy generative media in production while helping you maintain control over your data.

Google Cloud stack

Access the technical and commercial frameworks you need to deploy generative media models at scale.

Contact sales

By coupling SynthID watermarking with interoperable C2PA content credentials, we are ensuring your AI-generated media outputs are traceable, tamper-evident, and verifiable.

Under the Google Cloud Terms of Service, you retain full ownership and intellectual property rights over your data and applications.

Google Cloud Terms of Service

See how our customers are innovating with generative media models

Ocado uses Google AI for personalized marketing and task automation.

Kraft Heinz

Kraft Heinz uses Gemini, Veo, and Imagen for creative asset generation.

Imperial War Museum

The Imperial War Museums uses Gemini for audio transcription and translation.

VEED

VEED uses Gemini, and Veo for accessible video creation and editing.

Get started today with Generative media models


Google Cloud