Generative media models

Build the next generation of creative and multimodal experiences

Latest updates

Models

Gemini Image

Nano Banana Pro is a sophisticated reasoning-driven engine for professional-grade image editing and generation, offering studio-quality precision and advanced creative control. Nano Banana Pro is best for complex graphic design, high-fidelity product mockups, and factual data visualizations that require accurate text rendering and real-world grounding via Google Search.

Nano Banana 2 provides high-quality image generation and conversational editing at a mainstream price point and low latency. It serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume use cases.

Technical specifications

Developer guide

Pricing

Technical specifications

Developer guide

Pricing

Gemini Image (Nano Banana)

Gemini Image

Nano Banana Pro is a sophisticated reasoning-driven engine for professional-grade image editing and generation, offering studio-quality precision and advanced creative control. Nano Banana Pro is best for complex graphic design, high-fidelity product mockups, and factual data visualizations that require accurate text rendering and real-world grounding via Google Search.

Nano Banana 2 provides high-quality image generation and conversational editing at a mainstream price point and low latency. It serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume use cases.

Technical specifications

Developer guide

Pricing

Technical specifications

Developer guide

Pricing

Veo

Veo is Google’s state-of-the-art video generation model, designed to produce high-fidelity videos with stunning realism and natively generated audio. Veo supports both landscape and portrait aspect ratios, multiple resolutions up to 4K, and durations of 4, 6, or 8 seconds.

To meet diverse workflow requirements, the current generation is available in three distinct tiers:

Veo 3.1: This model is designed for state-of-the-art video generation where visual fidelity is the top priority for final production cuts
Veo 3.1 Fast: This option delivers faster video generation while maintaining high quality, making it ideal for standard production workflows
Veo 3.1 Lite: This is our most cost-effective model, empowering businesses to build high-volume video applications and rapidly iterate and scale

Technical specifications

Developer guide

Pricing

Technical specifications

Developer guide

Pricing

Gemini Audio

Gemini Audio is an advanced suite of models that allows you to seamlessly talk, create, and control sound. Using simple, natural language prompts, you can generate highly expressive speech, giving you granular control over style, tone, and performance to craft custom narratives.

For audio understanding, the models allow you to extract deep insights directly from your audio files, making it easy to analyze and process unstructured recordings.

Finally, to support live interaction, the suite enables you to build reliable, next-generation voice agents. These models deliver natural conversational capabilities and improved tonal understanding for fluid voice interactions.

Text to speech | Developer guide

Text to speech | Pricing

Speech to text | Developer guide

Speech to text | Pricing

Live (audio to audio) | Developer guide

Live (audio to audio) | pricing

Text to speech | Developer guide

Text to speech | Pricing

Speech to text | Developer guide

Speech to text | Pricing

Live (audio to audio) | Developer guide

Live (audio to audio) | pricing

Lyria

Lyria 3, Google's family of music generation models, is available on Vertex AI in public preview. With Lyria 3 models, you can generate high-quality and high-fidelity stereo audio from text prompts and from images with a vocal support.

Technical specifications

Developer guide

Pricing

Technical specifications

Developer guide

Pricing

Use cases

How Google DeepMind and Google Cloud are helping to bring a cinema classic to larger-than-life in Las Vegas.

Professional marketing and creative workflows

Embed generative media models directly into your application to help professionals explore ideas and create content.

Retail and e-commerce

Retailers use image and video models to build virtual try-ons and enrich product catalogs, giving shoppers a better understanding of items before they buy.

Media and entertainment

Use generative models to manage production pipelines across pre-production, visual effects, and post-production.

Consumer applications and gaming

Integrate multimodal capabilities into mobile applications and games. Device manufacturers can build voice and image features directly into hardware, while game developers can generate dynamic in-game assets.

Build with confidence at enterprise scale

Google Cloud provides the infrastructure and governance required to deploy generative media in production while helping you maintain control over your data.

Deploy on scalable infrastructure

Access the technical and commercial frameworks you need to deploy generative media models at scale.

Contact sales →

Build trust with SynthID watermarking and C2PA

By coupling SynthID watermarking with interoperable C2PA content credentials, we are ensuring your AI-generated media outputs are traceable, tamper-evident, and verifiable.