Models
Nano Banana Pro is a sophisticated reasoning-driven engine for professional-grade image editing and generation, offering studio-quality precision and advanced creative control. Nano Banana Pro is best for complex graphic design, high-fidelity product mockups, and factual data visualizations that require accurate text rendering and real-world grounding via Google Search.
Nano Banana 2 provides high-quality image generation and conversational editing at a mainstream price point and low latency. It serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume use cases.
Nano Banana Pro is a sophisticated reasoning-driven engine for professional-grade image editing and generation, offering studio-quality precision and advanced creative control. Nano Banana Pro is best for complex graphic design, high-fidelity product mockups, and factual data visualizations that require accurate text rendering and real-world grounding via Google Search.
Nano Banana 2 provides high-quality image generation and conversational editing at a mainstream price point and low latency. It serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume use cases.
Veo is Google’s state-of-the-art video generation model, designed to produce high-fidelity videos with stunning realism and natively generated audio. Veo supports both landscape and portrait aspect ratios, multiple resolutions up to 4K, and durations of 4, 6, or 8 seconds.
To meet diverse workflow requirements, the current generation is available in three distinct tiers:
Gemini Audio is an advanced suite of models that allows you to seamlessly talk, create, and control sound. Using simple, natural language prompts, you can generate highly expressive speech, giving you granular control over style, tone, and performance to craft custom narratives.
For audio understanding, the models allow you to extract deep insights directly from your audio files, making it easy to analyze and process unstructured recordings.
Finally, to support live interaction, the suite enables you to build reliable, next-generation voice agents. These models deliver natural conversational capabilities and improved tonal understanding for fluid voice interactions.
Lyria 3, Google's family of music generation models, is available on Vertex AI in public preview. With Lyria 3 models, you can generate high-quality and high-fidelity stereo audio from text prompts and from images with a vocal support.
Use cases
Professional marketing and creative workflows
Embed generative media models directly into your application to help professionals explore ideas and create content.
Retail and e-commerce
Retailers use image and video models to build virtual try-ons and enrich product catalogs, giving shoppers a better understanding of items before they buy.
Media and entertainment
Use generative models to manage production pipelines across pre-production, visual effects, and post-production.
Consumer applications and gaming
Integrate multimodal capabilities into mobile applications and games. Device manufacturers can build voice and image features directly into hardware, while game developers can generate dynamic in-game assets.
Build with confidence at enterprise scale
Google Cloud provides the infrastructure and governance required to deploy generative media in production while helping you maintain control over your data.

Deploy on scalable infrastructure
Access the technical and commercial frameworks you need to deploy generative media models at scale.
Build trust with SynthID watermarking and C2PA
By coupling SynthID watermarking with interoperable C2PA content credentials, we are ensuring your AI-generated media outputs are traceable, tamper-evident, and verifiable.
Maintain data privacy and security
Under the Google Cloud Terms of Service, you retain full ownership and intellectual property rights over your data and applications.
See how our customers are innovating with generative media models