Gemini 2.5 Flash Live API native audio

Gemini 2.5 Flash with Live API native audio features our cutting-edge native audio functionality for Live API. In addition to the standard Live API features, this preview model includes:

  • Enhanced audio quality: Experience dramatically improved audio quality that feels like speaking with a person.
  • Enhanced voice quality and adaptability: Live API native audio provides richer, more natural voice interactions with 30 HD voices in 24 languages.
  • Introducing Proactive Audio: When Proactive Audio is enabled, the model only responds when it's relevant. The model generates text transcripts and audio responses proactively only for queries directed to the device, and does not respond to non-device directed queries.
  • Introducing Affective Dialog: Models using Live API native audio can understand and respond appropriately to users' emotional expressions for more nuanced conversations.
  • Improved barge-in: Interrupt Gemini more naturally and reliably, even in loud and noisy environments.
  • Robust function calling: We've improved the triggering rate, allowing Gemini to successfully execute the functions you define to support your use cases.
  • Accurate transcription: The accuracy of audio-to-text transcription has been significantly enhanced.
  • Seamless multilingual support: Speak to Gemini in multiple languages, and it will effortlessly switch between them without any pre-configuration. Language is no longer a barrier.

For more information on Live API, see:

Try in Vertex AI

Model ID gemini-live-2.5-flash-preview-native-audio-09-2025
Supported inputs & outputs
  • Inputs:
    Text, Audio, Video
  • Outputs:
    Text, Audio
Token limits
  • Maximum input tokens: 128K
  • Maximum output tokens: 64K
  • Context window: 32K (default), upgradable to 128K
Capabilities
Usage types
Technical specifications
Video
  • Standard resolution: 768 x 768
  • Supported MIME types:
    video/x-flv, video/quicktime, video/mpeg, video/mpegs, video/mpg, video/mp4, video/webm, video/wmv, video/3gpp
Audio
  • Maximum conversation length: Default 10 minutes that can be extended.
  • Required audio input format: Raw 16-bit PCM audio at 16kHz, little-endian
  • Required audio output format: Raw 16-bit PCM audio at 24kHz, little-endian
  • Supported MIME types:
    audio/x-aac, audio/flac, audio/mp3, audio/m4a, audio/mpeg, audio/mpga, audio/mp4, audio/ogg, audio/pcm, audio/wav, audio/webm
Parameter defaults
  • Start of speech sensitivity: Low
  • End of speech sensitivity: High
  • Prefix padding: 0
  • Max context size: 128K
Supported regions

Model availability

  • United States
    • us-central1
See Data residency for more information.
Knowledge cutoff date January 2025
Versions
  • gemini-live-2.5-flash-preview-native-audio-09-2025
    • Launch stage: Public preview
    • Release date: September 18, 2025
  • gemini-live-2.5-flash-preview-native-audio
    • Launch stage: Public preview
    • Release date: June 17, 2025
    • Discontinuation date: October 18, 2025
Security controls
See Security controls for more information.
Supported languages See Supported languages.
Pricing See Pricing.