The Live API enables low-latency, two-way voice and video interactions
with Gemini. This guide covers how to set up a two-way interactive
conversation, adjust audio settings, manage sessions, and more. You can use the Live API with the following models: * Reach out to your Google account team representative to request
access. To end the session, click
To enable real-time, interactive conversations, run the following
example on a local computer with microphone and speaker access (not a
Colab notebook). This example sets up a conversation with the API,
allowing you to send audio prompts and receive audio responses:
Set up a conversation with the API that lets you send text prompts and receive audio responses: Start the conversation, input your prompts, or type The Live API uses Chirp 3 to support synthesized speech responses in a
variety of HD voices and languages. For a full list and demos of each voice, see
Chirp 3: HD voices. To set the response voice and language: Voice activity detection (VAD) allows the model to recognize when a person is
speaking. This is essential for creating natural conversations, as it allows a
user to interrupt the model at any time. The model automatically performs voice activity detection (VAD) on a continuous
audio input stream. You can configure the VAD settings using the
If the audio stream pauses for more than a second (for example, if the user
turns off the microphone), send an
Alternatively, disable automatic VAD by setting
The default maximum length of a conversation session is 10 minutes. A You can extend the session length in 10-minute increments using the
Gen AI SDK. There's no limit to the number of times you can extend
a session. For an example, see Enable and disable session
resumption. The Live API context window is used to store real-time streamed data
(25 tokens per second (TPS) for audio and 258 TPS for video) and other content,
including text inputs and model outputs. If the context window exceeds the maximum length (set using the Max content
size slider in Vertex AI Studio, or For example, if your maximum context length is set to Since the total context size is now higher than the 32,000 token maximum,
context window compression now triggers. The system goes back to the beginning
of the conversation and starts deleting old turns until the total token size is
less than the 16,000 token target: The final result is that only Turn 3 remains in active memory, and the
conversation continues from that point. The minimum and maximum lengths for the context length and target size are: To set the context window:
Set the You can have up to 5,000 concurrent sessions per project. The Live API lets you update the system instructions during an active
session. Use this to adapt the model's responses, such as changing the response
language or modifying the tone. To update the system instructions mid-session, you can send text content with
the Session resumption lets you reconnect to a previous session within 24 hours.
This is achieved by storing cached data, including text, video, audio prompts,
and model outputs. Project-level privacy is enforced for this cached data. By default, session resumption is disabled. To enable the session resumption feature, set the When a snapshot is successfully taken, a Here's an example of enabling session resumption and retrieving the handle ID: To achieve seamless session resumption, enable transparent mode: After transparent mode is enabled, the index of the client message that
corresponds with the context snapshot is explicitly returned. This helps
identify which client message you need to send again, when you resume the
session from the resumption handle. For more information on using the Live API, see:Supported models
Model version
Availability level
gemini-live-2.5-flash
Private GA*
gemini-live-2.5-flash-preview-native-audio
Public preview
Start a conversation
Console
Python
"""
# Installation
# on linux
sudo apt-get install portaudio19-dev
# on mac
brew install portaudio
python3 -m venv env
source env/bin/activate
pip install google-genai
"""
import asyncio
import pyaudio
from google import genai
from google.genai import types
CHUNK=4200
FORMAT=pyaudio.paInt16
CHANNELS=1
RECORD_SECONDS=5
MODEL = 'gemini-live-2.5-flash'
INPUT_RATE=16000
OUTPUT_RATE=24000
client = genai.Client(
vertexai=True,
project=GOOGLE_CLOUD_PROJECT,
location=GOOGLE_CLOUD_LOCATION,
)
config = {
"response_modalities": ["AUDIO"],
"input_audio_transcription": {}, # Configure input transcription
"output_audio_transcription": {}, # Configure output transcription
}
async def main():
print(MODEL)
p = pyaudio.PyAudio()
async with client.aio.live.connect(model=MODEL, config=config) as session:
#exit()
async def send():
stream = p.open(
format=FORMAT, channels=CHANNELS, rate=INPUT_RATE, input=True, frames_per_buffer=CHUNK)
while True:
frame = stream.read(CHUNK)
await session.send(input={"data": frame, "mime_type": "audio/pcm"})
await asyncio.sleep(10**-12)
async def receive():
output_stream = p.open(
format=FORMAT, channels=CHANNELS, rate=OUTPUT_RATE, output=True, frames_per_buffer=CHUNK)
async for message in session.receive():
if message.server_content.input_transcription:
print(message.server_content.model_dump(mode="json", exclude_none=True))
if message.server_content.output_transcription:
print(message.server_content.model_dump(mode="json", exclude_none=True))
if message.server_content.model_turn:
for part in message.server_content.model_turn.parts:
if part.inline_data.data:
audio_data=part.inline_data.data
output_stream.write(audio_data)
await asyncio.sleep(10**-12)
send_task = asyncio.create_task(send())
receive_task = asyncio.create_task(receive())
await asyncio.gather(send_task, receive_task)
asyncio.run(main())
Python
# Set model generation_config
CONFIG = {"response_modalities": ["AUDIO"]}
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {bearer_token[0]}",
}
async def main() -> None:
# Connect to the server
async with connect(SERVICE_URL, additional_headers=headers) as ws:
# Setup the session
async def setup() -> None:
await ws.send(
json.dumps(
{
"setup": {
"model": "gemini-live-2.5-flash",
"generation_config": CONFIG,
}
}
)
)
# Receive setup response
raw_response = await ws.recv(decode=False)
setup_response = json.loads(raw_response.decode("ascii"))
print(f"Connected: {setup_response}")
return
# Send text message
async def send() -> bool:
text_input = input("Input > ")
if text_input.lower() in ("q", "quit", "exit"):
return False
msg = {
"client_content": {
"turns": [{"role": "user", "parts": [{"text": text_input}]}],
"turn_complete": True,
}
}
await ws.send(json.dumps(msg))
return True
# Receive server response
async def receive() -> None:
responses = []
# Receive chucks of server response
async for raw_response in ws:
response = json.loads(raw_response.decode())
server_content = response.pop("serverContent", None)
if server_content is None:
break
model_turn = server_content.pop("modelTurn", None)
if model_turn is not None:
parts = model_turn.pop("parts", None)
if parts is not None:
for part in parts:
pcm_data = base64.b64decode(part["inlineData"]["data"])
responses.append(np.frombuffer(pcm_data, dtype=np.int16))
# End of turn
turn_complete = server_content.pop("turnComplete", None)
if turn_complete:
break
# Play the returned audio message
display(Markdown("**Response >**"))
display(Audio(np.concatenate(responses), rate=24000, autoplay=True))
return
await setup()
while True:
if not await send():
break
await receive()
q
, quit
or exit
to exit.await main()
Change language and voice settings
Console
Python
config = LiveConnectConfig(
response_modalities=["AUDIO"],
speech_config=SpeechConfig(
voice_config=VoiceConfig(
prebuilt_voice_config=PrebuiltVoiceConfig(
voice_name=voice_name,
)
),
language_code="en-US",
),
)
Change voice activity detection settings
realtimeInputConfig.automaticActivityDetection
field of the setup
message. When VAD detects
an interruption, the ongoing generation is canceled and discarded. Only the
information already sent to the client is retained in the session history. The
server then sends a message to report the interruption.audioStreamEnd
event to flush any cached audio. The client can resume sending audio data at any
time.realtimeInputConfig.automaticActivityDetection.disabled
to true
in the setup
message. With this configuration, the client detects user speech and sends
activityStart
and
activityEnd
messages at the appropriate times. An audioStreamEnd
isn't sent. Interruptions
are marked by activityEnd
.Python
config = LiveConnectConfig(
response_modalities=["TEXT"],
realtime_input_config=RealtimeInputConfig(
automatic_activity_detection=AutomaticActivityDetection(
disabled=False, # default
start_of_speech_sensitivity=StartSensitivity.START_SENSITIVITY_LOW, # Either START_SENSITIVITY_LOW or START_SENSITIVITY_HIGH
end_of_speech_sensitivity=EndSensitivity.END_SENSITIVITY_LOW, # Either END_SENSITIVITY_LOW or END_SENSITIVITY_HIGH
prefix_padding_ms=20,
silence_duration_ms=100,
)
),
)
async with client.aio.live.connect(
model=MODEL_ID,
config=config,
) as session:
audio_bytes = Path("sample.pcm").read_bytes()
await session.send_realtime_input(
media=Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
)
# if stream gets paused, send:
# await session.send_realtime_input(audio_stream_end=True)
response = []
async for message in session.receive():
if message.server_content.interrupted is True:
# The model generation was interrupted
response.append("The session was interrupted")
if message.text:
response.append(message.text)
display(Markdown(f"**Response >** {''.join(response)}"))
Extend a session
goAway
notification
(BidiGenerateContentServerMessage.goAway
)
is sent to the client 60 seconds before the session ends.Context window
trigger_tokens
in the API), the
oldest turns are truncated using context window compression to prevent abrupt
session termination. Context window compression triggers once the context window
hits its maximum length (set either in Vertex AI Studio using the
Target context size slider, or using target_tokens
in the API) and deletes
the oldest parts of the conversation until the total token count is back down to
this target size.32000
tokens and your
target size is set to 16000
tokens:
Setting (API flag)
Minimum value
Maximum value
Maximum context length (
trigger_tokens
)5,000
128,000
Target context size (
target_tokens
)0
128,000
Console
Python
context_window_compression.trigger_tokens
and
context_window_compression.sliding_window.target_tokens
fields in the setup message:
config = types.LiveConnectConfig(
temperature=0.7,
response_modalities=['TEXT'],
system_instruction=types.Content(
parts=[types.Part(text='test instruction')], role='user'
),
context_window_compression=types.ContextWindowCompressionConfig(
trigger_tokens=1000,
sliding_window=types.SlidingWindow(target_tokens=10),
),
)
Concurrent sessions
Update system instructions during a session
system
role. The updated system instruction will remain in effect for the
remaining session.Python
session.send_client_content(
turns=types.Content(
role="system", parts=[types.Part(text="new system instruction")]
),
turn_complete=False
)
Enable and disable session resumption
sessionResumption
field of
the
BidiGenerateContentSetup
message. If enabled, the server will periodically take a snapshot of the current
cached session contexts, and store it in the internal storage.resumptionUpdate
is returned with
the handle ID that you can record and use later to resume the session from the
snapshot.Python
import asyncio
from google import genai
from google.genai import types
client = genai.Client(
vertexai=True,
project=GOOGLE_CLOUD_PROJECT,
location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"
async def main():
print(f"Connecting to the service with handle {previous_session_handle}...")
async with client.aio.live.connect(
model=model,
config=types.LiveConnectConfig(
response_modalities=["AUDIO"],
session_resumption=types.SessionResumptionConfig(
# The handle of the session to resume is passed here,
# or else None to start a new session.
handle=previous_session_handle
),
),
) as session:
while True:
await session.send_client_content(
turns=types.Content(
role="user", parts=[types.Part(text="Hello world!")]
)
)
async for message in session.receive():
# Periodically, the server will send update messages that may
# contain a handle for the current state of the session.
if message.session_resumption_update:
update = message.session_resumption_update
if update.resumable and update.new_handle:
# The handle should be retained and linked to the session.
return update.new_handle
# For the purposes of this example, placeholder input is continually fed
# to the model. In non-sample code, the model inputs would come from
# the user.
if message.server_content and message.server_content.turn_complete:
break
if __name__ == "__main__":
asyncio.run(main())
Python
types.LiveConnectConfig(
response_modalities=["AUDIO"],
session_resumption=types.SessionResumptionConfig(
transparent=True,
),
)
More information
Interactive conversations with the Live API
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-15 UTC.