이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Multimodal Live API

Multimodal Live API를 사용하면 Gemini와의 양방향 음성 및 동영상 상호작용으로 지연 시간이 짧습니다. Multimodal Live API를 사용하면 최종 사용자에게 자연스럽고 인간과 같은 음성 대화 환경을 제공하고 음성 명령을 사용하여 모델의 응답을 중단할 수 있는 기능을 제공할 수 있습니다. 이 모델은 텍스트, 오디오, 동영상 입력을 처리하고 텍스트 및 오디오 출력을 제공할 수 있습니다.

Multimodal Live API는 Gemini API에서 BidiGenerateContent 메서드로 사용할 수 있으며 WebSockets를 기반으로 합니다.

자세한 내용은 다중 모드 실시간 API 참조 가이드를 참고하세요.

멀티모달 실시간 API를 시작하는 데 도움이 되는 텍스트 대 텍스트 예시는 다음을 참고하세요.

Gen AI SDK for Python

설치

pip install --upgrade google-genai

자세한 내용은 SDK 참고 문서를 참조하세요.

Vertex AI에서 Gen AI SDK를 사용하도록 환경 변수를 설정합니다.

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import LiveConnectConfig, HttpOptions, Modality

client = genai.Client(http_options=HttpOptions(api_version="v1beta1"))
model_id = "gemini-2.0-flash-exp"

async with client.aio.live.connect(
    model=model_id,
    config=LiveConnectConfig(response_modalities=[Modality.TEXT]),
) as session:
    text_input = "Hello? Gemini, are you there?"
    print("> ", text_input, "\n")
    await session.send(input=text_input, end_of_turn=True)

    response = []

    async for message in session.receive():
        if message.text:
            response.append(message.text)

    print("".join(response))
# Example output:
# >  Hello? Gemini, are you there?
# Yes, I'm here. What would you like to talk about?

기능:

오디오 출력과 함께 오디오 입력
오디오 출력과 함께 오디오 및 동영상 입력
선택 가능한 음성(Multimodal Live API 음성 참고)
세션 시간은 오디오의 경우 최대 15분, 오디오 및 동영상의 경우 최대 2분입니다.

멀티모달 Live API의 추가 기능에 관한 자세한 내용은 멀티모달 Live API 기능을 참고하세요.

언어:

영어로만 제공

제한사항:

멀티모달 Live API 제한사항을 참고하세요.

Multimodal Live API 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

Gen AI SDK for Python

설치

Multimodal Live API