여러 화자가 포함된 대화 생성

이 페이지에서는 Text-to-Speech로 생성된 여러 화자를 이용해서 대화를 만드는 방법을 설명합니다.

여러 화자가 포함된 오디오를 생성하여 대화를 만들 수 있습니다. 이 기능은 인터뷰, 대화형 스토리텔링, 비디오 게임, e-러닝 플랫폼, 접근성 솔루션에 유용할 수 있습니다.

여러 화자가 포함된 오디오에서는 다음 음성이 지원됩니다.

en-US-Studio-Multispeaker
- 화자: R
- 화자: S
- 화자: T
- 화자: U

예시. 이 샘플은 여러 화자를 이용해서 생성된 오디오입니다.

다중 화자 마크업 사용 방법 예시

다음은 다중 화자 마크업을 사용하는 방법을 보여주는 예시입니다.

Python

Text-to-Speech용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Text-to-Speech 클라이언트 라이브러리를 참조하세요. 자세한 내용은 Text-to-Speech Python API 참고 문서를 확인하세요.

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

"""Synthesizes speech for multiple speakers.
Make sure to be working in a virtual environment.
"""
from google.cloud import texttospeech_v1beta1 as texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

multi_speaker_markup = texttospeech.MultiSpeakerMarkup(
    turns=[
        texttospeech.MultiSpeakerMarkup.Turn(
            text="I've heard that the Google Cloud multi-speaker audio generation sounds amazing!",
            speaker="R",
        ),
        texttospeech.MultiSpeakerMarkup.Turn(
            text="Oh? What's so good about it?", speaker="S"
        ),
        texttospeech.MultiSpeakerMarkup.Turn(text="Well..", speaker="R"),
        texttospeech.MultiSpeakerMarkup.Turn(text="Well what?", speaker="S"),
        texttospeech.MultiSpeakerMarkup.Turn(
            text="Well, you should find it out by yourself!", speaker="R"
        ),
        texttospeech.MultiSpeakerMarkup.Turn(
            text="Alright alright, let's try it out!", speaker="S"
        ),
    ]
)

# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
    multi_speaker_markup=multi_speaker_markup
)

# Build the voice request, select the language code ('en-US') and the voice
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US", name="en-US-Studio-MultiSpeaker"
)

# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
)

# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')