시작하기 전에
커스텀 음성 기능에 액세스하려면 Google Cloud 프로젝트 ID를 Google에 제공해야 합니다. 이 프로젝트에 대해 결제를 사용 설정하고, Text-to-Speech API 및 AutoML API를 사용 설정했는지, Google Cloud CLI를 설치 및 초기화했는지 확인합니다.
프로젝트의 Google 계정에 AutoML 예측자 역할을 부여합니다. 자세한 내용은 단일 역할 부여를 참조하세요.
명령줄 사용
HTTP 메서드 및 URL:
POST https://texttospeech.googleapis.com/v1beta1/text:synthesize
JSON 요청 본문:
{
"input":{
"text":"Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets."
},
"voice":{
"custom_voice":{
"reportedUsage":"REALTIME",
"model":"projects/{project_id}/locations/us-central1/models/{model_id}",
}
},
"audioConfig":{
"audioEncoding":"LINEAR16"
}
}
요청 본문을 request.json
파일에 저장하고 다음 명령어를 실행합니다. 이때 PROJECT_ID
를 프로젝트 ID로 바꿉니다.
curl -X POST \ -H "Authorization: Bearer "$(gcloud auth print-access-token) \ -H "x-goog-user-project: PROJECT_ID" \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ https://texttospeech.googleapis.com/v1beta1/text:synthesize
Python 클라이언트 라이브러리 사용
클라이언트 라이브러리를 다운로드하고 다음 명령어를 실행합니다.
pip install texttospeech-custom-voice-beta-v1beta1-py.tar.gz
pip install protobuf --upgrade pip
샘플 Python 코드
"""Synthesize custom voice from the input string of text or ssml.
"""
from google.cloud import texttospeech_v1beta1
# Instantiate a client
client = texttospeech_v1beta1.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech_v1beta1.types.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and specify
# custom voice model and speaker_id.
custom_voice = texttospeech_v1beta1.types.CustomVoiceParams(
reported_usage=texttospeech_v1beta1.enums.CustomVoiceParams.ReportedUsage.REALTIME,
model='projects/{project_id}/locations/us-central1/models/{model_id}')
voice = texttospeech_v1beta1.types.VoiceSelectionParams(
language_code='en-US',
custom_voice=custom_voice)
# Select the type of audio file you want returned
audio_config = texttospeech_v1beta1.types.AudioConfig(
audio_encoding=texttospeech_v1beta1.enums.AudioEncoding.LINEAR16)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.wav', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.wav"')