Vertex AI의 최신 멀티모달 모델인 Gemini 1.5 모델을 사용해 보고 최대 2백만 개의 토큰 컨텍스트 윈도우를 사용해 무엇을 빌드할 수 있는지 확인해 보세요. Vertex AI의 최신 멀티모달 모델인 Gemini 1.5 모델을 사용해 보고 최대 2백만 개의 토큰 컨텍스트 윈도우를 사용해 무엇을 빌드할 수 있는지 확인해 보세요.

SSML로 주소 말하기

이 튜토리얼에서는 음성 합성 마크업 언어(SSML)를 사용하여 주소 텍스트 파일을 말하는 방법을 보여줍니다. SSML 태그를 사용해서 텍스트의 문자열을 마크업하여 텍스트 음성 변환으로부터 합성 오디오를 맞춤설정할 수 있습니다.

일반 텍스트	일반 텍스트의 SSML 렌더링
123 Street Ln	<speak>123 Street Ln</speak>
1 Number St	<speak>1 Number St</speak>
1 Piazza del Fibonacci	<speak>1 Piazza del Fibonacci</speak>

목표

SSML 및 텍스트 음성 변환 클라이언트 라이브러리를 사용하여 텍스트 음성 변환에 합성 음성 요청을 전송합니다.

비용

비용 정보는 텍스트 음성 변환 가격 책정 페이지를 참조하세요.

시작하기 전에

Google Cloud Console에 텍스트 음성 변환 프로젝트가 있는지 확인합니다.
이 튜토리얼에서는 자바, Node.js, Python을 사용할 수 있습니다. Java를 사용하려면 Maven을 다운로드하고 설치합니다. Node.js를 사용하려면 npm을 다운로드합니다.

코드 샘플 다운로드

코드 샘플을 다운로드하려면 사용하려는 프로그래밍 언어에 대해 Google Cloud GitHub 샘플을 클론합니다.

자바

이 튜토리얼에서는 Google Cloud Platform 자바 샘플 저장소의 texttospeech/cloud-client/src/main/java/com/example/texttospeech/ 디렉터리에 있는 코드가 사용됩니다.

이 튜토리얼용 코드로 이동하여 다운로드하려면 터미널에서 다음 명령어를 실행합니다.

git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
cd java-docs-samples/texttospeech/cloud-client/src/main/java/com/example/texttospeech/

Node.js

이 튜토리얼에서는 Google Cloud Platform Node.js 샘플 저장소의 texttospeech 디렉터리에 있는 코드를 사용합니다.

이 튜토리얼용 코드로 이동하여 다운로드하려면 터미널에서 다음 명령어를 실행합니다.

git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git
cd texttospeech/

Python

이 튜토리얼에서는 Google Cloud Platform Python 샘플 저장소의 texttospeech/snippets 디렉터리에 있는 코드가 사용됩니다.

이 튜토리얼용 코드로 이동하여 다운로드하려면 터미널에서 다음 명령어를 실행합니다.

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
cd samples/snippets

클라이언트 라이브러리 설치

이 튜토리얼에서는 텍스트 음성 변환 클라이언트 라이브러리가 사용됩니다.

자바

이 튜토리얼에서는 다음 종속 항목이 사용됩니다.

<!--  Using libraries-bom to manage versions.
See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM -->
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.32.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-texttospeech</artifactId>
  </dependency>
</dependencies>

Node.js

터미널에서 다음 명령어를 실행합니다.

npm install @google-cloud/text-to-speech

Python

터미널에서 다음 명령어를 실행합니다.

pip install --upgrade google-cloud-texttospeech

Google Cloud Platform 사용자 인증 정보 설정

GOOGLE_APPLICATION_CREDENTIALS 환경 변수를 설정하여 애플리케이션 코드에 사용자 인증 정보를 제공합니다. 이 변수는 현재 셸 세션에만 적용됩니다. 이후 셸 세션에 변수를 적용하려면 셸 시작 파일(예: ~/.bashrc 또는 ~/.profile 파일)에서 변수를 설정합니다.

Linux 또는 macOS

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

KEY_PATH를 사용자 인증 정보가 포함된 JSON 파일의 경로로 바꿉니다.

예를 들면 다음과 같습니다.

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

Windows

PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

KEY_PATH를 사용자 인증 정보가 포함된 JSON 파일의 경로로 바꿉니다.

예를 들면 다음과 같습니다.

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

명령 프롬프트:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

KEY_PATH를 사용자 인증 정보가 포함된 JSON 파일의 경로로 바꿉니다.

라이브러리 가져오기

이 튜토리얼에서는 다음 시스템 및 클라이언트 라이브러리가 사용됩니다.

Java

Text-to-Speech용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Text-to-Speech 클라이언트 라이브러리를 참조하세요. 자세한 내용은 Text-to-Speech Java API 참고 문서를 확인하세요.

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

// Imports the Google Cloud client library
import com.google.cloud.texttospeech.v1.AudioConfig;
import com.google.cloud.texttospeech.v1.AudioEncoding;
import com.google.cloud.texttospeech.v1.SsmlVoiceGender;
import com.google.cloud.texttospeech.v1.SynthesisInput;
import com.google.cloud.texttospeech.v1.SynthesizeSpeechResponse;
import com.google.cloud.texttospeech.v1.TextToSpeechClient;
import com.google.cloud.texttospeech.v1.VoiceSelectionParams;
import com.google.common.html.HtmlEscapers;
import com.google.protobuf.ByteString;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Paths;

Node.js

Text-to-Speech용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Text-to-Speech 클라이언트 라이브러리를 참조하세요. 자세한 내용은 Text-to-Speech Node.js API 참고 문서를 확인하세요.

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');

// Import other required libraries
const fs = require('fs');
//const escape = require('escape-html');
const util = require('util');

Python

Text-to-Speech용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Text-to-Speech 클라이언트 라이브러리를 참조하세요. 자세한 내용은 Text-to-Speech Python API 참고 문서를 확인하세요.

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import html

from google.cloud import texttospeech

Text-to-Speech API 사용

다음 함수는 SSML로 태그가 지정된 텍스트 문자열과 MP3 파일 이름을 사용합니다. 다음 함수는 SSML로 태그가 지정된 텍스트를 사용해서 합성 오디오를 생성합니다. 이 함수는 합성 오디오를 매개변수로 지정된 MP3 파일 이름으로 저장합니다.

전체 SSML 입력은 단일 음성으로만 읽혀질 수 있습니다. VoiceSelectionParams 객체에 음성을 설정할 수 있습니다.

Java

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * Generates synthetic audio from a String of SSML text.
 *
 * <p>Given a string of SSML text and an output file name, this function calls the Text-to-Speech
 * API. The API returns a synthetic audio version of the text, formatted according to the SSML
 * commands. This function saves the synthetic audio to the designated output file.
 *
 * @param ssmlText String of tagged SSML text
 * @param outFile String name of file under which to save audio output
 * @throws Exception on errors while closing the client
 */
public static void ssmlToAudio(String ssmlText, String outFile) throws Exception {
  // Instantiates a client
  try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
    // Set the ssml text input to synthesize
    SynthesisInput input = SynthesisInput.newBuilder().setSsml(ssmlText).build();

    // Build the voice request, select the language code ("en-US") and
    // the ssml voice gender ("male")
    VoiceSelectionParams voice =
        VoiceSelectionParams.newBuilder()
            .setLanguageCode("en-US")
            .setSsmlGender(SsmlVoiceGender.MALE)
            .build();

    // Select the audio file type
    AudioConfig audioConfig =
        AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();

    // Perform the text-to-speech request on the text input with the selected voice parameters and
    // audio file type
    SynthesizeSpeechResponse response =
        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

    // Get the audio contents from the response
    ByteString audioContents = response.getAudioContent();

    // Write the response to the output file
    try (OutputStream out = new FileOutputStream(outFile)) {
      out.write(audioContents.toByteArray());
      System.out.println("Audio content written to file " + outFile);
    }
  }
}

Node.js

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * Generates synthetic audio from a String of SSML text.
 *
 * Given a string of SSML text and an output file name, this function
 * calls the Text-to-Speech API. The API returns a synthetic audio
 * version of the text, formatted according to the SSML commands. This
 * function saves the synthetic audio to the designated output file.
 *
 * ARGS
 * ssmlText: String of tagged SSML text
 * outfile: String name of file under which to save audio output
 * RETURNS
 * nothing
 *
 */
async function ssmlToAudio(ssmlText, outFile) {
  // Creates a client
  const client = new textToSpeech.TextToSpeechClient();

  // Constructs the request
  const request = {
    // Select the text to synthesize
    input: {ssml: ssmlText},
    // Select the language and SSML Voice Gender (optional)
    voice: {languageCode: 'en-US', ssmlGender: 'MALE'},
    // Select the type of audio encoding
    audioConfig: {audioEncoding: 'MP3'},
  };

  // Performs the Text-to-Speech request
  const [response] = await client.synthesizeSpeech(request);
  // Write the binary audio content to a local file
  const writeFile = util.promisify(fs.writeFile);
  await writeFile(outFile, response.audioContent, 'binary');
  console.log('Audio content written to file ' + outFile);
}

Python

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

def ssml_to_audio(ssml_text, outfile):
    # Generates SSML text from plaintext.
    #
    # Given a string of SSML text and an output file name, this function
    # calls the Text-to-Speech API. The API returns a synthetic audio
    # version of the text, formatted according to the SSML commands. This
    # function saves the synthetic audio to the designated output file.
    #
    # Args:
    # ssml_text: string of SSML text
    # outfile: string name of file under which to save audio output
    #
    # Returns:
    # nothing

    # Instantiates a client
    client = texttospeech.TextToSpeechClient()

    # Sets the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)

    # Builds the voice request, selects the language code ("en-US") and
    # the SSML voice gender ("MALE")
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
    )

    # Selects the type of audio file to return
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    # Performs the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Writes the synthetic audio to the output file.
    with open(outfile, "wb") as out:
        out.write(response.audio_content)
        print("Audio content written to file " + outfile)

합성 오디오 맞춤설정

다음 함수는 텍스트 파일의 이름을 사용해서 파일 내용을 SSML로 태그 지정된 텍스트 문자열로 변환합니다.

Java

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * Generates SSML text from plaintext.
 *
 * <p>Given an input filename, this function converts the contents of the input text file into a
 * String of tagged SSML text. This function formats the SSML String so that, when synthesized,
 * the synthetic audio will pause for two seconds between each line of the text file. This
 * function also handles special text characters which might interfere with SSML commands.
 *
 * @param inputFile String name of plaintext file
 * @return a String of SSML text based on plaintext input.
 * @throws IOException on files that don't exist
 */
public static String textToSsml(String inputFile) throws Exception {

  // Read lines of input file
  String rawLines = new String(Files.readAllBytes(Paths.get(inputFile)));

  // Replace special characters with HTML Ampersand Character Codes
  // These codes prevent the API from confusing text with SSML tags
  // For example, '<' --> '&lt;' and '&' --> '&amp;'
  String escapedLines = HtmlEscapers.htmlEscaper().escape(rawLines);

  // Convert plaintext to SSML
  // Tag SSML so that there is a 2 second pause between each address
  String expandedNewline = escapedLines.replaceAll("\\n", "\n<break time='2s'/>");
  String ssml = "<speak>" + expandedNewline + "</speak>";

  // Return the concatenated String of SSML
  return ssml;
}

Node.js

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * Generates SSML text from plaintext.
 *
 * Given an input filename, this function converts the contents of the input text file
 * into a String of tagged SSML text. This function formats the SSML String so that,
 * when synthesized, the synthetic audio will pause for two seconds between each line
 * of the text file. This function also handles special text characters which might
 * interfere with SSML commands.
 *
 * ARGS
 * inputfile: String name of plaintext file
 * RETURNS
 * a String of SSML text based on plaintext input
 *
 */
function textToSsml(inputFile) {
  let rawLines = '';
  // Read input file
  try {
    rawLines = fs.readFileSync(inputFile, 'utf8');
  } catch (e) {
    console.log('Error:', e.stack);
    return;
  }

  // Replace special characters with HTML Ampersand Character Codes
  // These codes prevent the API from confusing text with SSML tags
  // For example, '<' --> '&lt;' and '&' --> '&amp;'
  let escapedLines = rawLines;
  escapedLines = escapedLines.replace(/&/g, '&amp;');
  escapedLines = escapedLines.replace(/"/g, '&quot;');
  escapedLines = escapedLines.replace(/</g, '&lt;');
  escapedLines = escapedLines.replace(/>/g, '&gt;');

  // Convert plaintext to SSML
  // Tag SSML so that there is a 2 second pause between each address
  const expandedNewline = escapedLines.replace(/\n/g, '\n<break time="2s"/>');
  const ssml = '<speak>' + expandedNewline + '</speak>';

  // Return the concatenated String of SSML
  return ssml;
}

Python

Text-to-Speech에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

def text_to_ssml(inputfile):
    # Generates SSML text from plaintext.
    # Given an input filename, this function converts the contents of the text
    # file into a string of formatted SSML text. This function formats the SSML
    # string so that, when synthesized, the synthetic audio will pause for two
    # seconds between each line of the text file. This function also handles
    # special text characters which might interfere with SSML commands.
    #
    # Args:
    # inputfile: string name of plaintext file
    #
    # Returns:
    # A string of SSML text based on plaintext input

    # Parses lines of input file
    with open(inputfile) as f:
        raw_lines = f.read()

    # Replace special characters with HTML Ampersand Character Codes
    # These Codes prevent the API from confusing text with
    # SSML commands
    # For example, '<' --> '&lt;' and '&' --> '&amp;'

    escaped_lines = html.escape(raw_lines)

    # Convert plaintext to SSML
    # Wait two seconds between each address
    ssml = "<speak>{}</speak>".format(
        escaped_lines.replace("\n", '\n<break time="2s"/>')
    )

    # Return the concatenated string of ssml script
    return ssml

종합해보기

이 프로그램은 다음 입력을 사용합니다.

123 Street Ln, Small Town, IL 12345 USA
1 Jenny St & Number St, Tutone City, CA 86753
1 Piazza del Fibonacci, 12358 Pisa, Italy

위 텍스트를 text_to_ssml()로 전달하면 다음 태그 지정된 텍스트가 생성됩니다.

<speak>123 Street Ln, Small Town, IL 12345 USA
<break time="2s"/>1 Jenny St &amp; Number St, Tutone City, CA 86753
<break time="2s"/>1 Piazza del Fibonacci, 12358 Pisa, Italy
<break time="2s"/></speak>

코드 실행

합성 음성의 오디오 파일을 생성하려면 명령줄에서 다음 코드를 실행합니다.

자바

Linux 또는 MacOS

java-docs-samples/texttospeech/cloud-client/ 디렉터리에서 명령줄로 다음 명령어를 실행합니다.

$ mvn clean package

Windows

java-docs-samples/texttospeech/cloud-client/ 디렉터리에서 명령줄로 다음 명령어를 실행합니다.

$ mvn clean package

Node.js

Linux 또는 MacOS

hybridGlossaries.js 파일에서 TODO (developer) 주석 처리된 변수의 주석 처리를 되돌립니다.

다음 명령어에서 projectId를 Google Cloud 프로젝트 ID로 바꿉니다. nodejs-docs-samples/texttospeech 디렉터리에서 명령줄로 다음 명령어를 실행합니다.

$ node ssmlAddresses.js projectId

Windows

hybridGlossaries.js 파일에서 TODO (developer) 주석 처리된 변수의 주석 처리를 되돌립니다.

다음 명령어에서 projectId를 Google Cloud 프로젝트 ID로 바꿉니다. nodejs-docs-samples/texttospeech 디렉터리에서 명령줄로 다음 명령어를 실행합니다.

$env: C:/Node.js/node.exe C: ssmlAddresses.js projectId

Python

Linux 또는 MacOS

python-docs-samples/texttospeech/snippets 디렉터리에서 명령줄로 다음 명령어를 실행합니다.

$ python ssml_addresses.py

Windows

python-docs-samples/texttospeech/snippets 디렉터리에서 명령줄로 다음 명령어를 실행합니다.

$env: C:/Python3/python.exe C: ssml_addresses.py

출력 확인

이 프로그램은 합성 음성의 example.mp3 오디오 파일을 출력합니다.

자바

java-docs-samples/texttospeech/cloud-client/resources/ 디렉터리로 이동합니다.

resources 디렉터리에서 example.mp3 파일을 확인합니다.

Node.js

nodejs-docs-samples/texttospeech/resources/ 디렉터리로 이동합니다.

resources 디렉터리에서 example.mp3 파일을 확인합니다.

Python

python-docs-samples/texttospeech/snippets/resources로 이동합니다.

resources 디렉터리에서 example.mp3 파일을 확인합니다.

다음 오디오 클립을 듣고 example.mp3 파일이 동일하게 소리나는지 확인합니다.

문제 해결

명령줄에서 GOOGLE_APPLICATION_CREDENTIALS 환경 변수 설정을 잊으면 오류 메시지가 발생합니다.
```
The Application Default Credentials are not available.
```
존재하지 않는 파일의 이름을 text_to_ssml()에 전달하면 오류 메시지가 발생합니다.
```
IOError: [Errno 2] No such file or directory
```
ssml_to_audio()에 None이 포함된 ssml_text 매개변수를 전달하면 오류 메시지가 발생합니다.
```
InvalidArgument: 400 Invalid input type. Type has to be text or SSML
```
코드를 실행 중인 위치가 올바른 디렉터리인지 확인합니다.

다음 단계

다른 SSML 태그 살펴보기
Translation 및 Vision에 SSML 사용 방법 알아보기

삭제

이 튜토리얼에서 사용된 리소스 비용이 Google Cloud Platform 계정에 청구되지 않도록 하려면 Google Cloud Console을 사용하여 필요하지 않은 프로젝트를 삭제하세요.

프로젝트 삭제

Google Cloud 콘솔에서 프로젝트 페이지로 이동합니다.
프로젝트 목록에서 삭제할 프로젝트를 선택하고 삭제를 클릭합니다.
대화상자에서 프로젝트 ID를 입력하고 종료를 클릭하여 프로젝트를 삭제합니다.