긴 오디오 만들기

이 문서에서는 긴 오디오를 합성하는 프로세스를 안내합니다. 긴 오디오 합성은 입력에서 최대 100만 바이트까지 비동기식으로 합성합니다. Text-to-Speech의 기본 개념에 대한 자세한 내용은 Text-to-Speech 기본 사항을 참조하세요.

시작하기 전에

Text-to-Speech API에 요청을 보내려면 먼저 다음 작업을 완료해야 합니다. 자세한 내용은 시작하기 전에 페이지를 참조하세요.

Google Cloud 프로젝트에서 Text-to-Speech 사용 설정
1. Text-to-Speech에 결제가 사용 설정되었는지 확인하기
2. 출력 Google Cloud 버킷에 다음 Identity and Access Management(IAM) 역할이 있는지 확인합니다.
  - 스토리지 객체 제작자
  - 스토리지 객체 뷰어
Google Cloud CLI를 설치합니다. 설치 후 다음 명령어를 실행하여 Google Cloud CLI를 초기화합니다.
```
gcloud init
```
외부 ID 공급업체(IdP)를 사용하는 경우 먼저 제휴 ID로 gcloud CLI에 로그인해야 합니다.

명령줄을 사용하여 텍스트에서 긴 오디오 합성

https://texttospeech.googleapis.com/v1beta1/projects/{$project_number}/locations/global:synthesizeLongAudio 엔드포인트에 대한 HTTP POST 요청을 수행하여 긴 텍스트를 오디오로 변환할 수 있습니다. POST 명령어 본문에 다음 필드를 지정합니다.

• voice: 합성할 음성 유형입니다.

• input.text: 합성할 텍스트입니다.

• audioConfig: 만들려는 오디오 유형입니다.

• output_gcs_uri: 'gs://bucket_name/file_name.wav' 형식의 Google Cloud 출력 경로입니다.

• parent: 'projects/{YOUR_PROJECT_NUMBER}/locations/{YOUR_PROJECT_LOCATION}' 형식의 상위 항목입니다.

입력은 최대 1MB의 문자가 포함될 수 있으며, 정확한 한도는 입력에 따라 다를 수 있습니다.

합성을 실행하는 데 사용되는 프로젝트 아래에 Google Cloud 스토리지 버킷을 만듭니다. 합성을 실행하는 데 사용된 서비스 계정에 출력 Google Cloud 버킷에 대한 읽기 및 쓰기 액세스 권한이 있는지 확인합니다.

Text-to-Speech를 사용하여 텍스트에서 오디오를 합성하려면 명령줄에서 REST 요청을 실행합니다. 이 명령어는 gcloud auth application-default print-access-token 명령어를 사용하여 요청에 사용할 승인 토큰을 검색합니다.

HTTP 메서드 및 URL:

POST https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio

JSON 요청 본문:

{
  "parent": "projects/12345/locations/global",
  "audio_config":{
      "audio_encoding":"LINEAR16"
  },
  "input":{
      "text":"hello"
  },
  "voice":{
      "language_code":"en-us",
      "name":"en-us-Standard-A"
  },
  "output_gcs_uri": "gs://bucket_name/file_name.wav"
}

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio"

PowerShell(Windows)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 0,
    "startTime": "2022-12-20T00:46:56.296191037Z",
    "lastUpdateTime": "2022-12-20T00:46:56.296191037Z"
  },
  "done": false
}

REST 명령어의 JSON 출력에서는 name 필드에 장기 작업 이름이 포함됩니다. 명령줄에서 REST 요청을 실행하여 장기 실행 작업의 상태를 쿼리합니다.

GET 작업을 실행하는 서비스 계정은 합성에 사용된 것과 동일한 프로젝트의 계정인지 확인합니다.

HTTP 메서드 및 URL:
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456
```
요청을 보내려면 다음 옵션 중 하나를 펼칩니다.
cURL(Linux, macOS, Cloud Shell)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

다음 명령어를 실행합니다.
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456"
```
PowerShell(Windows)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

다음 명령어를 실행합니다.
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456" | Select-Object -Expand Content
```
다음과 비슷한 JSON 응답이 표시됩니다.
```
{
  "name": "projects/12345/locations/global/operations/23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 100
  },
  "done": true
}
```
특정 프로젝트에서 실행되는 모든 작업 목록을 쿼리하고 REST 요청을 실행합니다.

LIST 작업을 실행하는 서비스 계정이 합성에 사용된 것과 동일한 프로젝트에 속해있는지 확인합니다.

HTTP 메서드 및 URL:
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations
```
요청을 보내려면 다음 옵션 중 하나를 펼칩니다.
cURL(Linux, macOS, Cloud Shell)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

다음 명령어를 실행합니다.
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations"
```
PowerShell(Windows)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

다음 명령어를 실행합니다.
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations" | Select-Object -Expand Content
```
다음과 비슷한 JSON 응답이 표시됩니다.
```
{
  "operations": [
    {
      "name": "12345",
      "done": false
    },
    {
      "name": "23456",
      "done": false
    }
  ],
  "nextPageToken": ""
}
```
장기 실행 작업이 완료되면 output_gcs_uri 필드의 지정된 버킷 uri에서 출력 오디오 파일을 찾습니다. 작업이 성공적으로 완료되지 않으면 GET REST 명령어를 사용해 쿼리하여 오류를 찾아 수정한 후 RPC를 다시 실행합니다.

클라이언트 라이브러리를 사용하여 텍스트에서 긴 오디오 합성

긴 오디오를 합성하려면 다음 안내를 따르세요.

클라이언트 라이브러리 설치

Python

라이브러리를 설치하기 전에 Python 개발을 위한 환경이 준비됐는지 확인하세요.

pip install --upgrade google-cloud-texttospeech

오디오 데이터 만들기

Text-to-Speech를 사용하여 합성한 인간 음성의 긴 오디오 파일을 만들 수 있습니다. 다음 코드를 사용하여 Google Cloud 버킷에서 긴 오디오 파일을 만듭니다.

Python

예시를 실행하기 전에 Python 개발 환경이 준비됐는지 확인합니다.

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from google.cloud import texttospeech


def synthesize_long_audio(project_id: str, output_gcs_uri: str) -> None:
    """
    Synthesizes long input, writing the resulting audio to `output_gcs_uri`.

    Args:
        project_id: ID or number of the Google Cloud project you want to use.
        output_gcs_uri: Specifies a Cloud Storage URI for the synthesis results.
            Must be specified in the format:
            ``gs://bucket_name/object_name``, and the bucket must
            already exist.
    """

    client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

    input = texttospeech.SynthesisInput(
        text="Test input. Replace this with any text you want to synthesize, up to 1 million bytes long!"
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    )

    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", name="en-US-Standard-A"
    )

    parent = f"projects/{project_id}/locations/us-central1"

    request = texttospeech.SynthesizeLongAudioRequest(
        parent=parent,
        input=input,
        audio_config=audio_config,
        voice=voice,
        output_gcs_uri=output_gcs_uri,
    )

    operation = client.synthesize_long_audio(request=request)
    # Set a deadline for your LRO to finish. 300 seconds is reasonable, but can be adjusted depending on the length of the input.
    # If the operation times out, that likely means there was an error. In that case, inspect the error, and try again.
    result = operation.result(timeout=300)
    print(
        "\nFinished processing, check your GCS bucket to find your audio file! Printing what should be an empty result: ",
        result,
    )

삭제

불필요한 Google Cloud 요금이 청구되지 않도록 하려면Google Cloud console 을 사용하여 필요하지 않은 프로젝트를 삭제하세요.

다음 단계

기본 사항을 읽으면서 Cloud Text-to-Speech에 대해 자세히 알아보기
합성 음성에 사용 가능한 음성 목록 검토.