Chirp 3 스크립트 작성: 다국어 지원 정확성 향상

Chirp 3는 피드백과 경험을 기반으로 사용자 요구사항을 충족하도록 설계된 최신 세대의 Google 다국어 자동 음성 인식(ASR) 전용 생성 모델입니다. Chirp 3는 이전 Chirp 모델보다 정확성과 속도가 향상되었으며 분할 및 자동 언어 감지를 제공합니다.

모델 세부정보

Chirp 3: Transcription은 Speech-to-Text API V2에서만 사용할 수 있습니다.

모델 식별자

Google Cloud 콘솔에서 API 또는 모델 이름을 사용할 때 인식 요청에 적절한 모델 식별자를 지정하여 Chirp 3: Transcription을 다른 모델과 동일하게 사용할 수 있습니다. 인식에서 적절한 식별자를 지정합니다.

모델	모델 식별자
Chirp 3	`chirp_3`

API 메서드

모든 인식 방법이 동일한 언어 세트를 지원하지 않습니다. Chirp 3는 Speech-to-Text API V2에서 사용할 수 있으므로 다음 인식 방법을 지원합니다.

API 버전	API 메서드	지원
V2	Speech.StreamingRecognize (스트리밍 및 실시간 오디오에 적합)	지원됨
V2	Speech.Recognize(1분 미만의 오디오에 적합)	지원됨
V2	Speech.BatchRecognize(1분~1시간의 긴 오디오에 적합)	지원됨

사용 가능한 리전

Chirp 3는 다음 Google Cloud 리전에서 사용할 수 있으며 더 많은 리전이 계획되어 있습니다.

Google Cloud 영역	출시 준비
`us(multi-region)`	GA
`eu(multi-region)`	GA
`asia-southeast1`	GA
`asia-northeast1`	GA

여기에 설명된 대로 Location API를 사용하여 각 스크립트 작성 모델에 대해 지원되는 Google Cloud 리전, 언어, 기능의 최신 목록을 확인할 수 있습니다.

스크립트 작성 지원 언어

Chirp 3는 다음 언어로 StreamingRecognize, Recognize, BatchRecognize의 스크립트 작성을 지원합니다.

언어	`BCP-47 Code`	출시 준비
카탈로니아어(스페인)	`ca-ES`	GA
중국어(간체, 중국)	`cmn-Hans-CN`	GA
크로아티아어(크로아티아)	`hr-HR`	GA
덴마크어(덴마크)	`da-DK`	GA
네덜란드어(네덜란드)	`nl-NL`	GA
영어(호주)	`en-AU`	GA
영어(영국)	`en-GB`	GA
영어(인도)	`en-IN`	GA
영어(미국)	`en-US`	GA
핀란드어(핀란드)	`fi-FI`	GA
프랑스어(캐나다)	`fr-CA`	GA
프랑스어(프랑스)	`fr-FR`	GA
독일어(독일)	`de-DE`	GA
그리스어(그리스)	`el-GR`	GA
힌디어(인도)	`hi-IN`	GA
이탈리아어(이탈리아)	`it-IT`	GA
일본어(일본)	`ja-JP`	GA
한국어(대한민국)	`ko-KR`	GA
폴란드어(폴란드)	`pl-PL`	GA
포르투갈어(브라질)	`pt-BR`	GA
포르투갈어(포르투갈)	`pt-PT`	GA
루마니아어(루마니아)	`ro-RO`	GA
러시아어(러시아)	`ru-RU`	GA
스페인어(스페인)	`es-ES`	GA
스페인어(미국)	`es-US`	GA
스웨덴어(스웨덴)	`sv-SE`	GA
터키어(터키)	`tr-TR`	GA
우크라이나어(우크라이나)	`uk-UA`	GA
베트남어(베트남)	`vi-VN`	GA
아랍어	`ar-XA`	프리뷰
아랍어(알제리)	`ar-DZ`	프리뷰
아랍어(바레인)	`ar-BH`	프리뷰
아랍어(이집트)	`ar-EG`	프리뷰
아랍어(이스라엘)	`ar-IL`	프리뷰
아랍어(요르단)	`ar-JO`	프리뷰
아랍어(쿠웨이트)	`ar-KW`	프리뷰
아랍어(레바논)	`ar-LB`	프리뷰
아랍어(모리타니)	`ar-MR`	프리뷰
아랍어(모로코)	`ar-MA`	프리뷰
아랍어(오만)	`ar-OM`	프리뷰
아랍어(카타르)	`ar-QA`	프리뷰
아랍어(사우디아라비아)	`ar-SA`	프리뷰
아랍어(팔레스타인)	`ar-PS`	프리뷰
아랍어(시리아)	`ar-SY`	프리뷰
아랍어(튀니지)	`ar-TN`	프리뷰
아랍어(아랍에미리트)	`ar-AE`	프리뷰
아랍어(예멘)	`ar-YE`	프리뷰
아르메니아어(아르메니아)	`hy-AM`	프리뷰
벵골어(방글라데시)	`bn-BD`	프리뷰
벵골어(인도)	`bn-IN`	프리뷰
불가리아어(불가리아)	`bg-BG`	프리뷰
버마어(미얀마)	`my-MM`	프리뷰
소라니 쿠르드어(이라크)	`ar-IQ`	프리뷰
중국어, 광둥어(번체, 홍콩)	`yue-Hant-HK`	프리뷰
중국어, 북경어(번체, 타이완)	`cmn-Hant-TW`	프리뷰
체코어(체코)	`cs-CZ`	프리뷰
영어(필리핀)	`en-PH`	프리뷰
에스토니아어(에스토니아)	`et-EE`	프리뷰
필리핀어(필리핀)	`fil-PH`	프리뷰
구자라트어(인도)	`gu-IN`	프리뷰
히브리어(이스라엘)	`iw-IL`	프리뷰
헝가리어(헝가리)	`hu-HU`	프리뷰
인도네시아어(인도네시아)	`id-ID`	프리뷰
칸나다어(인도)	`kn-IN`	프리뷰
크메르어(캄보디아)	`km-KH`	프리뷰
라오스어(라오스)	`lo-LA`	프리뷰
라트비아어(라트비아)	`lv-LV`	프리뷰
리투아니아어(리투아니아)	`lt-LT`	프리뷰
말레이어(말레이시아)	`ms-MY`	프리뷰
말라얄람어(인도)	`ml-IN`	프리뷰
마라티어(인도)	`mr-IN`	프리뷰
네팔어(네팔)	`ne-NP`	프리뷰
노르웨이어(노르웨이)	`no-NO`	프리뷰
페르시아어(이란)	`fa-IR`	프리뷰
세르비아어(세르비아)	`sr-RS`	프리뷰
슬로바키아어(슬로바키아)	`sk-SK`	프리뷰
슬로베니아어(슬로베니아)	`sl-SI`	프리뷰
스페인어(멕시코)	`es-MX`	프리뷰
스와힐리어	`sw`	프리뷰
타밀어(인도)	`ta-IN`	프리뷰
텔루구어(인도)	`te-IN`	프리뷰
태국어(태국)	`th-TH`	프리뷰
우즈베크어(우즈베키스탄)	`uz-UZ`	프리뷰

화자 분할 지원 언어

Chirp 3는 다음 언어로 BatchRecognize 및 Recognize의 스크립트 작성 및 분할을 지원합니다.

언어	BCP-47 코드
중국어(간체, 중국)	cmn-Hans-CN
독일어(독일)	de-DE
영어(영국)	en-GB
영어(인도)	en-IN
영어(미국)	en-US
스페인어(스페인)	es-ES
스페인어(미국)	es-US
프랑스어(캐나다)	fr-CA
프랑스어(프랑스)	fr-FR
힌디어(인도)	hi-IN
이탈리아어(이탈리아)	it-IT
일본어(일본)	ja-JP
한국어(대한민국)	ko-KR
포르투갈어(브라질)	pt-BR

기능 지원 및 제한사항

Chirp 3는 다음 기능을 지원합니다.

기능	설명	출시 단계
자동 구두점	모델에 의해 자동으로 생성되며 원하는 경우 사용 중지할 수 있습니다.	GA
자동 대문자 사용	모델에 의해 자동으로 생성되며 원하는 경우 사용 중지할 수 있습니다.	GA
발화 수준 타임스탬프	모델이 자동으로 생성합니다.	GA
화자 분할	단일 채널 오디오 샘플에서 여러 화자를 자동으로 식별합니다. `BatchRecognize`에서만 제공됩니다.	GA
음성 적응(편향)	특정 용어 또는 고유명사의 인식 정확도를 높이기 위해 문구 또는 단어 형식으로 모델에 힌트를 제공합니다.	GA
언어 제약이 없는 오디오 스크립트 작성	자동으로 추론하여 가장 흔한 언어로 스크립트를 작성합니다.	GA

Chirp 3는 다음 기능을 지원하지 않습니다.

기능	설명
단어 수준 타임스탬프	모델이 자동으로 생성하고 원하는 경우 사용 설정할 수 있습니다. 이 경우 일부 스크립트 작성 품질 저하가 예상됩니다.
단어 수준의 신뢰도 점수	API는 값을 반환하지만 실제 신뢰도 점수는 아닙니다.

Chirp 3를 사용하여 스크립트 작성

스크립트 작성 작업에 Chirp 3를 사용하는 방법을 알아보세요.

스트리밍 음성 인식 수행

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_streaming_chirp3(
   audio_file: str
) -> cloud_speech.StreamingRecognizeResponse:
   """Transcribes audio from audio file stream using the Chirp 3 model of Google Cloud Speech-to-Text v2 API.

   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"

   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API V2 containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       content = f.read()

   # In practice, stream should be a generator yielding chunks of audio data
   chunk_length = len(content) // 5
   stream = [
       content[start : start + chunk_length]
       for start in range(0, len(content), chunk_length)
   ]
   audio_requests = (
       cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream
   )

   recognition_config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )
   streaming_config = cloud_speech.StreamingRecognitionConfig(
       config=recognition_config
   )
   config_request = cloud_speech.StreamingRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       streaming_config=streaming_config,
   )

   def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
       yield config
       yield from audio

   # Transcribes the audio into text
   responses_iterator = client.streaming_recognize(
       requests=requests(config_request, audio_requests)
   )
   responses = []
   for response in responses_iterator:
       responses.append(response)
       for result in response.results:
           print(f"Transcript: {result.alternatives[0].transcript}")

   return responses

동기 음성 인식 수행

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp3(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

일괄 음성 인식 수행

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_3(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text v2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input audio file.
           E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Transcribes the audio into text
   operation = client.batch_recognize(request=request)

   print("Waiting for operation to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response.results[audio_uri].transcript

Chirp 3 기능 사용

코드 예시를 통해 최신 기능을 사용하는 방법을 알아보세요.

언어 제약이 없는 스크립트 작성 수행

Chirp 3는 다국어 애플리케이션에 필수적인 오디오에서 사용되는 주요 언어를 자동으로 식별하고 스크립트를 작성할 수 있습니다. 이를 위해 코드 예시에 표시된 대로 language_codes=["auto"]를 설정합니다.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp3_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 3.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["auto"],  # Set language code to auto to detect language.
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

언어 제한이 있는 스크립트 작성 수행

Chirp 3는 오디오 파일에서 주로 사용되는 언어를 자동으로 식별하고 스크립트를 작성할 수 있습니다. 또한 예상되는 특정 언어(예: ["en-US", "fr-FR"])를 기반으로 조건을 지정할 수도 있습니다. 이렇게 하면 코드 예시에 표시된 것처럼 모델의 리소스가 가장 가능성이 높은 언어에 집중되어 더 신뢰할 수 있는 결과를 얻을 수 있습니다.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_3_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 3.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US", "fr-FR"],  # Set language codes of the expected spoken locales
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

스크립트 작성 및 화자 분할 수행

스크립트 작성 및 화자 분할 작업에 Chirp 3을 사용합니다.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_chirp3(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input
         audio file. E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the
         Speech-to-Text API containing the transcription results.
   """

   # Instantiates a client.
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],  # Use "auto" to detect language.
       model="chirp_3",
       features=cloud_speech.RecognitionFeatures(
           # Enable diarization by setting empty diarization configuration.
           diarization_config=cloud_speech.SpeakerDiarizationConfig(),
       ),
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Creates audio transcription job.
   operation = client.batch_recognize(request=request)

   print("Waiting for transcription job to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")
       print(f"Speakers per word: {result.alternatives[0].words}")

   return response.results[audio_uri].transcript

모델 적응으로 정확도 개선

Chirp 3는 모델 적응을 사용하여 특정 오디오의 스크립트 작성 정확도를 향상할 수 있습니다. 이렇게 하면 특정 단어와 구문의 목록이 제공되므로 모델이 이를 인식할 가능성이 높아집니다. 특히 도메인별 용어, 고유명사 또는 고유한 어휘에 유용합니다.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp3_model_adaptation(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model with adaptation, improving accuracy for specific audio characteristics or vocabulary.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       # Use model adaptation
       adaptation=cloud_speech.SpeechAdaptation(
         phrase_sets=[
             cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                 inline_phrase_set=cloud_speech.PhraseSet(phrases=[
                   {
                       "value": "alphabet",
                   },
                   {
                         "value": "cell phone service",
                   }
                 ])
             )
         ]
       )
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

잡음 제거 및 SNR 필터링 사용 설정

Chirp 3는 스크립트 작성 전에 배경 소음을 줄이고 원치 않는 소리를 필터링하여 오디오 품질을 향상할 수 있습니다. 잡음이 많은 환경에서 내장된 잡음 제거 및 신호 대 노이즈 비율(SNR) 필터링을 사용 설정하면 결과를 개선할 수 있습니다.

denoiser_audio=true를 설정하면 배경 음악이나 빗소리, 거리 교통 소음과 같은 소음을 효과적으로 줄이는 데 도움이 됩니다.

snr_threshold=X를 설정하여 스크립트 작성을 위해 필요한 음성의 최소 음량을 제어할 수 있습니다. 이렇게 하면 음성이 아닌 오디오나 배경 소음을 필터링하여 결과에 원치 않는 텍스트가 표시되지 않도록 할 수 있습니다. snr_threshold가 높을수록 모델이 발화를 스크립트로 작성하려면 사용자가 더 크게 말해야 합니다.

SNR 필터링은 실시간 스트리밍 사용 사례에서 불필요한 소리를 스크립트 작성 모델에 전송하지 않도록 하는 데 사용할 수 있습니다. 이 설정의 값이 높을수록 스크립트 작성 모델로 전송되는 음성 볼륨이 배경 소음에 비해 더 커야 합니다.

snr_threshold 구성은 denoise_audio가 true인지 false인지와 상호작용합니다. denoise_audio=true를 사용하면 배경 소음이 제거되고 음성이 비교적 선명해집니다. 오디오의 전체 SNR이 올라갑니다.

사용 사례에 다른 사람이 말하지 않고 사용자의 음성만 포함되는 경우 denoise_audio=true를 설정하여 SNR 필터링의 감도를 높여 음성이 아닌 소음을 필터링할 수 있습니다. 사용 사례에 배경에서 말하는 사람이 포함되어 있고 배경 음성을 스크립트 작성하지 않으려면 denoise_audio=false를 설정하고 SNR 기준점을 낮추는 것이 좋습니다.

다음은 권장되는 SNR 기준점 값입니다. 적절한 snr_threshold 값은 0~1000에서 설정할 수 있습니다. 0 값은 아무것도 필터링하지 않음을 의미하고 1000 값은 모든 것을 필터링함을 의미합니다. 권장 설정이 작동하지 않으면 값을 미세 조정하세요.

잡음 제거	SNR 기준점	음성 민감도
참	10.0	많음
참	20.0	중간
참	40.0	적음
참	100.0	매우 낮음
거짓	0.5	많음
거짓	1.0	중간
거짓	2.0	적음
거짓	5.0	매우 낮음

Python

 import os

 from google.cloud.speech_v2 import SpeechClient
 from google.cloud.speech_v2.types import cloud_speech
 from google.api_core.client_options import ClientOptions

 PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
 REGION = "us"

def transcribe_sync_chirp3_with_timestamps(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text v2 API, which provides word-level timestamps for each transcribed word.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       denoiser_config={
           denoise_audio: True,
           # Medium snr threshold
           snr_threshold: 20.0,
       }
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

Google Cloud 콘솔에서 Chirp 3 사용

Google Cloud 계정에 가입하고 프로젝트를 만듭니다.
Google Cloud 콘솔에서 음성으로 이동합니다.
API가 사용 설정되지 않은 경우 API를 사용 설정합니다.
STT 콘솔 작업공간이 있는지 확인합니다. 작업공간이 없으면 작업공간을 만들어야 합니다.
1. 스크립트 작성 페이지로 이동하여 새 스크립트 작성을 클릭합니다.
2. 작업공간 드롭다운을 열고 새 작업공간을 클릭하여 스크립트 작성용 작업공간을 만듭니다.
3. 새 작업공간 만들기 탐색 사이드바에서 찾아보기를 클릭합니다.
4. 클릭하여 새 버킷을 만듭니다.
5. 버킷 이름을 입력하고 계속을 클릭합니다.
6. 만들기를 클릭하여 Cloud Storage 버킷을 만듭니다.
7. 버킷을 만든 후 선택을 클릭하여 사용할 버킷을 선택합니다.
8. 만들기를 클릭하여 Speech-to-Text API V2 콘솔의 작업공간 만들기를 완료합니다.
실제 오디오에서 스크립트 작성을 수행합니다.

새 스크립트 작성 페이지에서 업로드(로컬 업로드) 또는 기존 Cloud Storage 파일(Cloud Storage)을 지정하여 오디오 파일을 선택합니다.
계속을 클릭하여 스크립트 작성 옵션으로 이동합니다.
1. 이전에 만든 인식기에서 Chirp의 인식에 사용할 음성 언어를 선택합니다.
2. 모델 드롭다운에서 chirp_3를 선택합니다.
3. 인식기 드롭다운에서 새로 만든 인식기를 선택합니다.
4. 제출을 클릭하여 chirp_3을 사용한 첫 번째 인식 요청을 실행합니다.
Chirp 3 스크립트 작성 결과를 확인합니다.
1. 스크립트 작성 페이지에서 스크립트 작성 이름을 클릭하여 결과를 확인합니다.
2. 스크립트 작성 세부정보 페이지에서 스크립트 작성 결과를 확인하고 원하는 경우 브라우저에서 오디오를 재생합니다.

다음 단계

짧은 오디오 파일의 텍스트 변환 방법 알아보기

스트리밍 오디오의 텍스트 변환 방법 알아보기

긴 오디오 파일의 텍스트 변환 방법 알아보기

권장사항 문서에서 최상의 성능, 정확도, 기타 팁 참조하기