本頁面由 Cloud Translation API 翻譯而成。

轉錄音訊內容

Vertex AI on Google Distributed Cloud (GDC) air-gapped 的語音轉文字服務可辨識音訊檔案中的語音。語音轉文字會使用預先訓練的 API，將偵測到的音訊轉換為文字轉錄稿。

Speech-to-Text 包含 Chirp，這是一種進階語音模型，以數百萬小時的音訊資料和數十億個文句訓練而成。這項模型與傳統語音辨識技術不同，著重在大量特定語言的監督式資料。這些技術為使用者提供改善的辨識和語音轉錄功能，可用於辨識更多語言和口音。

本頁說明如何使用 Distributed Cloud 上的 Speech-to-Text API，將音訊檔案轉錄為文字。

事前準備

您必須擁有已啟用 Speech-to-Text API 的專案，並具備適當的憑證，才能開始使用 Speech-to-Text API。您也可以安裝用戶端程式庫，協助呼叫 API。詳情請參閱「設定語音辨識專案」。

使用預設模型轉錄音訊

Speech-to-Text 會執行語音辨識，您直接將要辨識語音的音訊檔案做為 API 要求的內容傳送。系統會在 API 回應中傳回轉錄結果。

提出語音辨識要求時，您必須提供 RecognitionConfig 設定物件。這個物件會告知 API 如何處理音訊資料，以及您預期的輸出類型。如果這個設定物件中未明確指定模型，Speech-to-Text 會選取預設模型。

詳情請參閱 Speech API 說明文件。

以下範例使用預設的 Speech-to-Text 模型，轉錄音訊檔案中的語音：

Python

請按照下列步驟操作，透過 Python 指令碼使用 Speech-to-Text 服務，轉錄音訊檔案中的語音：

安裝最新版本的 Speech-to-Text 用戶端程式庫。
在 Python 指令碼中設定必要的環境變數。
驗證 API 要求。

在您建立的 Python 指令碼中新增下列程式碼：

import base64

from google.cloud import speech_v1p1beta1
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

audience="https://ENDPOINT:443"
api_endpoint="ENDPOINT:443"

def get_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  return speech_v1p1beta1.SpeechClient(credentials=creds, client_options=opts)

def main():
  creds = None
  try:
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(audience)
    req = requests.Request()
    creds.refresh(req)
    print("Got token: ")
    print(creds.token)
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
  return creds

def speech_func(creds):
  tc = get_client(creds)

  content="BASE64_ENCODED_AUDIO"

  audio = speech_v1p1beta1.RecognitionAudio()
  audio.content = base64.standard_b64decode(content)
  config = speech_v1p1beta1.RecognitionConfig()
  config.encoding= speech_v1p1beta1.RecognitionConfig.AudioEncoding.ENCODING
  config.sample_rate_hertz=RATE_HERTZ
  config.language_code="LANGUAGE_CODE"
  config.audio_channel_count=CHANNEL_COUNT

  metadata = [("x-goog-user-project", "projects/PROJECT_ID")]
  resp = tc.recognize(config=config, audio=audio, metadata=metadata)
  print(resp)

if __name__=="__main__":
  creds = main()
  speech_func(creds)

更改下列內容：

ENDPOINT：貴機構使用的 Speech-to-Text 端點。詳情請參閱服務狀態和端點。
PROJECT_ID：您的專案 ID。
BASE64_ENCODED_AUDIO：以 Base64 表示法編碼的音訊資料位元組。這個字串開頭的字元與 ZkxhQwAAACIQABAAAAUJABtAA+gA8AB+W8FZndQvQAyjv 類似。詳情請參閱 RecognitionAudio。
ENCODING：要求中傳送的音訊資料編碼，例如 LINEAR16。詳情請參閱 AudioEncoding。
RATE_HERTZ：要求中傳送的音訊資料取樣率 (赫茲)，例如 16000。詳情請參閱 RecognitionConfig。
LANGUAGE_CODE：以 BCP-47 語言標記形式提供的音訊語言。請參閱支援的語言清單和對應的語言代碼。
CHANNEL_COUNT：輸入音訊資料中的聲道數量，例如 1。詳情請參閱 RecognitionConfig。

儲存 Python 指令碼。
執行 Python 指令碼來轉錄音訊：
```
python SCRIPT_NAME
```
將 SCRIPT_NAME 替換為您為 Python 指令碼指定的名稱，例如 speech.py。

使用 Chirp 轉錄音訊

與 Speech-to-Text 預設模型類似，發出語音辨識要求時，您必須提供 RecognitionConfig 設定物件。如要使用 Chirp，您必須在這個設定物件中明確指定這個模型，方法是在 model 欄位中設定 chirp 值。

以下範例使用 Chirp 模型轉錄音訊檔案中的語音：

Python

請按照下列步驟，透過 Python 指令碼使用 Chirp 轉錄音訊檔案中的語音：

安裝最新版本的 Speech-to-Text 用戶端程式庫。
在 Python 指令碼中設定必要的環境變數。
驗證 API 要求。

在您建立的 Python 指令碼中新增下列程式碼：

import base64

# Import the client library.
from google.cloud import speech_v1p1beta1
from google.cloud.speech_v1p1beta1.services.speech import client
from google.api_core.client_options import ClientOptions

api_endpoint="ENDPOINT:443"

def get_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  return client.SpeechClient(credentials=creds, client_options=opts)

# Specify the audio to transcribe.
tc = get_client(creds)
content = "BASE64_ENCODED_AUDIO"

audio = speech_v1p1beta1.RecognitionAudio()
audio.content = base64.standard_b64decode(content)

config = speech_v1p1beta1.RecognitionConfig(
    encoding=speech_v1p1beta1.RecognitionConfig.AudioEncoding.ENCODING,
    sample_rate_hertz=RATE_HERTZ,
    audio_channel_count=CHANNEL_COUNT,
    language_code="LANGUAGE_CODE",
    model="chirp"
)

# Detect speech in the audio file.
metadata = (("x-goog-user-project", "projects/PROJECT_ID"),)
response = tc.recognize(config=config, audio=audio, metadata=metadata)

for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

更改下列內容：

ENDPOINT：貴機構使用的 Speech-to-Text 端點。詳情請參閱服務狀態和端點。
BASE64_ENCODED_AUDIO：以 Base64 表示法編碼的音訊資料位元組。這個字串開頭的字元與 ZkxhQwAAACIQABAAAAUJABtAA+gA8AB+W8FZndQvQAyjv 類似。詳情請參閱 RecognitionAudio。
ENCODING：要求中傳送的音訊資料編碼，例如 LINEAR16。詳情請參閱 AudioEncoding。
RATE_HERTZ：要求中傳送的音訊資料取樣率 (赫茲)，例如 16000。詳情請參閱 RecognitionConfig。
CHANNEL_COUNT：輸入音訊資料中的聲道數量，例如 1。詳情請參閱 RecognitionConfig。
LANGUAGE_CODE：以 BCP-47 語言標記形式提供的音訊語言。請參閱支援的語言清單和對應的語言代碼。
PROJECT_ID：您的專案 ID。

儲存 Python 指令碼。
執行 Python 指令碼來轉錄音訊：
```
python SCRIPT_NAME
```
將 SCRIPT_NAME 替換為您為 Python 指令碼指定的名稱，例如 speech.py。

轉錄音訊內容 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

事前準備

使用預設模型轉錄音訊

Python

使用 Chirp 轉錄音訊

Python

轉錄音訊內容