从 Speech-to-Text v1 迁移到 v2

Speech-to-Text API v2 采用最新的 Google Cloud API 设计，能够直接用于满足客户的企业安全和监管要求。

这些要求通过以下功能实现：

数据驻留：Speech-to-Text v2 在 Google Cloud 区域（例如比利时或新加坡）中提供丰富的现有转写模型。您可以通过完全区域化的服务调用我们的转写模型。
识别器资源性：识别器是可重复使用的识别配置，可以包含模型、语言和功能的组合。
日志记录：资源创建和转写会生成可在 Google Cloud 控制台中查看的日志，帮助您更好地进行遥测和调试。
加密：Speech-to-Text v2 支持为所有资源以及批量转写使用客户管理的加密密钥。
音频自动检测：Speech-to-Text v2 可以自动检测音频文件的采样率、通道数和格式，您无需在请求配置中提供该信息。

从 v1 迁移到 v2

从 v1 API 到 v2 API 的迁移不是自动进行的。您只需进行少量实现更改即可利用此功能集。

在 API 中迁移

与 Speech-to-Text v1 类似，如需转写音频，您需要通过选择音频语言和识别模型来创建 RecognitionConfig：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def quickstart_v2(audio_file: str) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
    Returns:
        cloud_speech.RecognizeResponse: The response from the recognize request, containing
        the transcription results
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

如有需要，请选择一个区域以在其中使用 Speech-to-Text API，并检查该区域中的语言和模型可用性：

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def change_speech_v2_location(
    audio_file: str, location: str
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file in a specific region. It allows for specifying the location
        to potentially reduce latency and meet data residency requirements.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        location (str): The region where the Speech API will be accessed.
            E.g., "europe-west3"
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client to a regionalized Speech endpoint.
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-speech.googleapis.com",
        )
    )

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/{location}/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
    return response

（可选）如果您需要在多个转写请求中重复使用特定的识别配置，请创建识别器资源：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def create_recognizer(recognizer_id: str) -> cloud_speech.Recognizer:
    """Сreates a recognizer with an unique ID and default recognition configuration.
    Args:
        recognizer_id (str): The unique identifier for the recognizer to be created.
    Returns:
        cloud_speech.Recognizer: The created recognizer object with configuration.
    """
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{PROJECT_ID}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            default_recognition_config=cloud_speech.RecognitionConfig(
                language_codes=["en-US"], model="long"
            ),
        ),
    )
    # Sends the request to create a recognizer and waits for the operation to complete
    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    print("Created Recognizer:", recognizer.name)
    return recognizer

新的 v2 API 在请求和响应方面还有其他差异。如需了解详情，请参阅参考文档。

在界面中迁移

如需通过 Speech Google Cloud 控制台迁移，请按以下步骤操作：

前往 Speech Google Cloud 控制台。
打开转写页面。
点击新建转写，然后在音频配置标签页中选择音频。
在转写选项标签页中，选择 V2。

后续步骤

使用客户端库通过您喜爱的编程语言转写音频。

了解如何转写短音频文件。

了解如何转写流式传输音频。

了解如何转录长音频文件。