向 Cloud Speech-to-Text On-Prem 发送转录请求

前提条件

  1. 完成准备工作快速入门中所有必要的步骤。
  2. 部署此 API
  3. 查询 API 以确保其正常运行。

安装依赖项

  1. 克隆 python-speech 并将目录切换到示例目录。

    $ git clone https://github.com/googleapis/python-speech.git
    $ cd python-speech/samples/snippets
    
  2. 安装 pipvirtualenv(如果尚未安装的话)。如需了解详情,请参阅 Google Cloud Platform Python 开发环境设置指南

  3. 创建 virtualenv 网址。以下示例与 Python 2.7 和 3.4 及更高版本兼容。

    $ virtualenv env
    $ source env/bin/activate
    
  4. 安装运行示例所需的依赖项。

    $ pip install -r requirements.txt
    

代码示例

以下代码示例使用 google-cloud-speech 库。您可以使用 GitHub 浏览源代码报告问题

转录音频文件

您可以借助以下代码示例,使用公共 IP 或集群级层 IP 来转录音频文件。如需详细了解 IP 类型,请参阅有关查询 API 的文档。

公共 IP

    # Using a Public IP
    $ python transcribe_onprem.py --file_path="../resources/two_channel_16k.wav" --api_endpoint=${PUBLIC_IP}:443

集群级层 IP

    # Using a cluster level IP
    $ kubectl port-forward -n $NAMESPACE $POD 10000:10000
    $ python transcribe_onprem.py --file_path="../resources/two_channel_16k.wav" --api_endpoint="0.0.0.0:10000"

Python

如需向 Speech-to-Text 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

def transcribe_onprem(
    local_file_path: str,
    api_endpoint: str,
) -> speech_v1p1beta1.RecognizeResponse:
    """
    Transcribe a short audio file using synchronous speech recognition on-prem

    Args:
      local_file_path: The path to local audio file, e.g. /path/audio.wav
      api_endpoint: Endpoint to call for speech recognition, e.g. 0.0.0.0:10000

    Returns:
      The speech recognition response
          {
    """
    # api_endpoint = '0.0.0.0:10000'
    # local_file_path = '../resources/two_channel_16k.raw'

    # Create a gRPC channel to your server
    channel = grpc.insecure_channel(target=api_endpoint)
    transport = speech_v1p1beta1.services.speech.transports.SpeechGrpcTransport(
        channel=channel
    )

    client = speech_v1p1beta1.SpeechClient(transport=transport)

    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 16000

    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = speech_v1p1beta1.RecognitionConfig.AudioEncoding.LINEAR16
    config = {
        "encoding": encoding,
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(request={"config": config, "audio": audio})
    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(f"Transcript: {alternative.transcript}")

    return response