Migrating from Speech-to-Text v1 to v2

Speech-to-Text API v2 brings the latest Google Cloud API design for customers to meet enterprise security and regulatory requirements out of the box.

These requirements are realized through the following:

  • Data Residency: Speech-to-Text v2 offers the broad range of our existing transcription models in Google Cloud regions such as Belgium or Singapore. This allows the invocation of our transcription models through a fully regionalized service.

  • Recognizer Resourcefulness: Recognizers are reusable recognition configurations that can contain a combination of model, language, and features.

  • Logging: Resource creation and transcriptions generate logs available in the Google Cloud console, allowing for better telemetry and debugging.

  • Encryption: Speech-to-Text v2 supports Customer-managed encryption keys for all resources as well as batch transcription.

  • Audio Auto-Detect: Speech-to-Text v2 can automatically detect the sample rate, channel count, and format of your audio files, without needing to provide that information in the request configuration.

Migrating from v1 to v2

Migration from the v1 API to the v2 API does not happen automatically. Minimal implementation changes are required to take advantage of the feature set.

Migrating in API

Similar to Speech-to-Text v1, to transcribe audio, you need to create a RecognitionConfig by selecting the language of your audio and the recognition model of your choice:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def quickstart_v2(audio_file: str) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
    Returns:
        cloud_speech.RecognizeResponse: The response from the recognize request, containing
        the transcription results
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

If needed, select a region in which you want to use the Speech-to-Text API, and check the language and model availability in that region:

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def change_speech_v2_location(
    audio_file: str, location: str
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file in a specific region. It allows for specifying the location
        to potentially reduce latency and meet data residency requirements.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        location (str): The region where the Speech API will be accessed.
            E.g., "europe-west3"
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client to a regionalized Speech endpoint.
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-speech.googleapis.com",
        )
    )

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/{location}/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
    return response

Optionally, create a recognizer resource if you need to reuse a specific recognition configuration across many transcription requests:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def create_recognizer(recognizer_id: str) -> cloud_speech.Recognizer:
    """Сreates a recognizer with an unique ID and default recognition configuration.
    Args:
        recognizer_id (str): The unique identifier for the recognizer to be created.
    Returns:
        cloud_speech.Recognizer: The created recognizer object with configuration.
    """
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{PROJECT_ID}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            default_recognition_config=cloud_speech.RecognitionConfig(
                language_codes=["en-US"], model="long"
            ),
        ),
    )
    # Sends the request to create a recognizer and waits for the operation to complete
    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    print("Created Recognizer:", recognizer.name)
    return recognizer

There are other differences in the requests and responses in the new v2 API. For more details, see the reference documentation.

Migrating in UI

To migrate through Speech Google Cloud console, follow these steps:

  1. Go to Speech Google Cloud console.

  2. Navigate to the Transcriptions Page.

  3. Click New Transcription and select your audio in the Audio configuration tab.

  4. In the Transcription options tab, select V2.

What's next