Migrating from Speech-to-Text v1 to v2

Speech-to-Text API v2 brings the latest Google Cloud API design for customers to meet enterprise security and regulatory requirements out of the box.

These requirements are realized through the following:

  • Data Residency: Speech-to-Text v2 offers the broad range of our existing transcription models in Google Cloud regions such as Belgium or Singapore. This allows the invocation of our transcription models through a fully regionalized service.

  • Recognizer Resourcefulness: Recognizers are reusable recognition configurations that can contain a combination of model, language, and features. This resourceful implementation eliminates the need for dedicated service accounts for authentication and authorization.

  • Logging: Resource creation and transcriptions generate logs available in the Google Cloud console, allowing for better telemetry and debugging.

  • Encryption: Speech-to-Text v2 supports Customer-managed encryption keys for all resources as well as batch transcription.

  • Audio Auto-Detect: Speech-to-Text v2 can automatically detect the sample rate, channel count, and format of your audio files, without needing to provide that information in the request configuration.

Migrating from v1 to v2

Migration from the v1 API to the v2 API does not happen automatically. Minimal implementation changes are required to take advantage of the feature set.

Migrating in API

Similar to Speech-to-Text v1, to transcribe audio, you need to create a RecognitionConfig by selecting the language of your audio and the recognition model of your choice:

Python

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech


def quickstart_v2(
    project_id: str,
    audio_file: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file."""
    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{project_id}/locations/global/recognizers/_",
        config=config,
        content=content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

If needed, select a region in which you want to use the Speech-to-Text API, and check the language and model availability in that region:

Python

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech


def change_speech_v2_location(
    project_id: str,
    location: str,
    audio_file: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file in a specific region."""
    # Instantiates a client to a regionalized Speech endpoint.
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{project_id}/locations/{location}/recognizers/_",
        config=config,
        content=content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

Optionally, create a recognizer resource if you need to reuse a specific recognition configuration across many transcription requests:

Python

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech


def create_recognizer(project_id: str, recognizer_id: str) -> cloud_speech.Recognizer:
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{project_id}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            default_recognition_config=cloud_speech.RecognitionConfig(
                language_codes=["en-US"], model="long"
            ),
        ),
    )

    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    print("Created Recognizer:", recognizer.name)
    return recognizer

There are other differences in the requests and responses in the new v2 API. For more details, see the reference documentation.

Migrating in UI

To migrate through Speech Google Cloud console, follow these steps:

  1. Go to Speech Google Cloud console.

  2. Navigate to the Transcriptions Page.

  3. Click New Transcription and select your audio in the Audio configuration tab.

  4. In the Transcription options tab, select V2.