Chirp is the next generation of Speech-to-Text models on Google Distributed Cloud (GDC) air-gapped. Representing a version of a Universal Speech Model, Chirp has over 2B parameters and can transcribe many languages in a single model.
You can transcribe audio in other supported languages that Speech-to-Text doesn't originally support by enabling the Chirp component.
Chirp achieves state-of-the-art Word Error Rate (WER) on a variety of public test sets and languages, offering multi-language support on Distributed Cloud. It uses a universal encoder that trains models with a different architecture than current speech models, using data in many different languages. The model is then fine-tuned to offer transcription for specific languages. A single model unifies data from multiple languages. However, users still specify the language in which the model should recognize speech.
Chirp processes speech in much larger chunks than other models do. Results are only available after an entire utterance has finished. This means it might not be suitable for true, real-time use.
Chirp is available in the Speech-to-Text pre-trained API. The model identifier
for Chirp is: chirp
. Therefore, in the
Distributed Cloud implementation of Speech-to-Text, you can set the value
chirp
on the model
field of the RecognitionConfig
message in your request.
Available API methods
Chirp supports both
Speech.Recognize
and Speech.StreamingRecognize
API methods.
The difference between both methods is that StreamingRecognize
only returns
results after each utterance. For this reason, this method has a latency on the
order of seconds rather than milliseconds after starting speech, compared to the
Recognize
method. However, StreamingRecognize
has a very low latency after
an utterance is finished, for example, in a sentence followed by a pause.
Before you begin
Before using Chirp on Distributed Cloud, follow these steps:
Ask your Project IAM Admin to grant you the AI Speech Developer (
ai-speech-developer
) role in your project namespace.Enable the pre-trained APIs before using the client library.
Authenticate the request
You must get a token to authenticate the requests to the Speech-to-Text pre-trained service. Follow these steps:
gdcloud CLI
Export the identity token for the specified account to an environment variable:
export TOKEN="$($HOME/gdcloud auth print-identity-token --audiences=https://ENDPOINT)"
Replace ENDPOINT
with the Speech-to-Text endpoint. For more information, view service statuses and endpoints.
Python
Install the
google-auth
client library.pip install google-auth
Save the following code to a Python script, and update the
ENDPOINT
to the Speech-to-Text endpoint. For more information, see View service statuses and endpoints.import google.auth from google.auth.transport import requests audience = "https://ENDPOINT:443" creds, project_id = google.auth.default() creds = creds.with_gdch_audience(audience) def test_get_token(): req = requests.Request() creds.refresh(req) print(creds.token) if __name__=="__main__": test_get_token()
Run the script to fetch the token.
How to use Chirp
Work through the following steps to use Chirp as a model on the Speech-to-Text client library. You can use Python:
Python
- Open a notebook as your coding environment. If you don't have an existing notebook, create a notebook.
- Write your code using Python to install the Speech-to-Text library from a tar file and get a transcription.
Import the Speech-to-Text client library and transcribe an audio file to generate a Speech-to-Text transcription.
import base64 # Import the client library. from google.cloud import speech_v1p1beta1 from google.cloud.speech_v1p1beta1.services.speech import client from google.api_core.client_options import ClientOptions api_endpoint="ENDPOINT:443" def get_client(creds): opts = ClientOptions(api_endpoint=api_endpoint) return client.SpeechClient(credentials=creds, client_options=opts) # Specify the audio to transcribe. tc = get_client(creds) content = "BASE64_ENCODED_AUDIO" audio = speech_v1p1beta1.RecognitionAudio() audio.content = base64.standard_b64decode(content) config = speech_v1p1beta1.RecognitionConfig( encoding=speech_v1p1beta1.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=16000, audio_channel_count=1, language_code="LANGUAGE_CODE", model="chirp" ) # Detect speech in the audio file. metadata = (("x-goog-user-project", "projects/PROJECT_ID"),) response = tc.recognize(config=config, audio=audio, metadata=metadata) for result in response.results: print("Transcript: {}".format(result.alternatives[0].transcript))
Replace the following:
ENDPOINT
: the Speech-to-Text endpoint. For more information, view service statuses and endpoints.BASE64_ENCODED_AUDIO
: the audio data bytes encoded in a Base64 representation. This string begins with characters that look similar toZkxhQwAAACIQABAAAAUJABtAA+gA8AB+W8FZndQvQAyjv
.LANGUAGE_CODE
: a supported language code.PROJECT_ID
: your project ID.
Sample of the Speech-to-Text client library
To transcribe an audio file using the Chirp model on the Speech-to-Text API, first view the statuses and endpoints of the pre-trained models to identify your endpoint. Then, follow the sample code:
from google.cloud import speech_v1p1beta1
from google.cloud.speech_v1p1beta1.services.speech import client
from google.api_core.client_options import ClientOptions
import grpc
import io
def transcribe(local_file_path, api_endpoint):
opts = ClientOptions(api_endpoint=api_endpoint)
tc = client.SpeechClient(client_options=opts)
config = {
"encoding": speech_v1p1beta1.RecognitionConfig.AudioEncoding.LINEAR16,
"language_code": "LANGUAGE_CODE",
"sample_rate_hertz": 16000,
"audio_channel_count": 1
"model": "chirp"
}
metadata = (("x-goog-user-project", "projects/PROJECT_ID"),)
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(config=config, audio=audio, metadata=metadata)
Replace LANGUAGE_CODE with a supported language code.
Supported languages
The following languages are supported by Chirp:
Language | Language code |
---|---|
English (United States) | en-US |
Indonesian (Indonesia) | id-ID |
Malay (Malaysia) | ms-MY |