The ML.TRANSCRIBE function
This document describes the ML.TRANSCRIBE
function, which lets you
transcribe audio files from an
object table.
Syntax
ML.TRANSCRIBE( MODEL `project_id.dataset.model_name`, TABLE `project_id.dataset.object_table`, [RECOGNITION_CONFIG => ( JSON 'recognition_config')] )
Arguments
ML.TRANSCRIBE
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of a remote model with aREMOTE_SERVICE_TYPE
ofCLOUD_AI_SPEECH_TO_TEXT_V2
.object_table
: The name of the object table that contains URIs of the audio files.The audio files in the object table must be of a supported type. An error is returned for any row that contains an audio files of an unsupported type.
recognition_config
: aSTRING
value that contains aRecognitionConfig
resource in JSON format.If a recognizer has been specified for the remote model by using the
SPEECH_RECOGNIZER
option, you can optionally specify arecognition_config
value to override the default configuration of the specified recognizer.This argument is required if no recognizer has been specified for the remote model by using the
SPEECH_RECOGNIZER
option.
Output
ML.TRANSCRIBE
returns the following columns:
transcripts
: aSTRING
value that contains the transcripts from processing the audio files.ml_transcribe_result
: aJSON
value that contains the result from the Speech-to-Text API.ml_transcribe_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.- The object table columns.
Quotas
See Cloud AI service functions quotas and limits.
Known issues
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted
error message is returned.
Locations
ML.TRANSCRIBE
must run in the same region as the remote model that the
function references. You can only create models based on
Speech-to-Text in the following locations:
asia-northeast1
asia-south1
asia-southeast1
australia-southeast1
eu
europe-west1
europe-west2
europe-west3
europe-west4
northamerica-northeast1
us
us-central1
us-east1
us-east4
us-west1
Limitations
The function can't process audio files that are longer than 1 minute. Any row that contains such a file returns an error.
Example
The following example transcribes the audio files represented by the
audio
table:
Create the model:
# Create model CREATE OR REPLACE MODEL `myproject.mydataset.transcribe_model` REMOTE WITH CONNECTION `myproject.myregion.myconnection` OPTIONS (remote_service_type = 'CLOUD_AI_SPEECH_TO_TEXT_V2', speech_recognizer = 'projects/project_number/locations/recognizer_location/recognizer/recognizer_id');
Transcribe the audio files without overriding the recognizer's default configuration:
SELECT * FROM ML.TRANSCRIBE( MODEL `myproject.mydataset.transcribe_model`, TABLE `myproject.mydataset.audio` );
Transcribe the audio files and override the recognizer's default configuration:
SELECT * FROM ML.TRANSCRIBE( MODEL `myproject.mydataset.transcribe_model`, TABLE `myproject.mydataset.audio`, recognition_config => ( JSON '{"language_codes": ["en-US" ],"model": "telephony","auto_decoding_config": {}}') );
The result is similar to the following:
transcripts | ml_transcribe_result | ml_transcribe_status | uri | ... |
---|---|---|---|---|
OK Google stream stranger things from Netflix to my TV. Okay, stranger things from Netflix playing on t v smart home and it's just... | {"metadata":{"total_billed_duration":{"seconds":56}},"results":[{"alternatives":[{"confidence":0.738729,"transcript"... | gs://mybucket/audio_files |
What's next
- Get step-by-step instructions on how to
transcribe audio files from an object table
using the
ML.TRANSCRIBE
function. - To learn more about model inference, including other functions you can use to analyze BigQuery data, see Model inference overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.