The ML.TRANSLATE function
This document describes the ML.TRANSLATE function, which lets you translate
text stored in BigQuery tables by using the Cloud Translation API.
Syntax
ML.TRANSLATE(
MODEL `PROJECT_ID.DATASET.MODEL`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
STRUCT('MODE_NAME' AS translate_mode [, 'TARGET_LANGUAGE_CODE' AS target_language_code ])
)
Arguments
ML.TRANSLATE takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of a remote model with aREMOTE_SERVICE_TYPEofCLOUD_AI_TRANSLATE_V3.TABLE: the name of the BigQuery table that contains text data. The text analysis is applied on the column namedtext_contentin this table. If your table does not have atext_contentcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column, as shown in the following example:SELECT * from ML.TRANSLATE( MODEL `mydataset.mymodel`, (SELECT comment AS text_content from mydataset.mytable), STRUCT('translate_text' AS translate_mode, 'en' AS target_language_code) );
An error occurs if no
text_contentcolumn is available.QUERY_STATEMENT: a query whose result contains the text data. The text analysis is applied on the column in the query namedtext_content. You can alias an existing table column astext_contentif necessary. For information about the supported SQL syntax of theQUERY_STATEMENTclause, see GoogleSQL query syntax.MODE_NAME: aSTRINGvalue that specifies one of the following supported translation modes:TARGET_LANGUAGE_CODE: aSTRINGvalue that specifies a supported language code for translation. This argument is only required when you use theTRANSLATE_TEXTtranslation mode.
Output
ML.TRANSLATE returns the input table plus the following columns:
ml_translate_result: aJSONvalue that contains the translation result from Cloud Translation API.ml_translate_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Quotas
See Cloud AI service functions quotas and limits.
Known issues
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted error message is returned.
To iterate through inference calls until all rows are successfully processed, you can use the BigQuery remote inference SQL scripts or the BigQuery remote inference pipeline Dataform package.
Locations
ML.TRANSLATE must run in the same region as the remote model that the
function references. For more information about supported locations for models
based on the Cloud Translation API, see Locations for remote models.
Example
Example 1
The following example applies text translation on the column name
text_content on the bq table mybqtable in mydataset to Chinese.
# Create model CREATE OR REPLACE MODEL `myproject.mydataset.mytranslatemodel` REMOTE WITH CONNECTION `myproject.myregion.myconnection` OPTIONS (remote_service_type = 'cloud_ai_translate_v3')
# Translate text SELECT * FROM ML.TRANSLATE( MODEL `mydataset.mytranslatemodel`, TABLE `mydataset.mybqtable`, STRUCT('translate_text' AS translate_mode, 'zh-CN' AS target_language_code));
The output is similar to the following:
ml_translate_result | ml_translate_status | text_content |
------- | -------- | --------
{"glossary_translations":[],"translation_memory_translations":[],"translations":[{"detected_language_code":"en","translated_text":"苹果"}]} | | apple
Example 2
The following example translates the text in the column text_content in the
table mybqtable to Chinese, and parses the JSON results into separate columns.
# Translate text and parse the json CREATE TABLE `mydataset.translate_result` AS ( SELECT STRING(ml_translate_result.translations[0].detected_language_code) AS `Original Language`, text_content AS `Original Text`, "zh-CN" AS `Destination Language`, STRING(ml_translate_result.translations[0].translated_text) AS Translation, ml_translate_status as `Status` FROM ML.TRANSLATE( MODEL `mydataset.mytranslatemodel`, TABLE `mydataset.mybqtable`, STRUCT('translate_text' AS translate_mode, 'zh-CN' AS target_language_code))); SELECT * FROM `mydataset.translate_result`;
The output is similar to the following:
Original Language | Original Text | Destination Language | Translation | Status |
------- | -------- | -------- | -------- | --------
en | apple | zh-cn. | 苹果 | |
If you get an error like query limit exceeded, you might have exceeded the
quota for this function, which
can leave you with unprocessed rows. Use the following query to complete
processing the unprocessed rows:
CREATE TABLE `mydataset.translate_result_next` AS ( SELECT STRING(ml_translate_result.translations[0].detected_language_code) AS `Original Language`, text_content AS `Original Text`, 'zh-CN' AS `Destination Language`, STRING(ml_translate_result.translations[0].translated_text) AS Translation, ml_translate_status as `Status` FROM ML.TRANSLATE( MODEL `mydataset.mytranslatemodel`, (SELECT `Original Text` AS text_content FROM `mydataset.translate_result` WHERE Status != ''), STRUCT('translate_text' AS translate_mode, 'zh-CN' AS target_language_code))); SELECT * FROM `mydataset.translate_result_next`;
What's next
- Get step-by-step instructions on how to
translate text in a BigQuery table
using the
ML.TRANSLATEfunction. - Learn more about other functions you can use to analyze BigQuery data.
- For information about model inference, see Model inference overview.
- For more information about supported SQL statements and functions for generative AI models, see End-to-end user journeys for generative AI models.