The ML.UNDERSTAND_TEXT function
This document describes the ML.UNDERSTAND_TEXT
function, which lets you
analyze text that's stored in BigQuery tables by using the Cloud Natural Language API.
Syntax
ML.UNDERSTAND_TEXT( MODEL `project_id.dataset.model_name`, { TABLE `project_id.dataset.bq_table` | (query_statement) }, STRUCT('option_name' AS nlu_option) )
Arguments
ML.UNDERSTAND_TEXT
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of a remote model with aREMOTE_SERVICE_TYPE
ofCLOUD_AI_NATURAL_LANGUAGE_V1
.bq_table
: The name of the BigQuery table that contains text data. The text analysis is applied on the column with nametext_content
in this table. If your table does not havetext_content
column, use aSELECT
statement for this argument to provide an alias for an existing table column, as shown in the following example:SELECT * from ML.UNDERSTAND_TEXT( mydataset.mymodel, (SELECT comment AS text_content from mydataset.mytable), STRUCT('ANALYZE_SYNTAX' AS nlu_option) );
An error occurs if no
text_content
column is available.query_statement
: A query whose result contains the text data. The text analysis is applied on the column in the query namedtext_content
. You can alias an existing table column astext_content
if necessary. For information about the supported SQL syntax of thequery_statement
clause, see GoogleSQL query syntax.option_name
: aSTRING
value that specifies the feature name of a supported Natural Language API feature. The supported features are as follows:
Output
ML.UNDERSTAND_TEXT
returns the input table plus the following columns:
ml_understand_text_result
: aJSON
value that contains the text analysis result from Natural Language API.ml_understand_text_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Quotas
See Cloud AI service functions quotas and limits.
Known issues
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted
error message is returned.
To iterate through inference calls until all rows are successfully processed, you can use the BigQuery remote inference SQL scripts or the BigQuery remote inference pipeline Dataform package.
Locations
ML.UNDERSTAND_TEXT
must run in the same region as the remote model that the
function references. For more information about supported locations for models
based on the Natural Language API, see Locations for remote models.
Examples
Example 1
The following example applies classify_text on the bq table mybqtable
in
mydataset
.
# Create Model CREATE OR REPLACE MODEL `myproject.mydataset.mynlpmodel` REMOTE WITH CONNECTION `myproject.myregion.myconnection` OPTIONS (remote_service_type ='cloud_ai_natural_language_v1');
# Understand Text SELECT * FROM ML.UNDERSTAND_TEXT( MODEL `mydataset.mynlpmodel`, TABLE `mydataset.mybqtable`, STRUCT('classify_text' AS nlu_option) );
The output is similar to the following:
ml_understand_text_result | ml_understand_text_status | text_content |
---|---|---|
{"categories":[{"confidence":0.51999998,"name":"/Arts & Entertainment/TV & Video/TV Shows & Programs"}]} | That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows. |
Example 2
The following example classify the text in the column text_content
in the
table mybqtable
, selects the rows where confidence is higher than 0.5
, and
then returns the results in separate columns.
CREATE TABLE `mydataset.classfied_result` AS ( SELECT text_content AS `Original Input`, STRING(ml_understand_text_result.categories[0].name) AS `Classified Name`, FLOAT64(ml_understand_text_result.categories[0].confidence) AS `Confidence`, ml_understand_text_status AS `Status` FROM ML.UNDERSTAND_TEXT( MODEL `mydataset.mynlpmodel`, TABLE `mydataset.mybqtable`, STRUCT('classify_text' AS nlu_option)) ); SELECT * FROM `mydataset.classfied_result` WHERE confidence > 0.5;
The output is similar to the following:
Original Input | Classified Name | Confidence | Status |
---|---|---|---|
That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows. | /Arts & Entertainment/TV & Video/TV Shows & Programs | 0.51999998 |
If you get an error like query limit exceeded
, you might have exceeded the
quota for this function, which
can leave you with unprocessed rows. Use the following query to complete
processing the unprocessed rows:
CREATE TABLE `mydataset.classfied_result_next` AS ( SELECT text_content AS `Original Input`, STRING(ml_understand_text_result.categories[0].name) AS `Classified Name`, FLOAT64(ml_understand_text_result.categories[0].confidence) AS `Confidence`, ml_understand_text_status AS `Status` FROM ML.UNDERSTAND_TEXT( MODEL `mydataset.mynlpmodel`, (SELECT `Original Input` as text_content FROM `mydataset.classfied_result` WHERE Status != ''), STRUCT('classify_text' AS nlu_option)) ); SELECT * FROM `mydataset.classfied_result_next`;
What's next
- Get step-by-step instructions on how to
analyze text in a BigQuery table
using the
ML.UNDERSTAND_TEXT
function. - Learn more about other functions you can use to analyze BigQuery data.
- For information about model inference, see Model inference overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.