The ML.ROC_CURVE function
This document describes the ML.ROC_CURVE
function, which you can use to
evaluate binary class classification specific metrics.
Syntax
ML.ROC_CURVE( MODEL `project_id.dataset.model` [, { TABLE `project_id.dataset.table` | (query_statement) }] [, GENERATE_ARRAY(thresholds)] [, STRUCT(trial_id AS trial_id)])
Arguments
ML.ROC_CURVE
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.table
: The name of the input table that contains the evaluation data.If
table
is specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name that's provided during training. This value is provided using theinput_label_cols
option. Ifinput_label_cols
is unspecified, the column that's namedlabel
in the training data is used.If you don't specify either
table
orquery_statement
,ML.ROC_CURVE
computes the curve results as follows:- If the data is split during training, the split evaluation data is used to compute the curve results.
- If the data is not split during training, the entire training input is used to compute the curve results.
query_statement
: a GoogleSQL query that is used to generate the evaluation data. For the supported SQL syntax of thequery_statement
clause in GoogleSQL, see Query syntax.If
query_statement
is specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using theinput_label_cols
option. Ifinput_label_cols
is unspecified, the column namedlabel
in the training data is used. The extra columns are ignored.If you used the
TRANSFORM
clause in theCREATE MODEL
statement that created the model, then only the input columns present in theTRANSFORM
clause must appear inquery_statement
.If you don't specify either
table
orquery_statement
,ML.ROC_CURVE
computes the curve results as follows:- If the data is split during training, the split evaluation data is used to compute the curve results.
- If the data is not split during training, the entire training input is used to compute the curve results.
thresholds
: anARRAY<FLOAT64>
value that specifies the percentile values of the prediction output supplied by theGENERATE_ARRAY
function.trial_id
: anINT64
value that identifies the hyperparameter tuning trial that you want the function to evaluate. The function uses the optimal trial by default. Only specify this argument if you ran hyperparameter tuning when creating the model.
Output
ML.ROC_CURVE
returns multiple rows with metrics for
different threshold values for the model. The metrics include the following:
threshold
: aFLOAT64
value that contains the custom threshold for the binary class classification model.recall
: aFLOAT64
value that indicates the proportion of actual positive cases that were correctly predicted by the model.true_positives
: anINT64
value that contains the number of cases correctly predicted as positive by the model.false_positives
: anINT64
value that contains the number of cases incorrectly predicted as positive by the model.true_negatives
: anINT64
value that contains the number of cases correctly predicted as negative by the model.false_negatives
: anINT64
value that contains the number of cases incorrectly predicted as negative by the model.
Examples
The following examples assume your model and input table are in your default project.
Evaluate the ROC curve of a binary class logistic regression model
The following query returns all of the output columns for ML.ROC_CURVE
. You
can graph the recall
and false_positive_rate
values for an ROC curve. The
threshold values returned are chosen based on the percentile values of the
prediction output.
SELECT * FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`)
Evaluate an ROC curve with custom thresholds
The following query returns all of the output columns for ML.ROC_CURVE
. The
threshold values returned are chosen based on the output of the GENERATE_ARRAY
function.
SELECT * FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`, GENERATE_ARRAY(0.4,0.6,0.01))
Evaluate the precision-recall curve
Instead of getting an ROC curve (the recall versus false positive rate), the following query calculates a precision-recall curve by using the precision from the true and false positive counts:
SELECT recall, true_positives / (true_positives + false_positives) AS precision FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`)
What's next
- For information about model evaluation, see BigQuery ML model evaluation overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.