The ML.ROC_CURVE function
ML.ROC_CURVE
function
Use the ML.ROC_CURVE
function to evaluate binary class classification specific
metrics.
For information about model evaluation in BigQuery ML, see Model evaluation overview.
For information about supported model types of each SQL statement and function, and all supported SQL statements and functions for each model type, read End-to-end user journey for each model.
ML.ROC_CURVE
syntax
ML.ROC_CURVE(MODEL model_name [, {TABLE table_name | (query_statement)}] [, GENERATE_ARRAY(thresholds)])
model_name
model_name
is the name of the model you're evaluating. If you do
not have a default project configured, prepend the project ID to the model
name in following format: `[PROJECT_ID].[DATASET].[MODEL]` (including the
backticks); for example, `myproject.mydataset.mymodel`.
table_name
(Optional) table_name
is the name of the input table that contains the
evaluation data. If you do not have a default project configured, prepend the
project ID to the table name in following format: `[PROJECT_ID].[DATASET].[TABLE]`
(including the backticks); for example, `myproject.mydataset.mytable`.
If table_name
is specified, the input column names in the model must match the
column names in the table, and their types should be compatible according to
BigQuery implicit coercion rules.
The input must have a column that matches the label column name provided during
training. This value is provided using the input_label_cols
option. If
input_label_cols
is unspecified, the column named "label" in the training data
is used.
If neither table_name
nor query_statement
is specified, ML.ROC_CURVE
computes the roc curve results as follows:
- If the data is split during training, the split evaluation data is used to compute the roc curve results.
- If the data is not split during training, the entire training input is used to compute the roc curve results.
query_statement
(Optional) The query_statement
clause specifies the GoogleSQL query that is
used to generate the evaluation data. See the
GoogleSQL Query Syntax
page for the supported SQL syntax of the query_statement
clause.
If query_statement
is specified, the input column names from the query must
match the column names in the model, and their types should be compatible
according to BigQuery implicit coercion rules.
The input must have a column that matches the label column name provided during
training. This value is provided using the input_label_cols
option. If
input_label_cols
is unspecified, the column named "label" in the training data
is used. The extra columns are ignored.
If the
clause was present in the TRANSFORM
CREATE MODEL
statement that created model_name
,
then only the input columns present in the TRANSFORM
clause must appear in query_statement
.
If neither table_name
nor query_statement
is specified, ML.ROC_CURVE
computes the roc curve results as follows:
- If the data is split during training, the split evaluation data is used to compute the roc curve results.
- If the data is not split during training, the entire training input is used to compute the roc curve results.
thresholds
(Optional) thresholds
are the percentile values of the prediction output
supplied via the GENERATE_ARRAY
function. The values supplied should be of type DOUBLE
.
If you do not specify the thresholds to use, they are chosen automatically based
on the values of the prediction output using 100 approximate quantiles:
APPROX_QUANTILES(predicted_label, 100)
.
ML.ROC_CURVE
output
The output ML.ROC_CURVE
function includes multiple rows with metrics for
different threshold values for the model. The metrics include:
- threshold
- recall
- false_positive_rate
- true_positives
- false_positives
- true_negatives
- false_negatives
ML.ROC_CURVE
examples
The following examples assume your model and input table are in your default project.
Evaluate the ROC curve of a binary class logistic regression model
The following query return these columns: threshold
, recall
,
false_positive_rate
, true_positives
, false_positives
, true_negatives
,
and false_negatives
. The recall and false positive rate can be graphed for an
ROC curve. The threshold values returned are chosen based on the percentile
values of the prediction output.
SELECT * FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`)
Evaluate an ROC curve with custom thresholds
The following query returns the same columns as the previous example. The
threshold values returned are chosen based on the output of the GENERATE_ARRAY
function.
SELECT * FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`, GENERATE_ARRAY(0.4,0.6,0.01))
Evaluate the precision-recall curve
The following query, instead of getting an ROC curve (the recall versus false positive rate), a precision-recall curve is calculated by using the precision from the true and false positive counts.
SELECT recall, true_positives / (true_positives + false_positives) AS precision FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`)