The ML.ROC_CURVE function

ML.ROC_CURVE function

Use the ML.ROC_CURVE function to evaluate logistic regression-specific metrics. ML.ROC_CURVE only evaluates logistic regression models.

The output ML.ROC_CURVE function includes multiple rows with metrics for different threshold values for the model. The metrics include:

  • threshold
  • recall
  • false_positive_rate
  • true_positives
  • false_positives
  • true_negatives
  • false_negatives

ML.ROC_CURVE limitations

The ML.ROC_CURVE function is subject to the following limitations:

  • Multiclass logistic regression models are not supported.

ML.ROC_CURVE syntax

ML.ROC_CURVE(MODEL model_name
            [, {TABLE table_name | (query_statement)}]
            [, GENERATE_ARRAY(thresholds)])

model_name

model_name is the name of the model you're evaluating. If you do not have a default project configured, prepend the project ID to the model name in following format: `[PROJECT_ID].[DATASET].[MODEL]` (including the backticks); for example, `myproject.mydataset.mymodel`.

table_name

(Optional) table_name is the name of the input table that contains the evaluation data. If you do not have a default project configured, prepend the project ID to the table name in following format: `[PROJECT_ID].[DATASET].[TABLE]` (including the backticks); for example, `myproject.mydataset.mytable`.

If table_name is specified, the input column names in the model must match the column names in the table, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using the input_label_cols option. If input_label_cols is unspecified, the column named "label" in the training data is used.

If neither table_name nor query_statement is specified, ML.ROC_CURVE computes the roc curve results as follows:

  • If the data is split during training, the split evaluation data is used to compute the roc curve results.
  • If the data is not split during training, the entire training input is used to compute the roc curve results.

query_statement

(Optional) The query_statement clause specifies the standard SQL query that is used to generate the evaluation data. See the Standard SQL Query Syntax page for the supported SQL syntax of the query_statement clause.

If query_statement is specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using the input_label_cols option. If input_label_cols is unspecified, the column named "label" in the training data is used.

If neither table_name nor query_statement is specified, ML.ROC_CURVE computes the roc curve results as follows:

  • If the data is split during training, the split evaluation data is used to compute the roc curve results.
  • If the data is not split during training, the entire training input is used to compute the roc curve results.

thresholds

(Optional) thresholds are the percentile values of the prediction output supplied via the GENERATE_ARRAY function. The values supplied should be of type DOUBLE.

If you do not specify the thresholds to use, they are chosen automatically based on the values of the prediction output using 100 approximate quantiles: APPROX_QUANTILES(predicted_label, 100).

ML.ROC_CURVE examples

The following examples assume your model and input table are in your default project.

Evaluating the ROC curve of a logistic regression model

The following query return these columns: threshold, recall, false_positive_rate, true_positives, false_positives, true_negatives, and false_negatives. The recall and false positive rate can be graphed for an ROC curve. The threshold values returned are chosen based on the percentile values of the prediction output.

SELECT
  *
FROM
  ML.ROC_CURVE(MODEL `mydataset.mymodel`,
    TABLE `mydataset.mytable`)

Evaluating an ROC curve with custom thresholds

The following query returns the same columns as the previous example. The threshold values returned are chosen based on the output of the GENERATE_ARRAY function.

SELECT
  *
FROM
  ML.ROC_CURVE(MODEL `mydataset.mymodel`,
    TABLE `mydataset.mytable`,
    GENERATE_ARRAY(0.4,0.6,0.01))

Evaluating the precision-recall curve

The following query, instead of getting an ROC curve (the recall versus false positive rate), a precision-recall curve is calculated by using the precision from the true and false positive counts.

SELECT
  recall,
  true_positives / (true_positives + false_positives) AS precision
FROM
  ML.ROC_CURVE(MODEL `mydataset.mymodel`,
    TABLE `mydataset.mytable`)
Was this page helpful? Let us know how we did:

Send feedback about...

BigQuery ML Documentation