The ML.ROC_CURVE Function

ML.ROC_CURVE function

Use the ML.ROC_CURVE function to evaluate logistic regression-specific metrics. ML.ROC_CURVE only evaluates logistic regression models.

The output ML.ROC_CURVE function includes multiple rows with metrics for different threshold values for the model. The metrics include:

  • threshold
  • recall
  • false_positive_rate
  • true_positives
  • false_positives
  • true_negatives
  • false_negatives

ML.ROC_CURVE syntax

ML.ROC_CURVE(MODEL model_name,
            {TABLE table_name | (query_statement)},
            [GENERATE_ARRAY(thresholds)])

model_name

model_name is the name of the model you're evaluating. If you do not have a default project configured, prepend the project ID to the model name in following format: `[PROJECT_ID].[DATASET].[MODEL]` (including the backticks); for example, `myproject.mydataset.mymodel`.

The input column names and data types in the model must match the column names and data types in the table.

table_name

table_name is the name of the input table that contains the evaluation data. If you do not have a default project configured, prepend the project ID to the table name in following format: `[PROJECT_ID].[DATASET].[TABLE]` (including the backticks); for example, `myproject.mydataset.mytable`.

The input column names and data types in the model must match the column names and data types in the table. The input must have a column that matches the label column name provided during training. This value is provided using the input_label_cols option. If input_label_cols is unspecified, the column named "label" in the training data is used.

query_statement

The query_statement clause specifies the standard SQL query that is used to generate the evaluation data. See the Standard SQL Query Syntax page for the supported SQL syntax of the query_statement clause.

All columns referenced by the query_statement are used as inputs to the model.

thresholds

(Optional) thresholds are the percentile values of the prediction output supplied via the GENERATE_ARRAY function. The values supplied should be of type DOUBLE.

If you do not specify the thresholds to use, they are chosen automatically based on the values of the prediction output using 100 approximate quantiles: APPROX_QUANTILES(predicted_label, 100).

ML.ROC_CURVE examples

The following examples assume your model and input table are in your default project.

Evaluating the ROC curve of a logistic regression model

The following query return these columns: threshold, recall, false_positive_rate, true_positives, false_positives, true_negatives, and false_negatives. The recall and false positive rate can be graphed for an ROC curve. The threshold values returned are chosen based on the percentile values of the prediction output.

SELECT
  *
FROM
  ML.ROC_CURVE(MODEL `mydataset.mymodel`,
    TABLE `mydataset.mytable`)

Evaluating an ROC curve with custom thresholds

The following query returns the same columns as the previous example. The threshold values returned are chosen based on the output of the GENERATE_ARRAY function.

SELECT
  *
FROM
  ML.ROC_CURVE(MODEL `mydataset.mymodel`,
    TABLE `mydataset.mytable`,
    GENERATE_ARRAY(0.4,0.6,0.01))

Evaluating the precision-recall curve

The following query, instead of getting an ROC curve (the recall versus false positive rate), a precision-recall curve is calculated by using the precision from the true and false positive counts.

SELECT
  recall,
  true_positives / (true_positives + false_positives) AS precision
FROM
  ML.ROC_CURVE(MODEL `mydataset.mymodel`,
    TABLE `mydataset.mytable`)
Was this page helpful? Let us know how we did:

Send feedback about...