The ML.TRAINING_INFO function

This document describes the ML.TRAINING_INFO function, which lets you see information about the training iterations of a model.

You can run ML.TRAINING_INFO while the CREATE MODEL statement for the target model is running, or you can wait until after the CREATE MODEL statement completes. If you run ML.TRAINING_INFO before the first training iteration of the CREATE MODEL statement completes, the query returns a Not found error.

Syntax

ML.TRAINING_INFO(MODEL `project_id.dataset.model`)

Arguments

ML.TRAINING_INFO takes the following arguments:

  • project_id: Your project ID.
  • dataset: The BigQuery dataset that contains the model.
  • model: The name of the model.

Output

ML.TRAINING_INFO returns the following columns:

  • training_run: an INT64 value that contains the training run identifier for the model. The value in this column is 0 for a newly created model. If you retrain the model using the warm_start argument of the CREATE MODEL statement, this value is incremented.
  • iteration: an INT64 value that contains the iteration number of the training run. The value for the first iteration is 0. This value is incremented for each additional training run.
  • loss: a FLOAT64 value that contains the loss metric calculated after an iteration on the training data:

    • For logistic regression models, this is log loss.
    • For linear regression models, this is mean squared error.
    • For multiclass logistic regressions, this is cross-entropy log loss.
    • For explicit matrix factorization models this is mean squared error calculated over the seen input ratings.
    • For implicit matrix factorization models, the loss is calculated using the following formula:
    $$ Loss = \sum_{u, i} c_{ui}(p_{ui} - x^T_uy_i)^2 + \lambda(\sum_u||x_u||^2 + \sum_i||y_i||^2) $$

    For more information about what the variables mean, see Feedback types.

  • eval_loss: a FLOAT64 value that contains the loss metric calculated on the holdout data. For k-means models, ML.TRAINING_INFO doesn't return an eval_loss column. If the DATA_SPLIT_METHOD argument is NO_SPLIT, then all entries in the eval_loss column are NULL.

  • learning_rate: a FLOAT64 value that contains the learning rate in this iteration.

  • duration_ms: an INT64 value that contains how long the iteration took, in milliseconds.

  • cluster_info: an ARRAY<STRUCT> value that contains the fields centroid_id, cluster_radius, and cluster_size. ML.TRAINING_INFO computes cluster_radius and cluster_size with standardized features. Only returned for k-means models.

Permissions

You must have the bigquery.models.create and bigquery.models.getData Identity and Access Management (IAM) permissions in order to run ML.TRAINING_INFO.

Limitations

ML.TRAINING_INFO is subject to the following limitations:

  • ML.TRAINING_INFO doesn't support imported TensorFlow models.
  • For time series models, ML.TRAINING_INFO only returns three columns: training_run, iteration, and duration_ms. It doesn't expose the training information per iteration, or per time series if multiple time series are forecasted at once. The duration_ms is the total time cost for the entire process.

Example

The following example retrieves training information from the model mydataset.mymodel in your default project:

SELECT
  *
FROM
  ML.TRAINING_INFO(MODEL `mydataset.mymodel`)

What's next