The ML.TRAINING_INFO function

This document describes the ML.TRAINING_INFO function, which lets you see information about the training iterations of a model.

You can run ML.TRAINING_INFO while the CREATE MODEL statement for the target model is running, or you can wait until after the CREATE MODEL statement completes. If you run ML.TRAINING_INFO before the first training iteration of the CREATE MODEL statement completes, the query returns a Not found error.

Syntax

ML.TRAINING_INFO(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
)

Arguments

ML.TRAINING_INFO takes the following arguments:

PROJECT_ID: your project ID.
DATASET: the BigQuery dataset that contains the model.
MODEL_NAME: the name of the model.

Output

ML.TRAINING_INFO returns the following columns:

training_run: an INT64 value that contains the training run identifier for the model. The value in this column is 0 for a newly created model. If you retrain the model using the warm_start argument of the CREATE MODEL statement, this value is incremented.
iteration: an INT64 value that contains the iteration number of the training run. The value for the first iteration is 0. This value is incremented for each additional training run.
loss: a FLOAT64 value that contains the loss metric calculated after an iteration on the training data:
- For logistic regression models, this is log loss.
- For linear regression models, this is mean squared error.
- For multiclass logistic regressions, this is cross-entropy log loss.
- For explicit matrix factorization models this is mean squared error calculated over the seen input ratings.
- For implicit matrix factorization models, the loss is calculated using the following formula:
$$ Loss = \sum_{u, i} c_{ui}(p_{ui} - x^T_uy_i)^2 + \lambda(\sum_u||x_u||^2 + \sum_i||y_i||^2) $$

For more information about what the variables mean, see Feedback types.
eval_loss: a FLOAT64 value that contains the loss metric calculated on the holdout data. For k-means models, ML.TRAINING_INFO doesn't return an eval_loss column. If the DATA_SPLIT_METHOD argument is NO_SPLIT, then all entries in the eval_loss column are NULL.
learning_rate: a FLOAT64 value that contains the learning rate in this iteration.
duration_ms: an INT64 value that contains how long the iteration took, in milliseconds.
cluster_info: an ARRAY<STRUCT> value that contains the fields centroid_id, cluster_radius, and cluster_size. ML.TRAINING_INFO computes cluster_radius and cluster_size with standardized features. Only returned for k-means models.

Permissions

You must have the bigquery.models.create and bigquery.models.getData Identity and Access Management (IAM) permissions in order to run ML.TRAINING_INFO.

Limitations

ML.TRAINING_INFO is subject to the following limitations:

ML.TRAINING_INFO doesn't support imported TensorFlow models.
For time series models, ML.TRAINING_INFO only returns three columns: training_run, iteration, and duration_ms. It doesn't expose the training information per iteration, or per time series if multiple time series are forecasted at once. The duration_ms is the total time cost for the entire process.

Example

The following example retrieves training information from the model mydataset.mymodel in your default project:

SELECT
  *
FROM
  ML.TRAINING_INFO(MODEL `mydataset.mymodel`)

What's next

For more information about model evaluation, see BigQuery ML model evaluation overview.
For more information about supported SQL statements and functions for ML models, see End-to-end user journeys for ML models.