This page describes how to prepare the datasets needed to generate prediction outputs.
Before you begin
Before you begin, you need the following:
- A model
- To register all parties that appear in the dataset you are using for prediction
Create a dataset for prediction
You can create predictions using an existing dataset (for example, the one you were using for backtesting). However, in a production environment, we recommend that you create a new dataset for each prediction run:
- As a customer, you're responsible for all tracking of lineage from dataset to model. To ensure data remains unchanged, we recommend that you create a BigQuery table snapshot of your BigQuery tables after they pass data validation and reference the snapshot in the AML AI dataset. If you reference regularly updated tables, AML AI operations read the BigQuery tables each time an operation uses the AML AI dataset, so changes to the underlying BigQuery tables could impact tuning, training, backtesting, and predictions.
- Follow the guidance under Prepare Data for AML AI to prepare your BigQuery tables and then create a separate AML AI dataset for prediction using the tables you snapshotted in Step 1. To create the BigQuery datasets and tables, you can use the commands in Prepare BigQuery datasets and tables.
Prepare the output destinations
AML AI generates prediction outputs (risk scores and explainability) in BigQuery when you create a prediction results resource.
Before creating prediction results, you must create a BigQuery dataset for these outputs. Any BigQuery dataset can be used for prediction outputs, as long as the correct permissions are granted and the dataset is in the same project where the API is enabled and in the same location as the AML AI instance.
Generate risk scores and explainability
Now that you have the dataset for prediction, a trained model resource, and a BigQuery dataset for output, you can create prediction results. To do this, see Create and manage prediction results.