This page describes how you can provide multiple rows of data to AutoML Tables at once, and receive a prediction for each row.
Introduction
After you have created (trained) a model, you can make an asynchronous request
for a batch of predictions using the
batchPredict
method. You supply input data to the batchPredict
method, in table format.
Each row provides values
for the features you trained the model to use.
The batchPredict
method sends that data to the model and returns
predictions for each row of data.
Models must be retrained every six months so that they can continue to serve predictions.
Requesting a batch prediction
For batch predictions, you specify a data source and a results destination in either a BigQuery table or a CSV file in Cloud Storage. You do not need to use the same technology for the source and destination. For example, you could use BigQuery for the data source and a CSV file in Cloud Storage for the results destination. Use the appropriate steps from the two tasks below depending on your requirements.
Your data source must contain tabular data that includes all of the columns used to train the model. You can include columns that were not in the training data, or that were in the training data but excluded from use for training. These extra columns are included in the prediction output, but they are not used for generating the prediction.
Using BigQuery tables
The names of the columns and data types of your input data must match the data you used in your training data. The columns can be in a different order than the training data.
BigQuery table requirements
- BigQuery data source tables must be no larger than 100 GB.
- You must use a multi-regional BigQuery dataset in the
US
orEU
locations. - If the table is in a different project, you must provide the
BigQuery Data Editor
role to the AutoML Tables service account in that project. Learn more.
Requesting the batch prediction
Console
Go to the AutoML Tables page in the Google Cloud console.
Select Models and open the model that you want to use.
Select the Test & Use tab.
Click Batch prediction.
For Input dataset, select Table from BigQuery and provide the project, dataset, and table IDs for your data source.
For Result, select BigQuery project and provide the project ID for your results destination.
If you want to see how each feature impacted the prediction, select Generate feature importance.
Generating feature importance increases the time and compute resources required for your prediction. Local feature importance is not available with a results destination of Cloud Storage.
Click Send batch prediction to request the batch prediction.
REST
You request batch predictions by using the
models.batchPredict
method.
Before using any of the request data, make the following replacements:
-
endpoint:
automl.googleapis.com
for the global location, andeu-automl.googleapis.com
for the EU region. - project-id: your Google Cloud project ID.
- location: the location for the resource:
us-central1
for Global oreu
for the European Union. - model-id: the ID of the model. For example,
TBL543
. - dataset-id: the ID of the BigQuery dataset where the prediction data is located.
-
table-id: the ID of the BigQuery table where the prediction data is located.
AutoML Tables creates a subfolder for the prediction results named
prediction-<model_name>-<timestamp>
in project-id.dataset-id.table-id.
HTTP method and URL:
POST https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict
Request JSON body:
{ "inputConfig": { "bigquerySource": { "inputUri": "bq://project-id.dataset-id.table-id" }, }, "outputConfig": { "bigqueryDestination": { "outputUri": "bq://project-id" }, }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict" | Select-Object -Expand Content
You can get local feature importance by adding the feature_importance
parameter to the request data. For more information, see
Local feature importance.
Java
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Node.js
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Python
The client library for AutoML Tables includes additional Python methods that simplify using the AutoML Tables API. These methods refer to datasets and models by name instead of id. Your dataset and model names must be unique. For more information, see the Client reference.
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Using CSV files in Cloud Storage
The names of the columns and data types of your input data must match the data you used in your training data. The columns can be in a different order than the training data.
CSV file requirements
- The first line of the data source must contain the name of the columns.
Each data source file must not be larger than 10 GB.
You can include multiple files, up to a maximum amount of 100 GB.
The Cloud Storage bucket must conform to the bucket requirements.
If the Cloud Storage bucket is in a different project than where you use AutoML Tables, you must provide the
Storage Object Creator
role to the AutoML Tables service account in that project. Learn more.
Console
Go to the AutoML Tables page in the Google Cloud console.
Select Models and open the model that you want to use.
Select the Test & Use tab.
Click Batch prediction.
For Input dataset, select CSVs from Cloud Storage and provide the bucket URI for your data source.
For Result, select Cloud Storage bucket and provide the bucket URI for your destination bucket.
If you want to see how each feature impacted the prediction, select Generate feature importance.
Generating feature importance increases the time and compute resources required for your prediction. Local feature importance is not available with a results destination of Cloud Storage.
Click Send batch prediction to request the batch prediction.
REST
You request batch predictions by using the
models.batchPredict
method.
Before using any of the request data, make the following replacements:
-
endpoint:
automl.googleapis.com
for the global location, andeu-automl.googleapis.com
for the EU region. - project-id: your Google Cloud project ID.
- location: the location for the resource:
us-central1
for Global oreu
for the European Union. - model-id: the ID of the model. For example,
TBL543
. - input-bucket-name: the name of the Cloud Storage bucket where the prediction data is located.
- input-directory-name: the name of the Cloud Storage directory where the prediction data is located.
- object-name: the name of the Cloud Storage object where the prediction data is located.
- output-bucket-name: the name of the Cloud Storage bucket for the prediction results.
-
output-directory-name: the name of the Cloud Storage directory for the
prediction results.
AutoML Tables creates a subfolder for the prediction results named
prediction-<model_name>-<timestamp>
ings://output-bucket-name/output-directory-name
. You must have write permissions to this path.
HTTP method and URL:
POST https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict
Request JSON body:
{ "inputConfig": { "gcsSource": { "inputUris": [ "gs://input-bucket-name/input-directory-name/object-name.csv" ] }, }, "outputConfig": { "gcsDestination": { "outputUriPrefix": "gs://output-bucket-name/output-directory-name" }, }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict" | Select-Object -Expand Content
You can get local feature importance by adding the feature_importance
parameter to the request data. For more information, see
Local feature importance.
Java
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Node.js
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Python
The client library for AutoML Tables includes additional Python methods that simplify using the AutoML Tables API. These methods refer to datasets and models by name instead of id. Your dataset and model names must be unique. For more information, see the Client reference.
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Retrieving your results
Retrieving prediction results in BigQuery
If you specified BigQuery as your output destination, the results of your batch prediction request are returned as a new dataset in the BigQuery project you specified. The BigQuery dataset is the name of your model prepended with "prediction_" and appended with the timestamp of when the prediction job started. You can find the BigQuery dataset name in Recent predictions on the Batch prediction page of the Test & Use tab for your model.
The BigQuery dataset contains two tables: predictions
and
errors
. The errors
table has a row for every row in your prediction request
for which AutoML Tables could not return a prediction (for example,
if a non-nullable feature was null). The predictions
table contains a row
for every prediction returned.
In the predictions
table, AutoML Tables returns your prediction data,
and creates a new column for the prediction results
by prepending "predicted_" onto your target column name. The prediction results
column contains a nested BigQuery structure that contains the
prediction results.
To retrieve the prediction results, you can use a query in the BigQuery console. The format of the query depends on your model type.
Binary classification:
SELECT predicted_<target-column-name>[OFFSET(0)].tables AS value_1, predicted_<target-column-name>[OFFSET(1)].tables AS value_2 FROM <bq-dataset-name>.predictions
"value_1" and "value_2", are place markers, you can replace them with the target values or an equivalent.
Multi-class classification:
SELECT predicted_<target-column-name>[OFFSET(0)].tables AS value_1, predicted_<target-column-name>[OFFSET(1)].tables AS value_2, predicted_<target-column-name>[OFFSET(2)].tables AS value_3, ... predicted_<target-column-name>[OFFSET(4)].tables AS value_5 FROM <bq-dataset-name>.predictions
"value_1", "value_2", and so on are place markers, you can replace them with the target values or an equivalent.
Regression:
SELECT predicted_<target-column-name>[OFFSET(0)].tables.value, predicted_<target-column-name>[OFFSET(0)].tables.prediction_interval.start, predicted_<target-column-name>[OFFSET(0)].tables.prediction_interval.end FROM <bq-dataset-name>.predictions
Retrieving results in Cloud Storage
If you specified Cloud Storage as your output destination, the results of your batch prediction request are returned as CSV files in a new folder in the bucket you specified. The name of the folder is the name of your model, prepended with "prediction-" and appended with the timestamp of when the prediction job started. You can find the Cloud Storage folder name in Recent predictions at the bottom of the Batch prediction page of the Test & Use tab for your model.
The Cloud Storage folder contains two types of files: error files and prediction files. If the results are large, additional files are created.
The error files are named errors_1.csv
, errors_2.csv
, and so on. They
contain a header row, and a row for every row in your prediction request for
which AutoML Tables could not return a prediction.
The prediction files are named tables_1.csv
, tables_2.csv
, and
so on. They contain a header row with the column names, and a row
for every prediction returned.
In the prediction files, AutoML Tables returns your prediction data, and creates one or more new columns for the prediction results, depending on your model type:
Classification:
For each potential value of your target column, a column named
<target-column-name>_<value>_score
is added to the results. This column
contains the score, or confidence estimate, for that value.
Regression:
The predicted value for that row is returned in a column named
predicted_<target-column-name>
. The prediction interval is not returned for
CSV output.
Local feature importance is not available for results in Cloud Storage.
Interpreting your results
How you interpret your results depends on the business problem you are solving and how your data is distributed.
Interpreting your results for classification models
Prediction results for classification models (binary and multi-class) return a probability score for each potential value of the target column. You must determine how you want to use the scores. For example, to get a binary classification from the provided scores, you would identify a threshold value. If there are two classes, "A" and "B", you should classify the example as "A" if the score for "A" is greater than the chosen threshold, and "B" otherwise. For imbalanced datasets, the threshold might approach 100% or 0%.
You can use the precision recall curve chart, receiver operator curve chart, and other relevant per-label statistics on the Evaluate page for your model in the Google Cloud console to see how changing the threshold changes your evaluation metrics. This can help you determine the best way to use the score values to interpret your prediction results.
Interpreting your results for regression models
For regression models, an expected value is returned, and for many problems, you can use that value directly. You can also use the prediction interval, if it is returned, and if a range makes sense for your business problem.
Interpreting your local feature importance results
For information about interpreting your local feature importance results, see Local feature importance.
What's next
- Learn more about local feature importance.
- Learn more about long-running operations.