Exporting models

Overview

This page shows you how to export BigQuery ML models. You can export BigQuery ML models to Cloud Storage, and use them for online prediction, or edit them in Python. You can export a BigQuery ML model by:

  • Using the Cloud Console.
  • Using the bq extract command in the bq command-line tool.
  • Submitting an extract job through the API or client libraries.

You can export the following model types:

  • AUTOML_CLASSIFIER
  • AUTOML_REGRESSOR
  • BOOSTED_TREE_CLASSIFIER
  • BOOSTED_TREE_REGRESSOR
  • DNN_CLASSIFIER
  • DNN_REGRESSOR
  • KMEANS
  • LINEAR_REG
  • LOGISTIC_REG
  • MATRIX_FACTORIZATION
  • TENSORFLOW (imported TensorFlow models)
  • XGBOOST (imported XGBoost models)

Export model formats and samples

The following table shows the export destination formats for each BigQuery ML model type and provides a sample of files that get written in the Cloud Storage bucket.

Model type Export model format Exported files sample
AUTOML_CLASSIFIER TensorFlow SavedModel (TF 2.1.0) gcs_bucket/
  assets/
    f1.txt
    f2.txt
  saved_model.pb
  variables/
    variables.data-00-of-01
    variables.index
AUTOML_REGRESSOR
DNN_CLASSIFIER TensorFlow SavedModel (TF 1.15 or higher)
DNN_REGRESSOR
KMEANS
LINEAR_REGRESSOR
LOGISTIC_REG
MATRIX_FACTORIZATION
BOOSTED_TREE_CLASSIFIER Booster (XGBoost 0.82) gcs_bucket/
  assets/
    0.txt
    1.txt
    model_metadata.json
  main.py
  model.bst
  xgboost_predictor-0.1.tar.gz
    ....
     predictor.py
    ....


main.py is for local run. See Model deployment for more details.
BOOSTED_TREE_REGRESSOR
TENSORFLOW (imported) TensorFlow SavedModel Exactly the same files that were present when importing the model

Limitations

  • Model export is not supported if any of the following features were used during training:

    • ARRAY, TIMESTAMP, or GEOGRAPHY feature types were present in the input data.
    • BigQuery ML Transform clause was used for feature engineering.
  • Exported models for model types AUTOML_REGRESSOR and AUTOML_CLASSIFIER do not support AI Platform deployment for online prediction.

  • The model size limit is 1 GB for matrix factorization model export. The model size is roughly proportional to num_factors, so you can reduce num_factors during training to shrink the model size if you reach the limit.

Exporting BigQuery ML models

To export a model:

Console

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the navigation panel, in the Resources section, expand your project and click your dataset to expand it. Find and click the model that you're exporting.

  3. On the right side of the window, click Export Model.

    Export model

  4. In the Export model to Cloud Storage dialog:

    • For Select Cloud Storage location, browse for the bucket or folder location where you want to to export the model.
    • Click Export to export the model.

To check on the progress of the job, look near the top of the navigation for Job history for an Export job.

bq

Use the bq extract command with the --model flag.

(Optional) Supply the --destination_format flag and pick the format of the model exported. (Optional) Supply the --location flag and set the value to your location.

bq --location=location extract \
--destination_format format \
--model project_id:dataset.model \
gs://bucket/model_folder

Where:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • destination_format is the format for the exported model: ML_TF_SAVED_MODEL (default), or ML_XGBOOST_BOOSTER.
  • project_id is your project ID.
  • dataset is the name of the source dataset.
  • model is the model you're exporting.
  • bucket is the name of the Cloud Storage bucket to which you're exporting the data. The BigQuery dataset and the Cloud Storage bucket must be in the same location.
  • model_folder is the name of the folder where the exported model files will be written.

Examples:

For example, the following command exports mydataset.mymodel in TensorFlow SavedModel format to a Cloud Storage bucket named mymodel_folder.

bq extract --model \
'mydataset.mymodel' \
gs://example-bucket/mymodel_folder

The default value of destination_format is ML_TF_SAVED_MODEL.

The following command exports mydataset.mymodel in XGBoost Booster format to a Cloud Storage bucket named mymodel_folder.

bq extract --model \
--destination_format ML_XGBOOST_BOOSTER \
'mydataset.mytable' \
gs://example-bucket/mymodel_folder

API

To export model, create an extract job and populate the job configuration.

(Optional) Specify your location in the location property in the jobReference section of the job resource.

  1. Create an extract job that points to the BigQuery ML model and the Cloud Storage destination.

  2. Specify the source model by using the sourceModel configuration object that contains the project ID, dataset ID, and model ID.

  3. The destination URI(s) property must be fully-qualified, in the format gs://bucket/model_folder.

  4. Specify the destination format by setting the configuration.extract.destinationFormat property. For example, to export a Boosted Tree model, set this property to the value ML_XGBOOST_BOOSTER.

  5. To check the job status, call jobs.get(job_id) with the ID of the job returned by the initial request.

    • If status.state = DONE, the job completed successfully.
    • If the status.errorResult property is present, the request failed, and that object will include information describing what went wrong.
    • If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors. Non-fatal errors are listed in the returned job object's status.errors property.

API notes:

  • As a best practice, generate a unique ID and pass it as jobReference.jobId when calling jobs.insert to create a job. This approach is more robust to network failure because the client can poll or retry on the known job ID.

  • Calling jobs.insert on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.

Model deployment

You can deploy the exported model to Google Cloud AI Platform as well as locally.

AI Platform deployment

Export model format Deployment
Tensorflow SavedModel (non-AutoML models) Deploy a Tensorflow SavedModel (1.15 runtime version or higher)
Tensorflow SavedModel (AutoML models) Not supported
XGBoost Booster Custom prediction routine (1.15 runtime version)

Note: Since there is preprocessing and postprocessing information saved in the exported files, you must use a Custom prediction routine to deploy the model with the extra exported files.

Local deployment

Export model format Deployment
Tensorflow SavedModel (non-AutoML models) SavedModel is a standard format, and you can deploy them in Tensorflow Serving docker container.

You can also leverage the local run of AI Platform online prediction.
Tensorflow SavedModel (AutoML models) Run the AutoML container.
XGBoost Booster To run XGBoost Booster models locally, you can use the exported main.py file:
  1. Download all of the files from Cloud Storage to the local directory.
  2. Unzip the predictor.py file from xgboost_predictor-0.1.tar.gz to the local directory.
  3. Run main.py (see instructions in main.py).

Prediction output format

This section provides the prediction output format of the exported models for each model type. All exported models support batch prediction; they can handle multiple input rows at a time. For example, there are two input rows in each of the following output format examples.

AUTOML_CLASSIFIER

Prediction output format Output sample
+------------------------------------------+
| predictions                              |
+------------------------------------------+
| [{"scores":[FLOAT], "classes":[STRING]}] |
+------------------------------------------+
        
+---------------------------------------------+
| predictions                                 |
+---------------------------------------------+
| [{"scores":[1, 2], "classes":['a', 'b']},   |
|  {"scores":[3, 0.2], "classes":['a', 'b']}] |
+---------------------------------------------+
        

AUTOML_REGRESSOR

Prediction output format Output sample
+-----------------+
| predictions     |
+-----------------+
| [FLOAT]         |
+-----------------+
        
+-----------------+
| predictions     |
+-----------------+
| [1.8, 2.46]     |
+-----------------+
        

BOOSTED_TREE_CLASSIFIER

Prediction output format Output sample
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [FLOAT]     | [STRING]     | STRING          |
+-------------+--------------+-----------------+
        
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [0.1, 0.9]  | ['a', 'b']   | ['b']           |
+-------------+--------------+-----------------+
| [0.8, 0.2]  | ['a', 'b']   | ['a']           |
+-------------+--------------+-----------------+
        

BOOSTED_TREE_REGRESSOR

Prediction output format Output sample
+-----------------+
| predicted_label |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| predicted_label |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
        

DNN_CLASSIFIER

Prediction output format Output sample
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| ALL_CLASS_IDS | ALL_CLASSES | CLASS_IDS | CLASSES | LOGISTIC (binary only) | LOGITS | PROBABILITIES |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [INT64]       | [STRING]    | INT64     | STRING  | FLOAT                  | [FLOAT]| [FLOAT]       |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
        
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| ALL_CLASS_IDS | ALL_CLASSES | CLASS_IDS | CLASSES | LOGISTIC (binary only) | LOGITS | PROBABILITIES |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [0, 1]        | ['a', 'b']  | [0]       | ['a']   | [0.36]                 | [-0.53]| [0.64, 0.36]  |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [0, 1]        | ['a', 'b']  | [0]       | ['a']   | [0.2]                  | [-1.38]| [0.8, 0.2]    |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
        

DNN_REGRESSOR

Prediction output format Output sample
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
        

KMEANS

Prediction output format Output sample
+--------------------+--------------+---------------------+
| CENTROID_DISTANCES | CENTROID_IDS | NEAREST_CENTROID_ID |
+--------------------+--------------+---------------------+
| [FLOAT]            | [INT64]      | INT64               |
+--------------------+--------------+---------------------+
        
+--------------------+--------------+---------------------+
| CENTROID_DISTANCES | CENTROID_IDS | NEAREST_CENTROID_ID |
+--------------------+--------------+---------------------+
| [1.2, 1.3]         | [1, 2]       | [1]                 |
+--------------------+--------------+---------------------+
| [0.4, 0.1]         | [1, 2]       | [2]                 |
+--------------------+--------------+---------------------+
        

LINEAR_REG

Prediction output format Output sample
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
       

LOGISTIC_REG

Prediction output format Output sample
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [FLOAT]     | [STRING]     | STRING          |
+-------------+--------------+-----------------+
        
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [0.1, 0.9]  | ['a', 'b']   | ['b']           |
+-------------+--------------+-----------------+
| [0.8, 0.2]  | ['a', 'b']   | ['a']           |
+-------------+--------------+-----------------+
        

MATRIX_FACTORIZATION

Note: We currently only support taking an input user and output top 50 (predicted_rating, predicted_item) pairs sorted by predicted_rating in descending order.

Prediction output format Output sample
+--------------------+--------------+
| PREDICTED_RATING | PREDICTED_ITEM |
+------------------+----------------+
| [FLOAT]          | [STRING]       |
+------------------+----------------+
        
+--------------------+--------------+
| PREDICTED_RATING | PREDICTED_ITEM |
+------------------+----------------+
| [5.5, 1.7]       | ['A', 'B']     |
+------------------+----------------+
| [7.2, 2.7]       | ['B', 'A']     |
+------------------+----------------+
        

TENSORFLOW (imported)

Prediction output format
Same as the imported model

XGBoost model visualization

You can visualize the boosted trees using the plot_tree Python API after model export. For example, you can leverage Colab without installing the dependencies:

  1. Export the boosted tree model to a Cloud Storage bucket.
  2. Download the model.bst file from the Cloud Storage bucket.
  3. In a Colab noteboook, upload the model.bst file to Files.
  4. Run the following code in the notebook:

    import xgboost as xgb
    import matplotlib.pyplot as plt
    
    model = xgb.Booster(model_file="model.bst")
    num_iterations = <iteration_number>
    for tree_num in range(num_iterations):
      xgb.plot_tree(model, num_trees=tree_num)
    plt.show
    

This example plots multiple trees (one tree per iteration):

Export model

Currently, we don't save feature names in the model, so you will see names such as "f0", "f1", and so on. You can find the corresponding feature names in the assets/model_metadata.json exported file using these names (such as "f0") as indexes.

Required permissions

To export a BigQuery ML model to Cloud Storage, you need permissions to access the BigQuery ML model, permissions to run an export job, and permissions to write the data to the Cloud Storage bucket.

BigQuery permissions

  • At a minimum, to export model, you must be granted bigquery.models.export permissions. The following predefined IAM roles are granted bigquery.models.export permissions:

    • bigquery.dataViewer
    • bigquery.dataOwner
    • bigquery.dataEditor
    • bigquery.admin
  • At a minimum, to run an export job, you must be granted bigquery.jobs.create permissions. The following predefined IAM roles are granted bigquery.jobs.create permissions:

    • bigquery.user
    • bigquery.jobUser
    • bigquery.admin

Cloud Storage permissions

  • To write the data to an existing Cloud Storage bucket, you must be granted storage.objects.create permissions. The following predefined IAM roles are granted storage.objects.create permissions:

    • storage.objectCreator
    • storage.objectAdmin
    • storage.admin

For more information on IAM roles and permissions in BigQuery ML, see Access control. For more information on dataset-level roles, see Primitive roles for datasets in the BigQuery documentation.

Location considerations

When you choose a location for your data, consider the following:

  • Colocate your Cloud Storage buckets for exporting data.
    • When you export data, the regional or multi-regional Cloud Storage bucket must be in the same location as the BigQuery ML dataset. For example, if your BigQuery ML dataset is in the EU multi-regional location, the Cloud Storage bucket containing the data you're exporting must be in a regional or multi-regional location in the EU.
    • If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
    • Exception: If your dataset is in the US multi-regional location, you can export data into a Cloud Storage bucket in any regional or multi-regional location.
  • Develop a data management plan.

For more information on Cloud Storage locations, see Bucket Locations in the Cloud Storage documentation.

Moving BigQuery data between locations

You cannot change the location of a dataset after it is created, but you can make a copy of the dataset.

Quota policy

For information on export job quotas, see Export jobs on the Quotas and limits page.

Pricing

There is no charge for exporting BigQuery ML models, but exports are subject to BigQuery's Quotas and limits. For more information on BigQuery pricing, see the Pricing page.

After the data is exported, you are charged for storing the data in Cloud Storage. For more information on Cloud Storage pricing, see the Cloud Storage Pricing page.

What's next