Exporting models

Overview

This page shows you how to export BigQuery ML models. You can export BigQuery ML models to Cloud Storage, and use them for online prediction, or edit them in Python. You can export a BigQuery ML model by:

You can export the following model types:

  • KMEANS
  • LINEAR_REG
  • LOGISTIC_REG
  • MATRIX_FACTORIZATION
  • TENSORFLOW (imported TensorFlow models)

Export model formats and samples

The following table shows the export destination formats for each BigQuery ML model type and provides a sample of files that get written in the Cloud Storage bucket.

Model type Export model format Exported files sample
KMEANS TensorFlow SavedModel (TF 1.15 or higher) gcs_bucket/
  assets/
    f1.txt
    f2.txt
  saved_model.pb
  variables/
    variables.data-00-of-01
    variables.index
LINEAR_REGRESSOR
LOGISTIC_REG
MATRIX_FACTORIZATION
TENSORFLOW (imported) TensorFlow SavedModel Exactly the same files that were present when importing the model

Limitations

  • Model export is not supported if any of the following features were used during training:
    • ARRAY, TIMESTAMP, or GEOGRAPHY feature types were present in the input data.
    • BigQuery ML Transform clause was used for feature engineering.

Exporting BigQuery ML models

To export a model:

Console

  1. Open the BigQuery web UI in the Cloud Console.
    Go to the Cloud Console

  2. In the navigation panel, in the Resources section, expand your project and click your dataset to expand it. Find and click the model that you're exporting.

  3. On the right side of the window, click Export Model

    Export model

  4. In the Export model to Cloud Storage dialog:

    • For Select Cloud Storage location, browse for the bucket or folder location where you want to to export the model.
    • Click Export to export the model.

To check on the progress of the job, look near the top of the navigation for Job history for an Export job.

CLI

Use the bq extract command with the --model flag.

(Optional) Supply the --location flag and set the value to your location.

bq --location=location extract \
--model project_id:dataset.model \
gs://bucket/model_folder

Where:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • project_id is your project ID.
  • dataset is the name of the source dataset.
  • model is the model you're exporting.
  • bucket is the name of the Cloud Storage bucket to which you're exporting the data. The BigQuery dataset and the Cloud Storage bucket must be in the same location.
  • model_folder is the name of the folder where the exported model files will be written.

Examples:

For example, the following command exports mydataset.mymodel in TensorFlow SavedModel format to a Cloud Storage bucket named mymodel_folder.

bq extract --model \
'mydataset.mymodel' \
gs://example-bucket/mymodel_folder

API

To export model, create an extract job and populate the job configuration.

(Optional) Specify your location in the location property in the jobReference section of the job resource.

  1. Create an extract job that points to the BigQuery ML model and the Cloud Storage destination.

  2. Specify the source model by using the sourceModel configuration object that contains the project ID, dataset ID, and model ID.

  3. The destination URI(s) property must be fully-qualified, in the format gs://bucket/model_folder.

  4. To check the job status, call jobs.get(job_id) with the ID of the job returned by the initial request.

    • If status.state = DONE, the job completed successfully.
    • If the status.errorResult property is present, the request failed, and that object will include information describing what went wrong.
    • If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors. Non-fatal errors are listed in the returned job object's status.errors property.

API notes:

  • As a best practice, generate a unique ID and pass it as jobReference.jobId when calling jobs.insert to create a job. This approach is more robust to network failure because the client can poll or retry on the known job ID.

  • Calling jobs.insert on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.

Model deployment

You can deploy the exported model to Google Cloud AI Platform as well as locally.

AI Platform deployment

Export model format Deployment
Tensorflow SavedModel Deploy a Tensorflow SavedModel (1.15 runtime version or higher)

Local deployment

Export model format Deployment
Tensorflow SavedModel SavedModel is a standard format, and you can deploy them in Tensorflow Serving docker container.

You can also leverage the local run of AI Platform online prediction.

Prediction output format

This section provides the prediction output format of the exported models for each model type. All exported models support batch prediction; they can handle multiple input rows at a time. For example, there are two input rows in each of the following output format examples.

KMEANS

Prediction output format Output sample
+--------------------+--------------+---------------------+
| CENTROID_DISTANCES | CENTROID_IDS | NEAREST_CENTROID_ID |
+--------------------+--------------+---------------------+
| [FLOAT]            | [INT64]      | INT64               |
+--------------------+--------------+---------------------+
        
+--------------------+--------------+---------------------+
| CENTROID_DISTANCES | CENTROID_IDS | NEAREST_CENTROID_ID |
+--------------------+--------------+---------------------+
| [1.2, 1.3]         | [1, 2]       | [1]                 |
+--------------------+--------------+---------------------+
| [0.4, 0.1]         | [1, 2]       | [2]                 |
+--------------------+--------------+---------------------+
        

LINEAR_REG

Prediction output format Output sample
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
       

LOGISTIC_REG

Prediction output format Output sample
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [FLOAT]     | [STRING]     | STRING          |
+-------------+--------------+-----------------+
        
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [0.1, 0.9]  | ['a', 'b']   | ['b']           |
+-------------+--------------+-----------------+
| [0.8, 0.2]  | ['a', 'b']   | ['a']           |
+-------------+--------------+-----------------+
        

MATRIX_FACTORIZATION

Note: We currently only support taking an input user and output top 50 (predicted_rating, predicted_item) pairs sorted by predicted_rating in descending order.

Prediction output format Output sample
+--------------------+--------------+
| PREDICTED_RATING | PREDICTED_ITEM |
+------------------+----------------+
| [FLOAT]          | [STRING]       |
+------------------+----------------+
        
+--------------------+--------------+
| PREDICTED_RATING | PREDICTED_ITEM |
+------------------+----------------+
| [5.5, 1.7]       | ['A', 'B']     |
+------------------+----------------+
| [7.2, 2.7]       | ['B', 'A']     |
+------------------+----------------+
        

TENSORFLOW (imported)

Prediction output format
Same as the imported model

Required permissions

To export a BigQuery ML model to Cloud Storage, you need permissions to access the BigQuery ML model, permissions to run an export job, and permissions to write the data to the Cloud Storage bucket.

BigQuery permissions

  • At a minimum, to export model, you must be granted bigquery.models.export permissions. The following predefined Cloud IAM roles are granted bigquery.models.export permissions:

    • bigquery.dataViewer
    • bigquery.dataOwner
    • bigquery.dataEditor
    • bigquery.admin
  • At a minimum, to run an export job, you must be granted bigquery.jobs.create permissions. The following predefined Cloud IAM roles are granted bigquery.jobs.create permissions:

    • bigquery.user
    • bigquery.jobUser
    • bigquery.admin

Cloud Storage permissions

  • To write the data to an existing Cloud Storage bucket, you must be granted storage.objects.create permissions. The following predefined Cloud IAM roles are granted storage.objects.create permissions:

    • storage.objectCreator
    • storage.objectAdmin
    • storage.admin

For more information on IAM roles and permissions in BigQuery ML, see Access control. For more information on dataset-level roles, see Primitive roles for datasets in the BigQuery documentation.

Location considerations

When you choose a location for your data, consider the following:

  • Colocate your Cloud Storage buckets for exporting data.
    • When you export data, the regional or multi-regional Cloud Storage bucket must be in the same location as the BigQuery ML dataset. For example, if your BigQuery ML dataset is in the EU multi-regional location, the Cloud Storage bucket containing the data you're exporting must be in a regional or multi-regional location in the EU.
    • If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
    • Exception: If your dataset is in the US multi-regional location, you can export data into a Cloud Storage bucket in any regional or multi-regional location.
  • Develop a data management plan.

For more information on Cloud Storage locations, see Bucket Locations in the Cloud Storage documentation.

Moving BigQuery data between locations

You cannot change the location of a dataset after it is created, but you can make a copy of the dataset.

Quota policy

For information on export job quotas, see Export jobs on the Quotas and limits page.

Pricing

There is no charge for exporting BigQuery ML models, but exports are subject to BigQuery's Quotas and limits. For more information on BigQuery pricing, see the Pricing page.

After the data is exported, you are charged for storing the data in Cloud Storage. For more information on Cloud Storage pricing, see the Cloud Storage Pricing page.

What's next