Export models

Overview

This page shows you how to export BigQuery ML models. You can export BigQuery ML models to Cloud Storage, and use them for online prediction, or edit them in Python. You can export a BigQuery ML model by:

You can export the following model types:

  • AUTOENCODER
  • AUTOML_CLASSIFIER
  • AUTOML_REGRESSOR
  • BOOSTED_TREE_CLASSIFIER
  • BOOSTED_TREE_REGRESSOR
  • DNN_CLASSIFIER
  • DNN_REGRESSOR
  • DNN_LINEAR_COMBINED_CLASSIFIER
  • DNN_LINEAR_COMBINED_REGRESSOR
  • KMEANS
  • LINEAR_REG
  • LOGISTIC_REG
  • MATRIX_FACTORIZATION
  • RANDOM_FOREST_CLASSIFIER
  • RANDOM_FOREST_REGRESSOR
  • TENSORFLOW (imported TensorFlow models)
  • PCA
  • TRANSFORM_ONLY

Export model formats and samples

The following table shows the export destination formats for each BigQuery ML model type and provides a sample of files that get written in the Cloud Storage bucket.

Model type Export model format Exported files sample
AUTOML_CLASSIFIER TensorFlow SavedModel (TF 2.1.0) gcs_bucket/
  assets/
    f1.txt
    f2.txt
  saved_model.pb
  variables/
    variables.data-00-of-01
    variables.index
AUTOML_REGRESSOR
AUTOENCODER TensorFlow SavedModel (TF 1.15 or higher)
DNN_CLASSIFIER
DNN_REGRESSOR
DNN_LINEAR_COMBINED_CLASSIFIER
DNN_LINEAR_COMBINED_REGRESSOR
KMEANS
LINEAR_REGRESSOR
LOGISTIC_REG
MATRIX_FACTORIZATION
PCA
TRANSFORM_ONLY
BOOSTED_TREE_CLASSIFIER Booster (XGBoost 0.82) gcs_bucket/
  assets/
    0.txt
    1.txt
    model_metadata.json
  main.py
  model.bst
  xgboost_predictor-0.1.tar.gz
    ....
     predictor.py
    ....


main.py is for local run. See Model deployment for more details.
BOOSTED_TREE_REGRESSOR
RANDOM_FOREST_REGRESSOR
RANDOM_FOREST_REGRESSOR
TENSORFLOW (imported) TensorFlow SavedModel Exactly the same files that were present when importing the model

Export model trained with TRANSFORM

If the model is trained with the TRANSFORM clause, then an additional preprocessing model performs the same logic in the TRANSFORM clause and is saved in the TensorFlow SavedModel format under the subdirectory transform. You can deploy a model trained with the TRANSFORM clause to AI Platform as well as locally. For more information, see model deployment.

Export model format Exported files sample
Prediction model: TensorFlow SavedModel or Booster (XGBoost 0.82).
Preprocessing model for TRANSFORM clause: TensorFlow SavedModel (TF 2.5 or higher)
gcs_bucket/
  ....(model files)
  transform/
    assets/
        f1.txt/
        f2.txt/
    saved_model.pb
    variables/
        variables.data-00-of-01
        variables.index

The model doesn't contain the information about the feature engineering performed outside the TRANSFORM clause during training. For example, anything in the SELECT statement . So you would need to manually convert the input data before feeding into the preprocessing model.

Supported data types

When exporting models trained with the TRANSFORM clause, the following data types are supported for feeding into the TRANSFORM clause.

TRANSFORM input type TRANSFORM input samples Exported preprocessing model input samples
INT64 10,
11
tf.constant(
  [10, 11],
  dtype=tf.int64)
NUMERIC NUMERIC 10,
NUMERIC 11
tf.constant(
  [10, 11],
  dtype=tf.float64)
BIGNUMERIC BIGNUMERIC 10,
BIGNUMERIC 11
tf.constant(
  [10, 11],
  dtype=tf.float64)
FLOAT64 10.0,
11.0
tf.constant(
  [10, 11],
  dtype=tf.float64)
BOOL TRUE,
FALSE
tf.constant(
  [True, False],
  dtype=tf.bool)
STRING 'abc',
'def'
tf.constant(
  ['abc', 'def'],
  dtype=tf.string)
BYTES b'abc',
b'def'
tf.constant(
  ['abc', 'def'],
  dtype=tf.string)
DATE DATE '2020-09-27',
DATE '2020-09-28'
tf.constant(
  [
    '2020-09-27',
    '2020-09-28'
  ],
  dtype=tf.string)

"%F" format
DATETIME DATETIME '2023-02-02 02:02:01.152903',
DATETIME '2023-02-03 02:02:01.152903'
tf.constant(
  [
    '2023-02-02 02:02:01.152903',
    '2023-02-03 02:02:01.152903'
  ],
  dtype=tf.string)

"%F %H:%M:%E6S" format
TIME TIME '16:32:36.152903',
TIME '17:32:36.152903'
tf.constant(
  [
    '16:32:36.152903',
    '17:32:36.152903'
  ],
  dtype=tf.string)

"%H:%M:%E6S" format
TIMESTAMP TIMESTAMP '2017-02-28 12:30:30.45-08',
TIMESTAMP '2018-02-28 12:30:30.45-08'
tf.constant(
  [
    '2017-02-28 20:30:30.4 +0000',
    '2018-02-28 20:30:30.4 +0000'
  ],
  dtype=tf.string)

"%F %H:%M:%E1S %z" format
ARRAY ['a', 'b'],
['c', 'd']
tf.constant(
  [['a', 'b'], ['c', 'd']],
  dtype=tf.string)
ARRAY< STRUCT< INT64, FLOAT64>> [(1, 1.0), (2, 1.0)],
[(2, 1.0), (3, 1.0)]
tf.sparse.from_dense(
  tf.constant(
    [
      [0, 1.0, 1.0, 0],
      [0, 0, 1.0, 1.0]
    ],
    dtype=tf.float64))
NULL NULL,
NULL
tf.constant(
  [123456789.0e10, 123456789.0e10],
  dtype=tf.float64)

tf.constant(
  [1234567890000000000, 1234567890000000000],
  dtype=tf.int64)

tf.constant(
  [' __MISSING__ ', ' __MISSING__ '],
  dtype=tf.string)

Supported SQL functions

When exporting models trained with the TRANSFORM clause, you can use the following SQL functions inside the TRANSFORM clause .

  • Operators
    • +, -, *, /, =, <, >, <=, >=, !=, <>, [NOT] BETWEEN, [NOT] IN, IS [NOT] NULL, IS [NOT] TRUE, IS [NOT] FALSE, NOT, AND, OR.
  • Conditional expressions
    • CASE expr, CASE, COALESCE, IF, IFNULL, NULLIF.
  • Mathematical functions
    • ABS, ACOS, ACOSH, ASINH, ATAN, ATAN2, ATANH, CBRT, CEIL, CEILING, COS, COSH, COT, COTH, CSC, CSCH, EXP, FLOOR, IS_INF, IS_NAN, LN, LOG, LOG10, MOD, POW, POWER, SEC, SECH, SIGN, SIN, SINH, SQRT, TAN, TANH.
  • Conversion functions
    • CAST AS INT64, CAST AS FLOAT64, CAST AS NUMERIC, CAST AS BIGNUMERIC, CAST AS STRING, SAFE_CAST AS INT64, SAFE_CAST AS FLOAT64
  • String functions
    • CONCAT, LEFT, LENGTH, LOWER, REGEXP_REPLACE, RIGHT, SPLIT, SUBSTR, SUBSTRING, TRIM, UPPER.
  • Date functions
    • Date, DATE_ADD, DATE_SUB, DATE_DIFF, DATE_TRUNC, EXTRACT, FORMAT_DATE, PARSE_DATE, SAFE.PARSE_DATE.
  • Datetime functions
    • DATETIME, DATETIME_ADD, DATETIME_SUB, DATETIME_DIFF, DATETIME_TRUNC, EXTRACT, PARSE_DATETIME, SAFE.PARSE_DATETIME.
  • Time functions
    • TIME, TIME_ADD, TIME_SUB, TIME_DIFF, TIME_TRUNC, EXTRACT, FORMAT_TIME, PARSE_TIME, SAFE.PARSE_TIME.
  • Timestamp functions
    • TIMESTAMP, TIMESTAMP_ADD, TIMESTAMP_SUB, TIMESTAMP_DIFF, TIMESTAMP_TRUNC, FORMAT_TIMESTAMP, PARSE_TIMESTAMP, SAFE.PARSE_TIMESTAMP, TIMESTAMP_MICROS, TIMESTAMP_MILLIS, TIMESTAMP_SECONDS, EXTRACT, STRING, UNIX_MICROS, UNIX_MILLIS, UNIX_SECONDS.
  • Manual preprocessing functions
    • ML.IMPUTER, ML.HASH_BUCKETIZE, ML.LABEL_ENCODER, ML.MULTI_HOT_ENCODER, ML.NGRAMS, ML.ONE_HOT_ENCODER, ML.BUCKETIZE, ML.MAX_ABS_SCALER, ML.MIN_MAX_SCALER, ML.NORMALIZER, ML.QUANTILE_BUCKETIZE, ML.ROBUST_SCALER, ML.STANDARD_SCALER.

Limitations

The following limitations apply when exporting models:

  • Model export is not supported if any of the following features were used during training:

    • ARRAY, TIMESTAMP, or GEOGRAPHY feature types were present in the input data.
  • Exported models for model types AUTOML_REGRESSOR and AUTOML_CLASSIFIER do not support AI Platform deployment for online prediction.

  • The model size limit is 1 GB for matrix factorization model export. The model size is roughly proportional to num_factors, so you can reduce num_factors during training to shrink the model size if you reach the limit.

  • For models trained with the BigQuery ML TRANSFORM clause for manual feature preprocessing, see the data types and functions supported for exporting.

  • Models trained with the BigQuery ML TRANSFORM clause before 18 September 2023 must be re-trained before they can be deployed through Model Registry for online prediction.

  • During model export, ARRAY<STRUCT<INT64, FLOAT64>>, ARRAY and TIMESTAMP are supported as pre-transformed data, but are not supported as post-transformed data.

Export BigQuery ML models

To export a model:

Console

  1. Open the BigQuery page in the Google Cloud console.

    Go to the BigQuery page

  2. In the navigation panel, in the Resources section, expand your project and click your dataset to expand it. Find and click the model that you're exporting.

  3. On the right side of the window, click Export Model.

    Export model

  4. In the Export model to Cloud Storage dialog:

    • For Select Cloud Storage location, browse for the bucket or folder location where you want to to export the model.
    • Click Export to export the model.

To check on the progress of the job, look near the top of the navigation for Job history for an Export job.

SQL

The EXPORT MODEL statement lets you export BigQuery ML models to Cloud Storage using GoogleSQL query syntax.

To export a BigQuery ML model in the Google Cloud console by using the EXPORT MODEL statement, follow these steps:

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. Click Compose new query.

  3. In the Query editor field, type your EXPORT MODEL statement.

    The following query exports a model named myproject.mydataset.mymodel to a Cloud Storage bucket with URI gs://bucket/path/to/saved_model/.

     EXPORT MODEL `myproject.mydataset.mymodel`
     OPTIONS(URI = 'gs://bucket/path/to/saved_model/')
     

  4. Click Run. When the query is complete, the following appears in the Query results pane: Successfully exported model.

bq

Use the bq extract command with the --model flag.

(Optional) Supply the --destination_format flag and pick the format of the model exported. (Optional) Supply the --location flag and set the value to your location.

bq --location=location extract \
--destination_format format \
--model project_id:dataset.model \
gs://bucket/model_folder

Where:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • destination_format is the format for the exported model: ML_TF_SAVED_MODEL (default), or ML_XGBOOST_BOOSTER.
  • project_id is your project ID.
  • dataset is the name of the source dataset.
  • model is the model you're exporting.
  • bucket is the name of the Cloud Storage bucket to which you're exporting the data. The BigQuery dataset and the Cloud Storage bucket must be in the same location.
  • model_folder is the name of the folder where the exported model files will be written.

Examples:

For example, the following command exports mydataset.mymodel in TensorFlow SavedModel format to a Cloud Storage bucket named mymodel_folder.

bq extract --model \
'mydataset.mymodel' \
gs://example-bucket/mymodel_folder

The default value of destination_format is ML_TF_SAVED_MODEL.

The following command exports mydataset.mymodel in XGBoost Booster format to a Cloud Storage bucket named mymodel_folder.

bq extract --model \
--destination_format ML_XGBOOST_BOOSTER \
'mydataset.mytable' \
gs://example-bucket/mymodel_folder

API

To export model, create an extract job and populate the job configuration.

(Optional) Specify your location in the location property in the jobReference section of the job resource.

  1. Create an extract job that points to the BigQuery ML model and the Cloud Storage destination.

  2. Specify the source model by using the sourceModel configuration object that contains the project ID, dataset ID, and model ID.

  3. The destination URI(s) property must be fully-qualified, in the format gs://bucket/model_folder.

  4. Specify the destination format by setting the configuration.extract.destinationFormat property. For example, to export a boosted tree model, set this property to the value ML_XGBOOST_BOOSTER.

  5. To check the job status, call jobs.get(job_id) with the ID of the job returned by the initial request.

    • If status.state = DONE, the job completed successfully.
    • If the status.errorResult property is present, the request failed, and that object will include information describing what went wrong.
    • If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors. Non-fatal errors are listed in the returned job object's status.errors property.

API notes:

  • As a best practice, generate a unique ID and pass it as jobReference.jobId when calling jobs.insert to create a job. This approach is more robust to network failure because the client can poll or retry on the known job ID.

  • Calling jobs.insert on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.ExtractJobConfiguration;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.ModelId;

// Sample to extract model to GCS bucket
public class ExtractModel {

  public static void main(String[] args) throws InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectName = "bigquery-public-data";
    String datasetName = "samples";
    String modelName = "model";
    String bucketName = "MY-BUCKET-NAME";
    String destinationUri = "gs://" + bucketName + "/path/to/file";
    extractModel(projectName, datasetName, modelName, destinationUri);
  }

  public static void extractModel(
      String projectName, String datasetName, String modelName, String destinationUri)
      throws InterruptedException {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      ModelId modelId = ModelId.of(projectName, datasetName, modelName);

      ExtractJobConfiguration extractConfig =
          ExtractJobConfiguration.newBuilder(modelId, destinationUri).build();

      Job job = bigquery.create(JobInfo.of(extractConfig));

      // Blocks until this job completes its execution, either failing or succeeding.
      Job completedJob = job.waitFor();
      if (completedJob == null) {
        System.out.println("Job not executed since it no longer exists.");
        return;
      } else if (completedJob.getStatus().getError() != null) {
        System.out.println(
            "BigQuery was unable to extract due to an error: \n" + job.getStatus().getError());
        return;
      }
      System.out.println("Model extract successful");
    } catch (BigQueryException ex) {
      System.out.println("Model extraction job was interrupted. \n" + ex.toString());
    }
  }
}

Model deployment

You can deploy the exported model to AI Platform as well as locally. If the model's TRANSFORM clause contains Date functions, Datetime functions, Time functions or Timestamp functions, you must use bigquery-ml-utils library in the container. The exception is if you are deploying through Model Registry, which does not need exported models or serving containers.

AI Platform deployment

Export model format Deployment
TensorFlow SavedModel (non-AutoML models) Deploy a TensorFlow SavedModel (1.15 runtime version or higher)
TensorFlow SavedModel (AutoML models) Not supported
XGBoost Booster Custom prediction routine (1.15 runtime version)

Note: Since there is preprocessing and postprocessing information saved in the exported files, you must use a Custom prediction routine to deploy the model with the extra exported files.

Local deployment

Export model format Deployment
TensorFlow SavedModel (non-AutoML models) SavedModel is a standard format, and you can deploy them in TensorFlow Serving docker container.

You can also leverage the local run of AI Platform online prediction.
TensorFlow SavedModel (AutoML models) Run the AutoML container.
XGBoost Booster To run XGBoost Booster models locally, you can use the exported main.py file:
  1. Download all of the files from Cloud Storage to the local directory.
  2. Unzip the predictor.py file from xgboost_predictor-0.1.tar.gz to the local directory.
  3. Run main.py (see instructions in main.py).

Prediction output format

This section provides the prediction output format of the exported models for each model type. All exported models support batch prediction; they can handle multiple input rows at a time. For example, there are two input rows in each of the following output format examples.

AUTOENCODER

Prediction output format Output sample
+------------------------+------------------------+------------------------+
|      LATENT_COL_1      |      LATENT_COL_2      |           ...          |
+------------------------+------------------------+------------------------+
|       [FLOAT]          |         [FLOAT]        |           ...          |
+------------------------+------------------------+------------------------+
        
+------------------+------------------+------------------+------------------+
|   LATENT_COL_1   |   LATENT_COL_2   |   LATENT_COL_3   |   LATENT_COL_4   |
+------------------------+------------+------------------+------------------+
|    0.21384512    |    0.93457112    |    0.64978097    |    0.00480489    |
+------------------+------------------+------------------+------------------+
        

AUTOML_CLASSIFIER

Prediction output format Output sample
+------------------------------------------+
| predictions                              |
+------------------------------------------+
| [{"scores":[FLOAT], "classes":[STRING]}] |
+------------------------------------------+
        
+---------------------------------------------+
| predictions                                 |
+---------------------------------------------+
| [{"scores":[1, 2], "classes":['a', 'b']},   |
|  {"scores":[3, 0.2], "classes":['a', 'b']}] |
+---------------------------------------------+
        

AUTOML_REGRESSOR

Prediction output format Output sample
+-----------------+
| predictions     |
+-----------------+
| [FLOAT]         |
+-----------------+
        
+-----------------+
| predictions     |
+-----------------+
| [1.8, 2.46]     |
+-----------------+
        

BOOSTED_TREE_CLASSIFIER and RANDOM_FOREST_CLASSIFIER

Prediction output format Output sample
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [FLOAT]     | [STRING]     | STRING          |
+-------------+--------------+-----------------+
        
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [0.1, 0.9]  | ['a', 'b']   | ['b']           |
+-------------+--------------+-----------------+
| [0.8, 0.2]  | ['a', 'b']   | ['a']           |
+-------------+--------------+-----------------+
        

BOOSTED_TREE_REGRESSOR AND RANDOM_FOREST_REGRESSOR

Prediction output format Output sample
+-----------------+
| predicted_label |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| predicted_label |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
        

DNN_CLASSIFIER

Prediction output format Output sample
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| ALL_CLASS_IDS | ALL_CLASSES | CLASS_IDS | CLASSES | LOGISTIC (binary only) | LOGITS | PROBABILITIES |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [INT64]       | [STRING]    | INT64     | STRING  | FLOAT                  | [FLOAT]| [FLOAT]       |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
        
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| ALL_CLASS_IDS | ALL_CLASSES | CLASS_IDS | CLASSES | LOGISTIC (binary only) | LOGITS | PROBABILITIES |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [0, 1]        | ['a', 'b']  | [0]       | ['a']   | [0.36]                 | [-0.53]| [0.64, 0.36]  |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [0, 1]        | ['a', 'b']  | [0]       | ['a']   | [0.2]                  | [-1.38]| [0.8, 0.2]    |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
        

DNN_REGRESSOR

Prediction output format Output sample
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
        

DNN_LINEAR_COMBINED_CLASSIFIER

Prediction output format Output sample
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| ALL_CLASS_IDS | ALL_CLASSES | CLASS_IDS | CLASSES | LOGISTIC (binary only) | LOGITS | PROBABILITIES |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [INT64]       | [STRING]    | INT64     | STRING  | FLOAT                  | [FLOAT]| [FLOAT]       |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
        
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| ALL_CLASS_IDS | ALL_CLASSES | CLASS_IDS | CLASSES | LOGISTIC (binary only) | LOGITS | PROBABILITIES |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [0, 1]        | ['a', 'b']  | [0]       | ['a']   | [0.36]                 | [-0.53]| [0.64, 0.36]  |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
| [0, 1]        | ['a', 'b']  | [0]       | ['a']   | [0.2]                  | [-1.38]| [0.8, 0.2]    |
+---------------+-------------+-----------+---------+------------------------+--------+---------------+
        

DNN_LINEAR_COMBINED_REGRESSOR

Prediction output format Output sample
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
        

KMEANS

Prediction output format Output sample
+--------------------+--------------+---------------------+
| CENTROID_DISTANCES | CENTROID_IDS | NEAREST_CENTROID_ID |
+--------------------+--------------+---------------------+
| [FLOAT]            | [INT64]      | INT64               |
+--------------------+--------------+---------------------+
        
+--------------------+--------------+---------------------+
| CENTROID_DISTANCES | CENTROID_IDS | NEAREST_CENTROID_ID |
+--------------------+--------------+---------------------+
| [1.2, 1.3]         | [1, 2]       | [1]                 |
+--------------------+--------------+---------------------+
| [0.4, 0.1]         | [1, 2]       | [2]                 |
+--------------------+--------------+---------------------+
        

LINEAR_REG

Prediction output format Output sample
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| FLOAT           |
+-----------------+
        
+-----------------+
| PREDICTED_LABEL |
+-----------------+
| [1.8]           |
+-----------------+
| [2.46]          |
+-----------------+
       

LOGISTIC_REG

Prediction output format Output sample
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [FLOAT]     | [STRING]     | STRING          |
+-------------+--------------+-----------------+
        
+-------------+--------------+-----------------+
| LABEL_PROBS | LABEL_VALUES | PREDICTED_LABEL |
+-------------+--------------+-----------------+
| [0.1, 0.9]  | ['a', 'b']   | ['b']           |
+-------------+--------------+-----------------+
| [0.8, 0.2]  | ['a', 'b']   | ['a']           |
+-------------+--------------+-----------------+
        

MATRIX_FACTORIZATION

Note: We currently only support taking an input user and output top 50 (predicted_rating, predicted_item) pairs sorted by predicted_rating in descending order.

Prediction output format Output sample
+--------------------+--------------+
| PREDICTED_RATING | PREDICTED_ITEM |
+------------------+----------------+
| [FLOAT]          | [STRING]       |
+------------------+----------------+
        
+--------------------+--------------+
| PREDICTED_RATING | PREDICTED_ITEM |
+------------------+----------------+
| [5.5, 1.7]       | ['A', 'B']     |
+------------------+----------------+
| [7.2, 2.7]       | ['B', 'A']     |
+------------------+----------------+
        

TENSORFLOW (imported)

Prediction output format
Same as the imported model

PCA

Prediction output format Output sample
+-------------------------+---------------------------------+
| PRINCIPAL_COMPONENT_IDS | PRINCIPAL_COMPONENT_PROJECTIONS |
+-------------------------+---------------------------------+
|       [INT64]           |             [FLOAT]             |
+-------------------------+---------------------------------+
        
+-------------------------+---------------------------------+
| PRINCIPAL_COMPONENT_IDS | PRINCIPAL_COMPONENT_PROJECTIONS |
+-------------------------+---------------------------------+
|       [1, 2]            |             [1.2, 5.0]          |
+-------------------------+---------------------------------+
        

TRANSFORM_ONLY

Prediction output format
Same as the columns specified in the model's TRANSFORM clause

XGBoost model visualization

You can visualize the boosted trees using the plot_tree Python API after model export. For example, you can leverage Colab without installing the dependencies:

  1. Export the boosted tree model to a Cloud Storage bucket.
  2. Download the model.bst file from the Cloud Storage bucket.
  3. In a Colab noteboook, upload the model.bst file to Files.
  4. Run the following code in the notebook:

    import xgboost as xgb
    import matplotlib.pyplot as plt
    
    model = xgb.Booster(model_file="model.bst")
    num_iterations = <iteration_number>
    for tree_num in range(num_iterations):
      xgb.plot_tree(model, num_trees=tree_num)
    plt.show
    

This example plots multiple trees (one tree per iteration):

Export model

Currently, we don't save feature names in the model, so you will see names such as "f0", "f1", and so on. You can find the corresponding feature names in the assets/model_metadata.json exported file using these names (such as "f0") as indexes.

Required permissions

To export a BigQuery ML model to Cloud Storage, you need permissions to access the BigQuery ML model, permissions to run an export job, and permissions to write the data to the Cloud Storage bucket.

BigQuery permissions

  • At a minimum, to export model, you must be granted bigquery.models.export permissions. The following predefined Identity and Access Management (IAM) roles are granted bigquery.models.export permissions:

    • bigquery.dataViewer
    • bigquery.dataOwner
    • bigquery.dataEditor
    • bigquery.admin
  • At a minimum, to run an export job, you must be granted bigquery.jobs.create permissions. The following predefined IAM roles are granted bigquery.jobs.create permissions:

    • bigquery.user
    • bigquery.jobUser
    • bigquery.admin

Cloud Storage permissions

  • To write the data to an existing Cloud Storage bucket, you must be granted storage.objects.create permissions. The following predefined IAM roles are granted storage.objects.create permissions:

    • storage.objectCreator
    • storage.objectAdmin
    • storage.admin

For more information on IAM roles and permissions in BigQuery ML, see Access control.

Location considerations

When you choose a location for your data, consider the following:

    Colocate your Cloud Storage buckets for exporting data:
    • If your BigQuery dataset is in the EU multi-region, the Cloud Storage bucket containing the data that you export must be in the same multi-region or in a location that is contained within the multi-region. For example, if your BigQuery dataset is in the EU multi-region, the Cloud Storage bucket can be located in the europe-west1 Belgium region, which is within the EU.

      If your dataset is in the US multi-region, you can export data into a Cloud Storage bucket in any location.

    • If your dataset is in a region, your Cloud Storage bucket must be in the same region. For example, if your dataset is in the asia-northeast1 Tokyo region, your Cloud Storage bucket cannot be in the ASIA multi-region.
    Develop a data management plan:

For more information on Cloud Storage locations, see Bucket Locations in the Cloud Storage documentation.

Move BigQuery data between locations

You cannot change the location of a dataset after it is created, but you can make a copy of the dataset.

Quota policy

For information on export job quotas, see Export jobs on the Quotas and limits page.

Pricing

There is no charge for exporting BigQuery ML models, but exports are subject to BigQuery's Quotas and limits. For more information on BigQuery pricing, see the Pricing page.

After the data is exported, you are charged for storing the data in Cloud Storage. For more information on Cloud Storage pricing, see the Cloud Storage Pricing page.

What's next