Exporting a BigQuery ML model for online prediction

This tutorial shows how to export a BigQuery ML model and then deploy the model either on AI Platform or on a local machine. You will use the iris table from the BigQuery public datasets and work through the following three end-to-end scenarios:

  • Train and deploy a logistic regression model - also applies to DNN classifier, DNN regressor, k-means, linear regression, and matrix factorization models.
  • Train and deploy a Boosted Tree classifier model - also applies to Boosted Tree regressor model.
  • Train and deploy an AutoML classifier model - also applies to AutoML regressor model.

Costs

This tutorial uses billable components of Google Cloud, including:

  • BigQuery ML
  • Cloud Storage
  • AI Platform (optional, used for online prediction)

For more information about BigQuery ML costs, see the BigQuery ML pricing page.

For more information about Cloud Storage costs, see the Cloud Storage pricing page.

For more information about AI Platform costs, see the prediction nodes and resource allocation page.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to Enable the BigQuery API.

    Enable the API

  5. Enable the AI Platform Training and Prediction API and Compute Engine APIs.

    Enable the APIs

  6. Install the Google Cloud SDK and the gcloud command-line tool.

Create your dataset

The first step is to create a BigQuery dataset to store your ML model. To create your dataset:

  1. In the Google Cloud Console, go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. In the navigation panel, under the Resources section, click your project name.

  3. On the right side, in the details panel, click Create dataset.

    Create dataset.

  4. On the Create dataset page:

    • For Dataset ID, enter bqml_tutorial.
    • For Data location, choose United States (US). Currently, the public datasets are stored in the US multiregional location. For simplicity, place your dataset in the same location.

      Create dataset page.

  5. Leave all of the other default settings in place and click Create dataset.

Train and deploy a logistic regression model

Train the model

Train a logistic regression model that predicts iris type using the BigQuery ML CREATE MODEL statement. This training job should take approximately 1 minute to complete.

bq query --use_legacy_sql=false \
  'CREATE MODEL `bqml_tutorial.iris_model`
  OPTIONS (model_type="logistic_reg",
      max_iterations=10, input_label_cols=["species"])
  AS SELECT
    *
  FROM
    `bigquery-public-data.ml_datasets.iris`;'

Export the model

Export the model to a Cloud Storage bucket using the bq command-line tool. For additional ways to export models, see Exporting BigQuery ML models. This extract job should take less than 1 minute to complete.

bq extract -m bqml_tutorial.iris_model gs://some/gcs/path/iris_model

Local deployment and serving

You can deploy exported TensorFlow models using the TensorFlow Serving Docker container. The following steps require you to install Docker.

Download the exported model files to a temporary directory

mkdir tmp_dir
gsutil cp -r gs://some/gcs/path/iris_model tmp_dir

Create a version subdirectory

This step sets a version number (1 in this case) for the model.

mkdir -p serving_dir/iris_model/1
cp -r tmp_dir/iris_model/* serving_dir/iris_model/1
rm -r tmp_dir

Pull the docker image

docker pull tensorflow/serving

Run the Docker container

docker run -p 8500:8500 --network="host" --mount type=bind,source=`pwd`/serving_dir/iris_model,target=/models/iris_model -e MODEL_NAME=iris_model -t tensorflow/serving &

Run the prediction

curl -d '{"instances": [{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}]}' -X POST http://localhost:8501/v1/models/iris_model:predict

Online deployment and serving

This section uses the gcloud command-line tool to deploy and run predictions against the exported model.

For more details about deploying a model to AI Platform for online/batch predictions, see Deploying models.

Create a model resource

MODEL_NAME="IRIS_MODEL"
gcloud ai-platform models create $MODEL_NAME

Create a model version

1) Set the environment variables

MODEL_DIR="gs://some/gcs/path/iris_model"
// Select a suitable version for this model
VERSION_NAME="v1"
FRAMEWORK="TENSORFLOW"

2) Create the version

gcloud ai-platform versions create $VERSION_NAME --model=$MODEL_NAME --origin=$MODEL_DIR --runtime-version=1.15 --framework=$FRAMEWORK

This step might take a few minutes to complete. You should see the message Creating version (this might take a few minutes).......

3) (optional) Get information about your new version

gcloud ai-platform versions describe $VERSION_NAME --model $MODEL_NAME

You should see output similar to this:

createTime: '2020-02-28T16:30:45Z'
deploymentUri: gs://your_bucket_name
framework: TENSORFLOW
machineType: mls1-c1-m2
name: projects/[YOUR-PROJECT-ID]/models/IRIS_MODEL/versions/v1
pythonVersion: '2.7'
runtimeVersion: '1.15'
state: READY

Online prediction

The details about running online predictions against a deployed model are available at https://cloud.google.com/ai-platform/prediction/docs/online-predict#requesting_predictions

1) Create a newline-delimited JSON file for inputs, for example instances.json file with the following content

{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}
{"sepal_length":5.3, "sepal_width":3.7, "petal_length":1.5, "petal_width":0.2}

2) Setup env variables for predict

INPUT_DATA_FILE="instances.json"

3) Run predict

gcloud ai-platform predict --model $MODEL_NAME --version $VERSION_NAME --json-instances $INPUT_DATA_FILE

Train and deploy a Boosted Tree classifier model

Train the model

Train a Boosted Tree classifier model that predicts iris type using the BigQuery ML CREATE MODEL statement. This training job should take approximately 7 minutes to complete.

bq query --use_legacy_sql=false \
  'CREATE MODEL `bqml_tutorial.boosted_tree_iris_model`
  OPTIONS (model_type="boosted_tree_classifier",
      max_iterations=10, input_label_cols=["species"])
  AS SELECT
    *
  FROM
    `bigquery-public-data.ml_datasets.iris`;'

Export the model

Export the model to a Cloud Storage bucket using the bq command-line tool. For additional ways to export models, see Exporting BigQuery ML models.

bq extract --destination_format ML_XGBOOST_BOOSTER -m bqml_tutorial.boosted_tree_iris_model gs://some/gcs/path/boosted_tree_iris_model

Local deployment and serving

In the exported files, there is a main.py file for local run.

Download the exported model files to a local directory

mkdir serving_dir
gsutil cp -r gs://some/gcs/path/boosted_tree_iris_model serving_dir

Extract predictor.py

tar -xvf serving_dir/boosted_tree_iris_model/xgboost_predictor-0.1.tar.gz -C serving_dir/boosted_tree_iris_model/

Install XGBoost library

Install the XGBoost library - version 0.82 or later.

Run the prediction

cd serving_dir/boosted_tree_iris_model/
python main.py '[{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}]'

Online deployment and serving

This section uses the gcloud command-line tool to deploy and run predictions against the exported model.

For more details about deploying a model to AI Platform for online/batch predictions using custom routines, see Deploying models.

Create a model resource

MODEL_NAME="BOOSTED_TREE_IRIS_MODEL"
gcloud ai-platform models create $MODEL_NAME

Create a model version

1) Set the environment variables

MODEL_DIR="gs://some/gcs/path/boosted_tree_iris_model"
VERSION_NAME="v1"

2) Create the version

gcloud beta ai-platform versions create $VERSION_NAME --model=$MODEL_NAME --origin=$MODEL_DIR --package-uris=${MODEL_DIR}/xgboost_predictor-0.1.tar.gz --prediction-class=predictor.Predictor --runtime-version=1.15

This step might take a few minutes to complete. You should see the message Creating version (this might take a few minutes).......

3) (optional) Get information about your new version

gcloud ai-platform versions describe $VERSION_NAME --model $MODEL_NAME

You should see output similar to this:

createTime: '2020-02-07T00:35:42Z'
deploymentUri: gs://some/gcs/path/boosted_tree_iris_model
etag: rp090ebEnQk=
machineType: mls1-c1-m2
name: projects/[YOUR-PROJECT-ID]/models/BOOSTED_TREE_IRIS_MODEL/versions/v1
packageUris:
- gs://some/gcs/path/boosted_tree_iris_model/xgboost_predictor-0.1.tar.gz
predictionClass: predictor.Predictor
pythonVersion: '2.7'
runtimeVersion: '1.15'
state: READY

Online prediction

For more details about running online predictions against a deployed model, see Requesting predictions.

1) Create a newline-delimited JSON file for inputs. For example, instances.json file with the following content:

{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}
{"sepal_length":5.3, "sepal_width":3.7, "petal_length":1.5, "petal_width":0.2}

2) Set up environment variables for predict

INPUT_DATA_FILE="instances.json"

3) Run predict

gcloud ai-platform predict --model $MODEL_NAME --version $VERSION_NAME --json-instances $INPUT_DATA_FILE

Train and deploy an AutoML classifier model

Train the model

Train an AutoML classifier model that predicts iris type using the BigQuery ML CREATE MODEL statement. AutoML models need at least 1000 rows of input data. Because ml_datasets.iris only has 150 rows, we duplicate the data 10 times. This training job should take around 2 hours to complete.

bq query --use_legacy_sql=false \
  'CREATE MODEL `bqml_tutorial.automl_iris_model`
  OPTIONS (model_type="automl_classifier",
      budget_hours=1, input_label_cols=["species"])
  AS SELECT
    * EXCEPT(multiplier)
  FROM
    `bigquery-public-data.ml_datasets.iris`, unnest(GENERATE_ARRAY(1, 10)) as multiplier;'

Export the model

Export the model to a Cloud Storage bucket using the bq command-line tool. For additional ways to export models, see Exporting BigQuery ML models.

bq extract -m bqml_tutorial.automl_iris_model gs://some/gcs/path/automl_iris_model

Local deployment and serving

For details about building AutoML containers, see Exporting models. The following steps require you to install Docker.

Copy exported model files to a local directory

mkdir automl_serving_dir
gsutil cp -r gs://some/gcs/path/automl_iris_model/* automl_serving_dir/

Pull AutoML Docker image

docker pull gcr.io/cloud-automl-tables-public/model_server

Start Docker container

docker run -v `pwd`/automl_serving_dir:/models/default/0000001 -p 8080:8080 -it gcr.io/cloud-automl-tables-public/model_server

Run the prediction

1) Create a newline-delimited JSON file for inputs. For example, input.json file with the following contents:

{"instances": [{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0},
{"sepal_length":5.3, "sepal_width":3.7, "petal_length":1.5, "petal_width":0.2}]}

2) Make the predict call

curl -X POST --data @input.json http://localhost:8080/predict

Online deployment and serving

Online prediction for AutoML regressor and AutoML classifier models is not supported in AI Platform.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  • You can delete the project you created.
  • Or you can keep the project and delete the dataset and Cloud Storage bucket.

Stopping Docker container

1) List all running Docker containers.

docker ps

2) Stop the container with the applicable container ID from the container list.

docker stop container_id

Deleting AI Platform resources

1) Delete the model version.

gcloud ai-platform versions delete $VERSION_NAME --model=$MODEL_NAME

2) Delete the model.

gcloud ai-platform models delete $MODEL_NAME

Deleting your dataset

Deleting your project removes all datasets and all tables in the project. If you prefer to reuse the project, you can delete the dataset you created in this tutorial:

  1. If necessary, open the BigQuery web UI.

    Go to the BigQuery web UI

  2. In the navigation, click the bqml_tutorial dataset you created.

  3. Click Delete dataset on the right side of the window. This action deletes the dataset, the table, and all the data.

  4. In the Delete dataset dialog box, confirm the delete command by typing the name of your dataset (bqml_tutorial) and then click Delete.

Deleting your Cloud Storage bucket

Deleting your project removes all Cloud Storage buckets in the project. If you prefer to reuse the project, you can delete the bucket you created in this tutorial

  1. Open the Cloud Storage browser in the Google Cloud Console.
    Open the Cloud Storage browser
  2. Select the checkbox of the bucket you want to delete.

  3. Click Delete.

  4. In the overlay window that appears, confirm you want to delete the bucket and its contents by clicking Delete.

Deleting your project

To delete the project:

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

What's next