In this tutorial, you import TensorFlow models into a BigQuery ML dataset. Then, you use a SQL query to make predictions from the imported models.
Objectives
- Use the
CREATE MODEL
statement to import TensorFlow models into BigQuery ML. - Use the
ML.PREDICT
function to make predictions with the imported TensorFlow models.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
Ensure that the BigQuery API is enabled.
- Ensure that you have the necessary permissions to perform the tasks in this document.
Required roles
If you create a new project, you are the project owner, and you are granted all of the required Identity and Access Management (IAM) permissions that you need to complete this tutorial.
If you are using an existing project, the
BigQuery Studio Admin (roles/bigquery.studioAdmin
) role grants all of the
permissions that are needed to complete this tutorial.
Make sure that you have the following role or roles on the project:
BigQuery Studio Admin (roles/bigquery.studioAdmin
).
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
For more information about IAM permissions in BigQuery, see BigQuery permissions.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, click your project name.
Click
View actions > Create dataset.On the Create dataset page, do the following:
For Dataset ID, enter
bqml_tutorial
.For Location type, select Multi-region, and then select US (multiple regions in United States).
The public datasets are stored in the
US
multi-region. For simplicity, store your dataset in the same location.- Leave the remaining default settings as they are, and click Create dataset.
bq
To create a new dataset, use the
bq mk
command
with the --location
flag. For a full list of possible parameters, see the
bq mk --dataset
command
reference.
Create a dataset named
bqml_tutorial
with the data location set toUS
and a description ofBigQuery ML tutorial dataset
:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--dataset
flag, the command uses the-d
shortcut. If you omit-d
and--dataset
, the command defaults to creating a dataset.Confirm that the dataset was created:
bq ls
API
Call the datasets.insert
method with a defined dataset resource.
{ "datasetReference": { "datasetId": "bqml_tutorial" } }
Import a TensorFlow model
The following steps show you how to import a model from Cloud Storage.
The path to the model is
gs://cloud-training-demos/txtclass/export/exporter/1549825580/*
. The imported
model name is imported_tf_model
.
Note the Cloud Storage URI ends in a wildcard character (*
).
This character indicates that BigQuery ML should import any assets
associated with the model.
The imported model is a TensorFlow text classifier model that predicts which website published a given article title.
To import the TensorFlow model into your dataset, follow these steps.
Console
In the Google Cloud console, go to the BigQuery page.
For Create new, click SQL query.
In the query editor, enter this
CREATE MODEL
statement, and then click Run.CREATE OR REPLACE MODEL `bqml_tutorial.imported_tf_model` OPTIONS (MODEL_TYPE='TENSORFLOW', MODEL_PATH='gs://cloud-training-demos/txtclass/export/exporter/1549825580/*')
When the operation is complete, you should see a message like
Successfully created model named imported_tf_model
.Your new model appears in the Resources panel. Models are indicated by the model icon: .
If you select the new model in the Resources panel, information about the model appears below the Query editor.
bq
Import the TensorFlow model from Cloud Storage by entering the following
CREATE MODEL
statement.bq query --use_legacy_sql=false \ "CREATE OR REPLACE MODEL `bqml_tutorial.imported_tf_model` OPTIONS (MODEL_TYPE='TENSORFLOW', MODEL_PATH='gs://cloud-training-demos/txtclass/export/exporter/1549825580/*')"
After you import the model, verify that the model appears in the dataset.
bq ls bqml_tutorial
The output is similar to the following:
tableId Type ------------------- ------- imported_tf_model MODEL
API
Insert a new job and populate the jobs#configuration.query property in the request body.
{ "query": "CREATE MODEL `PROJECT_ID:bqml_tutorial.imported_tf_model` OPTIONS(MODEL_TYPE='TENSORFLOW' MODEL_PATH='gs://cloud-training-demos/txtclass/export/exporter/1549825580/*')" }
Replace PROJECT_ID
with the name of your
project and dataset.
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Import the model by using the TensorFlowModel
object.
For more information about importing TensorFlow models into
BigQuery ML, including format and storage requirements, see the
CREATE MODEL
statement for importing TensorFlow models.
Make predictions with the imported TensorFlow model
After importing the TensorFlow model, you use the
ML.PREDICT
function
to make predictions with the model.
The following query uses imported_tf_model
to make predictions using input
data from the full
table in the public dataset hacker_news
. In the query,
the TensorFlow model's serving_input_fn
function specifies that
the model expects a single input string named input
. The subquery assigns the
alias input
to the title
column in the subquery's SELECT
statement.
To make predictions with the imported TensorFlow model, follow these steps.
Console
In the Google Cloud console, go to the BigQuery page.
Under Create new, click SQL query.
In the query editor, enter this query that uses the
ML.PREDICT
function.SELECT * FROM ML.PREDICT(MODEL `bqml_tutorial.imported_tf_model`, ( SELECT title AS input FROM bigquery-public-data.hacker_news.full ) )
The query results should look like this:
bq
Enter this command to run the query that uses ML.PREDICT
.
bq query \ --use_legacy_sql=false \ 'SELECT * FROM ML.PREDICT( MODEL `bqml_tutorial.imported_tf_model`, (SELECT title AS input FROM `bigquery-public-data.hacker_news.full`))'
The results should look like this:
+------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | dense_1 | input | +------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | ["0.6251608729362488","0.2989124357700348","0.07592673599720001"] | How Red Hat Decides Which Open Source Companies t... | | ["0.014276246540248394","0.972910463809967","0.01281337533146143"] | Ask HN: Toronto/GTA mastermind around side income for big corp. dev? | | ["0.9821603298187256","1.8601855117594823E-5","0.01782100833952427"] | Ask HN: What are good resources on strategy and decision making for your career? | | ["0.8611106276512146","0.06648492068052292","0.07240450382232666"] | Forget about promises, use harvests | +------------------------------------------------------------------------+----------------------------------------------------------------------------------+
API
Insert a new job and
populate the
jobs#configuration.query
property as in the request body. Replace project_id:bqml_tutorial
with
the name of your project.
{ "query": "SELECT * FROM ML.PREDICT(MODEL `project_id.bqml_tutorial.imported_tf_model`, (SELECT * FROM input_data))" }
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Use the predict
function to run the remote model:
The results should look like this:
In the query results, the dense_1
column contains an array of
probability values, and the input
column contains the corresponding
string values from the input table. Each array element value represents
the probability that the corresponding input string is an article title
from a particular publication.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
Console
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
gcloud
Delete a Google Cloud project:
gcloud projects delete PROJECT_ID
Delete individual resources
Alternatively, remove the individual resources used in this tutorial:
Optional: Delete the dataset.
What's next
- For an overview of BigQuery ML, see Introduction to BigQuery ML.
- To get started using BigQuery ML, see Create machine learning models in BigQuery ML.
- For more information about importing TensorFlow models, see
The
CREATE MODEL
statement for importing TensorFlow models. - For more information about working with models, see these resources:
- For more information on using the BigQuery DataFrames API in a BigQuery notebook, see: