We will treat each positive movie rating (rating >= 4) as a product page view event. We will train a recommendation model of type "Others You May Like" that will make movie recommendations based on any user or a seed movie in our dataset.
- Initial steps to start training the model: ~1.5 hours.
- Waiting for the model to train: ~2 days.
- Evaluating the model predictions and cleaning up: ~30 minutes.
- Learn how to import products and user events data from BigQuery into Retail API.
- Train and evaluate recommendation models.
CostsThis tutorial uses billable components of Google Cloud, including:
- Cloud Storage
- Recommendations AI
For more information about Cloud Storage costs, see the Cloud Storage pricing page.
For more information about BigQuery costs, see the BigQuery pricing page.
For more information about Recommendations AI costs, see the Recommendations AI pricing page.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
Preparing the dataset
Open Cloud Console, select your Google Cloud project, and click the Activate Cloud Shell button at the top of the Console.
A Cloud Shell session opens inside a new frame at the bottom of the Console and displays a command-line prompt.
Import the dataset
Using the Cloud Shell, download and unpack the source dataset:
wget http://files.grouplens.org/datasets/movielens/ml-latest.zip unzip ml-latest.zip
Create a Cloud Storage bucket and upload the data into it:
gsutil mb gs://PROJECT_ID-movielens-data gsutil cp ml-latest/movies.csv ml-latest/ratings.csv \ gs://PROJECT_ID-movielens-data
Create a BigQuery dataset:
bq mk movielens
movies.csvinto a new movies BigQuery table:
bq load --skip_leading_rows=1 movielens.movies \ gs://PROJECT_ID-movielens-data/movies.csv \ movieId:integer,title,genres
ratings.csvinto a new ratings BigQuery table:
bq load --skip_leading_rows=1 movielens.ratings \ gs://PROJECT_ID-movielens-data//ratings.csv \ userId:integer,movieId:integer,rating:float,time:timestamp
Create BigQuery views
Create a view that converts the movies table into the retail product catalog schema:
bq mk --project_id=PROJECT_ID \ --use_legacy_sql=false \ --view ' SELECT CAST(movieId AS string) AS id, SUBSTR(title, 0, 128) AS title, SPLIT(genres, "|") AS categories FROM PROJECT_ID.movielens.movies' \ movielens.products
Now the new view has the schema that the Retail API expects.
Now let's convert movie ratings into user events. We will:
- Ignore negative movie ratings (<4)
- Treat every positive rating as a product page view event
- Rescale the Movielens timeline into the last 90 days. We do this for two
- Retail API requires that user events are no older than 2015. Movielens ratings go back to 1995.
- Retail API uses the last 90 days of user events when serving prediction requests for a user. Every user will appear to have recent events when we make predictions for any user later on.
Create a BigQuery view. The following command uses a SQL query that meets the Retail API conversion requirements listed above.
bq mk --project_id=PROJECT_ID \ --use_legacy_sql=false \ --view ' WITH t AS ( SELECT MIN(UNIX_SECONDS(time)) AS old_start, MAX(UNIX_SECONDS(time)) AS old_end, UNIX_SECONDS(TIMESTAMP_SUB( CURRENT_TIMESTAMP(), INTERVAL 90 DAY)) AS new_start, UNIX_SECONDS(CURRENT_TIMESTAMP()) AS new_end FROM `PROJECT_ID.movielens.ratings`) SELECT CAST(userId AS STRING) AS visitorId, "detail-page-view" AS eventType, FORMAT_TIMESTAMP( "%Y-%m-%dT%X%Ez", TIMESTAMP_SECONDS(CAST( (t.new_start + (UNIX_SECONDS(time) - t.old_start) * (t.new_end - t.new_start) / (t.old_end - t.old_start)) AS int64))) AS eventTime, [STRUCT(STRUCT(movieId AS id) AS product)] AS productDetails, FROM `PROJECT_ID.movielens.ratings`, t WHERE rating >= 4' \ movielens.user_events
Importing product catalog and user events into Retail API
We are now ready to import the product catalog and the user event data into Retail API.
Enable the Retail API for your Google Cloud project.
Click Get started.
Go to the Recommendations AI Data page in the Google Cloud Console.
Go to the Recommendations AI Data page
Import product catalog
Fill in the form to import products from the BigQuery view you created above:
- Select import type: Product Catalog.
- Select source of data: BigQuery.
- Select schema of data: Retail Product Schema.
- Enter the name of the products BigQuery view you created above.
Wait until all products have been imported, which should take 5–10 minutes.
You can check the import activity for the import operation status. When the import is complete, the import operation status changes to Succeeded.
Import user events
Import the user_events BigQuery view:
- Select import type: User Events.
- Select source of data: BigQuery.
- Select schema of data: Retail User Events Schema.
- Enter the name of the
user_eventsBigQuery view you created above.
Wait until at least a million events have been imported before proceeding to the next step, in order to meet the data requirements for training a new model.
You can check the import activity for the operation status. The process takes about an hour to complete.
Training and evaluating recommendation models
Create a recommendation model
Go to the Recommendations AI Models page in the Google Cloud Console.
Go to the Recommendations AI Models page
Click Create model:
Give the model a name.
Select Others you may like as the model type.
Choose Click-through rate (CTR) as the business objective.
Your new model starts training.
Create a placement
Go to the Recommendations AI Placements page in the Google Cloud Console.
Go to the Recommendations AI Placements page
Click Create placement.
Select the model you created, give the placement a name and click Create:
Wait for the model to be "Ready to query"
It takes about two days for the model to train and become ready to query.
To view the status, open the created placement:
The Ready to query field indicates when the process is complete.
Once the model is ready to query:
- Open the placement detail page.
- Click Add item.
Enter a seed movie ID, such as
4993for "The Lord of the Rings: The Fellowship of the Ring (2001)".
Click Prediction preview to see the list of recommended items on the right of the page.
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete individual resources
Open the Placements page and delete the placement you created.
Open the Models page and delete the model.
Delete the BigQuery dataset in Cloud Shell:
bq rm --recursive --dataset movielens
Delete the Cloud Storage bucket:
gsutil rm gs://PROJECT_ID-movielens-data/movies.csv gsutil rm gs://PROJECT_ID-movielens-data/ratings.csv gsutil rb gs://PROJECT_ID-movielens-data/