This preview documentation is deprecated.

Discovery for Media has been renamed to media recommendations. It is now part of Vertex AI Search, and you can manage it using the Search and Conversation console. We recommend that you use the Search and Conversation console. After March 31, 2024, the Discovery Engine console will no longer be available.

For information about what's changing, see Switch from Discovery for Media to media recommendations. For more information about recommendations in Vertex AI Search, see About recommendations.

Personalized movie recommendations

In this tutorial, we will use the Movielens dataset to demonstrate how to upload your media content catalog and user events into the Discovery Engine API and train a personalized movie recommendation model. The Movielens dataset contains a catalog of movies (documents) and user movie ratings (user events).

We will treat each positive movie rating (rating >= 4) as both a view-item and view-home-page event to meet the minimal data requirements to create a recommendations model. We will train a recommendation model of type Others You May Like that will make movie recommendations based on any user or a seed movie in our dataset and will be configured to optimize for Click-Through-Rate (CTR).

Estimated time:

Initial steps to start training the model: ~1.5 hours.
Waiting for the model to train: ~2 days.
Evaluating the model predictions and cleaning up: ~30 minutes.

Objectives

Learn how to import products and user events data from BigQuery into the Discovery Engine API.
Train and evaluate recommendation models.

Costs

This tutorial uses billable components of Google Cloud, including:

Cloud Storage
BigQuery
Discovery Engine

For more information about Cloud Storage costs, see the Cloud Storage pricing page.

For more information about BigQuery costs, see the BigQuery pricing page.

For more information about Discovery Engine costs, see the Discovery Engine pricing page.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Prepare the dataset

Open Google Cloud console.
Select your Google Cloud project.
Take note of the project ID in the Project info card on the dashboard page. You will need the project ID for the following procedures.
Click the Activate Cloud Shell button at the top of the Console. A Cloud Shell session opens inside a new frame at the bottom of the Google Cloud console and displays a command-line prompt.

Import the dataset

Run the following using your project ID to set the default project for the command-line.
```
gcloud config set project PROJECT_ID
```

Using the Cloud Shell, download and unpack the source dataset:

wget https://files.grouplens.org/datasets/movielens/ml-latest.zip
unzip ml-latest.zip

Create a Cloud Storage bucket and upload the data into it:

gsutil mb gs://PROJECT_ID-movielens-data
gsutil cp ml-latest/movies.csv ml-latest/ratings.csv \
  gs://PROJECT_ID-movielens-data

Create a BigQuery dataset:
```
bq mk movielens
```

Load movies.csv into a new movies BigQuery table:

bq load --skip_leading_rows=1 movielens.movies \
  gs://PROJECT_ID-movielens-data/movies.csv \
  movieId:integer,title,genres

Load ratings.csv into a new ratings BigQuery table:

bq load --skip_leading_rows=1 movielens.ratings \
  gs://PROJECT_ID-movielens-data/ratings.csv \
  userId:integer,movieId:integer,rating:float,time:timestamp

Create BigQuery views

Create a view that converts the movies table into the Discovery Engine document schema:

bq mk --project_id=PROJECT_ID \
 --use_legacy_sql=false \
 --view '
  WITH t AS (
    SELECT
      CAST(movieId AS string) AS id,
      SUBSTR(title, 0, 128) AS title,
      SPLIT(genres, "|") AS categories
      FROM `PROJECT_ID.movielens.movies`)
    SELECT
      id, "default_schema" as schemaId, null as parentDocumentId, 
      TO_JSON_STRING(STRUCT(title as title, categories as categories, 
      CONCAT("http://mytestdomain.movie/content/", id) as uri,
      "2023-01-01T00:00:00Z" as available_time, 
      "2033-01-01T00:00:00Z" as expire_time,
      "movie" as media_type)) as jsonData
    FROM t;' \
movielens.movies_view

Now the new view has the schema that the Discovery Engine API expects.

Go to the BigQuery page in Google Cloud console.

Go to BigQuery
In the Explorer pane, expand your project name, expand the movielens dataset and click movies_view to open the query page for this view.

Now let's convert movie ratings into user events. We will:

Ignore negative movie ratings (<4)
Treat every positive rating as both a (view-item) and (view-home-page) event. Both types of event are needed to meet minimal data requirements for an Others You May Like recommendations model.
Rescale the Movielens timeline into the last 90 days. We do this for two reasons:
- Discovery Engine API requires that user events are no older than 2015. Movielens ratings go back to 1995.
- Discovery Engine API uses the last 90 days of user events when serving prediction requests for a user. Every user will appear to have recent events when we make predictions for any user later on.

Create a BigQuery view. The following Cloud Shell command uses a SQL query that meets the Discovery Engine API conversion requirements listed above.

bq mk --project_id=PROJECT_ID \
 --use_legacy_sql=false \
 --view '
 WITH t AS (
  SELECT
    MIN(UNIX_SECONDS(time)) AS old_start,
    MAX(UNIX_SECONDS(time)) AS old_end,
    UNIX_SECONDS(TIMESTAMP_SUB(
    CURRENT_TIMESTAMP(), INTERVAL 90 DAY)) AS new_start,
    UNIX_SECONDS(CURRENT_TIMESTAMP()) AS new_end
  FROM `PROJECT_ID.movielens.ratings`)
  SELECT
    CAST(userId AS STRING) AS userPseudoId,
    "view-item" AS eventType,
    FORMAT_TIMESTAMP("%Y-%m-%dT%X%Ez",
    TIMESTAMP_SECONDS(CAST(
      (t.new_start + (UNIX_SECONDS(time) - t.old_start) *
      (t.new_end - t.new_start) / (t.old_end - t.old_start))
    AS int64))) AS eventTime,
    [STRUCT(movieId AS id, null AS name)] AS documents,
  FROM `PROJECT_ID.movielens.ratings`, t
  WHERE rating >= 4
UNION ALL
  SELECT
    CAST(userId AS STRING) AS userPseudoId,
    "view-home-page" AS eventType,
    FORMAT_TIMESTAMP("%Y-%m-%dT%X%Ez",
    TIMESTAMP_SECONDS(CAST(
      (t.new_start + (UNIX_SECONDS(time) - t.old_start) *
      (t.new_end - t.new_start) / (t.old_end - t.old_start))
    AS int64))) AS eventTime, null AS documents
  FROM `PROJECT_ID.movielens.ratings`, t
  WHERE rating >= 4;' \
  movielens.user_events

Import content catalog and user events into Discovery Engine API

We are now ready to import the product catalog and the user event data into Discovery Engine API

Enable the Discovery Engine API for your Google Cloud project.

ENABLE THE DISCOVERY ENGINE API
Click Turn On API.
Click Continue.
Read the Data Use Terms and click Accept.
Click Contine, then click Create to create the Datastore.
Click Get Started.
Click Import.

Import content catalog

Fill in the form to import content from the BigQuery view you created above:
- Select data type: Media Catalog.
- Select source of data: BigQuery.
- Enter the name of the movies BigQuery view you created above (PROJECT_ID.movielens.movies_view).
- Select the default branch name.
  
  Note: The BigQuery view name is obtained from the view ID by replacing the colon (:) with a period (.). You can find the view ID on the BigQuery page by selecting your view from the left menu and going to the DETAILS tab.
Click Import.

Note: It may take a few minutes for the Discovery Engine API to fully activate and you may get an error in this step if the activation has not finished. If that happens, try again a few minutes later.
Wait until all products have been imported, which should take 5–10 minutes.

You can check the import activity for the import operation status. When the import is complete, the import operation status changes to Succeeded.

Import user events

Import the user_events BigQuery view:
- Select import type: User Events.
- Select source of data: BigQuery.
- Enter the name of the user_events BigQuery view you created above (PROJECT_ID.movielens.user_events).
Click Import.
Wait until at least a million events have been imported before proceeding to the next step, in order to meet the data requirements for training a new model.

You can check the import activity for the operation status. The process takes about an hour to complete since we are importing millions of rows.

Train and evaluate recommendation models

Create a recommendation model

Go to the Discovery Engine Models page in the Google Cloud console.

Go to the Models page
Click Create model:
- Give the model a name.
- Select Others you may like as the model type.
- Choose Click-through rate (CTR) as the business objective.
Click Create.

Your new model starts training.

Create a serving config

Go to the Discovery Engine Serving Configs page in the Google Cloud console.

Go to the Serving Configs page
Click Create:
- Select Recommendation.
- Give the serving config a name and click Continue
- Select the model you created and click Continue
- Leave the preferences to their default values.
Click Create.

Wait for the model to be "Ready to query"

It takes about two days for the model to train and become ready to query.

To view the status, click the created serving config on the Serving configs page.

The Model ready to query field indicates Yes when the process is complete.

Preview recommendations

Once the model is ready to query:

Go to the Discovery Engine Evaluate page in the Google Cloud console.

Go to the Evaluate page
Select the serving config name from the dropdown menu.
Enter a seed document (movie) ID, such as 4993 for "The Lord of the Rings: The Fellowship of the Ring (2001)".
Click Prediction preview to see the list of recommended items on the right of the page.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Delete individual resources

Go to the Serving configs page and delete the serving config you created.
Go to the Models page and delete the model.
Delete the BigQuery dataset in Cloud Shell:
```
bq rm --recursive --dataset movielens
```

Delete the Cloud Storage bucket:

gsutil rm gs://PROJECT_ID-movielens-data/movies.csv
gsutil rm gs://PROJECT_ID-movielens-data/ratings.csv
gsutil rb gs://PROJECT_ID-movielens-data/

What's next

What is Discovery for Media?