This page describes how to import your catalog information to Recommendations AI, and keep it up to date.
Before you begin
Before you can import your catalog information, you must have completed the instructions in Before you begin, specifically setting up your project, creating a service account, and adding the service account to your local environment.
You must choose your catalog levels before importing your catalog, and you must have the Recommendations AI Admin IAM role to be able perform the import.
Catalog import best practices
Recommendations AI requires high-quality data to make high-quality predictions. If your data is missing fields or has placeholder values instead of actual values, the quality of your predictions suffers.
When you import catalog data, ensure that you implement the following best practices:
Make sure you review the information about catalog levels before uploading any data.
Changing catalog levels after you have imported any data requires a significant effort: changing catalog levels.
Observe the catalog item import limits.
For bulk import from Cloud Storage, the size of each file must be 2 GB or smaller. You can include up to 100 files at a time in a single bulk import request.
For inline import, import no more than 5,000 catalog items at a time.
Make sure that all required catalog information is included and correct.
Do not use dummy or placeholder values.
Include as much optional catalog information as possible.
Keep your catalog up to date.
Ideally, you should update your catalog daily. Scheduling periodic catalog imports prevents model quality from going down over time. You can use Google Cloud Scheduler to automate this task.
Do not record user events for catalog items that have not been imported yet.
After importing catalog information, review the error reporting and logging information for your project.
A few errors are expected, but if you have a large number of errors, you should review them and fix any process issues that led to the errors.
Importing catalog data
To import your catalog item data, you can import it from Merchant Center, Cloud Storage, BigQuery, or specify the data inline in the request. Each of these procedures are one-time imports. We recommend you import your catalog on a daily basis to ensure that your catalog is current. See Keeping your catalog up to date.
You can also import individual catalog items. For more information, see Uploading a catalog item.
Importing catalog data from Merchant Center
You can import catalog data from Merchant Center using either the Google Cloud console or the Recommendations AI API.
To import your catalog from Merchant Center, complete the following steps:
Using the instructions in Merchant Center transfers, set up a transfer from Merchant Center into BigQuery.
You'll use the Google Merchant Center products table schema. Configure your transfer to repeat daily, but configure your dataset expiration time at 2 days.
If your BigQuery dataset is in another project, configure the required permissions so that Recommendations AI can access the BigQuery dataset. Learn more.
Import your catalog data from BigQuery into Recommendations AI.
Console
-
Go to the Recommendations AI Data page in the Google Cloud console.
Go to the Recommendations AI Data page - Click Import to open the Import catalog panel.
- Enter the IDs of the BigQuery dataset and table where your data is located.
Enter the location of a Cloud Storage bucket in your project.
This bucket is used as a temporary location for your data.
If this is the first time you are importing your catalog or you are re-importing the catalog after purging it, select the catalog levels for upload (user event recording) and prediction.
Changing catalog levels after you have imported any data requires a significant effort. Learn more about catalog levels.
Click Import.
cURL
If this is the first time you are uploading your catalog or you are re-importing the catalog after purging it, set your catalog levels by using the
Catalog.patch
method. This operation requires the Recommendations AI Admin role.Supported values for
eventItemLevel
andpredictItemLevel
areMASTER
andVARIANT
.curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data '{ "catalogItemLevelConfig": { "eventItemLevel": "event-data-level", "predictItemLevel": "prediction-level" } }' \ "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog"
Import your catalog, using the catalogItems.import method.
- dataset-id: The ID of the BigQuery dataset.
- table-id: The ID of the BigQuery table holding your data.
- staging-directory: Optional. A Cloud Storage directory that is used as an interim location for your data before it is imported into BigQuery. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
- error-directory: Optional. A Cloud Storage directory for error information about the import. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
dataSchema
: For thedataSchema
property, use valuecatalog_merchant_center
. See the Merchant Center products table schema.
We recommend you don't specify staging or error directories so that Recommendations AI can automatically create a Cloud Storage bucket with new staging and error directories. These are created in the same region as the BigQuery dataset, and are unique to each import (which prevents multiple import jobs from staging data to the same directory, and potentially re-importing the same data). After three days, the bucket and directories are automatically deleted to reduce storage costs.
An automatically created bucket name includes the project ID, bucket region, and data schema name, separated by underscores (for example,
4321_us_catalog_recommendations_ai
). The automatically created directories are calledstaging
orerrors
, appended by a number (for example,staging2345
orerrors5678
).If you specify directories, the Cloud Storage bucket must be in the same region as the BigQuery dataset, or the import will fail. Provide the staging and error directories in the format
gs://<bucket>/<folder>/
; they should be different.curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data '{ "inputConfig":{ "bigQuerySource": { "datasetId":"dataset-id", "tableId":"table-id", "dataSchema":"catalog_merchant_center" } } }' \ "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
-
Go to the Recommendations AI Data page in the Google Cloud console.
Importing catalog data from BigQuery
To import catalog data in the Recommendations AI format from BigQuery, you use the Recommendations AI schema to create a BigQuery table with the correct format and load the empty table with your catalog data. Then, you upload your data to Recommendations AI.
For more help with BigQuery tables, see Introduction to tables. For help with BigQuery queries, see Overview of querying BigQuery data.
curl
If your BigQuery dataset is in another project, configure the required permissions so that Recommendations AI can access the BigQuery dataset. Learn more.
If this is the first time you are uploading your catalog or you are re-importing the catalog after purging it, set your catalog levels by using the
Catalog.patch
method. This operation requires the Recommendations AI Admin role.Supported values for
eventItemLevel
andpredictItemLevel
areMASTER
andVARIANT
.curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data '{ "catalogItemLevelConfig": { "eventItemLevel": "event-data-level", "predictItemLevel": "prediction-level" } }' \ "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog"
Create a data file for the input parameters for the import. Your input parameter values depend on whether you are importing from Cloud Storage or BigQuery.
You use the BigQuerySource object to point to your BigQuery dataset.
- dataset-id: The ID of the BigQuery dataset.
- table-id: The ID of the BigQuery table holding your data.
- staging-directory: Optional. A Cloud Storage directory that is used as an interim location for your data before it is imported into BigQuery. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
- error-directory: Optional. A Cloud Storage directory for error information about the import. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
dataSchema
: For thedataSchema
property, use valuecatalog_recommendations_ai
(default). You'll use the Recommendations AI schema.
We recommend you don't specify staging or error directories so that Recommendations AI can automatically create a Cloud Storage bucket with new staging and error directories. These are created in the same region as the BigQuery dataset, and are unique to each import (which prevents multiple import jobs from staging data to the same directory, and potentially re-importing the same data). After three days, the bucket and directories are automatically deleted to reduce storage costs.
An automatically created bucket name includes the project ID, bucket region, and data schema name, separated by underscores (for example,
4321_us_catalog_recommendations_ai
). The automatically created directories are calledstaging
orerrors
, appended by a number (for example,staging2345
orerrors5678
).If you specify directories, the Cloud Storage bucket must be in the same region as the BigQuery dataset, or the import will fail. Provide the staging and error directories in the format
gs://<bucket>/<folder>/
; they should be different.{ "inputConfig":{ "bigQuerySource": { "datasetId":"dataset-id", "tableId":"table-id", "dataSchema":"catalog_recommendations_ai"} } }
Import your catalog information to Recommendations AI by making a
POST
request to thecatalogitems:import
REST method, providing the name of the data file (here, shown asinput.json
).curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" -d @./input.json "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
The easiest way to check the status of your import operation is to use the Google Cloud console. For more information, see Seeing status for a specific integration operation.
You can also check the status programmatically using the API. You should receive a response object that looks something like this:
{ "name": "import-catalog-cat123-5821", "done": false }
The name field is the ID of the operation object. You request the status of this object, replacing the name field with the value returned by the import method, until the
done
field returns astrue
:curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/operations/[OPERATION_NAME]"
When the operation completes, the returned object has a
done
value oftrue
, and includes a Status object similar to the following example:{ "name": "import-catalog-cat123-5821", "done": true, "response": { "@type": "type.googleapis.com/google.cloud.recommendationengine.v1beta1.ImportCatalogItemsResponse" }, "error_samples": [{"code": 3, "message": "bad catalog"}, {"code": 3, "message": "invalid id"}], "errors_config": { "gcs_prefix": "gs://error-bucket/error-directory" } }
You can inspect the files in the error directory in Cloud Storage to see what kind of errors occurred during the import.
Setting up access to your BigQuery dataset
To set up access when your BigQuery dataset is in a different project than your Recommendations AI service, complete the following steps.
Open the IAM page in the Google Cloud console.
Select your Recommendations AI project.
Find the service account with the name AutoML Recommendations Service Account.
If you have not previously initiated an import operation with Recommendations AI, this service account might not be listed. If you do not see this service account, return to the import task and initiate the import. When it fails due to permission errors, return here and complete this task.
Copy the identifier for the service account, which looks like an email address (for example,
service-525@gcp-sa-recommendationengine.iam.gserviceaccount.com
).Switch to your BigQuery project (on the same IAM & Admin page) and click Add.
Enter the identifier for the Recommendations AI service account and select the BigQuery > BigQuery User role.
Click Add another role and select BigQuery > BigQuery Data Editor.
If you do not want to provide the Data Editor role to the entire project, you can add this role directly to the dataset. Learn more.
Click Save.
Importing catalog data from Cloud Storage
To import catalog data in JSON format, you create one or more JSON files that contain the catalog data you want to import, and upload it to Cloud Storage. From there, you can import it into Recommendations AI.
For an example of the JSON catalog item format, see Catalog item JSON data format.
For help with uploading files to Cloud Storage, see Uploading objects.
curl
Make sure the Recommendations AI service account has permission to read and write to the bucket.
The Recommendations AI service account is listed on the IAM page in the Google Cloud console with the name AutoML Recommendations Service Account. Use the principal name, which looks like an email address (for example,
service-525@gcp-sa-recommendationengine.iam.gserviceaccount.com
), when adding the account to your bucket permissions.If this is the first time you are uploading your catalog or you are re-importing the catalog after purging it, set your catalog levels by using the
Catalog.patch
method.Supported values for
eventItemLevel
andpredictItemLevel
areMASTER
andVARIANT
.curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data '{ "catalogItemLevelConfig": { "eventItemLevel": "event-data-level", "predictItemLevel": "prediction-level" } }' \ "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog"
Create a data file for the input parameters for the import. You use the
GcsSource
object to point to your Cloud Storage bucket.You can provide multiple files, or just one; this example uses two files.
- input-file: A file or files in Cloud Storage containing your catalog data.
- error-directory: A Cloud Storage directory for error information about the import.
The input file fields must be in the format
gs://<bucket>/<path-to-file>/
. The error directory must be in the formatgs://<bucket>/<folder>/
. If the error directory does not exist, Recommendations AI creates it. The bucket must already exist.{ "inputConfig":{ "gcsSource": { "inputUris": ["input-file1", "input-file2"] } }, "errorsConfig":{"gcsPrefix":"error-directory"} }
Import your catalog information to Recommendations AI by making a
POST
request to thecatalogitems:import
REST method, providing the name of the data file (here, shown asinput.json
).curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" -H "Content-Type: application/json; charset=utf-8" -d @./input.json" "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
The easiest way to check the status of your import operation is to use the Google Cloud console. For more information, see Seeing status for a specific integration operation.
You can also check the status programmatically using the API. You should receive a response object that looks something like this:
{ "name": "import-catalog-cat123-5821", "done": false }
The name field is the ID of the operation object. You request the status of this object, replacing the name field with the value returned by the import method, until the
done
field returns astrue
:curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/operations/[OPERATION_NAME]"
When the operation completes, the returned object has a
done
value oftrue
, and includes a Status object similar to the following example:{ "name": "import-catalog-cat123-5821", "done": true, "response": { "@type": "type.googleapis.com/google.cloud.recommendationengine.v1beta1.ImportCatalogItemsResponse", }, "error_samples": [{"code": 3, "message": "bad catalog"}, {"code": 3, "message": "invalid id"}], "errors_config": { "gcs_prefix": "gs://error-bucket/error-directory" } }
You can inspect the files in the error directory in Cloud Storage to see what kind of errors occurred during the import.
Importing catalog data inline
curl
You import your catalog information to Recommendations AI inline by making aPOST
request to the catalogitems:import
REST method, using
the catalogInlineSource
object to specify your catalog data.
Provide an entire catalog item on a single line. Each catalog item should be on its own line. Do not include line breaks within the catalog item data.
For an example of the JSON catalog item format, see Catalog item JSON data format.
Create the JSON file for your catalog item:
{ "inputConfig": { "catalogInlineSource": { "catalogItems": [ { CATALOG_ITEM_1 } { CATALOG_ITEM_2 } ] } } }
Call the POST method:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data @./data.json \ "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
Catalog item JSON data format
The catalogItem
entries in your JSON file should look like the following
examples.
Provide an entire catalog item on a single line. Each catalog item should be on its own line.
Minimum required fields:
{
"id": "1234",
"category_hierarchies": [ { "categories": [ "athletic wear", "shoes" ] } ],
"title": "ABC sneakers"
}
{
"id": "5839",
"category_hierarchies": [ { "categories": [ "casual attire", "t-shirts" ] } ],
"title": "Crew t-shirt"
}
Complete object:
{
"id": "1234",
"category_hierarchies": [ { "categories": [ "athletic wear", "shoes" ] } ],
"title": "ABC sneakers",
"description": "Sneakers for the rest of us",
"language_code": "en",
"tags": [ ],
"itemGroupId": "abcshoe123",
"product_metadata": {
"exact_price": {
"display_price": 99.98,
"original_price": 111.99 },
"costs": {
"manufacturing": 35.99,
"other": 20
},
"currency_code": "USD",
"canonical_product_uri": "https://www.example.com/products/1234",
"images": [ ]
}
}
{
"id": "5839",
"category_hierarchies": [ { "categories": [ "casual attire", "t-shirts" ] } ],
"title": "Crew t-shirt",
"description": "Crew t-shirt with design",
"language_code": "en",
"tags": [ ],
"itemGroupId": "xyzshirt456",
"product_metadata": {
"exact_price": {
"display_price": 14.98,
"original_price": 18.99 },
"costs": {
"manufacturing": 5.17,
"other": 15
},
"currency_code": "USD",
"canonical_product_uri": "https://www.example.com/products/5839",
"images": [ ]
}
}
Keeping your catalog up to date
Recommendations AI relies on having current product information to provide you with the best recommendations. We recommend that you import your catalog on a daily basis to ensure that your catalog is current. You can use Google Cloud Scheduler to schedule imports.
You can update only new or changed catalog items, or you can import the entire catalog. If you import catalog items that are already in your catalog, they are not added again. Any item that has changed is updated.
To update a single item, see Updating catalog information.
Batch updating
You can use the import method to batch update your catalog. You do this the same way you do the initial import; follow the steps in Importing catalog data.
Monitoring import health
Keeping your catalog up to date is important for getting high-quality recommendations. You should monitor the import error rates and take action if needed. For more information, see Setting up alerts for data upload issues.
What's next
- Start recording user events.
- View aggregated information about your catalog.
- Set up data upload alerts.