Importing catalog information

This page describes how to import your catalog information to Recommendations AI, and keep it up to date.

Before you begin

Before you can import your catalog information, you must have completed the instructions in Before you begin, specifically setting up your project, creating a service account, and adding the service account to your local environment.

You must choose your catalog levels before importing your catalog, and you must have the Recommendations AI Admin IAM role to be able perform the import.

Catalog import best practices

Recommendations AI requires high-quality data to make high-quality predictions. If your data is missing fields or has placeholder values instead of actual values, the quality of your predictions suffers.

When you import catalog data, ensure that you implement the following best practices:

  • Make sure you review the information about catalog levels before uploading any data.

    Changing catalog levels after you have imported any data requires a significant effort: changing catalog levels.

  • Observe the catalog item import limits.

    For bulk import from Cloud Storage, the size of each file must be 2 GB or smaller. You can include up to 100 files at a time in a single bulk import request.

    For inline import, import no more than 5,000 catalog items at a time.

  • Make sure that all required catalog information is included and correct.

    Do not use dummy or placeholder values.

  • Include as much optional catalog information as possible.

  • Keep your catalog up to date.

    Ideally, you should update your catalog daily. Scheduling periodic catalog imports prevents model quality from going down over time. You can use Google Cloud Scheduler to automate this task.

  • Do not record user events for catalog items that have not been imported yet.

  • After importing catalog information, review the error reporting and logging information for your project.

    A few errors are expected, but if you have a large number of errors, you should review them and fix any process issues that led to the errors.

Importing catalog data

To import your catalog item data, you can import it from Merchant Center, Cloud Storage, BigQuery, or specify the data inline in the request. Each of these procedures are one-time imports. We recommend you import your catalog on a daily basis to ensure that your catalog is current. See Keeping your catalog up to date.

You can also import individual catalog items. For more information, see Uploading a catalog item.

Importing catalog data from Merchant Center

You can import catalog data from Merchant Center using either the Google Cloud console or the Recommendations AI API.

To import your catalog from Merchant Center, complete the following steps:

  1. Using the instructions in Merchant Center transfers, set up a transfer from Merchant Center into BigQuery.

    You'll use the Google Merchant Center products table schema. Configure your transfer to repeat daily, but configure your dataset expiration time at 2 days.

  2. If your BigQuery dataset is in another project, configure the required permissions so that Recommendations AI can access the BigQuery dataset. Learn more.

  3. Import your catalog data from BigQuery into Recommendations AI.

    Console

    1. Go to the Recommendations AI Data page in the Google Cloud console.
      Go to the Recommendations AI Data page
    2. Click Import to open the Import catalog panel.
    3. Enter the IDs of the BigQuery dataset and table where your data is located.
    4. Enter the location of a Cloud Storage bucket in your project.

      This bucket is used as a temporary location for your data.

    5. If this is the first time you are importing your catalog or you are re-importing the catalog after purging it, select the catalog levels for upload (user event recording) and prediction.

      Changing catalog levels after you have imported any data requires a significant effort. Learn more about catalog levels.

    6. Click Import.

    cURL

    1. If this is the first time you are uploading your catalog or you are re-importing the catalog after purging it, set your catalog levels by using the Catalog.patch method. This operation requires the Recommendations AI Admin role.

      Supported values for eventItemLevel and predictItemLevel are MASTER and VARIANT.

      curl -X PATCH \
      -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
      -H "Content-Type: application/json; charset=utf-8" \
      --data '{
      "catalogItemLevelConfig": {
      "eventItemLevel": "event-data-level",
      "predictItemLevel": "prediction-level"
      }
      }' \
      "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog"
      
    2. Import your catalog, using the catalogItems.import method.

      • dataset-id: The ID of the BigQuery dataset.
      • table-id: The ID of the BigQuery table holding your data.
      • staging-directory: Optional. A Cloud Storage directory that is used as an interim location for your data before it is imported into BigQuery. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
      • error-directory: Optional. A Cloud Storage directory for error information about the import. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
      • dataSchema: For the dataSchema property, use value catalog_merchant_center. See the Merchant Center products table schema.

      We recommend you don't specify staging or error directories so that Recommendations AI can automatically create a Cloud Storage bucket with new staging and error directories. These are created in the same region as the BigQuery dataset, and are unique to each import (which prevents multiple import jobs from staging data to the same directory, and potentially re-importing the same data). After three days, the bucket and directories are automatically deleted to reduce storage costs.

      An automatically created bucket name includes the project ID, bucket region, and data schema name, separated by underscores (for example, 4321_us_catalog_recommendations_ai). The automatically created directories are called staging or errors, appended by a number (for example, staging2345 or errors5678).

      If you specify directories, the Cloud Storage bucket must be in the same region as the BigQuery dataset, or the import will fail. Provide the staging and error directories in the format gs://<bucket>/<folder>/; they should be different.

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
      -H "Content-Type: application/json; charset=utf-8" \
      --data '{
      "inputConfig":{
      "bigQuerySource": {
        "datasetId":"dataset-id",
        "tableId":"table-id",
        "dataSchema":"catalog_merchant_center"
      }
      }
      }' \
      "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
      

Importing catalog data from BigQuery

To import catalog data in the Recommendations AI format from BigQuery, you use the Recommendations AI schema to create a BigQuery table with the correct format and load the empty table with your catalog data. Then, you upload your data to Recommendations AI.

For more help with BigQuery tables, see Introduction to tables. For help with BigQuery queries, see Overview of querying BigQuery data.

curl

  1. If your BigQuery dataset is in another project, configure the required permissions so that Recommendations AI can access the BigQuery dataset. Learn more.

  2. If this is the first time you are uploading your catalog or you are re-importing the catalog after purging it, set your catalog levels by using the Catalog.patch method. This operation requires the Recommendations AI Admin role.

    Supported values for eventItemLevel and predictItemLevel are MASTER and VARIANT.

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
     --data '{
       "catalogItemLevelConfig": {
         "eventItemLevel": "event-data-level",
         "predictItemLevel": "prediction-level"
       }
     }' \
    "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog"
    
  3. Create a data file for the input parameters for the import. Your input parameter values depend on whether you are importing from Cloud Storage or BigQuery.

    You use the BigQuerySource object to point to your BigQuery dataset.

    • dataset-id: The ID of the BigQuery dataset.
    • table-id: The ID of the BigQuery table holding your data.
    • staging-directory: Optional. A Cloud Storage directory that is used as an interim location for your data before it is imported into BigQuery. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
    • error-directory: Optional. A Cloud Storage directory for error information about the import. Leave this field empty to let Recommendations AI automatically create a temporary directory (recommended).
    • dataSchema: For the dataSchema property, use value catalog_recommendations_ai (default). You'll use the Recommendations AI schema.

    We recommend you don't specify staging or error directories so that Recommendations AI can automatically create a Cloud Storage bucket with new staging and error directories. These are created in the same region as the BigQuery dataset, and are unique to each import (which prevents multiple import jobs from staging data to the same directory, and potentially re-importing the same data). After three days, the bucket and directories are automatically deleted to reduce storage costs.

    An automatically created bucket name includes the project ID, bucket region, and data schema name, separated by underscores (for example, 4321_us_catalog_recommendations_ai). The automatically created directories are called staging or errors, appended by a number (for example, staging2345 or errors5678).

    If you specify directories, the Cloud Storage bucket must be in the same region as the BigQuery dataset, or the import will fail. Provide the staging and error directories in the format gs://<bucket>/<folder>/; they should be different.

    {
    "inputConfig":{
     "bigQuerySource": {
       "datasetId":"dataset-id",
       "tableId":"table-id",
       "dataSchema":"catalog_recommendations_ai"}
      }
    }
    
  4. Import your catalog information to Recommendations AI by making a POST request to the catalogitems:import REST method, providing the name of the data file (here, shown as input.json).

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" -d @./input.json
    "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
    

    The easiest way to check the status of your import operation is to use the Google Cloud console. For more information, see Seeing status for a specific integration operation.

    You can also check the status programmatically using the API. You should receive a response object that looks something like this:

    {
    "name": "import-catalog-cat123-5821",
    "done": false
    }
    

    The name field is the ID of the operation object. You request the status of this object, replacing the name field with the value returned by the import method, until the done field returns as true:

    curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/operations/[OPERATION_NAME]"
    

    When the operation completes, the returned object has a done value of true, and includes a Status object similar to the following example:

    { "name": "import-catalog-cat123-5821",
    "done": true,
    "response": {
    "@type": "type.googleapis.com/google.cloud.recommendationengine.v1beta1.ImportCatalogItemsResponse"
    },
    "error_samples": [{"code": 3, "message": "bad catalog"}, {"code": 3, "message": "invalid id"}],
    "errors_config": { "gcs_prefix": "gs://error-bucket/error-directory" }
    }
    

    You can inspect the files in the error directory in Cloud Storage to see what kind of errors occurred during the import.

Setting up access to your BigQuery dataset

To set up access when your BigQuery dataset is in a different project than your Recommendations AI service, complete the following steps.

  1. Open the IAM page in the Google Cloud console.

    Open the IAM page

  2. Select your Recommendations AI project.

  3. Find the service account with the name AutoML Recommendations Service Account.

    If you have not previously initiated an import operation with Recommendations AI, this service account might not be listed. If you do not see this service account, return to the import task and initiate the import. When it fails due to permission errors, return here and complete this task.

  4. Copy the identifier for the service account, which looks like an email address (for example, service-525@gcp-sa-recommendationengine.iam.gserviceaccount.com).

  5. Switch to your BigQuery project (on the same IAM & Admin page) and click Add.

  6. Enter the identifier for the Recommendations AI service account and select the BigQuery > BigQuery User role.

  7. Click Add another role and select BigQuery > BigQuery Data Editor.

    If you do not want to provide the Data Editor role to the entire project, you can add this role directly to the dataset. Learn more.

  8. Click Save.

Importing catalog data from Cloud Storage

To import catalog data in JSON format, you create one or more JSON files that contain the catalog data you want to import, and upload it to Cloud Storage. From there, you can import it into Recommendations AI.

For an example of the JSON catalog item format, see Catalog item JSON data format.

For help with uploading files to Cloud Storage, see Uploading objects.

curl

  1. Make sure the Recommendations AI service account has permission to read and write to the bucket.

    The Recommendations AI service account is listed on the IAM page in the Google Cloud console with the name AutoML Recommendations Service Account. Use the principal name, which looks like an email address (for example, service-525@gcp-sa-recommendationengine.iam.gserviceaccount.com), when adding the account to your bucket permissions.

  2. If this is the first time you are uploading your catalog or you are re-importing the catalog after purging it, set your catalog levels by using the Catalog.patch method.

    Supported values for eventItemLevel and predictItemLevel are MASTER and VARIANT.

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
     --data '{
       "catalogItemLevelConfig": {
         "eventItemLevel": "event-data-level",
         "predictItemLevel": "prediction-level"
       }
     }' \
    "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog"
    
  3. Create a data file for the input parameters for the import. You use the GcsSource object to point to your Cloud Storage bucket.

    You can provide multiple files, or just one; this example uses two files.

    • input-file: A file or files in Cloud Storage containing your catalog data.
    • error-directory: A Cloud Storage directory for error information about the import.

    The input file fields must be in the format gs://<bucket>/<path-to-file>/. The error directory must be in the format gs://<bucket>/<folder>/. If the error directory does not exist, Recommendations AI creates it. The bucket must already exist.

    {
    "inputConfig":{
     "gcsSource": {
       "inputUris": ["input-file1", "input-file2"]
      }
    },
    "errorsConfig":{"gcsPrefix":"error-directory"}
    }
    
  4. Import your catalog information to Recommendations AI by making a POST request to the catalogitems:import REST method, providing the name of the data file (here, shown as input.json).

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)"
    -H "Content-Type: application/json; charset=utf-8" -d @./input.json"
    "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
    

    The easiest way to check the status of your import operation is to use the Google Cloud console. For more information, see Seeing status for a specific integration operation.

    You can also check the status programmatically using the API. You should receive a response object that looks something like this:

    {
    "name": "import-catalog-cat123-5821",
    "done": false
    }
    

    The name field is the ID of the operation object. You request the status of this object, replacing the name field with the value returned by the import method, until the done field returns as true:

    curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/operations/[OPERATION_NAME]"
    

    When the operation completes, the returned object has a done value of true, and includes a Status object similar to the following example:

    { "name": "import-catalog-cat123-5821",
    "done": true,
    "response": {
    "@type": "type.googleapis.com/google.cloud.recommendationengine.v1beta1.ImportCatalogItemsResponse",
    },
    "error_samples": [{"code": 3, "message": "bad catalog"}, {"code": 3, "message": "invalid id"}],
    "errors_config": { "gcs_prefix": "gs://error-bucket/error-directory" }
    }
    

    You can inspect the files in the error directory in Cloud Storage to see what kind of errors occurred during the import.

Importing catalog data inline

curl

You import your catalog information to Recommendations AI inline by making a POST request to the catalogitems:import REST method, using the catalogInlineSource object to specify your catalog data.

Provide an entire catalog item on a single line. Each catalog item should be on its own line. Do not include line breaks within the catalog item data.

For an example of the JSON catalog item format, see Catalog item JSON data format.

  1. Create the JSON file for your catalog item:

    {
    "inputConfig": {
    "catalogInlineSource": {
      "catalogItems": [
        { CATALOG_ITEM_1 }
        { CATALOG_ITEM_2 }
      ]
    }
    }
    }
    
  2. Call the POST method:

    curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     --data @./data.json \
    "https://recommendationengine.googleapis.com/v1beta1/projects/PROJECT_ID/locations/global/catalogs/default_catalog/catalogItems:import"
    

Catalog item JSON data format

The catalogItem entries in your JSON file should look like the following examples.

Provide an entire catalog item on a single line. Each catalog item should be on its own line.

Minimum required fields:

  {
    "id": "1234",
    "category_hierarchies": [ { "categories": [ "athletic wear", "shoes" ]  } ],
    "title": "ABC sneakers"
  }
  {
    "id": "5839",
    "category_hierarchies": [ { "categories": [ "casual attire", "t-shirts" ]  } ],
    "title": "Crew t-shirt"
  }

Complete object:

  {
    "id": "1234",
    "category_hierarchies": [ { "categories": [ "athletic wear", "shoes" ]  } ],
    "title": "ABC sneakers",
    "description": "Sneakers for the rest of us",
    "language_code": "en",
    "tags": [ ],
    "itemGroupId": "abcshoe123",
    "product_metadata": {
      "exact_price": {
        "display_price": 99.98,
        "original_price": 111.99 },
      "costs": {
        "manufacturing": 35.99,
        "other": 20
      },
      "currency_code": "USD",
      "canonical_product_uri": "https://www.example.com/products/1234",
      "images": [ ]
     }
  }
  {
    "id": "5839",
    "category_hierarchies": [ { "categories": [ "casual attire", "t-shirts" ]  } ],
    "title": "Crew t-shirt",
    "description": "Crew t-shirt with design",
    "language_code": "en",
    "tags": [ ],
    "itemGroupId": "xyzshirt456",
    "product_metadata": {
      "exact_price": {
        "display_price": 14.98,
        "original_price": 18.99 },
      "costs": {
        "manufacturing": 5.17,
        "other": 15
      },
      "currency_code": "USD",
      "canonical_product_uri": "https://www.example.com/products/5839",
      "images": [ ]
     }
  }

Keeping your catalog up to date

Recommendations AI relies on having current product information to provide you with the best recommendations. We recommend that you import your catalog on a daily basis to ensure that your catalog is current. You can use Google Cloud Scheduler to schedule imports.

You can update only new or changed catalog items, or you can import the entire catalog. If you import catalog items that are already in your catalog, they are not added again. Any item that has changed is updated.

To update a single item, see Updating catalog information.

Batch updating

You can use the import method to batch update your catalog. You do this the same way you do the initial import; follow the steps in Importing catalog data.

Monitoring import health

Keeping your catalog up to date is important for getting high-quality recommendations. You should monitor the import error rates and take action if needed. For more information, see Setting up alerts for data upload issues.

What's next