Import autocomplete data for search

Autocomplete is a feature for predicting the rest of a word a user is typing, which can improve the user search experience. It can provide typeahead suggestion based on your provided dataset or based on user events you provided.

Consider importing autocomplete data only if you want to have additional controls (Do Not Remove List, Deny List) or if you need to use your own autocomplete data. Turning on auto-learning is sufficient for most cases where autocomplete is needed. Auto-learning provides a machine learning-powered suggestion dataset based on user search events. See Autocomplete for how to turn on auto-learning.

These instructions are for uploading your own autocomplete data only. Keep it up to date if you plan to use your autocomplete dataset all the time. For getting autocomplete results at query time, refer to CompletionService.CompleteQuery. Autocomplete data is used only for search. This data is not used by recommendations.

Before you begin

Before you can import your autocomplete information, you must have completed the instructions in Before you begin, specifically setting up your project, creating a service account, and adding the service account to your local environment.

You must have the Retail Editor IAM role to perform the import.

Autocomplete import best practices

When you import autocomplete data, ensure that you implement the following best practices:

  • Read the BigQuery schema listed in the following sections and API documentation.

  • Do not use placeholder values.

  • Include as many fields as possible.

  • Keep your own autocomplete dataset up to date if you plan to use own uploaded dataset.

  • Importing data from another project is disallowed.

Import autocomplete data

Import autocomplete data from BigQuery

Vertex AI Search for retail supports BigQuery data import for Deny List, Do Not Remove List, and Suggestion Terms List. See more details in Autocomplete.

To import autocomplete data in the correct format from BigQuery, use the Vertex AI Search for retail autocomplete schema to create a BigQuery table with the correct format and load the table with your autocomplete data. Then, upload your data to Vertex AI Search for retail.

For more help with BigQuery tables, see Introduction to tables. For help with BigQuery queries, see Overview of querying BigQuery data.

BigQuery dataset location

When you first create your BigQuery dataset for your autocomplete BigQuery tables, make sure the dataset location is set to the multi-region location "US". Not setting it correctly will cause your import request to fail later. To learn more about BigQuery dataset locations, see Dataset locations in the BigQuery documentation.

Populate data to BigQuery

Use the Vertex AI Search for retail autocomplete schema to upload your autocomplete data to BigQuery.

BigQuery can use the schema to validate whether JSON-formatted data has correct field names and types (such as STRING, INTEGER, and RECORD), but cannot perform validations such as determining:

  • If a string field mapped into recognizable enum value.
  • If a string field is using the correct format.
  • If an integer or float field has value in a valid range.
  • If a missing field is a required field.

To ensure the quality of your data and the end user search experience, make sure you refer to the schema and reference documentation for details about values and format.

Set up access to your BigQuery dataset

To set up access, make sure that your BigQuery dataset is in the same project as your Vertex AI Search for retail service and complete the following steps.

  1. Open the IAM page in the Google Cloud console.

    Open the IAM page

  2. Select your Vertex AI Search for retail project.

  3. On the IAM & Admin page, click  Grant Access.

  4. For New principals, enter cloud-retail-customer-data-access@system.gserviceaccount.com and select the BigQuery > BigQuery Data Viewer role.

    If you do not want to provide the Data Viewer role to the entire project, you can add this role directly to the dataset. Learn more.

  5. Click Save.

Trigger data import to Vertex AI Search for retail

Console

  1. Go to the Controls page

  2. Go to the Autocomplete Controls tab.

  3. In the Term Lists section, find the type of list you plan to import (Deny list, Do Not Remove list, or Suggested terms list) and click Import or Replace.

    The Import pane opens.

  4. Enter the BigQuery path of your data location, or select Browse to select the location.

    The BigQuery path must be in the same project and its schema should be correct. To check this, click Browse and click the table name to view its contents in theBigQuery console.

  5. In the Import pane, click Import.

    The import begins. You can leave the page without disrupting the import.

cURL

  1. Create a data file for the input parameters for the import. Your input parameter values depend on whether you are importing from Cloud Storage or BigQuery.

    Use the BigQuerySource object to point to your BigQuery dataset.

    • dataset-id: The ID of the BigQuery dataset.
    • table-id: The ID of the BigQuery table holding your data.
    • data-schema: For the dataSchema property, use value suggestions (default), allowlist, denylist. Use the Vertex AI Search for retail autocomplete schema.
    {
      "inputConfig":{
        "bigQuerySource": {
          "datasetId":"dataset-id",
          "tableId":"table-id",
          "dataSchema":"data-schema"
        }
      }
    }
    
  2. Import your autocomplete information to Vertex AI Search for retail by making a POST request to the CompletionData:import REST method, providing the name of the data file (shown as input.json in the example below).

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" -d @./input.json
    "https://retail.googleapis.com/v2alpha/projects/PROJECT_NUMBER/locations/global/catalogs/default_catalog/completionData:import"
    

    You can check the status programmatically using the API. You should receive a response object that looks something like this:

    {
      "name": "projects/PROJECT_ID/locations/global/catalogs/default_catalog/operations/123456",
      "done": false
    }
    

    The name field is the ID of the operation object. To request the status of this object, replace the name field with the value returned by the import method. When the import is complete, the done field returns as true:

    curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    "https://retail.googleapis.com/v2alpha/projects/PROJECT_ID/locations/global/catalogs/default_catalog/operations/123456"
    

    When the operation completes, the returned object has a done value of true, and includes a Status object similar to the following example:

    {
      "name": "projects/PROJECT_ID/locations/global/catalogs/default_catalog/operations/123456",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.retail.v2alpha.ImportMetadata",
        "createTime": "2020-01-01T03:33:33.000001Z",
        "updateTime": "2020-01-01T03:34:33.000001Z",
        "successCount": "2",
        "failureCount": "1"
      },
      "done": true
      "response": {
        "@type": "type.googleapis.com/google.cloud.retail.v2alpha.ImportCompletionDataResponse",
      }
    }
    

Autocomplete data format

Your JSON file should look like the following examples. The line breaks are for readability; you should provide an entire suggestion on a single line. Each suggestion should be on its own line.

Suggestion minimum required fields:

{
  "suggestion": "ABC",
  "globalScore": "0.5"
}

Or:

{
  "suggestion": "ABC",
  "frequency": "100"
}

Autocomplete data import duration

It usually takes about a few minutes to one hour for one import from BigQuery to complete.

When the dataset import is finished, the done field in the operation object is marked as true. After that, it might take additional 1~2 days for data to be indexed and be used in production serving.

Keep your autocomplete dataset up to date

If you plan to use own uploaded dataset, it's a best practice to keep the uploaded dataset up to date on a regular basis.

Batch update

You can use the import method to batch update your autocomplete. You do this the same way you do the initial import; follow the steps in Importing autocomplete data. This will replace the whole imported dataset.

Monitor import health

Keeping your own dataset up to date is important for getting high-quality suggestion results when you use it. You should monitor the import error rates and take action if needed.

Vertex AI Search for retail autocomplete schema

When importing autocomplete dataset from BigQuery, use the Vertex AI Search for retail schema below to create BigQuery tables with the correct format and load them with your autocomplete data.

Schema for suggestions

This dataset is used to provide your own autocomplete suggestion phrases with your own scores.

Schema for denylist

This dataset is used as a denylist to block phrases from being suggested.

Schema for allowlist

This dataset is used for skipping post processes (such as spell correction and zero-result filtering) for all the phrases in this allowlist.