Import historical user events

This page describes how to import user event data from past events in bulk. Discovery for Media models require user event data for training.

After you've set up real-time event recording, it can take a considerable amount of time to record sufficient user event data to train your models. You can accelerate initial model training by importing user event data from past events in bulk. Before doing so, review the best practices for recording user events and the Before you begin section on this page.

You can:

Before you begin

To avoid import errors and ensure Discovery for Media has sufficient data to generate good results, review the following information before importing your user events.

Event import considerations

This section describes the methods that can be used for batch importing of your historical user events, when you might use each method, and some of their limitations.

Cloud Storage Description Import data in a JSON format from files loaded in a Cloud Storage bucket. Each file must be 2 GB or smaller, and up to 100 files at a time can be imported. The import can be done using the Google Cloud console or cURL. Uses the Document JSON data format, which allows custom attributes.
When to use If you need higher volumes of data to be loaded in a single step.
Limitations If your data is not in Cloud Storage, requires the step of first importing it to Cloud Storage.
BigQuery Description Import data from a previously loaded BigQuery table that uses the BigQuery schema for Discovery for Media. Can be performed using Google Cloud console or cURL.
When to use If you are preprocessing event data before importing it.
Limitations Requires the extra step of creating a BigQuery table that maps to the BigQuery schema for Discovery for Media. If you have a high volume of user events, also consider that BigQuery is a higher-cost resource than Cloud Storage.
Inline import Description Import using a call to the userEvents.import method.
When to use If you want to have the increased privacy of having all authentication occur on the backend and are capable of performing a backend import.
Limitations Usually more complicated than a web import.

Import user events from Cloud Storage

Import user events from Cloud Storage using the Google Cloud console or the userEvents.import method.

Console

  1. Go to the Discovery Engine Data page in the Google Cloud console.

    Go to the Data page
  2. Click Import to open the Import Data panel.
  3. For Data Type, select User events.
  4. Select Google Cloud Storage as the data source.
  5. Enter the Cloud Storage location of your data.
  6. Click Import.

cURL

Use the userEvents.import method to import your user events.

  1. Create a data file for the input parameters for the import. Use the GcsSource object to point to your Cloud Storage bucket.

    You can provide multiple files, or just one.

    • INPUT_FILE: A file or files in Cloud Storage containing your user event data. See About user events for examples of each user event type format. Make sure each user event is on its own single line, with no line breaks.
    • ERROR_DIRECTORY: A Cloud Storage directory for error information about the import.

    The input file fields must be in the format gs://<bucket>/<path-to-file>/. The error directory must be in the format gs://<bucket>/<folder>/. If the error directory does not exist, Discovery for Media creates it. The bucket must already exist.

    {
     "gcsSource": {
       "inputUris": ["INPUT_FILE_1", "INPUT_FILE_2"],
       "dataSchema": "user_event"
      },
      "errorConfig":{
          "gcsPrefix":"ERROR_DIRECTORY"
      }
    }
  2. Import your datastore information to Discovery for Media by making a POST request to the userEvents:import REST method, providing the name of the data file.

    export GOOGLE_APPLICATION_CREDENTIALS=/tmp/my-key.json
    
    curl -X POST \
         -v \
         -H "Content-Type: application/json; charset=utf-8" \
         -H "Authorization: Bearer "$(gcloud auth print-access-token)"" \
         --data @./DATA_FILE.json \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/dataStores/default_data_store/userEvents:import"
      }
    }'

Import user events from BigQuery

Import user events from BigQuery using the Google Cloud console or the userEvents.import method.

Set up BigQuery access

Follow the instructions in Setting up access to your BigQuery dataset to give your Discovery Engine service account a BigQuery Data Owner role for your BigQuery dataset.

Import your user events from BigQuery

You can import events from BigQuery using the Discovery Engine console or the userEvents.import method.

Import events from BigQuery using the console

  1. Go to the Discovery Engine Data page in the Google Cloud console.

    Go to the Data page
  2. Click Import to open the Import Data panel.
  3. For Data Type, select User events.
  4. Select BigQuery as the data source.
  5. Enter the BigQuery table where your data is located.
  6. Click Import.

Import events from BigQuery using the API

Import your user events by including the data for the events in your call to the userEvents.import method. See the userEvents.import API reference.

When importing your events, use the value user_event for dataSchema.

export GOOGLE_APPLICATION_CREDENTIALS=/tmp/my-key.json

curl \
  -v \
  -X POST \
  -H "Content-Type: application/json; charset=utf-8" \
  -H "Authorization: Bearer "$(gcloud auth print-access-token)"" \
  "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/dataStores/default_data_store/userEvents:import" \
  --data '{
      "bigquerySource": {
          "projectId":"PROJECT_ID",
          "datasetId": "DATASET_ID",
          "tableId": "TABLE_ID",
          "dataSchema": "user_event"
      }
    }'

Import user events inline

You can import user events inline by including the data for the events in your call to the userEvents.import method.

The easiest way to do this is to put your user event data into a JSON file and provide the file to cURL.

For the formats of the user event types, see About user events.

  1. Create the JSON file:

    {
      "userEventInlineSource": {
        "userEvents": [
          {
            <userEvent1>>
          },
          {
            <userEvent2>
          },
          ....
        ]
      }
    }
    
  2. Call the POST method:

    curl -X POST \
         -H "Authorization: Bearer $(gcloud auth print-access-token)" \
         -H "Content-Type: application/json; charset=utf-8" \
         --data @./data.json \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/dataStores/default_data_store/userEvents:import"
    

What's next