This page describes how to import user event data from past events in bulk. Discovery for Media models require user event data for training.
After you've set up real-time event recording, it can take a considerable amount of time to record sufficient user event data to train your models. You can accelerate initial model training by importing user event data from past events in bulk. Before doing so, review the best practices for recording user events and the Before you begin section on this page.
You can:
- Import events from Cloud Storage.
- Import events from BigQuery.
- Import events inline with the
userEvents.import
method.
Before you begin
To avoid import errors and ensure Discovery for Media has sufficient data to generate good results, review the following information before importing your user events.
Review the best practices for recording user events.
User event formatting differs depending on the user event type. See User event types and examples schema for the format to specify when creating tables for each event type.
See User event requirements and best practices for general requirements, and User event data requirements, for requirements depending on the recommendation model type and optimization objective you plan to use.
Event import considerations
This section describes the methods that can be used for batch importing of your historical user events, when you might use each method, and some of their limitations.
Cloud Storage | Description |
Import data in a JSON format from files loaded in a Cloud Storage
bucket. Each file must be 2 GB or smaller, and up to 100 files at a time
can be imported. The import can be done using the Google Cloud console
or cURL. Uses the Document JSON data format, which allows
custom attributes.
|
---|---|---|
When to use | If you need higher volumes of data to be loaded in a single step. | |
Limitations | If your data is not in Cloud Storage, requires the step of first importing it to Cloud Storage. | |
BigQuery | Description | Import data from a previously loaded BigQuery table that uses the BigQuery schema for Discovery for Media. Can be performed using Google Cloud console or cURL. |
When to use | If you are preprocessing event data before importing it. | |
Limitations | Requires the extra step of creating a BigQuery table that maps to the BigQuery schema for Discovery for Media. If you have a high volume of user events, also consider that BigQuery is a higher-cost resource than Cloud Storage. | |
Inline import | Description |
Import using a call to the userEvents.import method.
|
When to use | If you want to have the increased privacy of having all authentication occur on the backend and are capable of performing a backend import. | |
Limitations | Usually more complicated than a web import. |
Import user events from Cloud Storage
Import user events from Cloud Storage using the Google Cloud console
or the userEvents.import
method.
Console
-
Go to the Discovery Engine Data page in the Google Cloud console.
Go to the Data page - Click Import to open the Import Data panel.
- For Data Type, select User events.
- Select Google Cloud Storage as the data source.
- Enter the Cloud Storage location of your data.
- Click Import.
cURL
Use the userEvents.import
method to import your user
events.
Create a data file for the input parameters for the import. Use the
GcsSource
object to point to your Cloud Storage bucket.You can provide multiple files, or just one.
- INPUT_FILE: A file or files in Cloud Storage containing your user event data. See About user events for examples of each user event type format. Make sure each user event is on its own single line, with no line breaks.
- ERROR_DIRECTORY: A Cloud Storage directory for error information about the import.
The input file fields must be in the format
gs://<bucket>/<path-to-file>/
. The error directory must be in the formatgs://<bucket>/<folder>/
. If the error directory does not exist, Discovery for Media creates it. The bucket must already exist.{ "gcsSource": { "inputUris": ["INPUT_FILE_1", "INPUT_FILE_2"], "dataSchema": "user_event" }, "errorConfig":{ "gcsPrefix":"ERROR_DIRECTORY" } }
Import your datastore information to Discovery for Media by making a
POST
request to theuserEvents:import
REST method, providing the name of the data file.export GOOGLE_APPLICATION_CREDENTIALS=/tmp/my-key.json curl -X POST \ -v \ -H "Content-Type: application/json; charset=utf-8" \ -H "Authorization: Bearer "$(gcloud auth print-access-token)"" \ --data @./DATA_FILE.json \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/dataStores/default_data_store/userEvents:import" } }'
Import user events from BigQuery
Import user events from BigQuery using the Google Cloud console
or the userEvents.import
method.
Set up BigQuery access
Follow the instructions in Setting up access to your BigQuery dataset to give your Discovery Engine service account a BigQuery Data Owner role for your BigQuery dataset.
Import your user events from BigQuery
You can import events from BigQuery using the
Discovery Engine console or the userEvents.import
method.
Import events from BigQuery using the console
-
Go to the Discovery Engine Data page in the Google Cloud console.
Go to the Data page - Click Import to open the Import Data panel.
- For Data Type, select User events.
- Select BigQuery as the data source.
- Enter the BigQuery table where your data is located.
- Click Import.
Import events from BigQuery using the API
Import your user events by including the data for the events in your call
to the userEvents.import
method. See the
userEvents.import
API reference.
When importing your events, use the value user_event
for dataSchema
.
export GOOGLE_APPLICATION_CREDENTIALS=/tmp/my-key.json
curl \
-v \
-X POST \
-H "Content-Type: application/json; charset=utf-8" \
-H "Authorization: Bearer "$(gcloud auth print-access-token)"" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/dataStores/default_data_store/userEvents:import" \
--data '{
"bigquerySource": {
"projectId":"PROJECT_ID",
"datasetId": "DATASET_ID",
"tableId": "TABLE_ID",
"dataSchema": "user_event"
}
}'
Import user events inline
You can import user events inline by including the data for the events in your
call to the userEvents.import
method.
The easiest way to do this is to put your user event data into a JSON file and provide the file to cURL.
For the formats of the user event types, see About user events.
Create the JSON file:
{ "userEventInlineSource": { "userEvents": [ { <userEvent1>> }, { <userEvent2> }, .... ] } }
Call the POST method:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data @./data.json \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/dataStores/default_data_store/userEvents:import"