Method: projects.locations.collections.dataStores.trainCustomModel

Trains a custom model.

HTTP request

POST https://discoveryengine.googleapis.com/v1beta/{dataStore=projects/*/locations/*/collections/*/dataStores/*}:trainCustomModel

The URL uses gRPC Transcoding syntax.

Path parameters

Parameters
dataStore

string

Required. The resource name of the Data Store, such as projects/*/locations/global/collections/default_collection/dataStores/default_data_store. This field is used to identify the data store where to train the models.

Request body

The request body contains data with the following structure:

JSON representation
{
  "modelType": string,
  "errorConfig": {
    object (ImportErrorConfig)
  },
  "modelId": string,

  // Union field training_input can be only one of the following:
  "gcsTrainingInput": {
    object (GcsTrainingInput)
  }
  // End of list of possible types for union field training_input.
}
Fields
modelType

string

Model to be trained. Supported values are:

  • search-tuning: Fine tuning the search system based on data provided.
errorConfig

object (ImportErrorConfig)

The desired location of errors incurred during the data ingestion and training.

modelId

string

If not provided, a UUID will be generated.

Union field training_input. Model training input. training_input can be only one of the following:
gcsTrainingInput

object (GcsTrainingInput)

Cloud Storage training input.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the dataStore resource:

  • discoveryengine.dataStores.trainCustomModel

For more information, see the IAM documentation.

GcsTrainingInput

Cloud Storage training data input.

JSON representation
{
  "corpusDataPath": string,
  "queryDataPath": string,
  "trainDataPath": string,
  "testDataPath": string
}
Fields
corpusDataPath

string

The Cloud Storage corpus data which could be associated in train data. The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file.

For search-tuning model, each line should have the Id, title and text. Example: {"Id": "doc1", title: "relevant doc", "text": "relevant text"}

queryDataPath

string

The gcs query data which could be associated in train data. The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file.

For search-tuning model, each line should have the Id and text. Example: {"Id": "query1", "text": "example query"}

trainDataPath

string

Cloud Storage training data path whose format should be gs://<bucket_to_data>/<tsv_file_name>. The file should be in tsv format. Each line should have the docId and queryId and score (number).

For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in [0, inf+). The larger the number is, the more relevant the pair is. Example:

  • query-id\tcorpus-id\tscore
  • query1\tdoc1\t1
testDataPath

string

Cloud Storage test data. Same format as trainDataPath. If not provided, a random 80/20 train/test split will be performed on trainDataPath.