Method: projects.locations.collections.dataStores.trainCustomModel

HTTP request
Path parameters
Request body
- JSON representation
Response body
Authorization scopes
IAM Permissions
GcsTrainingInput
- JSON representation

Trains a custom model.

HTTP request

POST https://discoveryengine.googleapis.com/v1beta/{dataStore=projects/*/locations/*/collections/*/dataStores/*}:trainCustomModel

The URL uses gRPC Transcoding syntax.

Path parameters

Parameters

Parameters
`dataStore`	`string` Required. The resource name of the Data Store, such as `projects/*/locations/global/collections/default_collection/dataStores/default_data_store`. This field is used to identify the data store where to train the models.

dataStore

string

Required. The resource name of the Data Store, such as projects/*/locations/global/collections/default_collection/dataStores/default_data_store. This field is used to identify the data store where to train the models.

Request body

The request body contains data with the following structure:

JSON representation

JSON representation
{ "modelType": string, "errorConfig": { object (`ImportErrorConfig`) }, "modelId": string, // Union field `training_input` can be only one of the following: "gcsTrainingInput": { object (`GcsTrainingInput`) } // End of list of possible types for union field `training_input`. }

{
  "modelType": string,
  "errorConfig": {
    object (ImportErrorConfig)
  },
  "modelId": string,

  // Union field training_input can be only one of the following:
  "gcsTrainingInput": {
    object (GcsTrainingInput)
  }
  // End of list of possible types for union field training_input.
}

Fields
`modelType`	`string` Model to be trained. Supported values are: search-tuning: Fine tuning the search system based on data provided.
`errorConfig`	`object (ImportErrorConfig)` The desired location of errors incurred during the data ingestion and training.
`modelId`	`string` If not provided, a UUID will be generated.
Union field `training_input`. Model training input. `training_input` can be only one of the following:
`gcsTrainingInput`	`object (GcsTrainingInput)` Cloud Storage training input.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the dataStore resource:

discoveryengine.dataStores.trainCustomModel

For more information, see the IAM documentation.

GcsTrainingInput

Cloud Storage training data input.

JSON representation
{ "corpusDataPath": string, "queryDataPath": string, "trainDataPath": string, "testDataPath": string }

Fields
`corpusDataPath`	`string` The Cloud Storage corpus data which could be associated in train data. The data path format is `gs://<bucket_to_data>/<jsonl_file_name>`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the Id, title and text. Example: `{"Id": "doc1", title: "relevant doc", "text": "relevant text"}`
`queryDataPath`	`string` The gcs query data which could be associated in train data. The data path format is `gs://<bucket_to_data>/<jsonl_file_name>`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the Id and text. Example: {"Id": "query1", "text": "example query"}
`trainDataPath`	`string` Cloud Storage training data path whose format should be `gs://<bucket_to_data>/<tsv_file_name>`. The file should be in tsv format. Each line should have the docId and queryId and score (number). For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in `[0, inf+)`. The larger the number is, the more relevant the pair is. Example: `query-id\tcorpus-id\tscore` `query1\tdoc1\t1`
`testDataPath`	`string` Cloud Storage test data. Same format as trainDataPath. If not provided, a random 80/20 train/test split will be performed on trainDataPath.