Class GcsTrainingInput (0.11.10)

GcsTrainingInput(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Cloud Storage training data input.

Attributes
Name	Description
`corpus_data_path`	`str` The Cloud Storage corpus data which could be associated in train data. The data path format is `gs://`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the \_id, title and text. Example: {"_id": "doc1", title: "relevant doc", "text": "relevant text"}
`query_data_path`	`str` The gcs query data which could be associated in train data. The data path format is `gs://`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the \_id and text. Example: {"_id": "query1", "text": "example query"}
`train_data_path`	`str` Cloud Storage training data path whose format should be `gs://`. The file should be in tsv format. Each line should have the doc_id and query_id and score (number). For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in `[0, inf+)`. The larger the number is, the more relevant the pair is. Example: - `query-id\tcorpus-id\tscore` - `query1\tdoc1\t1`
`test_data_path`	`str` Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path.

Attributes