Class GcsTrainingInput (0.12.0)

GcsTrainingInput(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Cloud Storage training data input.

Attributes

Name Description
corpus_data_path str
The Cloud Storage corpus data which could be associated in train data. The data path format is gs://. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the \_id, title and text. Example: {"_id": "doc1", title: "relevant doc", "text": "relevant text"}
query_data_path str
The gcs query data which could be associated in train data. The data path format is gs://. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the \_id and text. Example: {"_id": "query1", "text": "example query"}
train_data_path str
Cloud Storage training data path whose format should be gs://. The file should be in tsv format. Each line should have the doc_id and query_id and score (number). For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in [0, inf+). The larger the number is, the more relevant the pair is. Example: - query-id\tcorpus-id\tscore - query1\tdoc1\t1
test_data_path str
Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path.