Class GcsTrainingInput (0.12.2)

GcsTrainingInput(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Cloud Storage training data input.

Attributes
Name	Description
`corpus_data_path`	`str` The Cloud Storage corpus data which could be associated in train data. The data path format is `gs://`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the \_id, title and text. Example: `{"_id": "doc1", title: "relevant doc", "text": "relevant text"}`
`query_data_path`	`str` The gcs query data which could be associated in train data. The data path format is `gs://`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the \_id and text. Example: {"_id": "query1", "text": "example query"}
`train_data_path`	`str` Cloud Storage training data path whose format should be `gs://`. The file should be in tsv format. Each line should have the doc_id and query_id and score (number). For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in `[0, inf+)`. The larger the number is, the more relevant the pair is. Example: - `query-id\tcorpus-id\tscore` - `query1\tdoc1\t1`
`test_data_path`	`str` Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-02-28 UTC.

Attributes