Google Cloud Discovery Engine V1 Client - Class GcsTrainingInput (1.2.0)

Reference documentation and code samples for the Google Cloud Discovery Engine V1 Client class GcsTrainingInput.

Cloud Storage training data input.

Generated from protobuf message google.cloud.discoveryengine.v1.TrainCustomModelRequest.GcsTrainingInput

Namespace

Google \ Cloud \ DiscoveryEngine \ V1 \ TrainCustomModelRequest

Methods

__construct

Constructor.

Parameters
Name Description
data array

Optional. Data for populating the Message object.

↳ corpus_data_path string

The Cloud Storage corpus data which could be associated in train data. The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id, title and text. Example: {"_id": "doc1", title: "relevant doc", "text": "relevant text"}

↳ query_data_path string

The gcs query data which could be associated in train data. The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id and text. Example: {"_id": "query1", "text": "example query"}

↳ train_data_path string

Cloud Storage training data path whose format should be gs://<bucket_to_data>/<tsv_file_name>. The file should be in tsv format. Each line should have the doc_id and query_id and score (number). For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in [0, inf+). The larger the number is, the more relevant the pair is. Example: * query-id\tcorpus-id\tscore * query1\tdoc1\t1

↳ test_data_path string

Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path.

getCorpusDataPath

The Cloud Storage corpus data which could be associated in train data.

The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id, title and text. Example: {"_id": "doc1", title: "relevant doc", "text": "relevant text"}

Returns
Type Description
string

setCorpusDataPath

The Cloud Storage corpus data which could be associated in train data.

The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id, title and text. Example: {"_id": "doc1", title: "relevant doc", "text": "relevant text"}

Parameter
Name Description
var string
Returns
Type Description
$this

getQueryDataPath

The gcs query data which could be associated in train data.

The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id and text. Example: {"_id": "query1", "text": "example query"}

Returns
Type Description
string

setQueryDataPath

The gcs query data which could be associated in train data.

The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id and text. Example: {"_id": "query1", "text": "example query"}

Parameter
Name Description
var string
Returns
Type Description
$this

getTrainDataPath

Cloud Storage training data path whose format should be gs://<bucket_to_data>/<tsv_file_name>. The file should be in tsv format. Each line should have the doc_id and query_id and score (number).

For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in [0, inf+). The larger the number is, the more relevant the pair is. Example:

  • query-id\tcorpus-id\tscore
  • query1\tdoc1\t1
Returns
Type Description
string

setTrainDataPath

Cloud Storage training data path whose format should be gs://<bucket_to_data>/<tsv_file_name>. The file should be in tsv format. Each line should have the doc_id and query_id and score (number).

For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in [0, inf+). The larger the number is, the more relevant the pair is. Example:

  • query-id\tcorpus-id\tscore
  • query1\tdoc1\t1
Parameter
Name Description
var string
Returns
Type Description
$this

getTestDataPath

Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path.

Returns
Type Description
string

setTestDataPath

Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path.

Parameter
Name Description
var string
Returns
Type Description
$this