Resource: TrainingPipeline
The TrainingPipeline orchestrates tasks associated with training a Model. It always executes the training task, and optionally may also export data from Vertex AI's Dataset which becomes the training input, upload
the Model to Vertex AI, and evaluate the Model.
JSON representation |
---|
{ "name": string, "displayName": string, "inputDataConfig": { object ( |
Fields | |
---|---|
name |
Output only. Resource name of the TrainingPipeline. |
displayName |
Required. The user-defined name of this TrainingPipeline. |
inputDataConfig |
Specifies Vertex AI owned input data that may be used for training the Model. The TrainingPipeline's |
trainingTaskDefinition |
Required. A Google Cloud Storage path to the YAML file that defines the training task which is responsible for producing the model artifact, and may also include additional auxiliary work. The definition files that can be used here are found in gs://google-cloud-aiplatform/schema/trainingjob/definition/. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access. |
trainingTaskInputs |
Required. The training task's parameter(s), as specified in the |
trainingTaskMetadata |
Output only. The metadata information as specified in the |
modelToUpload |
Describes the Model that may be uploaded (via |
state |
Output only. The detailed state of the pipeline. |
error |
Output only. Only populated when the pipeline's state is |
createTime |
Output only. Time when the TrainingPipeline was created. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
startTime |
Output only. Time when the TrainingPipeline for the first time entered the A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
endTime |
Output only. Time when the TrainingPipeline entered any of the following states: A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
updateTime |
Output only. Time when the TrainingPipeline was most recently updated. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
labels |
The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. |
encryptionSpec |
Customer-managed encryption key spec for a TrainingPipeline. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key if |
InputDataConfig
Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.
JSON representation |
---|
{ "datasetId": string, "annotationsFilter": string, "annotationSchemaUri": string, // Union field |
Fields | |
---|---|
datasetId |
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's |
annotationsFilter |
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in |
annotationSchemaUri |
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with |
Union field split . The instructions how the input data should be split between the training, validation and test sets. If no split type is provided, the fraction_split is used by default. split can be only one of the following: |
|
fractionSplit |
Split based on fractions defining the size of each set. |
filterSplit |
Split based on the provided filters for each set. |
predefinedSplit |
Supported only for tabular Datasets. Split based on a predefined key. |
timestampSplit |
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces. |
Union field The destination of the training data to be written to. Supported destination file formats: * For non-tabular data: "jsonl". * For tabular data: "csv" and "bigquery". The following Vertex AI environment variables are passed to containers or python modules of the training task when this field is set:
|
|
gcsDestination |
The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"
|
bigqueryDestination |
Only applicable to custom training with tabular Dataset with BigQuery source. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
|
FractionSplit
Assigns the input data to training, validation, and test sets as per the given fractions. Any of trainingFraction
, validationFraction
and testFraction
may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data is used for training, 10% for validation, and 10% for test.
JSON representation |
---|
{ "trainingFraction": number, "validationFraction": number, "testFraction": number } |
Fields | |
---|---|
trainingFraction |
The fraction of the input data that is to be used to train the Model. |
validationFraction |
The fraction of the input data that is to be used to validate the Model. |
testFraction |
The fraction of the input data that is to be used to evaluate the Model. |
FilterSplit
Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as '-' (the minus sign).
Supported only for unstructured Datasets.
JSON representation |
---|
{ "trainingFilter": string, "validationFilter": string, "testFilter": string } |
Fields | |
---|---|
trainingFilter |
Required. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in |
validationFilter |
Required. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in |
testFilter |
Required. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in |
PredefinedSplit
Assigns input data to training, validation, and test sets based on the value of a provided key.
Supported only for tabular Datasets.
JSON representation |
---|
{ "key": string } |
Fields | |
---|---|
key |
Required. The key is a name of one of the Dataset's data columns. The value of the key (either the label's value or value in the column) must be one of { |
TimestampSplit
Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set.
Supported only for tabular Datasets.
JSON representation |
---|
{ "trainingFraction": number, "validationFraction": number, "testFraction": number, "key": string } |
Fields | |
---|---|
trainingFraction |
The fraction of the input data that is to be used to train the Model. |
validationFraction |
The fraction of the input data that is to be used to validate the Model. |
testFraction |
The fraction of the input data that is to be used to evaluate the Model. |
key |
Required. The key is a name of one of the Dataset's data columns. The values of the key (the values in the column) must be in RFC 3339 |
Methods |
|
---|---|
|
Cancels a TrainingPipeline. |
|
Creates a TrainingPipeline. |
|
Deletes a TrainingPipeline. |
|
Gets a TrainingPipeline. |
|
Lists TrainingPipelines in a Location. |