Class InputDataConfig (0.4.0)

InputDataConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Specifies AI Platform owned input data to be used for training, and possibly evaluating, the Model.

Attributes

NameDescription
fraction_split `.training_pipeline.FractionSplit`
Split based on fractions defining the size of each set.
filter_split `.training_pipeline.FilterSplit`
Split based on the provided filters for each set.
predefined_split `.training_pipeline.PredefinedSplit`
Supported only for tabular Datasets. Split based on a predefined key.
timestamp_split `.training_pipeline.TimestampSplit`
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
gcs_destination `.io.GcsDestination`
The Google Cloud Storage location where the training data is to be written to. In the given directory a new directory will be created with name: ``dataset-
bigquery_destination `.io.BigQueryDestination`
The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name ``dataset_
dataset_id str
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
annotations_filter str
Only applicable to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by AI Platform). A filter with same syntax as the one used in ``ListAnnotations`` may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
annotation_schema_uri str
Only applicable to custom training. Google Cloud Storage URI points to a YAML file describing annotation schema. The schema is defined as an OpenAPI 3.0.2 [Schema Object](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.2.md#schema-object) The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/, note that the chosen schema must be consistent with ``metadata`` of the Dataset specified by ``dataset_id``. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with ``annotations_filter``, the Annotations used for training are filtered by both ``annotations_filter`` and ``annotation_schema_uri``.

Inheritance

builtins.object > proto.message.Message > InputDataConfig