Cloud AI Platform v1 API - Class InputDataConfig (3.0.0)

public sealed class InputDataConfig : IMessage<InputDataConfig>, IEquatable<InputDataConfig>, IDeepCloneable<InputDataConfig>, IBufferMessage, IMessage

Reference documentation and code samples for the Cloud AI Platform v1 API class InputDataConfig.

Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.

Inheritance

object > InputDataConfig

Namespace

Google.Cloud.AIPlatform.V1

Assembly

Google.Cloud.AIPlatform.V1.dll

Constructors

InputDataConfig()

public InputDataConfig()

InputDataConfig(InputDataConfig)

public InputDataConfig(InputDataConfig other)
Parameter
Name Description
other InputDataConfig

Properties

AnnotationSchemaUri

public string AnnotationSchemaUri { get; set; }

Applicable only to custom training with Datasets that have DataItems and Annotations.

Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with [metadata][google.cloud.aiplatform.v1.Dataset.metadata_schema_uri] of the Dataset specified by [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id].

Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.

When used in conjunction with [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter], the Annotations used for training are filtered by both [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter] and [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri].

Property Value
Type Description
string

AnnotationsFilter

public string AnnotationsFilter { get; set; }

Applicable only to Datasets that have DataItems and Annotations.

A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in [ListAnnotations][google.cloud.aiplatform.v1.DatasetService.ListAnnotations] may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

Property Value
Type Description
string

BigqueryDestination

public BigQueryDestination BigqueryDestination { get; set; }

Only applicable to custom training with tabular Dataset with BigQuery source.

The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training, validation and test.

  • AIP_DATA_FORMAT = "bigquery".
  • AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"

  • AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"

  • AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"

Property Value
Type Description
BigQueryDestination

DatasetId

public string DatasetId { get; set; }

Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

Property Value
Type Description
string

DestinationCase

public InputDataConfig.DestinationOneofCase DestinationCase { get; }
Property Value
Type Description
InputDataConfigDestinationOneofCase

FilterSplit

public FilterSplit FilterSplit { get; set; }

Split based on the provided filters for each set.

Property Value
Type Description
FilterSplit

FractionSplit

public FractionSplit FractionSplit { get; set; }

Split based on fractions defining the size of each set.

Property Value
Type Description
FractionSplit

GcsDestination

public GcsDestination GcsDestination { get; set; }

The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory.

The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

  • AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
  • AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"

  • AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"

  • AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"

Property Value
Type Description
GcsDestination

PersistMlUseAssignment

public bool PersistMlUseAssignment { get; set; }

Whether to persist the ML use assignment to data item system labels.

Property Value
Type Description
bool

PredefinedSplit

public PredefinedSplit PredefinedSplit { get; set; }

Supported only for tabular Datasets.

Split based on a predefined key.

Property Value
Type Description
PredefinedSplit

SavedQueryId

public string SavedQueryId { get; set; }

Only applicable to Datasets that have SavedQueries.

The ID of a SavedQuery (annotation set) under the Dataset specified by [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id] used for filtering Annotations for training.

Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter], the Annotations used for training are filtered by both [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id] and [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter].

Only one of [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id] and [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri] should be specified as both of them represent the same thing: problem type.

Property Value
Type Description
string

SplitCase

public InputDataConfig.SplitOneofCase SplitCase { get; }
Property Value
Type Description
InputDataConfigSplitOneofCase

StratifiedSplit

public StratifiedSplit StratifiedSplit { get; set; }

Supported only for tabular Datasets.

Split based on the distribution of the specified column.

Property Value
Type Description
StratifiedSplit

TimestampSplit

public TimestampSplit TimestampSplit { get; set; }

Supported only for tabular Datasets.

Split based on the timestamp of the input data pieces.

Property Value
Type Description
TimestampSplit