Interface InputDataConfigOrBuilder (3.52.0)

public interface InputDataConfigOrBuilder extends MessageOrBuilder

Implements

MessageOrBuilder

Methods

getAnnotationSchemaUri()

public abstract String getAnnotationSchemaUri()

Applicable only to custom training with Datasets that have DataItems and Annotations.

Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id.

Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.

When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.

string annotation_schema_uri = 9;

Returns
Type Description
String

The annotationSchemaUri.

getAnnotationSchemaUriBytes()

public abstract ByteString getAnnotationSchemaUriBytes()

Applicable only to custom training with Datasets that have DataItems and Annotations.

Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id.

Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.

When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.

string annotation_schema_uri = 9;

Returns
Type Description
ByteString

The bytes for annotationSchemaUri.

getAnnotationsFilter()

public abstract String getAnnotationsFilter()

Applicable only to Datasets that have DataItems and Annotations.

A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

string annotations_filter = 6;

Returns
Type Description
String

The annotationsFilter.

getAnnotationsFilterBytes()

public abstract ByteString getAnnotationsFilterBytes()

Applicable only to Datasets that have DataItems and Annotations.

A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

string annotations_filter = 6;

Returns
Type Description
ByteString

The bytes for annotationsFilter.

getBigqueryDestination()

public abstract BigQueryDestination getBigqueryDestination()

Only applicable to custom training with tabular Dataset with BigQuery source.

The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id><annotation-type><timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training, validation and test.

  • AIP_DATA_FORMAT = "bigquery".
  • AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"

  • AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"

  • AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"

.google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;

Returns
Type Description
BigQueryDestination

The bigqueryDestination.

getBigqueryDestinationOrBuilder()

public abstract BigQueryDestinationOrBuilder getBigqueryDestinationOrBuilder()

Only applicable to custom training with tabular Dataset with BigQuery source.

The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id><annotation-type><timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training, validation and test.

  • AIP_DATA_FORMAT = "bigquery".
  • AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"

  • AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"

  • AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"

.google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;

Returns
Type Description
BigQueryDestinationOrBuilder

getDatasetId()

public abstract String getDatasetId()

Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];

Returns
Type Description
String

The datasetId.

getDatasetIdBytes()

public abstract ByteString getDatasetIdBytes()

Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];

Returns
Type Description
ByteString

The bytes for datasetId.

getDestinationCase()

public abstract InputDataConfig.DestinationCase getDestinationCase()
Returns
Type Description
InputDataConfig.DestinationCase

getFilterSplit()

public abstract FilterSplit getFilterSplit()

Split based on the provided filters for each set.

.google.cloud.aiplatform.v1.FilterSplit filter_split = 3;

Returns
Type Description
FilterSplit

The filterSplit.

getFilterSplitOrBuilder()

public abstract FilterSplitOrBuilder getFilterSplitOrBuilder()

Split based on the provided filters for each set.

.google.cloud.aiplatform.v1.FilterSplit filter_split = 3;

Returns
Type Description
FilterSplitOrBuilder

getFractionSplit()

public abstract FractionSplit getFractionSplit()

Split based on fractions defining the size of each set.

.google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;

Returns
Type Description
FractionSplit

The fractionSplit.

getFractionSplitOrBuilder()

public abstract FractionSplitOrBuilder getFractionSplitOrBuilder()

Split based on fractions defining the size of each set.

.google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;

Returns
Type Description
FractionSplitOrBuilder

getGcsDestination()

public abstract GcsDestination getGcsDestination()

The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory.

The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

  • AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
  • AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"

  • AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"

  • AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"

.google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;

Returns
Type Description
GcsDestination

The gcsDestination.

getGcsDestinationOrBuilder()

public abstract GcsDestinationOrBuilder getGcsDestinationOrBuilder()

The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory.

The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

  • AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
  • AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"

  • AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"

  • AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"

.google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;

Returns
Type Description
GcsDestinationOrBuilder

getPersistMlUseAssignment()

public abstract boolean getPersistMlUseAssignment()

Whether to persist the ML use assignment to data item system labels.

bool persist_ml_use_assignment = 11;

Returns
Type Description
boolean

The persistMlUseAssignment.

getPredefinedSplit()

public abstract PredefinedSplit getPredefinedSplit()

Supported only for tabular Datasets.

Split based on a predefined key.

.google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;

Returns
Type Description
PredefinedSplit

The predefinedSplit.

getPredefinedSplitOrBuilder()

public abstract PredefinedSplitOrBuilder getPredefinedSplitOrBuilder()

Supported only for tabular Datasets.

Split based on a predefined key.

.google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;

Returns
Type Description
PredefinedSplitOrBuilder

getSavedQueryId()

public abstract String getSavedQueryId()

Only applicable to Datasets that have SavedQueries.

The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training.

Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter.

Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.

string saved_query_id = 7;

Returns
Type Description
String

The savedQueryId.

getSavedQueryIdBytes()

public abstract ByteString getSavedQueryIdBytes()

Only applicable to Datasets that have SavedQueries.

The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training.

Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter.

Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.

string saved_query_id = 7;

Returns
Type Description
ByteString

The bytes for savedQueryId.

getSplitCase()

public abstract InputDataConfig.SplitCase getSplitCase()
Returns
Type Description
InputDataConfig.SplitCase

getStratifiedSplit()

public abstract StratifiedSplit getStratifiedSplit()

Supported only for tabular Datasets.

Split based on the distribution of the specified column.

.google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;

Returns
Type Description
StratifiedSplit

The stratifiedSplit.

getStratifiedSplitOrBuilder()

public abstract StratifiedSplitOrBuilder getStratifiedSplitOrBuilder()

Supported only for tabular Datasets.

Split based on the distribution of the specified column.

.google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;

Returns
Type Description
StratifiedSplitOrBuilder

getTimestampSplit()

public abstract TimestampSplit getTimestampSplit()

Supported only for tabular Datasets.

Split based on the timestamp of the input data pieces.

.google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;

Returns
Type Description
TimestampSplit

The timestampSplit.

getTimestampSplitOrBuilder()

public abstract TimestampSplitOrBuilder getTimestampSplitOrBuilder()

Supported only for tabular Datasets.

Split based on the timestamp of the input data pieces.

.google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;

Returns
Type Description
TimestampSplitOrBuilder

hasBigqueryDestination()

public abstract boolean hasBigqueryDestination()

Only applicable to custom training with tabular Dataset with BigQuery source.

The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id><annotation-type><timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training, validation and test.

  • AIP_DATA_FORMAT = "bigquery".
  • AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.training"

  • AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.validation"

  • AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id><annotation-type><time>.test"

.google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;

Returns
Type Description
boolean

Whether the bigqueryDestination field is set.

hasFilterSplit()

public abstract boolean hasFilterSplit()

Split based on the provided filters for each set.

.google.cloud.aiplatform.v1.FilterSplit filter_split = 3;

Returns
Type Description
boolean

Whether the filterSplit field is set.

hasFractionSplit()

public abstract boolean hasFractionSplit()

Split based on fractions defining the size of each set.

.google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;

Returns
Type Description
boolean

Whether the fractionSplit field is set.

hasGcsDestination()

public abstract boolean hasGcsDestination()

The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory.

The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

  • AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
  • AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"

  • AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"

  • AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"

.google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;

Returns
Type Description
boolean

Whether the gcsDestination field is set.

hasPredefinedSplit()

public abstract boolean hasPredefinedSplit()

Supported only for tabular Datasets.

Split based on a predefined key.

.google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;

Returns
Type Description
boolean

Whether the predefinedSplit field is set.

hasStratifiedSplit()

public abstract boolean hasStratifiedSplit()

Supported only for tabular Datasets.

Split based on the distribution of the specified column.

.google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;

Returns
Type Description
boolean

Whether the stratifiedSplit field is set.

hasTimestampSplit()

public abstract boolean hasTimestampSplit()

Supported only for tabular Datasets.

Split based on the timestamp of the input data pieces.

.google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;

Returns
Type Description
boolean

Whether the timestampSplit field is set.