- 1.75.0 (latest)
- 1.74.0
- 1.73.0
- 1.72.0
- 1.71.1
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
Google Cloud Aiplatform SDK
class google.cloud.aiplatform.AutoMLForecastingTrainingJob(display_name: str, optimization_objective: Optional[str] = None, column_specs: Optional[Dict[str, str]] = None, column_transformations: Optional[List[Dict[str, Dict[str, str]]]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._TrainingJob
Constructs a AutoML Forecasting Training Job.
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
optimization_objective (str) – Optional. Objective function the model is to be optimized towards. The training process creates a Model that optimizes the value of the objective function over the validation set. The supported optimization objectives: “minimize-rmse” (default) - Minimize root-mean-squared error (RMSE). “minimize-mae” - Minimize mean-absolute error (MAE). “minimize-rmsle” - Minimize root-mean-squared log error (RMSLE). “minimize-rmspe” - Minimize root-mean-squared percentage error (RMSPE). “minimize-wape-mae” - Minimize the combination of weighted absolute percentage error (WAPE)
and mean-absolute-error (MAE).
”minimize-quantile-loss” - Minimize the quantile loss at the defined quantiles.
(Set this objective to build quantile forecasts.)
column_specs (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed.
column_transformations (List[Dict[str, **Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]]]*) – Optional. Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column’s value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually.
project (str) – Optional. Project to run training in. Overrides project set in aiplatform.init.
location (str) – Optional. Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
Raises
ValueError – If both column_transformations and column_specs were provided.
property evaluated_data_items_bigquery_uri(: Optional[str )
BigQuery location of exported evaluated examples from the Training Job :returns:
BigQuery uri for the exported evaluated examples if the export
feature is enabled for training.
None: If the export feature was not enabled for training.
Return type
run(dataset: google.cloud.aiplatform.datasets.time_series_dataset.TimeSeriesDataset, target_column: str, time_column: str, time_series_identifier_column: str, unavailable_at_forecast_columns: List[str], available_at_forecast_columns: List[str], forecast_horizon: int, data_granularity_unit: str, data_granularity_count: int, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, predefined_split_column_name: Optional[str] = None, weight_column: Optional[str] = None, time_series_attribute_columns: Optional[List[str]] = None, context_window: Optional[int] = None, export_evaluated_data_items: bool = False, export_evaluated_data_items_bigquery_destination_uri: Optional[str] = None, export_evaluated_data_items_override_destination: bool = False, quantiles: Optional[List[float]] = None, validation_options: Optional[str] = None, budget_milli_node_hours: int = 1000, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, additional_experiments: Optional[List[str]] = None, sync: bool = True)
Runs the training job and returns a model.
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
Any of `training_fraction_split`, `validation_fraction_split` and
`test_fraction_split` may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Predefined splits:
Assigns input data to training, validation, and test sets based on the value of a provided key.
If using predefined splits, `predefined_split_column_name` must be provided.
Supported only for tabular Datasets.
Timestamp splits:
Assigns input data to training, validation, and test sets
based on a provided timestamps. The youngest data pieces are
assigned to training set, next to validation set, and the oldest
to the test set.
Supported only for tabular Datasets.
Parameters
dataset (datasets.TimeSeriesDataset) – Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For time series Datasets, all their data is exported to training, to pick and choose from.
target_column (str) – Required. Name of the column that the Model is to predict values for.
time_column (str) – Required. Name of the column that identifies time order in the time series.
time_series_identifier_column (str) – Required. Name of the column that identifies the time series.
unavailable_at_forecast_columns (List[str]) – Required. Column names of columns that are unavailable at forecast. Each column contains information for the given entity (identified by the [time_series_identifier_column]) that is unknown before the forecast (e.g. population of a city in a given year, or weather on a given day).
available_at_forecast_columns (List[str]) – Required. Column names of columns that are available at forecast. Each column contains information for the given entity (identified by the [time_series_identifier_column]) that is known at forecast.
forecast_horizon – (int): Required. The amount of time into the future for which forecasted values for the target are returned. Expressed in number of units defined by the [data_granularity_unit] and [data_granularity_count] field. Inclusive.
data_granularity_unit (str) – Required. The data granularity unit. Accepted values are
minute
,hour
,day
,week
,month
,year
.data_granularity_count (int) – Required. The number of data granularity units between data points in the training data. If [data_granularity_unit] is minute, can be 1, 5, 10, 15, or 30. For all other values of [data_granularity_unit], must be 1.
predefined_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
TRAIN
,VALIDATE
,TEST
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.Supported only for tabular and time series Datasets.
weight_column (str) – Optional. Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1.
time_series_attribute_columns (List[str]) – Optional. Column names that should be used as attribute columns. Each column is constant within a time series.
context_window (int) – Optional. The amount of time into the past training and prediction data is used for model training and prediction respectively. Expressed in number of units defined by the [data_granularity_unit] and [data_granularity_count] fields. When not provided uses the default value of 0 which means the model sets each series context window to be 0 (also known as “cold start”). Inclusive.
export_evaluated_data_items (bool) – Whether to export the test set predictions to a BigQuery table. If False, then the export is not performed.
export_evaluated_data_items_bigquery_destination_uri (string) – Optional. URI of desired destination BigQuery table for exported test set predictions.
Expected format:
bq://<project_id>:<dataset_id>:<table>
If not specified, then results are exported to the following auto-created BigQuery table:
<project_id>:export_evaluated_examples_<model_name>_<yyyy_MM_dd'T'HH_mm_ss_SSS'Z'>.evaluated_examples
Applies only if [export_evaluated_data_items] is True.
export_evaluated_data_items_override_destination (bool) – Whether to override the contents of [export_evaluated_data_items_bigquery_destination_uri], if the table exists, for exported test set predictions. If False, and the table exists, then the training job will fail.
Applies only if [export_evaluated_data_items] is True and [export_evaluated_data_items_bigquery_destination_uri] is specified.
quantiles (List[float]) – Quantiles to use for the minizmize-quantile-loss [AutoMLForecastingTrainingJob.optimization_objective]. This argument is required in this case.
Accepts up to 5 quantiles in the form of a double from 0 to 1, exclusive. Each quantile must be unique.
validation_options (str) – Validation options for the data validation component. The available options are: “fail-pipeline” - (default), will validate against the validation and fail the pipeline
if it fails.
”ignore-validation” - ignore the results of the validation and continue the pipeline
budget_milli_node_hours (int) – Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error. The minimum value is 1000 and the maximum is 72000.
model_display_name (str) – Optional. If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
additional_experiments (List[str]) – Optional. Additional experiment flags for the time series forcasting training.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
Return type
model
Raises
RuntimeError – If Training job has already been run or is waiting to run.
class google.cloud.aiplatform.AutoMLImageTrainingJob(display_name: str, prediction_type: str = 'classification', multi_label: bool = False, model_type: str = 'CLOUD', base_model: Optional[google.cloud.aiplatform.models.Model] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._TrainingJob
Constructs a AutoML Image Training Job.
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
prediction_type (str) – The type of prediction the Model is to produce, one of:
”classification” - Predict one out of multiple target values is
picked for each row.
”object_detection” - Predict a value based on its relation to other values.
This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings.
multi_label – bool = False Required. Default is False. If false, a single-label (multi-class) Model will be trained (i.e. assuming that for each image just up to one annotation may be applicable). If true, a multi-label Model will be trained (i.e. assuming that for each image multiple annotations may be applicable).
This is only applicable for the “classification” prediction_type and will be ignored otherwise.
model_type – str = “CLOUD” Required. One of the following:
”CLOUD” - Default for Image Classification.
A Model best tailored to be used within Google Cloud, and which cannot be exported.
”CLOUD_HIGH_ACCURACY_1” - Default for Image Object Detection.
A model best tailored to be used within Google Cloud, and which cannot be exported. Expected to have a higher latency, but should also have a higher prediction quality than other cloud models.
”CLOUD_LOW_LATENCY_1” - A model best tailored to be used within
Google Cloud, and which cannot be exported. Expected to have a low latency, but may have lower prediction quality than other cloud models.
”MOBILE_TF_LOW_LATENCY_1” - A model that, in addition to being
available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device afterwards. Expected to have low latency, but may have lower prediction quality than other mobile models.
”MOBILE_TF_VERSATILE_1” - A model that, in addition to being
available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device with afterwards.
”MOBILE_TF_HIGH_ACCURACY_1” - A model that, in addition to being
available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device afterwards. Expected to have a higher latency, but should also have a higher prediction quality than other mobile models.
base_model – Optional[models.Model] = None Optional. Only permitted for Image Classification models. If it is specified, the new model will be trained based on the base model. Otherwise, the new model will be trained from scratch. The base model must be in the same Project and Location as the new Model to train, and have the same model_type.
project (str) – Optional. Project to run training in. Overrides project set in aiplatform.init.
location (str) – Optional. Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
Raises
ValueError – When an invalid prediction_type or model_type is provided.
run(dataset: google.cloud.aiplatform.datasets.image_dataset.ImageDataset, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, validation_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, budget_milli_node_hours: Optional[int] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, disable_early_stopping: bool = False, sync: bool = True)
Runs the AutoML Image training job and returns a model.
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
Any of `training_fraction_split`, `validation_fraction_split` and
`test_fraction_split` may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Data filter splits:
Assigns input data to training, validation, and test sets
based on the given filters, data pieces not matched by any
filter are ignored. Currently only supported for Datasets
containing DataItems.
If any of the filters in this message are to match nothing, then
they can be set as ‘-’ (the minus sign).
If using filter splits, all of `training_filter_split`, `validation_filter_split` and
`test_filter_split` must be provided.
Supported only for unstructured Datasets.
Parameters
dataset (datasets.ImageDataset) – Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
training_fraction_split (float) – Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
validation_fraction_split (float) – Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
test_fraction_split (float) – Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
training_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
validation_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
test_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
budget_milli_node_hours (int) – Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour.
Defaults by prediction_type:
classification - For Cloud models the budget must be: 8,000 - 800,000 milli node hours (inclusive). The default value is 192,000 which represents one day in wall time, assuming 8 nodes are used. object_detection - For Cloud models the budget must be: 20,000 - 900,000 milli node hours (inclusive). The default value is 216,000 which represents one day in wall time, assuming 9 nodes are used.
The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error.
model_display_name (str) – Optional. The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
disable_early_stopping – bool = False Required. If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model.
sync – bool = True Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
Return type
model
Raises
RuntimeError – If Training job has already been run or is waiting to run.
class google.cloud.aiplatform.AutoMLTabularTrainingJob(display_name: str, optimization_prediction_type: str, optimization_objective: Optional[str] = None, column_specs: Optional[Dict[str, str]] = None, column_transformations: Optional[List[Dict[str, Dict[str, str]]]] = None, optimization_objective_recall_value: Optional[float] = None, optimization_objective_precision_value: Optional[float] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._TrainingJob
Constructs a AutoML Tabular Training Job.
Example usage:
job = training_jobs.AutoMLTabularTrainingJob(
display_name=”my_display_name”,
optimization_prediction_type=”classification”,
optimization_objective=”minimize-log-loss”,
column_specs={“column_1”: “auto”, “column_2”: “numeric”},
labels={‘key’: ‘value’},
)
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
optimization_prediction_type (str) – The type of prediction the Model is to produce. “classification” - Predict one out of multiple target values is picked for each row. “regression” - Predict a value based on its relation to other values. This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings.
optimization_objective (str) – Optional. Objective function the Model is to be optimized towards. The training task creates a Model that maximizes/minimizes the value of the objective function over the validation set.
The supported optimization objectives depend on the prediction type, and in the case of classification also the number of distinct values in the target column (two distint values -> binary, 3 or more distinct values -> multi class). If the field is not set, the default objective function is used.
Classification (binary): “maximize-au-roc” (default) - Maximize the area under the receiver
operating characteristic (ROC) curve.
”minimize-log-loss” - Minimize log loss. “maximize-au-prc” - Maximize the area under the precision-recall curve. “maximize-precision-at-recall” - Maximize precision for a specified
recall value.
”maximize-recall-at-precision” - Maximize recall for a specified
precision value.
Classification (multi class): “minimize-log-loss” (default) - Minimize log loss.
Regression: “minimize-rmse” (default) - Minimize root-mean-squared error (RMSE). “minimize-mae” - Minimize mean-absolute error (MAE). “minimize-rmsle” - Minimize root-mean-squared log error (RMSLE).
column_specs (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed.
column_transformations (List[Dict[str, **Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]]]*) – Optional. Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column’s value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually.
optimization_objective_recall_value (float) – Optional. Required when maximize-precision-at-recall optimizationObjective was picked, represents the recall value at which the optimization is done.
The minimum value is 0 and the maximum is 1.0.
optimization_objective_precision_value (float) – Optional. Required when maximize-recall-at-precision optimizationObjective was picked, represents the precision value at which the optimization is done.
The minimum value is 0 and the maximum is 1.0.
project (str) – Optional. Project to run training in. Overrides project set in aiplatform.init.
location (str) – Optional. Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
Raises
ValueError – If both column_transformations and column_specs were provided.
static get_auto_column_specs(dataset: google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset, target_column: str)
Returns a dict with all non-target columns as keys and ‘auto’ as values.
Example usage:
column_specs = training_jobs.AutoMLTabularTrainingJob.get_auto_column_specs(
dataset=my_dataset,
target_column=”my_target_column”,
)
Parameters
dataset (datasets.TabularDataset) – Required. Intended dataset.
target_column (str) – Required. Intended target column.
Returns
Dict[str, str]
Column names as keys and ‘auto’ as values
run(dataset: google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset, target_column: str, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, predefined_split_column_name: Optional[str] = None, timestamp_split_column_name: Optional[str] = None, weight_column: Optional[str] = None, budget_milli_node_hours: int = 1000, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, disable_early_stopping: bool = False, export_evaluated_data_items: bool = False, export_evaluated_data_items_bigquery_destination_uri: Optional[str] = None, export_evaluated_data_items_override_destination: bool = False, additional_experiments: Optional[List[str]] = None, sync: bool = True)
Runs the training job and returns a model.
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
Any of `training_fraction_split`, `validation_fraction_split` and
`test_fraction_split` may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Predefined splits:
Assigns input data to training, validation, and test sets based on the value of a provided key.
If using predefined splits, `predefined_split_column_name` must be provided.
Supported only for tabular Datasets.
Timestamp splits:
Assigns input data to training, validation, and test sets
based on a provided timestamps. The youngest data pieces are
assigned to training set, next to validation set, and the oldest
to the test set.
Supported only for tabular Datasets.
Parameters
dataset (datasets.TabularDataset) – Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
target_column (str) – Required. The name of the column values of which the Model is to predict.
training_fraction_split (float) – Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
validation_fraction_split (float) – Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
test_fraction_split (float) – Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
predefined_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
training
,validation
,test
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.Supported only for tabular and time series Datasets.
timestamp_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.
Supported only for tabular and time series Datasets. This parameter must be used with training_fraction_split, validation_fraction_split and test_fraction_split.
weight_column (str) – Optional. Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1.
budget_milli_node_hours (int) – Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error. The minimum value is 1000 and the maximum is 72000.
model_display_name (str) – Optional. If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
disable_early_stopping (bool) – Required. If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model.
export_evaluated_data_items (bool) – Whether to export the test set predictions to a BigQuery table. If False, then the export is not performed.
export_evaluated_data_items_bigquery_destination_uri (string) – Optional. URI of desired destination BigQuery table for exported test set predictions.
Expected format:
bq://<project_id>:<dataset_id>:<table>
If not specified, then results are exported to the following auto-created BigQuery table:
<project_id>:export_evaluated_examples_<model_name>_<yyyy_MM_dd'T'HH_mm_ss_SSS'Z'>.evaluated_examples
Applies only if [export_evaluated_data_items] is True.
export_evaluated_data_items_override_destination (bool) – Whether to override the contents of [export_evaluated_data_items_bigquery_destination_uri], if the table exists, for exported test set predictions. If False, and the table exists, then the training job will fail.
Applies only if [export_evaluated_data_items] is True and [export_evaluated_data_items_bigquery_destination_uri] is specified.
additional_experiments (List[str]) – Optional. Additional experiment flags for the automl tables training.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
Return type
model
Raises
RuntimeError – If Training job has already been run or is waiting to run.
class google.cloud.aiplatform.AutoMLTextTrainingJob(display_name: str, prediction_type: str, multi_label: bool = False, sentiment_max: int = 10, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._TrainingJob
Constructs a AutoML Text Training Job.
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
prediction_type (str) – The type of prediction the Model is to produce, one of:
”classification” - A classification model analyzes text data and
returns a list of categories that apply to the text found in the data. Vertex AI offers both single-label and multi-label text classification models.
”extraction” - An entity extraction model inspects text data
for known entities referenced in the data and labels those entities in the text.
”sentiment” - A sentiment analysis model inspects text data and identifies the
prevailing emotional opinion within it, especially to determine a writer’s attitude as positive, negative, or neutral.
multi_label (bool) – Required and only applicable for text classification task. If false, a single-label (multi-class) Model will be trained (i.e. assuming that for each text snippet just up to one annotation may be applicable). If true, a multi-label Model will be trained (i.e. assuming that for each text snippet multiple annotations may be applicable).
sentiment_max (int) – Required and only applicable for sentiment task. A sentiment is expressed as an integer ordinal, where higher value means a more positive sentiment. The range of sentiments that will be used is between 0 and sentimentMax (inclusive on both ends), and all the values in the range must be represented in the dataset before a model can be created. Only the Annotations with this sentimentMax will be used for training. sentimentMax value must be between 1 and 10 (inclusive).
project (str) – Optional. Project to run training in. Overrides project set in aiplatform.init.
location (str) – Optional. Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
run(dataset: google.cloud.aiplatform.datasets.text_dataset.TextDataset, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, validation_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, sync: bool = True)
Runs the training job and returns a model.
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
Any of `training_fraction_split`, `validation_fraction_split` and
`test_fraction_split` may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Data filter splits:
Assigns input data to training, validation, and test sets
based on the given filters, data pieces not matched by any
filter are ignored. Currently only supported for Datasets
containing DataItems.
If any of the filters in this message are to match nothing, then
they can be set as ‘-’ (the minus sign).
If using filter splits, all of `training_filter_split`, `validation_filter_split` and
`test_filter_split` must be provided.
Supported only for unstructured Datasets.
Parameters
dataset (datasets.TextDataset) – Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition].
training_fraction_split (float) – Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
validation_fraction_split (float) – Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
test_fraction_split (float) – Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
training_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
validation_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
test_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
model_display_name (str) – Optional. The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource.
Return type
model
Raises
RuntimeError – If Training job has already been run or is waiting to run.
class google.cloud.aiplatform.AutoMLVideoTrainingJob(display_name: str, prediction_type: str = 'classification', model_type: str = 'CLOUD', project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._TrainingJob
Constructs a AutoML Video Training Job.
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
prediction_type (str) – The type of prediction the Model is to produce, one of:
”classification” - A video classification model classifies shots
and segments in your videos according to your own defined labels.
”object_tracking” - A video object tracking model detects and tracks
multiple objects in shots and segments. You can use these models to track objects in your videos according to your own pre-defined, custom labels.
”action_recognition” - A video action reconition model pinpoints
the location of actions with short temporal durations (~1 second).
model_type – str = “CLOUD” Required. One of the following:
”CLOUD” - available for “classification”, “object_tracking” and “action_recognition”
A Model best tailored to be used within Google Cloud, and which cannot be exported.
”MOBILE_VERSATILE_1” - available for “classification”, “object_tracking” and “action_recognition”
A model that, in addition to being available within Google Cloud, can also be exported (see ModelService.ExportModel) as a TensorFlow or TensorFlow Lite model and used on a mobile or edge device with afterwards.
”MOBILE_CORAL_VERSATILE_1” - available only for “object_tracking”
A versatile model that is meant to be exported (see ModelService.ExportModel) and used on a Google Coral device.
”MOBILE_CORAL_LOW_LATENCY_1” - available only for “object_tracking”
A model that trades off quality for low latency, to be exported (see ModelService.ExportModel) and used on a Google Coral device.
”MOBILE_JETSON_VERSATILE_1” - available only for “object_tracking”
A versatile model that is meant to be exported (see ModelService.ExportModel) and used on an NVIDIA Jetson device.
”MOBILE_JETSON_LOW_LATENCY_1” - available only for “object_tracking”
A model that trades off quality for low latency, to be exported (see ModelService.ExportModel) and used on an NVIDIA Jetson device.
project (str) – Optional. Project to run training in. Overrides project set in aiplatform.init.
location (str) – Optional. Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
Raises
ValueError – When an invalid prediction_type and/or model_type is provided.
run(dataset: google.cloud.aiplatform.datasets.video_dataset.VideoDataset, training_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, sync: bool = True)
Runs the AutoML Image training job and returns a model.
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
`training_fraction_split`, and `test_fraction_split` may optionally
be provided, they must sum to up to 1. If none of the fractions are set,
by default roughly 80% of data will be used for training, and 20% for test.
Data filter splits:
Assigns input data to training, validation, and test sets
based on the given filters, data pieces not matched by any
filter are ignored. Currently only supported for Datasets
containing DataItems.
If any of the filters in this message are to match nothing, then
they can be set as ‘-’ (the minus sign).
If using filter splits, all of `training_filter_split`, `validation_filter_split` and
`test_filter_split` must be provided.
Supported only for unstructured Datasets.
Parameters
dataset (datasets.VideoDataset) – Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
training_fraction_split (float) – Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
test_fraction_split (float) – Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
training_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
test_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
model_display_name (str) – Optional. The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
sync – bool = True Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
Return type
model
Raises
RuntimeError – If Training job has already been run or is waiting to run.
class google.cloud.aiplatform.BatchPredictionJob(batch_prediction_job_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.jobs._Job
Retrieves a BatchPredictionJob resource and instantiates its representation.
Parameters
batch_prediction_job_name (str) – Required. A fully-qualified BatchPredictionJob resource name or ID. Example: “projects/…/locations/…/batchPredictionJobs/456” or “456” when project and location are initialized or passed.
project – Optional[str] = None, Optional project to retrieve BatchPredictionJob from. If not set, project set in aiplatform.init will be used.
location – Optional[str] = None, Optional location to retrieve BatchPredictionJob from. If not set, location set in aiplatform.init will be used.
credentials – Optional[auth_credentials.Credentials] = None, Custom credentials to use. If not set, credentials set in aiplatform.init will be used.
property completion_stats(: Optional[google.cloud.aiplatform_v1.types.completion_stats.CompletionStats )
Statistics on completed and failed prediction instances.
classmethod create(job_display_name: str, model_name: Union[str, google.cloud.aiplatform.models.Model], instances_format: str = 'jsonl', predictions_format: str = 'jsonl', gcs_source: Optional[Union[str, Sequence[str]]] = None, bigquery_source: Optional[str] = None, gcs_destination_prefix: Optional[str] = None, bigquery_destination_prefix: Optional[str] = None, model_parameters: Optional[Dict] = None, machine_type: Optional[str] = None, accelerator_type: Optional[str] = None, accelerator_count: Optional[int] = None, starting_replica_count: Optional[int] = None, max_replica_count: Optional[int] = None, generate_explanation: Optional[bool] = False, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, labels: Optional[Dict[str, str]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Create a batch prediction job.
Parameters
job_display_name (str) – Required. The user-defined name of the BatchPredictionJob. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
model_name (Union[str, **aiplatform.Model]) – Required. A fully-qualified model resource name or model ID. Example: “projects/123/locations/us-central1/models/456” or “456” when project and location are initialized or passed.
Or an instance of aiplatform.Model.
instances_format (str) – Required. The format in which instances are provided. Must be one of the formats listed in Model.supported_input_storage_formats. Default is “jsonl” when using gcs_source. If a bigquery_source is provided, this is overridden to “bigquery”.
predictions_format (str) – Required. The format in which Vertex AI outputs the predictions, must be one of the formats specified in Model.supported_output_storage_formats. Default is “jsonl” when using gcs_destination_prefix. If a bigquery_destination_prefix is provided, this is overridden to “bigquery”.
gcs_source (Optional[Sequence[str]]) – Google Cloud Storage URI(-s) to your instances to run batch prediction on. They must match instances_format. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames.
bigquery_source (Optional[str]) – BigQuery URI to a table, up to 2000 characters long. For example: bq://projectId.bqDatasetId.bqTableId
gcs_destination_prefix (Optional[str]) – The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is
prediction-<model-display-name>-<job-create-time>
, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it filespredictions_0001.<extension>
,predictions_0002.<extension>
, …,predictions_N.<extension>
are created where<extension>
depends on chosenpredictions_format
, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has bothinstance
andprediction
schemata defined then each such file contains predictions as per thepredictions_format
. If prediction for any instance failed (partially or completely), then an additionalerrors_0001.<extension>
,errors_0002.<extension>
,…,errors_N.<extension>
files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additionalerror
field which as value has\
google.rpc.Status<Status>\
__ containing onlycode
andmessage
fields.bigquery_destination_prefix (Optional[str]) – The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name
prediction_<model-display-name>_<job-create-time>
where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created,predictions
, anderrors
. If the Model has bothinstance
andprediction
schemata defined then the tables have columns as follows: Thepredictions
table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. Theerrors
table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has\
google.rpc.Status<Status>\
__ represented as a STRUCT, and containing onlycode
andmessage
.model_parameters (Optional[Dict]) – The parameters that govern the predictions. The schema of the parameters may be specified via the Model’s parameters_schema_uri.
machine_type (Optional[str]) – The type of machine for running batch prediction on dedicated resources. Not specifying machine type will result in batch prediction job being run with automatic resources.
accelerator_type (Optional[str]) – The type of accelerator(s) that may be attached to the machine as per accelerator_count. Only used if machine_type is set.
accelerator_count (Optional[int]) – The number of accelerators to attach to the machine_type. Only used if machine_type is set.
starting_replica_count (Optional[int]) – The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set.
max_replica_count (Optional[int]) – The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. Default is 10.
generate_explanation (bool) – Optional. Generate explanation along with the batch prediction results. This will cause the batch prediction output to include explanations based on the prediction_format:
* bigquery: output includes a column named explanation. The value is a struct that conforms to the [aiplatform.gapic.Explanation] object.
> * jsonl: The JSON objects on each line include an additional entry
> keyed explanation. The value of the entry is a JSON object that
> conforms to the [aiplatform.gapic.Explanation] object.
> * csv: Generating explanations for CSV format is not supported.
* **explanation_metadata** (*aiplatform.explain.ExplanationMetadata*) – Optional. Explanation metadata configuration for this BatchPredictionJob.
Can be specified only if generate_explanation is set to True.
This value overrides the value of Model.explanation_metadata.
All fields of explanation_metadata are optional in the request. If
a field of the explanation_metadata object is not populated, the
corresponding field of the Model.explanation_metadata object is inherited.
For more details, see Ref docs <http://tinyurl.com/1igh60kt>
* **explanation_parameters** (*aiplatform.explain.ExplanationParameters*) – Optional. Parameters to configure explaining for Model’s predictions.
Can be specified only if generate_explanation is set to True.
This value overrides the value of Model.explanation_parameters.
All fields of explanation_parameters are optional in the request. If
a field of the explanation_parameters object is not populated, the
corresponding field of the Model.explanation_parameters object is inherited.
For more details, see Ref docs <http://tinyurl.com/1an4zake>
* **labels** (*Dict**[*[*str*](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)*, *[*str*](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)*]*) – Optional. The labels with user-defined metadata to organize your
BatchPredictionJobs. Label keys and values can be no longer than
64 characters (Unicode codepoints), can only contain lowercase
letters, numeric characters, underscores and dashes.
International characters are allowed. See [https://goo.gl/xmQnxf](https://goo.gl/xmQnxf)
for more information and examples of labels.
* **credentials** (*Optional**[**auth_credentials.Credentials**]*) – Custom credentials to use to create this batch prediction
job. Overrides credentials set in aiplatform.init.
* **encryption_spec_key_name** (*Optional**[*[*str*](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)*]*) – Optional. The Cloud KMS resource identifier of the customer
managed encryption key used to protect the job. Has the
form:
`projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`.
The key needs to be in the same region as where the compute
resource is created.
If this is set, then all
resources created by the BatchPredictionJob will
be encrypted with the provided encryption key.
Overrides encryption_spec_key_name set in aiplatform.init.
* **sync** ([*bool*](https://python.readthedocs.io/en/latest/library/functions.html#bool)) – Whether to execute this method synchronously. If False, this method
will be executed in concurrent Future and any downstream object will
be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the created batch prediction job.
Return type
(jobs.BatchPredictionJob)
iter_outputs(bq_max_results: Optional[int] = 100)
Returns an Iterable object to traverse the output files, either a list of GCS Blobs or a BigQuery RowIterator depending on the output config set when the BatchPredictionJob was created.
Parameters
bq_max_results – Optional[int] = 100 Limit on rows to retrieve from prediction table in BigQuery dataset. Only used when retrieving predictions from a bigquery_destination_prefix. Default is 100.
Returns
Either a list of GCS Blob objects within the prediction output directory or an iterable BigQuery RowIterator with predictions.
Return type
Union[Iterable[storage.Blob], Iterable[bigquery.table.RowIterator]]
Raises
RuntimeError – If BatchPredictionJob is in a JobState other than SUCCEEDED, since outputs cannot be retrieved until the Job has finished.
NotImplementedError – If BatchPredictionJob succeeded and output_info does not have a GCS or BQ output provided.
property output_info(: Optional[google.cloud.aiplatform_v1.types.batch_prediction_job.BatchPredictionJob.OutputInfo )
Information describing the output of this job, including output location into which prediction output is written.
This is only available for batch predicition jobs that have run successfully.
property partial_failures(: Optional[Sequence[google.rpc.status_pb2.Status] )
Partial failures encountered. For example, single files that can’t be read. This field never exceeds 20 entries. Status details fields contain standard GCP error details.
wait_for_resource_creation()
Waits until resource has been created.
class google.cloud.aiplatform.CustomContainerTrainingJob(display_name: str, container_uri: str, command: Optional[Sequence[str]] = None, model_serving_container_image_uri: Optional[str] = None, model_serving_container_predict_route: Optional[str] = None, model_serving_container_health_route: Optional[str] = None, model_serving_container_command: Optional[Sequence[str]] = None, model_serving_container_args: Optional[Sequence[str]] = None, model_serving_container_environment_variables: Optional[Dict[str, str]] = None, model_serving_container_ports: Optional[Sequence[int]] = None, model_description: Optional[str] = None, model_instance_schema_uri: Optional[str] = None, model_parameters_schema_uri: Optional[str] = None, model_prediction_schema_uri: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._CustomTrainingJob
Class to launch a Custom Training Job in Vertex AI using a Container.
Constructs a Custom Container Training Job.
job = aiplatform.CustomTrainingJob(
display_name=’test-train’,
container_uri=’gcr.io/cloud-aiplatform/training/tf-cpu.2-2:latest’,
command=[‘python3’, ‘run_script.py’]
model_serving_container_image_uri=’gcr.io/my-trainer/serving:1’,
model_serving_container_predict_route=’predict’,
model_serving_container_health_route=’metadata,
labels={‘key’: ‘value’},
)
Usage with Dataset:
ds = aiplatform.TabularDataset(
‘projects/my-project/locations/us-central1/datasets/12345’)
job.run(
ds,
replica_count=1,
model_display_name=’my-trained-model’,
model_labels={‘key’: ‘value’},
)
Usage without Dataset:
job.run(replica_count=1, model_display_name=’my-trained-model)
TODO(b/169782082) add documentation about traning utilities To ensure your model gets saved in Vertex AI, write your saved model to os.environ[“AIP_MODEL_DIR”] in your provided training script.
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
container_uri (str) – Required: Uri of the training container image in the GCR.
command (Sequence[str]) – The command to be invoked when the container is started. It overrides the entrypoint instruction in Dockerfile when provided
model_serving_container_image_uri (str) – If the training produces a managed Vertex AI Model, the URI of the Model serving container suitable for serving the model produced by the training script.
model_serving_container_predict_route (str) – If the training produces a managed Vertex AI Model, An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.
model_serving_container_health_route (str) – If the training produces a managed Vertex AI Model, an HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by AI Platform.
model_serving_container_command (Sequence[str]) – The command with which the container is run. Not executed within a shell. The Docker image’s ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
model_serving_container_args (Sequence[str]) – The arguments to the command. The Docker image’s CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
model_serving_container_environment_variables (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.
model_serving_container_ports (Sequence[int]) – Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default “0.0.0.0” address inside a container will be accessible from the network.
model_description (str) – The description of the Model.
model_instance_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.model_parameters_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.model_prediction_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.project (str) – Project to run training in. Overrides project set in aiplatform.init.
location (str) – Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
staging_bucket (str) – Bucket used to stage source and training artifacts. Overrides staging_bucket set in aiplatform.init.
run(dataset: Optional[Union[google.cloud.aiplatform.datasets.image_dataset.ImageDataset, google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset, google.cloud.aiplatform.datasets.text_dataset.TextDataset, google.cloud.aiplatform.datasets.video_dataset.VideoDataset]] = None, annotation_schema_uri: Optional[str] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, base_output_dir: Optional[str] = None, service_account: Optional[str] = None, network: Optional[str] = None, bigquery_destination: Optional[str] = None, args: Optional[List[Union[float, int, str]]] = None, environment_variables: Optional[Dict[str, str]] = None, replica_count: int = 1, machine_type: str = 'n1-standard-4', accelerator_type: str = 'ACCELERATOR_TYPE_UNSPECIFIED', accelerator_count: int = 0, boot_disk_type: str = 'pd-ssd', boot_disk_size_gb: int = 100, reduction_server_replica_count: int = 0, reduction_server_machine_type: Optional[str] = None, reduction_server_container_uri: Optional[str] = None, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, validation_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, predefined_split_column_name: Optional[str] = None, timestamp_split_column_name: Optional[str] = None, timeout: Optional[int] = None, restart_job_on_worker_restart: bool = False, enable_web_access: bool = False, tensorboard: Optional[str] = None, sync=True)
Runs the custom training job.
Distributed Training Support: If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool. ie: replica_count = 10 will result in 1 chief and 9 workers All replicas have same machine_type, accelerator_type, and accelerator_count
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
Any of `training_fraction_split`, `validation_fraction_split` and
`test_fraction_split` may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Data filter splits:
Assigns input data to training, validation, and test sets
based on the given filters, data pieces not matched by any
filter are ignored. Currently only supported for Datasets
containing DataItems.
If any of the filters in this message are to match nothing, then
they can be set as ‘-’ (the minus sign).
If using filter splits, all of `training_filter_split`, `validation_filter_split` and
`test_filter_split` must be provided.
Supported only for unstructured Datasets.
Predefined splits:
Assigns input data to training, validation, and test sets based on the value of a provided key.
If using predefined splits, `predefined_split_column_name` must be provided.
Supported only for tabular Datasets.
Timestamp splits:
Assigns input data to training, validation, and test sets
based on a provided timestamps. The youngest data pieces are
assigned to training set, next to validation set, and the oldest
to the test set.
Supported only for tabular Datasets.
Parameters
dataset (Union[datasets.ImageDataset,datasets.TabularDataset,datasets.TextDataset,datasets.VideoDataset]) – Vertex AI to fit this training against. Custom training script should retrieve datasets through passed in environment variables uris:
os.environ[“AIP_TRAINING_DATA_URI”] os.environ[“AIP_VALIDATION_DATA_URI”] os.environ[“AIP_TEST_DATA_URI”]
Additionally the dataset format is passed in as:
os.environ[“AIP_DATA_FORMAT”]
annotation_schema_uri (str) – Google Cloud Storage URI points to a YAML file describing annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/, note that the chosen schema must be consistent with
metadata
of the Dataset specified bydataset_id
.Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.
When used in conjunction with
annotations_filter
, the Annotations used for training are filtered by bothannotations_filter
andannotation_schema_uri
.model_display_name (str) – If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
base_output_dir (str) – GCS output directory of job. If not provided a timestamped directory in the staging directory will be used.
Vertex AI sets the following environment variables when it runs your training code:
AIP_MODEL_DIR: a Cloud Storage URI of a directory intended for saving model artifacts, i.e. <base_output_dir>/model/
AIP_CHECKPOINT_DIR: a Cloud Storage URI of a directory intended for saving checkpoints, i.e. <base_output_dir>/checkpoints/
AIP_TENSORBOARD_LOG_DIR: a Cloud Storage URI of a directory intended for saving TensorBoard logs, i.e. <base_output_dir>/logs/
service_account (str) – Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
network (str) – The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
bigquery_destination (str) – Provide this field if dataset is a BiqQuery dataset. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data will be written into that dataset. In the dataset three tables will be created,training
,validation
andtest
.AIP_DATA_FORMAT = “bigquery”.
AIP_TRAINING_DATA_URI =”bigquery_destination.dataset_*.training”
AIP_VALIDATION_DATA_URI = “bigquery_destination.dataset_*.validation”
AIP_TEST_DATA_URI = “bigquery_destination.dataset_*.test”
args (List[Unions[str, *[int](https://python.readthedocs.io/en/latest/library/functions.html#int), [float](https://python.readthedocs.io/en/latest/library/functions.html#float)]*]) – Command line arguments to be passed to the Python script.
environment_variables (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Environment variables to be passed to the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names. At most 10 environment variables can be specified. The Name of the environment variable must be unique.
environment_variables = {
‘MY_KEY’: ‘MY_VALUE’
}
replica_count (int) – The number of worker replicas. If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool.
machine_type (str) – The type of machine to use for training.
accelerator_type (str) – Hardware accelerator type. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count (int) – The number of accelerators to attach to a worker replica.
boot_disk_type (str) – Type of the boot disk, default is pd-ssd. Valid values: pd-ssd (Persistent Disk Solid State Drive) or pd-standard (Persistent Disk Hard Disk Drive).
boot_disk_size_gb (int) – Size in GB of the boot disk, default is 100GB. boot disk size must be within the range of [100, 64000].
reduction_server_replica_count (int) – The number of reduction server replicas, default is 0.
reduction_server_machine_type (str) – Optional. The type of machine to use for reduction server.
reduction_server_container_uri (str) – Optional. The Uri of the reduction server container image. See details: https://cloud.google.com/vertex-ai/docs/training/distributed-training#reduce_training_time_with_reduction_server
training_fraction_split (float) – Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
validation_fraction_split (float) – Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
test_fraction_split (float) – Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
training_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
validation_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
test_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
predefined_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
training
,validation
,test
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.Supported only for tabular and time series Datasets.
timestamp_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.
Supported only for tabular and time series Datasets.
timeout (int) – The maximum job running time in seconds. The default is 7 days.
restart_job_on_worker_restart (bool) – Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
enable_web_access (bool) – Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
tensorboard (str) – Optional. The name of a Vertex AI [Tensorboard][google.cloud.aiplatform.v1beta1.Tensorboard] resource to which this CustomJob will upload Tensorboard logs. Format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
The training script should write Tensorboard to following Vertex AI environment variable:
AIP_TENSORBOARD_LOG_DIR
service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
Return type
model
Raises
RuntimeError – If Training job has already been run, staging_bucket has not been set, or model_display_name was provided but required arguments were not provided in constructor.
class google.cloud.aiplatform.CustomJob(display_name: str, worker_pool_specs: Union[List[Dict], List[google.cloud.aiplatform_v1.types.custom_job.WorkerPoolSpec]], base_output_dir: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None)
Bases: google.cloud.aiplatform.jobs._RunnableJob
Vertex AI Custom Job.
Cosntruct a Custom Job with Worker Pool Specs.
``
` Example usage: worker_pool_specs = [
{
“machine_spec”: { “machine_type”: “n1-standard-4”, “accelerator_type”: “NVIDIA_TESLA_K80”, “accelerator_count”: 1, }, “replica_count”: 1, “container_spec”: { > “image_uri”: container_image_uri, > “command”: [], > “args”: [], },
}
]
my_job = aiplatform.CustomJob(
display_name=’my_job’,
worker_pool_specs=worker_pool_specs,
labels={‘my_key’: ‘my_value’},
)
my_job.run()
``
`
For more information on configuring worker pool specs please visit: https://cloud.google.com/ai-platform-unified/docs/training/create-custom-job
Parameters
display_name (str) – Required. The user-defined name of the HyperparameterTuningJob. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
worker_pool_specs (Union[List[Dict], **List[aiplatform.gapic.WorkerPoolSpec]]) – Required. The spec of the worker pools including machine type and Docker image. Can provided as a list of dictionaries or list of WorkerPoolSpec proto messages.
base_output_dir (str) – Optional. GCS output directory of job. If not provided a timestamped directory in the staging directory will be used.
project (str) – Optional.Project to run the custom job in. Overrides project set in aiplatform.init.
location (str) – Optional.Location to run the custom job in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional.Custom credentials to use to run call custom job service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize CustomJobs. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
encryption_spec_key_name (str) – Optional.Customer-managed encryption key name for a CustomJob. If this is set, then all resources created by the CustomJob will be encrypted with the provided encryption key.
staging_bucket (str) – Optional. Bucket for produced custom job artifacts. Overrides staging_bucket set in aiplatform.init.
Raises
RuntimeError – If staging bucket was not set using aiplatform.init and a staging
bucket was not passed in. –
classmethod from_local_script(display_name: str, script_path: str, container_uri: str, args: Optional[Sequence[str]] = None, requirements: Optional[Sequence[str]] = None, environment_variables: Optional[Dict[str, str]] = None, replica_count: int = 1, machine_type: str = 'n1-standard-4', accelerator_type: str = 'ACCELERATOR_TYPE_UNSPECIFIED', accelerator_count: int = 0, boot_disk_type: str = 'pd-ssd', boot_disk_size_gb: int = 100, reduction_server_replica_count: int = 0, reduction_server_machine_type: Optional[str] = None, reduction_server_container_uri: Optional[str] = None, base_output_dir: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None)
Configures a custom job from a local script.
Example usage:
``
` job = aiplatform.CustomJob.from_local_script(
display_name=”my-custom-job”, script_path=”training_script.py”, container_uri=”gcr.io/cloud-aiplatform/training/tf-cpu.2-2:latest”, requirements=[“gcsfs==0.7.1”], replica_count=1, args=[’–dataset’, ‘gs://my-bucket/my-dataset’, ‘–model_output_uri’, ‘gs://my-bucket/model’] labels={‘my_key’: ‘my_value’},
)
job.run()
``
`
Parameters
display_name (str) – Required. The user-defined name of this CustomJob.
script_path (str) – Required. Local path to training script.
container_uri (str) – Required: Uri of the training container image to use for custom job.
args (Optional[Sequence[str]]) – Optional. Command line arguments to be passed to the Python task.
requirements (Sequence[str]) – Optional. List of python packages dependencies of script.
environment_variables (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Environment variables to be passed to the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names. At most 10 environment variables can be specified. The Name of the environment variable must be unique.
environment_variables = {
‘MY_KEY’: ‘MY_VALUE’
}
replica_count (int) – Optional. The number of worker replicas. If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool.
machine_type (str) – Optional. The type of machine to use for training.
accelerator_type (str) – Optional. Hardware accelerator type. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count (int) – Optional. The number of accelerators to attach to a worker replica.
boot_disk_type (str) – Optional. Type of the boot disk, default is pd-ssd. Valid values: pd-ssd (Persistent Disk Solid State Drive) or pd-standard (Persistent Disk Hard Disk Drive).
boot_disk_size_gb (int) – Optional. Size in GB of the boot disk, default is 100GB. boot disk size must be within the range of [100, 64000].
reduction_server_replica_count (int) – The number of reduction server replicas, default is 0.
reduction_server_machine_type (str) – Optional. The type of machine to use for reduction server.
reduction_server_container_uri (str) – Optional. The Uri of the reduction server container image. See details: https://cloud.google.com/vertex-ai/docs/training/distributed-training#reduce_training_time_with_reduction_server
base_output_dir (str) – Optional. GCS output directory of job. If not provided a timestamped directory in the staging directory will be used.
project (str) – Optional. Project to run the custom job in. Overrides project set in aiplatform.init.
location (str) – Optional. Location to run the custom job in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to run call custom job service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize CustomJobs. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
encryption_spec_key_name (str) – Optional. Customer-managed encryption key name for a CustomJob. If this is set, then all resources created by the CustomJob will be encrypted with the provided encryption key.
staging_bucket (str) – Optional. Bucket for produced custom job artifacts. Overrides staging_bucket set in aiplatform.init.
Raises
RuntimeError – If staging bucket was not set using aiplatform.init and a staging
bucket was not passed in. –
property network(: Optional[str )
The full name of the Google Compute Engine network to which this CustomJob should be peered.
Takes the format projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.
Private services access must already be configured for the network. If left unspecified, the CustomJob is not peered with any network.
run(service_account: Optional[str] = None, network: Optional[str] = None, timeout: Optional[int] = None, restart_job_on_worker_restart: bool = False, enable_web_access: bool = False, tensorboard: Optional[str] = None, sync: bool = True)
Run this configured CustomJob.
Parameters
service_account (str) – Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
network (str) – Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
timeout (int) – The maximum job running time in seconds. The default is 7 days.
restart_job_on_worker_restart (bool) – Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
enable_web_access (bool) – Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
tensorboard (str) – Optional. The name of a Vertex AI [Tensorboard][google.cloud.aiplatform.v1beta1.Tensorboard] resource to which this CustomJob will upload Tensorboard logs. Format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
The training script should write Tensorboard to following Vertex AI environment variable:
AIP_TENSORBOARD_LOG_DIR
service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
sync (bool) – Whether to execute this method synchronously. If False, this method will unblock and it will be executed in a concurrent Future.
class google.cloud.aiplatform.CustomPythonPackageTrainingJob(display_name: str, python_package_gcs_uri: str, python_module_name: str, container_uri: str, model_serving_container_image_uri: Optional[str] = None, model_serving_container_predict_route: Optional[str] = None, model_serving_container_health_route: Optional[str] = None, model_serving_container_command: Optional[Sequence[str]] = None, model_serving_container_args: Optional[Sequence[str]] = None, model_serving_container_environment_variables: Optional[Dict[str, str]] = None, model_serving_container_ports: Optional[Sequence[int]] = None, model_description: Optional[str] = None, model_instance_schema_uri: Optional[str] = None, model_parameters_schema_uri: Optional[str] = None, model_prediction_schema_uri: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._CustomTrainingJob
Class to launch a Custom Training Job in Vertex AI using a Python Package.
Takes a training implementation as a python package and executes that package in Cloud Vertex AI Training.
Constructs a Custom Training Job from a Python Package.
job = aiplatform.CustomPythonPackageTrainingJob(
display_name=’test-train’,
python_package_gcs_uri=’gs://my-bucket/my-python-package.tar.gz’,
python_module_name=’my-training-python-package.task’,
container_uri=’gcr.io/cloud-aiplatform/training/tf-cpu.2-2:latest’,
model_serving_container_image_uri=’gcr.io/my-trainer/serving:1’,
model_serving_container_predict_route=’predict’,
model_serving_container_health_route=’metadata,
labels={‘key’: ‘value’},
)
Usage with Dataset:
ds = aiplatform.TabularDataset(
‘projects/my-project/locations/us-central1/datasets/12345’
)
job.run(
ds, replica_count=1, model_display_name=’my-trained-model’, model_labels={‘key’: ‘value’},
)
Usage without Dataset:
job.run(
replica_count=1, model_display_name=’my-trained-model’, model_labels={‘key’: ‘value’},
)
To ensure your model gets saved in Vertex AI, write your saved model to os.environ[“AIP_MODEL_DIR”] in your provided training script.
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
python_package_gcs_uri (str) – Required: GCS location of the training python package.
python_module_name (str) – Required: The module name of the training python package.
container_uri (str) – Required: Uri of the training container image in the GCR.
model_serving_container_image_uri (str) – If the training produces a managed Vertex AI Model, the URI of the Model serving container suitable for serving the model produced by the training script.
model_serving_container_predict_route (str) – If the training produces a managed Vertex AI Model, An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.
model_serving_container_health_route (str) – If the training produces a managed Vertex AI Model, an HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by AI Platform.
model_serving_container_command (Sequence[str]) – The command with which the container is run. Not executed within a shell. The Docker image’s ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
model_serving_container_args (Sequence[str]) – The arguments to the command. The Docker image’s CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
model_serving_container_environment_variables (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.
model_serving_container_ports (Sequence[int]) – Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default “0.0.0.0” address inside a container will be accessible from the network.
model_description (str) – The description of the Model.
model_instance_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.model_parameters_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.model_prediction_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.project (str) – Project to run training in. Overrides project set in aiplatform.init.
location (str) – Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
staging_bucket (str) – Bucket used to stage source and training artifacts. Overrides staging_bucket set in aiplatform.init.
run(dataset: Optional[Union[google.cloud.aiplatform.datasets.image_dataset.ImageDataset, google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset, google.cloud.aiplatform.datasets.text_dataset.TextDataset, google.cloud.aiplatform.datasets.video_dataset.VideoDataset]] = None, annotation_schema_uri: Optional[str] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, base_output_dir: Optional[str] = None, service_account: Optional[str] = None, network: Optional[str] = None, bigquery_destination: Optional[str] = None, args: Optional[List[Union[float, int, str]]] = None, environment_variables: Optional[Dict[str, str]] = None, replica_count: int = 1, machine_type: str = 'n1-standard-4', accelerator_type: str = 'ACCELERATOR_TYPE_UNSPECIFIED', accelerator_count: int = 0, boot_disk_type: str = 'pd-ssd', boot_disk_size_gb: int = 100, reduction_server_replica_count: int = 0, reduction_server_machine_type: Optional[str] = None, reduction_server_container_uri: Optional[str] = None, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, validation_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, predefined_split_column_name: Optional[str] = None, timestamp_split_column_name: Optional[str] = None, timeout: Optional[int] = None, restart_job_on_worker_restart: bool = False, enable_web_access: bool = False, tensorboard: Optional[str] = None, sync=True)
Runs the custom training job.
Distributed Training Support: If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool. ie: replica_count = 10 will result in 1 chief and 9 workers All replicas have same machine_type, accelerator_type, and accelerator_count
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
Any of `training_fraction_split`, `validation_fraction_split` and
`test_fraction_split` may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Data filter splits:
Assigns input data to training, validation, and test sets
based on the given filters, data pieces not matched by any
filter are ignored. Currently only supported for Datasets
containing DataItems.
If any of the filters in this message are to match nothing, then
they can be set as ‘-’ (the minus sign).
If using filter splits, all of `training_filter_split`, `validation_filter_split` and
`test_filter_split` must be provided.
Supported only for unstructured Datasets.
Predefined splits:
Assigns input data to training, validation, and test sets based on the value of a provided key.
If using predefined splits, `predefined_split_column_name` must be provided.
Supported only for tabular Datasets.
Timestamp splits:
Assigns input data to training, validation, and test sets
based on a provided timestamps. The youngest data pieces are
assigned to training set, next to validation set, and the oldest
to the test set.
Supported only for tabular Datasets.
Parameters
dataset (Union[datasets.ImageDataset,datasets.TabularDataset,datasets.TextDataset,datasets.VideoDataset,]) – Vertex AI to fit this training against. Custom training script should retrieve datasets through passed in environment variables uris:
os.environ[“AIP_TRAINING_DATA_URI”] os.environ[“AIP_VALIDATION_DATA_URI”] os.environ[“AIP_TEST_DATA_URI”]
Additionally the dataset format is passed in as:
os.environ[“AIP_DATA_FORMAT”]
annotation_schema_uri (str) – Google Cloud Storage URI points to a YAML file describing annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/, note that the chosen schema must be consistent with
metadata
of the Dataset specified bydataset_id
.Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.
When used in conjunction with
annotations_filter
, the Annotations used for training are filtered by bothannotations_filter
andannotation_schema_uri
.model_display_name (str) – If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
base_output_dir (str) – GCS output directory of job. If not provided a timestamped directory in the staging directory will be used.
Vertex AI sets the following environment variables when it runs your training code:
AIP_MODEL_DIR: a Cloud Storage URI of a directory intended for saving model artifacts, i.e. <base_output_dir>/model/
AIP_CHECKPOINT_DIR: a Cloud Storage URI of a directory intended for saving checkpoints, i.e. <base_output_dir>/checkpoints/
AIP_TENSORBOARD_LOG_DIR: a Cloud Storage URI of a directory intended for saving TensorBoard logs, i.e. <base_output_dir>/logs/
service_account (str) – Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
network (str) – The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
bigquery_destination (str) – Provide this field if dataset is a BiqQuery dataset. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data will be written into that dataset. In the dataset three tables will be created,training
,validation
andtest
.AIP_DATA_FORMAT = “bigquery”.
AIP_TRAINING_DATA_URI =”bigquery_destination.dataset_*.training”
AIP_VALIDATION_DATA_URI = “bigquery_destination.dataset_*.validation”
AIP_TEST_DATA_URI = “bigquery_destination.dataset_*.test”
args (List[Unions[str, *[int](https://python.readthedocs.io/en/latest/library/functions.html#int), [float](https://python.readthedocs.io/en/latest/library/functions.html#float)]*]) – Command line arguments to be passed to the Python script.
environment_variables (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Environment variables to be passed to the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names. At most 10 environment variables can be specified. The Name of the environment variable must be unique.
environment_variables = {
‘MY_KEY’: ‘MY_VALUE’
}
replica_count (int) – The number of worker replicas. If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool.
machine_type (str) – The type of machine to use for training.
accelerator_type (str) – Hardware accelerator type. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count (int) – The number of accelerators to attach to a worker replica.
boot_disk_type (str) – Type of the boot disk, default is pd-ssd. Valid values: pd-ssd (Persistent Disk Solid State Drive) or pd-standard (Persistent Disk Hard Disk Drive).
boot_disk_size_gb (int) – Size in GB of the boot disk, default is 100GB. boot disk size must be within the range of [100, 64000].
reduction_server_replica_count (int) – The number of reduction server replicas, default is 0.
reduction_server_machine_type (str) – Optional. The type of machine to use for reduction server.
reduction_server_container_uri (str) – Optional. The Uri of the reduction server container image. See details: https://cloud.google.com/vertex-ai/docs/training/distributed-training#reduce_training_time_with_reduction_server
training_fraction_split (float) – Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
validation_fraction_split (float) – Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
test_fraction_split (float) – Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
training_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
validation_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
test_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
predefined_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
training
,validation
,test
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.Supported only for tabular and time series Datasets.
timestamp_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.
Supported only for tabular and time series Datasets.
timeout (int) – The maximum job running time in seconds. The default is 7 days.
restart_job_on_worker_restart (bool) – Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
enable_web_access (bool) – Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
tensorboard (str) – Optional. The name of a Vertex AI [Tensorboard][google.cloud.aiplatform.v1beta1.Tensorboard] resource to which this CustomJob will upload Tensorboard logs. Format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
The training script should write Tensorboard to following Vertex AI environment variable:
AIP_TENSORBOARD_LOG_DIR
service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
Return type
model
class google.cloud.aiplatform.CustomTrainingJob(display_name: str, script_path: str, container_uri: str, requirements: Optional[Sequence[str]] = None, model_serving_container_image_uri: Optional[str] = None, model_serving_container_predict_route: Optional[str] = None, model_serving_container_health_route: Optional[str] = None, model_serving_container_command: Optional[Sequence[str]] = None, model_serving_container_args: Optional[Sequence[str]] = None, model_serving_container_environment_variables: Optional[Dict[str, str]] = None, model_serving_container_ports: Optional[Sequence[int]] = None, model_description: Optional[str] = None, model_instance_schema_uri: Optional[str] = None, model_parameters_schema_uri: Optional[str] = None, model_prediction_schema_uri: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None)
Bases: google.cloud.aiplatform.training_jobs._CustomTrainingJob
Class to launch a Custom Training Job in Vertex AI using a script.
Takes a training implementation as a python script and executes that script in Cloud Vertex AI Training.
Constructs a Custom Training Job from a Python script.
job = aiplatform.CustomTrainingJob(
display_name=’test-train’,
script_path=’test_script.py’,
requirements=[‘pandas’, ‘numpy’],
container_uri=’gcr.io/cloud-aiplatform/training/tf-cpu.2-2:latest’,
model_serving_container_image_uri=’gcr.io/my-trainer/serving:1’,
model_serving_container_predict_route=’predict’,
model_serving_container_health_route=’metadata,
labels={‘key’: ‘value’},
)
Usage with Dataset:
ds = aiplatform.TabularDataset(
‘projects/my-project/locations/us-central1/datasets/12345’)
job.run(
ds,
replica_count=1,
model_display_name=’my-trained-model’,
model_labels={‘key’: ‘value’},
)
Usage without Dataset:
job.run(replica_count=1, model_display_name=’my-trained-model)
TODO(b/169782082) add documentation about traning utilities To ensure your model gets saved in Vertex AI, write your saved model to os.environ[“AIP_MODEL_DIR”] in your provided training script.
Parameters
display_name (str) – Required. The user-defined name of this TrainingPipeline.
script_path (str) – Required. Local path to training script.
container_uri (str) – Required: Uri of the training container image in the GCR.
requirements (Sequence[str]) – List of python packages dependencies of script.
model_serving_container_image_uri (str) – If the training produces a managed Vertex AI Model, the URI of the Model serving container suitable for serving the model produced by the training script.
model_serving_container_predict_route (str) – If the training produces a managed Vertex AI Model, An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.
model_serving_container_health_route (str) – If the training produces a managed Vertex AI Model, an HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by AI Platform.
model_serving_container_command (Sequence[str]) – The command with which the container is run. Not executed within a shell. The Docker image’s ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
model_serving_container_args (Sequence[str]) – The arguments to the command. The Docker image’s CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
model_serving_container_environment_variables (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.
model_serving_container_ports (Sequence[int]) – Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default “0.0.0.0” address inside a container will be accessible from the network.
model_description (str) – The description of the Model.
model_instance_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.model_parameters_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.model_prediction_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.project (str) – Project to run training in. Overrides project set in aiplatform.init.
location (str) – Location to run training in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
training_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
model_encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
staging_bucket (str) – Bucket used to stage source and training artifacts. Overrides staging_bucket set in aiplatform.init.
run(dataset: Optional[Union[google.cloud.aiplatform.datasets.image_dataset.ImageDataset, google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset, google.cloud.aiplatform.datasets.text_dataset.TextDataset, google.cloud.aiplatform.datasets.video_dataset.VideoDataset]] = None, annotation_schema_uri: Optional[str] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, base_output_dir: Optional[str] = None, service_account: Optional[str] = None, network: Optional[str] = None, bigquery_destination: Optional[str] = None, args: Optional[List[Union[float, int, str]]] = None, environment_variables: Optional[Dict[str, str]] = None, replica_count: int = 1, machine_type: str = 'n1-standard-4', accelerator_type: str = 'ACCELERATOR_TYPE_UNSPECIFIED', accelerator_count: int = 0, boot_disk_type: str = 'pd-ssd', boot_disk_size_gb: int = 100, reduction_server_replica_count: int = 0, reduction_server_machine_type: Optional[str] = None, reduction_server_container_uri: Optional[str] = None, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, validation_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, predefined_split_column_name: Optional[str] = None, timestamp_split_column_name: Optional[str] = None, timeout: Optional[int] = None, restart_job_on_worker_restart: bool = False, enable_web_access: bool = False, tensorboard: Optional[str] = None, sync=True)
Runs the custom training job.
Distributed Training Support: If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool. ie: replica_count = 10 will result in 1 chief and 9 workers All replicas have same machine_type, accelerator_type, and accelerator_count
If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
Any of `training_fraction_split`, `validation_fraction_split` and
`test_fraction_split` may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Data filter splits:
Assigns input data to training, validation, and test sets
based on the given filters, data pieces not matched by any
filter are ignored. Currently only supported for Datasets
containing DataItems.
If any of the filters in this message are to match nothing, then
they can be set as ‘-’ (the minus sign).
If using filter splits, all of `training_filter_split`, `validation_filter_split` and
`test_filter_split` must be provided.
Supported only for unstructured Datasets.
Predefined splits:
Assigns input data to training, validation, and test sets based on the value of a provided key.
If using predefined splits, `predefined_split_column_name` must be provided.
Supported only for tabular Datasets.
Timestamp splits:
Assigns input data to training, validation, and test sets
based on a provided timestamps. The youngest data pieces are
assigned to training set, next to validation set, and the oldest
to the test set.
Supported only for tabular Datasets.
Parameters
( (dataset) – Union[
datasets.ImageDataset, datasets.TabularDataset, datasets.TextDataset, datasets.VideoDataset,
]
) – Vertex AI to fit this training against. Custom training script should retrieve datasets through passed in environment variables uris:
os.environ[“AIP_TRAINING_DATA_URI”] os.environ[“AIP_VALIDATION_DATA_URI”] os.environ[“AIP_TEST_DATA_URI”]
Additionally the dataset format is passed in as:
os.environ[“AIP_DATA_FORMAT”]
annotation_schema_uri (str) – Google Cloud Storage URI points to a YAML file describing annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/, note that the chosen schema must be consistent with
metadata
of the Dataset specified bydataset_id
.Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.
When used in conjunction with
annotations_filter
, the Annotations used for training are filtered by bothannotations_filter
andannotation_schema_uri
.model_display_name (str) – If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
model_labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
base_output_dir (str) – GCS output directory of job. If not provided a timestamped directory in the staging directory will be used.
Vertex AI sets the following environment variables when it runs your training code:
AIP_MODEL_DIR: a Cloud Storage URI of a directory intended for saving model artifacts, i.e. <base_output_dir>/model/
AIP_CHECKPOINT_DIR: a Cloud Storage URI of a directory intended for saving checkpoints, i.e. <base_output_dir>/checkpoints/
AIP_TENSORBOARD_LOG_DIR: a Cloud Storage URI of a directory intended for saving TensorBoard logs, i.e. <base_output_dir>/logs/
service_account (str) – Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
network (str) – The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
bigquery_destination (str) – Provide this field if dataset is a BiqQuery dataset. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data will be written into that dataset. In the dataset three tables will be created,training
,validation
andtest
.AIP_DATA_FORMAT = “bigquery”.
AIP_TRAINING_DATA_URI =”bigquery_destination.dataset_*.training”
AIP_VALIDATION_DATA_URI = “bigquery_destination.dataset_*.validation”
AIP_TEST_DATA_URI = “bigquery_destination.dataset_*.test”
args (List[Unions[str, *[int](https://python.readthedocs.io/en/latest/library/functions.html#int), [float](https://python.readthedocs.io/en/latest/library/functions.html#float)]*]) – Command line arguments to be passed to the Python script.
environment_variables (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Environment variables to be passed to the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names. At most 10 environment variables can be specified. The Name of the environment variable must be unique.
environment_variables = {
‘MY_KEY’: ‘MY_VALUE’
}
replica_count (int) – The number of worker replicas. If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool.
machine_type (str) – The type of machine to use for training.
accelerator_type (str) – Hardware accelerator type. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count (int) – The number of accelerators to attach to a worker replica.
boot_disk_type (str) – Type of the boot disk, default is pd-ssd. Valid values: pd-ssd (Persistent Disk Solid State Drive) or pd-standard (Persistent Disk Hard Disk Drive).
boot_disk_size_gb (int) – Size in GB of the boot disk, default is 100GB. boot disk size must be within the range of [100, 64000].
reduction_server_replica_count (int) – The number of reduction server replicas, default is 0.
reduction_server_machine_type (str) – Optional. The type of machine to use for reduction server.
reduction_server_container_uri (str) – Optional. The Uri of the reduction server container image. See details: https://cloud.google.com/vertex-ai/docs/training/distributed-training#reduce_training_time_with_reduction_server
training_fraction_split (float) – Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
validation_fraction_split (float) – Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
test_fraction_split (float) – Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
training_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
validation_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
test_filter_split (str) – Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
predefined_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
training
,validation
,test
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.Supported only for tabular and time series Datasets.
timestamp_split_column_name (str) – Optional. The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.
Supported only for tabular and time series Datasets.
timeout (int) – The maximum job running time in seconds. The default is 7 days.
restart_job_on_worker_restart (bool) – Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
enable_web_access (bool) – Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
tensorboard (str) – Optional. The name of a Vertex AI [Tensorboard][google.cloud.aiplatform.v1beta1.Tensorboard] resource to which this CustomJob will upload Tensorboard logs. Format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
The training script should write Tensorboard to following Vertex AI environment variable:
AIP_TENSORBOARD_LOG_DIR
service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
Return type
model
class google.cloud.aiplatform.Endpoint(endpoint_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager
Retrieves an endpoint resource.
Parameters
endpoint_name (str) – Required. A fully-qualified endpoint resource name or endpoint ID. Example: “projects/123/locations/us-central1/endpoints/456” or “456” when project and location are initialized or passed.
project (str) – Optional. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
classmethod create(display_name: str, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, metadata: Optional[Sequence[Tuple[str, str]]] = (), project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, encryption_spec_key_name: Optional[str] = None, sync=True)
Creates a new endpoint.
Parameters
display_name (str) – Required. The user-defined name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
project (str) – Required. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.
location (str) – Required. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.
description (str) – Optional. The description of the Endpoint.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Created endpoint.
Return type
endpoint (endpoint.Endpoint)
delete(force: bool = False, sync: bool = True)
Deletes this Vertex AI Endpoint resource. If force is set to True, all models on this Endpoint will be undeployed prior to deletion.
Parameters
force (bool) – Required. If force is set to True, all deployed models on this Endpoint will be undeployed first. Default is False.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Raises
FailedPrecondition – If models are deployed on this Endpoint and force = False.
deploy(model: google.cloud.aiplatform.models.Model, deployed_model_display_name: Optional[str] = None, traffic_percentage: int = 0, traffic_split: Optional[Dict[str, int]] = None, machine_type: Optional[str] = None, min_replica_count: int = 1, max_replica_count: int = 1, accelerator_type: Optional[str] = None, accelerator_count: Optional[int] = None, service_account: Optional[str] = None, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, metadata: Optional[Sequence[Tuple[str, str]]] = (), sync=True)
Deploys a Model to the Endpoint.
Parameters
model (aiplatform.Model) – Required. Model to be deployed.
deployed_model_display_name (str) – Optional. The display name of the DeployedModel. If not provided upon creation, the Model’s display_name is used.
traffic_percentage (int) – Optional. Desired traffic to newly deployed model. Defaults to 0 if there are pre-existing deployed models. Defaults to 100 if there are no pre-existing deployed models. Negative values should not be provided. Traffic of previously deployed models at the endpoint will be scaled down to accommodate new deployed model’s traffic. Should not be provided if traffic_split is provided.
traffic_split (Dict[str, *[int](https://python.readthedocs.io/en/latest/library/functions.html#int)]*) – Optional. A map from a DeployedModel’s ID to the percentage of this Endpoint’s traffic that should be forwarded to that DeployedModel. If a DeployedModel’s ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. Key for model being deployed is “0”. Should not be provided if traffic_percentage is provided.
machine_type (str) – Optional. The type of machine. Not specifying machine type will result in model to be deployed with automatic resources.
min_replica_count (int) – Optional. The minimum number of machine replicas this deployed model will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
max_replica_count (int) – Optional. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, the larger value of min_replica_count or 1 will be used. If value provided is smaller than min_replica_count, it will automatically be increased to be min_replica_count.
accelerator_type (str) – Optional. Hardware accelerator type. Must also set accelerator_count if used. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count (int) – Optional. The number of accelerators to attach to a worker replica.
service_account (str) – The service account that the DeployedModel’s container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn’t have access to the resource project. Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.
explanation_metadata (explain.ExplanationMetadata) – Optional. Metadata describing the Model’s input and output for explanation. Both explanation_metadata and explanation_parameters must be passed together when used. For more details, see Ref docs http://tinyurl.com/1igh60kt
explanation_parameters (explain.ExplanationParameters) – Optional. Parameters to configure explaining for Model’s predictions. For more details, see Ref docs http://tinyurl.com/1an4zake
metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
explain(instances: List[Dict], parameters: Optional[Dict] = None, deployed_model_id: Optional[str] = None)
Make a prediction with explanations against this Endpoint.
Example usage:
response = my_endpoint.explain(instances=[…])
my_explanations = response.explanations
Parameters
instances (List) – Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint’s DeployedModels’ [Model’s][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata’s][google.cloud.aiplatform.v1beta1.Model.predict_schemata]
instance_schema_uri
.parameters (Dict) – The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint’s DeployedModels’ [Model’s ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata’s][google.cloud.aiplatform.v1beta1.Model.predict_schemata]
parameters_schema_uri
.deployed_model_id (str) – Optional. If specified, this ExplainRequest will be served by the chosen DeployedModel, overriding this Endpoint’s traffic split.
Returns
Prediction with returned predictions, explanations and Model Id.
Return type
prediction
classmethod list(filter: Optional[str] = None, order_by: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
List all Endpoint resource instances.
Example Usage:
aiplatform.Endpoint.list(
filter=’labels.my_label=”my_label_value” OR display_name=!”old_endpoint”’,
)
Parameters
filter (str) – Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending. Supported fields: display_name, create_time, update_time
project (str) – Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.
Returns
List[models.Endpoint] - A list of Endpoint resource objects
list_models()
Returns a list of the models deployed to this Endpoint.
Returns
A list of the models deployed in this Endpoint.
Return type
deployed_models (Sequence[aiplatform.gapic.DeployedModel])
property network(: Optional[str )
The full name of the Google Compute Engine network to which this Endpoint should be peered.
Takes the format projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.
Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.
predict(instances: List, parameters: Optional[Dict] = None)
Make a prediction against this Endpoint.
Parameters
instances (List) – Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint’s DeployedModels’ [Model’s][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata’s][google.cloud.aiplatform.v1beta1.Model.predict_schemata]
instance_schema_uri
.parameters (Dict) – The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint’s DeployedModels’ [Model’s ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata’s][google.cloud.aiplatform.v1beta1.Model.predict_schemata]
parameters_schema_uri
.
Returns
Prediction with returned predictions and Model Id.
Return type
prediction
property traffic_split(: Dict[str, int )
A map from a DeployedModel’s ID to the percentage of this Endpoint’s traffic that should be forwarded to that DeployedModel.
If a DeployedModel’s ID is not listed in this map, then it receives no traffic.
The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.
undeploy(deployed_model_id: str, traffic_split: Optional[Dict[str, int]] = None, metadata: Optional[Sequence[Tuple[str, str]]] = (), sync=True)
Undeploys a deployed model.
The model to be undeployed should have no traffic or user must provide a new traffic_split with the remaining deployed models. Refer to Endpoint.traffic_split for the current traffic split mapping.
Parameters
deployed_model_id (str) – Required. The ID of the DeployedModel to be undeployed from the Endpoint.
traffic_split (Dict[str, *[int](https://python.readthedocs.io/en/latest/library/functions.html#int)]*) – Optional. A map of DeployedModel IDs to the percentage of this Endpoint’s traffic that should be forwarded to that DeployedModel. Required if undeploying a model with non-zero traffic from an Endpoint with multiple deployed models. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. If a DeployedModel’s ID is not listed in this map, then it receives no traffic.
metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
undeploy_all(sync: bool = True)
Undeploys every model deployed to this Endpoint.
Parameters
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
class google.cloud.aiplatform.EntityType(entity_type_name: str, featurestore_id: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager
Managed entityType resource for Vertex AI.
Retrieves an existing managed entityType given an entityType resource name or an entity_type ID.
Example Usage:
my_entity_type = aiplatform.EntityType(
entity_type_name=’projects/123/locations/us-central1/featurestores/my_featurestore_id/ entityTypes/my_entity_type_id’
) or my_entity_type = aiplatform.EntityType(
entity_type_name=’my_entity_type_id’, featurestore_id=’my_featurestore_id’,
)
Parameters
entity_type_name (str) – Required. A fully-qualified entityType resource name or an entity_type ID. Example: “projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id” or “my_entity_type_id” when project and location are initialized or passed, with featurestore_id passed.
featurestore_id (str) – Optional. Featurestore ID of an existing featurestore to retrieve entityType from, when entity_type_name is passed as entity_type ID.
project (str) – Optional. Project to retrieve entityType from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve entityType from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve this EntityType. Overrides credentials set in aiplatform.init.
batch_create_features(feature_configs: Dict[str, Dict[str, Union[bool, int, Dict[str, str], str]]], request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Batch creates Feature resources in this EntityType.
Example Usage:
my_entity_type = aiplatform.EntityType(
entity_type_name=’my_entity_type_id’, featurestore_id=’my_featurestore_id’,
) my_entity_type.batch_create_features(
feature_configs={
“my_feature_id1”: { > “value_type”: “INT64”, }, “my_feature_id2”: { > “value_type”: “BOOL”, }, “my_feature_id3”: { > “value_type”: “STRING”, },
}
)
Parameters
feature_configs (Dict[str, **Dict[str, **Union[bool, *[int](https://python.readthedocs.io/en/latest/library/functions.html#int), Dict[[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str), [str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)], [str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]]*]) – Required. A user defined Dict containing configurations for feature creation.
The feature_configs Dict[str, Dict] i.e. {feature_id: feature_config} contains configuration for each creating feature: .. rubric:: Example
feature_configs = {
“my_feature_id_1”: feature_config_1, “my_feature_id_2”: feature_config_2, “my_feature_id_3”: feature_config_3,
}
Each feature_config requires “value_type”, and optional “description”, “labels”: .. rubric:: Example
feature_config_1 = {
“value_type”: “INT64”,
} feature_config_2 = {
”value_type”: “BOOL”, “description”: “my feature id 2 description”
} feature_config_3 = {
”value_type”: “STRING”, “labels”: {
”my key”: “my value”,
}
}
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Optional. Whether to execute this creation synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
EntityType - entity_type resource object
classmethod create(entity_type_id: str, featurestore_name: str, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Creates an EntityType resource in a Featurestore.
Example Usage:
my_entity_type = aiplatform.EntityType.create(
entity_type_id=’my_entity_type_id’, featurestore_name=’projects/123/locations/us-central1/featurestores/my_featurestore_id’
) or my_entity_type = aiplatform.EntityType.create(
entity_type_id=’my_entity_type_id’, featurestore_name=’my_featurestore_id’,
)
Parameters
entity_type_id (str) – Required. The ID to use for the EntityType, which will become the final component of the EntityType’s resource name.
This value may be up to 60 characters, and valid characters are
[a-z0-9_]
. The first character cannot be a number.The value must be unique within a featurestore.
featurestore_name (str) – Required. A fully-qualified featurestore resource name or a featurestore ID of an existing featurestore to create EntityType in. Example: “projects/123/locations/us-central1/featurestores/my_featurestore_id” or “my_featurestore_id” when project and location are initialized or passed.
description (str) – Optional. Description of the EntityType.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your EntityTypes. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one EntityType (System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
project (str) – Optional. Project to create EntityType in if featurestore_name is passed an featurestore ID. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to create EntityType in if featurestore_name is passed an featurestore ID. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to create EntityTypes. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Optional. Whether to execute this creation synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
EntityType - entity_type resource object
create_feature(feature_id: str, value_type: str, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Creates a Feature resource in this EntityType.
Example Usage:
my_entity_type = aiplatform.EntityType(
entity_type_name=’my_entity_type_id’, featurestore_id=’my_featurestore_id’,
) my_feature = my_entity_type.create_feature(
feature_id=’my_feature_id’, value_type=’INT64’,
)
Parameters
feature_id (str) – Required. The ID to use for the Feature, which will become the final component of the Feature’s resource name, which is immutable.
This value may be up to 60 characters, and valid characters are
[a-z0-9_]
. The first character cannot be a number.The value must be unique within an EntityType.
value_type (str) – Required. Immutable. Type of Feature value. One of BOOL, BOOL_ARRAY, DOUBLE, DOUBLE_ARRAY, INT64, INT64_ARRAY, STRING, STRING_ARRAY, BYTES.
description (str) – Optional. Description of the Feature.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Features. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one Feature (System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Optional. Whether to execute this creation synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
featurestore.Feature - feature resource object
delete(sync: bool = True, force: bool = False)
Deletes this EntityType resource. If force is set to True, all features in this EntityType will be deleted prior to entityType deletion.
WARNING: This deletion is permanent.
Parameters
force (bool) – If set to true, any Features for this EntityType will also be deleted. (Otherwise, the request will only work if the EntityType has no Features.)
sync (bool) – Whether to execute this deletion synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Raises
FailedPrecondition – If features are created in this EntityType and force = False.
delete_features(feature_ids: List[str], sync: bool = True)
Deletes feature resources in this EntityType given their feature IDs. WARNING: This deletion is permanent.
Parameters
feature_ids (List[str]) – Required. The list of feature IDs to be deleted.
sync (bool) – Optional. Whether to execute this deletion synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
property featurestore_name(: st )
Full qualified resource name of the managed featurestore in which this EntityType is.
get_feature(feature_id: str)
Retrieves an existing managed feature in this EntityType.
Parameters
feature_id (str) – Required. The managed feature resource ID in this EntityType.
Returns
featurestore.Feature - The managed feature resource object.
get_featurestore()
Retrieves the managed featurestore in which this EntityType is.
Returns
featurestore.Featurestore - The managed featurestore in which this EntityType is.
ingest_from_bq(feature_ids: List[str], feature_time: Union[str, datetime.datetime], bq_source_uri: str, feature_source_fields: Optional[Dict[str, str]] = None, entity_id_field: Optional[str] = None, disable_online_serving: Optional[bool] = None, worker_count: Optional[int] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Ingest feature values from BigQuery.
Parameters
feature_ids (List[str]) – Required. IDs of the Feature to import values of. The Features must exist in the target EntityType, or the request will fail.
feature_time (Union[str, *[datetime.datetime](https://python.readthedocs.io/en/latest/library/datetime.html#datetime.datetime)]*) – Required. The feature_time can be one of:
* The source column that holds the Feature
timestamp for all Feature values in each entity.
- A single Feature timestamp for all entities being imported. The timestamp must not have higher than millisecond precision.
bq_source_uri (str) – Required. BigQuery URI to the input table. .. rubric:: Example
’bq://project.dataset.table_name’
feature_source_fields (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. User defined dictionary to map ID of the Feature for importing values of to the source column for getting the Feature values from.
Specify the features whose ID and source column are not the same. If not provided, the source column need to be the same as the Feature ID.
Example
feature_ids = [‘my_feature_id_1’, ‘my_feature_id_2’, ‘my_feature_id_3’]
feature_source_fields = {
‘my_feature_id_1’: ‘my_feature_id_1_source_field’,
}
Note:
The source column of ‘my_feature_id_1’ is ‘my_feature_id_1_source_field’, The source column of ‘my_feature_id_2’ is the ID of the feature, same for ‘my_feature_id_3’.
entity_id_field (str) – Optional. Source column that holds entity IDs. If not provided, entity IDs are extracted from the column named
entity_id
.disable_online_serving (bool) – Optional. If set, data will not be imported for online serving. This is typically used for backfilling, where Feature generation timestamps are not in the timestamp range needed for online serving.
worker_count (int) – Optional. Specifies the number of workers that are used to write data to the Featurestore. Consider the online serving capacity that you require to achieve the desired import throughput without interfering with online serving. The value must be positive, and less than or equal to 100. If not set, defaults to using 1 worker. The low count ensures minimal impact on online serving performance.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Optional. Whether to execute this import synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
EntityType - The entityType resource object with feature values imported.
ingest_from_df(feature_ids: List[str], feature_time: Union[str, datetime.datetime], df_source: pd.DataFrame, feature_source_fields: Optional[Dict[str, str]] = None, entity_id_field: Optional[str] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = ())
Ingest feature values from DataFrame.
NOTE: Calling this method will automatically create and delete a temporary bigquery dataset in the same GCP project, which will be used as the intermediary storage for ingesting feature values from dataframe to featurestore.
The call will return upon ingestion completes, where the feature values will be ingested into the entity_type.
Parameters
feature_ids (List[str]) – Required. IDs of the Feature to import values of. The Features must exist in the target EntityType, or the request will fail.
feature_time (Union[str, *[datetime.datetime](https://python.readthedocs.io/en/latest/library/datetime.html#datetime.datetime)]*) – Required. The feature_time can be one of:
* The source column that holds the Feature
timestamp for all Feature values in each entity.
Note:
The dtype of the source column should be datetime64. * A single Feature timestamp for all entities
being imported. The timestamp must not have higher than millisecond precision.
Example:
feature_time = datetime.datetime(year=2022, month=1, day=1, hour=11, minute=59, second=59) or feature_time_str = datetime.datetime.now().isoformat(sep=” “, timespec=”milliseconds”) feature_time = datetime.datetime.strptime(feature_time_str, “%Y-%m-%d %H:%M:%S.%f”)
df_source (pd.DataFrame) – Required. Pandas DataFrame containing the source data for ingestion.
feature_source_fields (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. User defined dictionary to map ID of the Feature for importing values of to the source column for getting the Feature values from.
Specify the features whose ID and source column are not the same. If not provided, the source column need to be the same as the Feature ID.
Example
feature_ids = [‘my_feature_id_1’, ‘my_feature_id_2’, ‘my_feature_id_3’]
feature_source_fields = {
‘my_feature_id_1’: ‘my_feature_id_1_source_field’,
}
Note:
The source column of ‘my_feature_id_1’ is ‘my_feature_id_1_source_field’, The source column of ‘my_feature_id_2’ is the ID of the feature, same for ‘my_feature_id_3’.
entity_id_field (str) – Optional. Source column that holds entity IDs. If not provided, entity IDs are extracted from the column named
entity_id
.request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
Returns
EntityType - The entityType resource object with feature values imported.
ingest_from_gcs(feature_ids: List[str], feature_time: Union[str, datetime.datetime], gcs_source_uris: Union[str, List[str]], gcs_source_type: str, feature_source_fields: Optional[Dict[str, str]] = None, entity_id_field: Optional[str] = None, disable_online_serving: Optional[bool] = None, worker_count: Optional[int] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Ingest feature values from GCS.
Parameters
feature_ids (List[str]) – Required. IDs of the Feature to import values of. The Features must exist in the target EntityType, or the request will fail.
feature_time (Union[str, *[datetime.datetime](https://python.readthedocs.io/en/latest/library/datetime.html#datetime.datetime)]*) – Required. The feature_time can be one of:
* The source column that holds the Feature
timestamp for all Feature values in each entity.
- A single Feature timestamp for all entities being imported. The timestamp must not have higher than millisecond precision.
gcs_source_uris (Union[str, **List[str]]) – Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Example
[“gs://my_bucket/my_file_1.csv”, “gs://my_bucket/my_file_2.csv”] or “gs://my_bucket/my_file.avro”
gcs_source_type (str) – Required. The type of the input file(s) provided by gcs_source_uris, the value of gcs_source_type can only be either csv, or avro.
feature_source_fields (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. User defined dictionary to map ID of the Feature for importing values of to the source column for getting the Feature values from.
Specify the features whose ID and source column are not the same. If not provided, the source column need to be the same as the Feature ID.
Example
feature_ids = [‘my_feature_id_1’, ‘my_feature_id_2’, ‘my_feature_id_3’]
feature_source_fields = {
‘my_feature_id_1’: ‘my_feature_id_1_source_field’,
}
Note:
The source column of ‘my_feature_id_1’ is ‘my_feature_id_1_source_field’, The source column of ‘my_feature_id_2’ is the ID of the feature, same for ‘my_feature_id_3’.
entity_id_field (str) – Optional. Source column that holds entity IDs. If not provided, entity IDs are extracted from the column named
entity_id
.disable_online_serving (bool) – Optional. If set, data will not be imported for online serving. This is typically used for backfilling, where Feature generation timestamps are not in the timestamp range needed for online serving.
worker_count (int) – Optional. Specifies the number of workers that are used to write data to the Featurestore. Consider the online serving capacity that you require to achieve the desired import throughput without interfering with online serving. The value must be positive, and less than or equal to 100. If not set, defaults to using 1 worker. The low count ensures minimal impact on online serving performance.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Optional. Whether to execute this import synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
EntityType - The entityType resource object with feature values imported.
Raises
ValueError if gcs_source_type is not supported. –
classmethod list(featurestore_name: str, filter: Optional[str] = None, order_by: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Lists existing managed entityType resources in a featurestore, given a featurestore resource name or a featurestore ID.
Example Usage:
my_entityTypes = aiplatform.EntityType.list(
featurestore_name=’projects/123/locations/us-central1/featurestores/my_featurestore_id’
) or my_entityTypes = aiplatform.EntityType.list(
featurestore_name=’my_featurestore_id’
)
Parameters
featurestore_name (str) – Required. A fully-qualified featurestore resource name or a featurestore ID of an existing featurestore to list entityTypes in. Example: “projects/123/locations/us-central1/featurestores/my_featurestore_id” or “my_featurestore_id” when project and location are initialized or passed.
filter (str) – Optional. Lists the EntityTypes that match the filter expression. The following filters are supported:
create_time
: Supports=
,!=
,<
,>
,>=
, and<=
comparisons. Values must be in RFC 3339 format.update_time
: Supports=
,!=
,<
,>
,>=
, and<=
comparisons. Values must be in RFC 3339 format.labels
: Supports key-value equality as well as key presence.
Examples:
create_time > "2020-01-31T15:30:00.000000Z" OR update_time > "2020-01-31T15:30:00.000000Z"
–> EntityTypes created or updated after 2020-01-31T15:30:00.000000Z.labels.active = yes AND labels.env = prod
–> EntityTypes having both (active: yes) and (env: prod) labels.labels.env: \*
–> Any EntityType which has a label with ‘env’ as the key.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending.
Supported fields:
entity_type_id
create_time
update_time
project (str) – Optional. Project to list entityTypes in. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to list entityTypes in. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to list entityTypes. Overrides credentials set in aiplatform.init.
Returns
List[EntityType] - A list of managed entityType resource objects
list_features(filter: Optional[str] = None, order_by: Optional[str] = None)
Lists existing managed feature resources in this EntityType.
Example Usage:
my_entity_type = aiplatform.EntityType(
entity_type_name=’my_entity_type_id’, featurestore_id=’my_featurestore_id’,
) my_entityType.list_features()
Parameters
filter (str) – Optional. Lists the Features that match the filter expression. The following filters are supported:
value_type
: Supports = and != comparisons.create_time
: Supports =, !=, <, >, >=, and <= comparisons. Values must be in RFC 3339 format.update_time
: Supports =, !=, <, >, >=, and <= comparisons. Values must be in RFC 3339 format.labels
: Supports key-value equality as well as key presence.
Examples:
value_type = DOUBLE
–> Features whose type is DOUBLE.create_time > "2020-01-31T15:30:00.000000Z" OR update_time > "2020-01-31T15:30:00.000000Z"
–> EntityTypes created or updated after 2020-01-31T15:30:00.000000Z.labels.active = yes AND labels.env = prod
–> Features having both (active: yes) and (env: prod) labels.labels.env: \*
–> Any Feature which has a label with ‘env’ as the key.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending. Supported fields:
feature_id
value_type
create_time
update_time
Returns
List[featurestore.Feature] - A list of managed feature resource objects.
read(entity_ids: Union[str, List[str]], feature_ids: Union[str, List[str]] = '*', request_metadata: Optional[Sequence[Tuple[str, str]]] = ())
Reads feature values for given feature IDs of given entity IDs in this EntityType.
Parameters
entity_ids (Union[str, **List[str]]) – Required. ID for a specific entity, or a list of IDs of entities to read Feature values of. The maximum number of IDs is 100 if a list.
feature_ids (Union[str, **List[str]]) – Required. ID for a specific feature, or a list of IDs of Features in the EntityType for reading feature values. Default to “*”, where value of all features will be read.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
Returns
entities’ feature values in DataFrame
Return type
pd.DataFrame
update(description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, request_metadata: Sequence[Tuple[str, str]] = ())
Updates an existing managed entityType resource.
Example Usage:
my_entity_type = aiplatform.EntityType(
entity_type_name=’my_entity_type_id’, featurestore_id=’my_featurestore_id’,
) my_entity_type.update(
description=’update my description’,
)
Parameters
description (str) – Optional. Description of the EntityType.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your EntityTypes. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one Feature (System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Required. Strings which should be sent along with the request as metadata.
Returns
EntityType - The updated entityType resource object.
class google.cloud.aiplatform.Feature(feature_name: str, featurestore_id: Optional[str] = None, entity_type_id: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager
Managed feature resource for Vertex AI.
Retrieves an existing managed feature given a feature resource name or a feature ID.
Example Usage:
my_feature = aiplatform.Feature(
feature_name=’projects/123/locations/us-central1/featurestores/my_featurestore_id/ entityTypes/my_entity_type_id/features/my_feature_id’
) or my_feature = aiplatform.Feature(
feature_name=’my_feature_id’, featurestore_id=’my_featurestore_id’, entity_type_id=’my_entity_type_id’,
)
Parameters
feature_name (str) – Required. A fully-qualified feature resource name or a feature ID. Example: “projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id/features/my_feature_id” or “my_feature_id” when project and location are initialized or passed, with featurestore_id and entity_type_id passed.
featurestore_id (str) – Optional. Featurestore ID of an existing featurestore to retrieve feature from, when feature_name is passed as Feature ID.
entity_type_id (str) – Optional. EntityType ID of an existing entityType to retrieve feature from, when feature_name is passed as Feature ID. The EntityType must exist in the Featurestore if provided by the featurestore_id.
project (str) – Optional. Project to retrieve feature from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve feature from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve this Feature. Overrides credentials set in aiplatform.init.
Raises
ValueError – If only one of featurestore_id or entity_type_id is provided.
classmethod create(feature_id: str, value_type: str, entity_type_name: str, featurestore_id: Optional[str] = None, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Creates a Feature resource in an EntityType.
Example Usage:
my_feature = aiplatform.Feature.create(
feature_id=’my_feature_id’, value_type=’INT64’, entity_type_name=’projects/123/locations/us-central1/featurestores/my_featurestore_id/ entityTypes/my_entity_type_id’
) or my_feature = aiplatform.Feature.create(
feature_id=’my_feature_id’, value_type=’INT64’, entity_type_name=’my_entity_type_id’, featurestore_id=’my_featurestore_id’,
)
Parameters
feature_id (str) – Required. The ID to use for the Feature, which will become the final component of the Feature’s resource name, which is immutable.
This value may be up to 60 characters, and valid characters are
[a-z0-9_]
. The first character cannot be a number.The value must be unique within an EntityType.
value_type (str) – Required. Immutable. Type of Feature value. One of BOOL, BOOL_ARRAY, DOUBLE, DOUBLE_ARRAY, INT64, INT64_ARRAY, STRING, STRING_ARRAY, BYTES.
entity_type_name (str) – Required. A fully-qualified entityType resource name or an entity_type ID of an existing entityType to create Feature in. The EntityType must exist in the Featurestore if provided by the featurestore_id. Example: “projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id” or “my_entity_type_id” when project and location are initialized or passed, with featurestore_id passed.
featurestore_id (str) – Optional. Featurestore ID of an existing featurestore to create Feature in if entity_type_name is passed an entity_type ID.
description (str) – Optional. Description of the Feature.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Features. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one Feature (System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
project (str) – Optional. Project to create Feature in if entity_type_name is passed an entity_type ID. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to create Feature in if entity_type_name is passed an entity_type ID. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to create Features. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Optional. Whether to execute this creation synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Feature - feature resource object
property entity_type_name(: st )
Full qualified resource name of the managed entityType in which this Feature is.
property featurestore_name(: st )
Full qualified resource name of the managed featurestore in which this Feature is.
get_entity_type()
Retrieves the managed entityType in which this Feature is.
Returns
featurestore.EntityType - The managed entityType in which this Feature is.
get_featurestore()
Retrieves the managed featurestore in which this Feature is.
Returns
featurestore.Featurestore - The managed featurestore in which this Feature is.
classmethod list(entity_type_name: str, featurestore_id: Optional[str] = None, filter: Optional[str] = None, order_by: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Lists existing managed feature resources in an entityType, given an entityType resource name or an entity_type ID.
Example Usage:
my_features = aiplatform.Feature.list(
entity_type_name=’projects/123/locations/us-central1/featurestores/my_featurestore_id/ entityTypes/my_entity_type_id’
) or my_features = aiplatform.Feature.list(
entity_type_name=’my_entity_type_id’, featurestore_id=’my_featurestore_id’,
)
Parameters
entity_type_name (str) – Required. A fully-qualified entityType resource name or an entity_type ID of an existing entityType to list features in. The EntityType must exist in the Featurestore if provided by the featurestore_id. Example: “projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id” or “my_entity_type_id” when project and location are initialized or passed, with featurestore_id passed.
featurestore_id (str) – Optional. Featurestore ID of an existing featurestore to list features in, when entity_type_name is passed as entity_type ID.
filter (str) – Optional. Lists the Features that match the filter expression. The following filters are supported:
value_type
: Supports = and != comparisons.create_time
: Supports =, !=, <, >, >=, and <= comparisons. Values must be in RFC 3339 format.update_time
: Supports =, !=, <, >, >=, and <= comparisons. Values must be in RFC 3339 format.labels
: Supports key-value equality as well as key presence.
Examples:
value_type = DOUBLE
–> Features whose type is DOUBLE.create_time > "2020-01-31T15:30:00.000000Z" OR update_time > "2020-01-31T15:30:00.000000Z"
–> EntityTypes created or updated after 2020-01-31T15:30:00.000000Z.labels.active = yes AND labels.env = prod
–> Features having both (active: yes) and (env: prod) labels.labels.env: \*
–> Any Feature which has a label with ‘env’ as the key.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending. Supported fields:
feature_id
value_type
create_time
update_time
project (str) – Optional. Project to list features in. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to list features in. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to list features. Overrides credentials set in aiplatform.init.
Returns
List[Feature] - A list of managed feature resource objects
classmethod search(query: Optional[str] = None, page_size: Optional[int] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Searches existing managed Feature resources.
Example Usage:
my_features = aiplatform.Feature.search()
Parameters
query (str) – Optional. Query string that is a conjunction of field-restricted queries and/or field-restricted filters. Field-restricted queries and filters can be combined using
AND
to form a conjunction.A field query is in the form FIELD:QUERY. This implicitly checks if QUERY exists as a substring within Feature’s FIELD. The QUERY and the FIELD are converted to a sequence of words (i.e. tokens) for comparison. This is done by:
Removing leading/trailing whitespace and tokenizing the search value. Characters that are not one of alphanumeric
[a-zA-Z0-9]
, underscore_
, or asterisk\*
are treated as delimiters for tokens.\*
is treated as a wildcard that matches characters within a token.Ignoring case.
Prepending an asterisk to the first and appending an asterisk to the last token in QUERY.
A QUERY must be either a singular token or a phrase. A phrase is one or multiple words enclosed in double quotation marks (“). With phrases, the order of the words is important. Words in the phrase must be matching in order and consecutively.
Supported FIELDs for field-restricted queries:
feature_id
description
entity_type_id
Examples:
feature_id: foo
–> Matches a Feature with ID containing the substringfoo
(eg.foo
,foofeature
,barfoo
).feature_id: foo\*feature
–> Matches a Feature with ID containing the substringfoo\*feature
(eg.foobarfeature
).feature_id: foo AND description: bar
–> Matches a Feature with ID containing the substringfoo
and description containing the substringbar
.
Besides field queries, the following exact-match filters are supported. The exact-match filters do not support wildcards. Unlike field-restricted queries, exact-match filters are case-sensitive.
feature_id
: Supports = comparisons.description
: Supports = comparisons. Multi-token filters should be enclosed in quotes.entity_type_id
: Supports = comparisons.value_type
: Supports = and != comparisons.labels
: Supports key-value equality as well as key presence.featurestore_id
: Supports = comparisons.
Examples:
description = "foo bar"
–> Any Feature with description exactly equal tofoo bar
value_type = DOUBLE
–> Features whose type is DOUBLE.labels.active = yes AND labels.env = prod
–> Features having both (active: yes) and (env: prod) labels.labels.env: \*
–> Any Feature which has a label withenv
as the key.
This corresponds to the
query
field on therequest
instance; ifrequest
is provided, this should not be set.page_size (int) – Optional. The maximum number of Features to return. The service may return fewer than this value. If unspecified, at most 100 Features will be returned. The maximum value is 100; any value greater than 100 will be coerced to 100.
project (str) – Optional. Project to list features in. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to list features in. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to list features. Overrides credentials set in aiplatform.init.
Returns
List[Feature] - A list of managed feature resource objects
update(description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = ())
Updates an existing managed feature resource.
Example Usage:
my_feature = aiplatform.Feature(
feature_name=’my_feature_id’, featurestore_id=’my_featurestore_id’, entity_type_id=’my_entity_type_id’,
) my_feature.update(
description=’update my description’,
)
Parameters
description (str) – Optional. Description of the Feature.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Features. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one Feature (System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
Returns
Feature - The updated feature resource object.
class google.cloud.aiplatform.Featurestore(featurestore_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager
Managed featurestore resource for Vertex AI.
Retrieves an existing managed featurestore given a featurestore resource name or a featurestore ID.
Example Usage:
my_featurestore = aiplatform.Featurestore(
featurestore_name=’projects/123/locations/us-central1/featurestores/my_featurestore_id’
) or my_featurestore = aiplatform.Featurestore(
featurestore_name=’my_featurestore_id’
)
Parameters
featurestore_name (str) – Required. A fully-qualified featurestore resource name or a featurestore ID. Example: “projects/123/locations/us-central1/featurestores/my_featurestore_id” or “my_featurestore_id” when project and location are initialized or passed.
project (str) – Optional. Project to retrieve featurestore from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve featurestore from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve this Featurestore. Overrides credentials set in aiplatform.init.
batch_serve_to_bq(bq_destination_output_uri: str, serving_feature_ids: Dict[str, List[str]], read_instances_uri: str, pass_through_fields: Optional[List[str]] = None, feature_destination_fields: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Batch serves feature values to BigQuery destination
Parameters
bq_destination_output_uri (str) – Required. BigQuery URI to the detination table.
Example
’bq://project.dataset.table_name’
It requires an existing BigQuery destination Dataset, under the same project as the Featurestore.
serving_feature_ids (Dict[str, **List[str]]) – Required. A user defined dictionary to define the entity_types and their features for batch serve/read. The keys of the dictionary are the serving entity_type ids and the values are lists of serving feature ids in each entity_type.
Example
serving_feature_ids = {
‘my_entity_type_id_1’: [‘feature_id_1_1’, ‘feature_id_1_2’], ‘my_entity_type_id_2’: [‘feature_id_2_1’, ‘feature_id_2_2’],
}
read_instances_uri (str) – Required. Read_instances_uri can be either BigQuery URI to an input table, or Google Cloud Storage URI to a csv file.
Example
’bq://project.dataset.table_name’ or “gs://my_bucket/my_file.csv”
Each read instance should consist of exactly one read timestamp and one or more entity IDs identifying entities of the corresponding EntityTypes whose Features are requested.
Each output instance contains Feature values of requested entities concatenated together as of the read time.
An example read instance may be
foo_entity_id, bar_entity_id, 2020-01-01T10:00:00.123Z
.An example output instance may be
foo_entity_id, bar_entity_id, 2020-01-01T10:00:00.123Z, foo_entity_feature1_value, bar_entity_feature2_value
.Timestamp in each read instance must be millisecond-aligned.
The columns can be in any order.
Values in the timestamp column must use the RFC 3339 format, e.g.
2012-07-30T10:43:17.123Z
.pass_through_fields (List[str]) – Optional. When not empty, the specified fields in the read_instances source will be joined as-is in the output, in addition to those fields from the Featurestore Entity.
For BigQuery source, the type of the pass-through values will be automatically inferred. For CSV source, the pass-through values will be passed as opaque bytes.
feature_destination_fields (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. A user defined dictionary to map a feature’s fully qualified resource name to its destination field name. If the destination field name is not defined, the feature ID will be used as its destination field name.
Example
feature_destination_fields = {
‘projects/123/locations/us-central1/featurestores/fs_id/entityTypes/et_id1/features/f_id11’: ‘foo’, ‘projects/123/locations/us-central1/featurestores/fs_id/entityTypes/et_id2/features/f_id22’: ‘bar’,
}
Returns
The featurestore resource object batch read feature values from.
Return type
Featurestore
Raises
NotFound – if the BigQuery destination Dataset does not exist.
FailedPrecondition – if the BigQuery destination Dataset/Table is in a different project.
batch_serve_to_df(serving_feature_ids: Dict[str, List[str]], read_instances_df: pd.DataFrame, pass_through_fields: Optional[List[str]] = None, feature_destination_fields: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = ())
Batch serves feature values to pandas DataFrame
NOTE: Calling this method will automatically create and delete a temporary bigquery dataset in the same GCP project, which will be used as the intermediary storage for batch serve feature values from featurestore to dataframe.
Parameters
serving_feature_ids (Dict[str, **List[str]]) – Required. A user defined dictionary to define the entity_types and their features for batch serve/read. The keys of the dictionary are the serving entity_type ids and the values are lists of serving feature ids in each entity_type.
Example
serving_feature_ids = {
‘my_entity_type_id_1’: [‘feature_id_1_1’, ‘feature_id_1_2’], ‘my_entity_type_id_2’: [‘feature_id_2_1’, ‘feature_id_2_2’],
}
read_instances_df (pd.DataFrame) – Required. Read_instances_df is a pandas DataFrame containing the read instances.
Each read instance should consist of exactly one read timestamp and one or more entity IDs identifying entities of the corresponding EntityTypes whose Features are requested.
Each output instance contains Feature values of requested entities concatenated together as of the read time.
An example read_instances_df may be
pd.DataFrame(
data=[ { “my_entity_type_id_1”: “my_entity_type_id_1_entity_1”, “my_entity_type_id_2”: “my_entity_type_id_2_entity_1”, “timestamp”: “2020-01-01T10:00:00.123Z” ],
)
An example batch_serve_output_df may be
pd.DataFrame(
data=[ { “my_entity_type_id_1”: “my_entity_type_id_1_entity_1”, “my_entity_type_id_2”: “my_entity_type_id_2_entity_1”, “foo”: “feature_id_1_1_feature_value”, “feature_id_1_2”: “feature_id_1_2_feature_value”, “feature_id_2_1”: “feature_id_2_1_feature_value”, “bar”: “feature_id_2_2_feature_value”, “timestamp”: “2020-01-01T10:00:00.123Z” ],
)
Timestamp in each read instance must be millisecond-aligned.
The columns can be in any order.
Values in the timestamp column must use the RFC 3339 format, e.g.
2012-07-30T10:43:17.123Z
.pass_through_fields (List[str]) – Optional. When not empty, the specified fields in the read_instances source will be joined as-is in the output, in addition to those fields from the Featurestore Entity.
For BigQuery source, the type of the pass-through values will be automatically inferred. For CSV source, the pass-through values will be passed as opaque bytes.
feature_destination_fields (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. A user defined dictionary to map a feature’s fully qualified resource name to its destination field name. If the destination field name is not defined, the feature ID will be used as its destination field name.
Example
feature_destination_fields = {
‘projects/123/locations/us-central1/featurestores/fs_id/entityTypes/et_id1/features/f_id11’: ‘foo’, ‘projects/123/locations/us-central1/featurestores/fs_id/entityTypes/et_id2/features/f_id22’: ‘bar’,
}
Returns
The pandas DataFrame containing feature values from batch serving.
Return type
pd.DataFrame
batch_serve_to_gcs(gcs_destination_output_uri_prefix: str, gcs_destination_type: str, serving_feature_ids: Dict[str, List[str]], read_instances_uri: str, pass_through_fields: Optional[List[str]] = None, feature_destination_fields: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Batch serves feature values to GCS destination
Parameters
gcs_destination_output_uri_prefix (str) – Required. Google Cloud Storage URI to output directory. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.
Example
”gs://bucket/path/to/prefix”
gcs_destination_type (str) – Required. The type of the destination files(s), the value of gcs_destination_type can only be either csv, or tfrecord.
For CSV format. Array Feature value types are not allowed in CSV format.
For TFRecord format.
Below are the mapping from Feature value type in Featurestore to Feature value type in TFRecord:
Value type in Featurestore | Value type in TFRecord DOUBLE, DOUBLE_ARRAY | FLOAT_LIST INT64, INT64_ARRAY | INT64_LIST STRING, STRING_ARRAY, BYTES | BYTES_LIST true -> byte_string("true"), false -> byte_string("false") BOOL, BOOL_ARRAY (true, false) | BYTES_LIST
serving_feature_ids (Dict[str, **List[str]]) – Required. A user defined dictionary to define the entity_types and their features for batch serve/read. The keys of the dictionary are the serving entity_type ids and the values are lists of serving feature ids in each entity_type.
Example
serving_feature_ids = {
‘my_entity_type_id_1’: [‘feature_id_1_1’, ‘feature_id_1_2’], ‘my_entity_type_id_2’: [‘feature_id_2_1’, ‘feature_id_2_2’],
}
read_instances_uri (str) – Required. Read_instances_uri can be either BigQuery URI to an input table, or Google Cloud Storage URI to a csv file.
Example
’bq://project.dataset.table_name’ or “gs://my_bucket/my_file.csv”
Each read instance should consist of exactly one read timestamp and one or more entity IDs identifying entities of the corresponding EntityTypes whose Features are requested.
Each output instance contains Feature values of requested entities concatenated together as of the read time.
An example read instance may be
foo_entity_id, bar_entity_id, 2020-01-01T10:00:00.123Z
.An example output instance may be
foo_entity_id, bar_entity_id, 2020-01-01T10:00:00.123Z, foo_entity_feature1_value, bar_entity_feature2_value
.Timestamp in each read instance must be millisecond-aligned.
The columns can be in any order.
Values in the timestamp column must use the RFC 3339 format, e.g.
2012-07-30T10:43:17.123Z
.pass_through_fields (List[str]) – Optional. When not empty, the specified fields in the read_instances source will be joined as-is in the output, in addition to those fields from the Featurestore Entity.
For BigQuery source, the type of the pass-through values will be automatically inferred. For CSV source, the pass-through values will be passed as opaque bytes.
feature_destination_fields (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. A user defined dictionary to map a feature’s fully qualified resource name to its destination field name. If the destination field name is not defined, the feature ID will be used as its destination field name.
Example
feature_destination_fields = {
‘projects/123/locations/us-central1/featurestores/fs_id/entityTypes/et_id1/features/f_id11’: ‘foo’, ‘projects/123/locations/us-central1/featurestores/fs_id/entityTypes/et_id2/features/f_id22’: ‘bar’,
}
Returns
The featurestore resource object batch read feature values from.
Return type
Featurestore
Raises
ValueError if gcs_destination_type is not supported. –
classmethod create(featurestore_id: str, online_store_fixed_node_count: Optional[int] = None, labels: Optional[Dict[str, str]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Creates a Featurestore resource.
Example Usage:
my_featurestore = aiplatform.Featurestore.create(
featurestore_id=’my_featurestore_id’,
)
Parameters
featurestore_id (str) – Required. The ID to use for this Featurestore, which will become the final component of the Featurestore’s resource name.
This value may be up to 60 characters, and valid characters are
[a-z0-9_]
. The first character cannot be a number.The value must be unique within the project and location.
online_store_fixed_node_count (int) – Optional. Config for online serving resources. When not specified, default node count is 1. The number of nodes will not scale automatically but can be scaled manually by providing different values when updating.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Featurestore. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one Featurestore(System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
project (str) – Optional. Project to create EntityType in. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to create EntityType in. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to create EntityTypes. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
request_metadata – Optional. Strings which should be sent along with the request as metadata.
encryption_spec (str) – Optional. Customer-managed encryption key spec for data storage. If set, both of the online and offline data storage will be secured by this key.
sync (bool) – Optional. Whether to execute this creation synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Featurestore - Featurestore resource object
create_entity_type(entity_type_id: str, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), sync: bool = True)
Creates an EntityType resource in this Featurestore.
Example Usage:
my_featurestore = aiplatform.Featurestore.create(
featurestore_id=’my_featurestore_id’
) my_entity_type = my_featurestore.create_entity_type(
entity_type_id=’my_entity_type_id’,
)
Parameters
entity_type_id (str) – Required. The ID to use for the EntityType, which will become the final component of the EntityType’s resource name.
This value may be up to 60 characters, and valid characters are
[a-z0-9_]
. The first character cannot be a number.The value must be unique within a featurestore.
description (str) – Optional. Description of the EntityType.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your EntityTypes. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one EntityType (System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
sync (bool) – Optional. Whether to execute this creation synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
featurestore.EntityType - EntityType resource object
delete(sync: bool = True, force: bool = False)
Deletes this Featurestore resource. If force is set to True, all entityTypes in this Featurestore will be deleted prior to featurestore deletion, and all features in each entityType will be deleted prior to each entityType deletion.
WARNING: This deletion is permanent.
Parameters
force (bool) – If set to true, any EntityTypes and Features for this Featurestore will also be deleted. (Otherwise, the request will only work if the Featurestore has no EntityTypes.)
sync (bool) – Whether to execute this deletion synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
delete_entity_types(entity_type_ids: List[str], sync: bool = True, force: bool = False)
Deletes entity_type resources in this Featurestore given their entity_type IDs. WARNING: This deletion is permanent.
Parameters
entity_type_ids (List[str]) – Required. The list of entity_type IDs to be deleted.
sync (bool) – Optional. Whether to execute this deletion synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
force (bool) – Optional. If force is set to True, all features in each entityType will be deleted prior to entityType deletion. Default is False.
get_entity_type(entity_type_id: str)
Retrieves an existing managed entityType in this Featurestore.
Parameters
entity_type_id (str) – Required. The managed entityType resource ID in this Featurestore.
Returns
featurestore.EntityType - The managed entityType resource object.
list_entity_types(filter: Optional[str] = None, order_by: Optional[str] = None)
Lists existing managed entityType resources in this Featurestore.
Example Usage:
my_featurestore = aiplatform.Featurestore(
featurestore_name=’my_featurestore_id’,
) my_featurestore.list_entity_types()
Parameters
filter (str) – Optional. Lists the EntityTypes that match the filter expression. The following filters are supported:
create_time
: Supports=
,!=
,<
,>
,>=
, and<=
comparisons. Values must be in RFC 3339 format.update_time
: Supports=
,!=
,<
,>
,>=
, and<=
comparisons. Values must be in RFC 3339 format.labels
: Supports key-value equality as well as key presence.
Examples:
create_time > "2020-01-31T15:30:00.000000Z" OR update_time > "2020-01-31T15:30:00.000000Z"
–> EntityTypes created or updated after 2020-01-31T15:30:00.000000Z.labels.active = yes AND labels.env = prod
–> EntityTypes having both (active: yes) and (env: prod) labels.labels.env: \*
–> Any EntityType which has a label with ‘env’ as the key.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending.
Supported fields:
entity_type_id
create_time
update_time
Returns
List[featurestore.EntityType] - A list of managed entityType resource objects.
update(labels: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = ())
Updates an existing managed featurestore resource.
Example Usage:
my_featurestore = aiplatform.Featurestore(
featurestore_name=’my_featurestore_id’,
) my_featurestore.update(
labels={‘update my key’: ‘update my value’},
)
Parameters
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Featurestores. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information on and examples of labels. No more than 64 user labels can be associated with one Feature (System labels are excluded).” System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
Returns
Featurestore - The updated featurestore resource object.
update_online_store(fixed_node_count: int, request_metadata: Optional[Sequence[Tuple[str, str]]] = ())
Updates the online store of an existing managed featurestore resource.
Example Usage:
my_featurestore = aiplatform.Featurestore(
featurestore_name=’my_featurestore_id’,
) my_featurestore.update_online_store(
fixed_node_count=2,
)
Parameters
fixed_node_count (int) – Required. Config for online serving resources, can only update the node count to >= 1.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
Returns
Featurestore - The updated featurestore resource object.
class google.cloud.aiplatform.HyperparameterTuningJob(display_name: str, custom_job: google.cloud.aiplatform.jobs.CustomJob, metric_spec: Dict[str, str], parameter_spec: Dict[str, google.cloud.aiplatform.hyperparameter_tuning._ParameterSpec], max_trial_count: int, parallel_trial_count: int, max_failed_trial_count: int = 0, search_algorithm: Optional[str] = None, measurement_selection: Optional[str] = 'best', project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None)
Bases: google.cloud.aiplatform.jobs._RunnableJob
Vertex AI Hyperparameter Tuning Job.
Configures a HyperparameterTuning Job.
Example usage:
``
` from google.cloud.aiplatform import hyperparameter_tuning as hpt
worker_pool_specs = [
> {
> “machine_spec”: {
> “machine_type”: “n1-standard-4”,
> “accelerator_type”: “NVIDIA_TESLA_K80”,
> “accelerator_count”: 1,
> },
> “replica_count”: 1,
> “container_spec”: {
> > “image_uri”: container_image_uri,
> > “command”: [],
> > “args”: [],
> },
> }
]
custom_job = aiplatform.CustomJob(
display_name=’my_job’,
worker_pool_specs=worker_pool_specs,
labels={‘my_key’: ‘my_value’},
)
hp_job = aiplatform.HyperparameterTuningJob(
display_name=’hp-test’,
custom_job=job,
metric_spec={
> ‘loss’: ‘minimize’,
},
parameter_spec={
> ‘lr’: hpt.DoubleParameterSpec(min=0.001, max=0.1, scale=’log’),
> ‘units’: hpt.IntegerParameterSpec(min=4, max=128, scale=’linear’),
> ‘activation’: hpt.CategoricalParameterSpec(values=[‘relu’, ‘selu’]),
> ‘batch_size’: hpt.DiscreteParameterSpec(values=[128, 256], scale=’linear’)
},
max_trial_count=128,
parallel_trial_count=8,
labels={‘my_key’: ‘my_value’},
)
hp_job.run()
print(hp_job.trials)
``
`
For more information on using hyperparameter tuning please visit: https://cloud.google.com/ai-platform-unified/docs/training/using-hyperparameter-tuning
Parameters
display_name (str) – Required. The user-defined name of the HyperparameterTuningJob. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
custom_job (aiplatform.CustomJob) – Required. Configured CustomJob. The worker pool spec from this custom job applies to the CustomJobs created in all the trials.
metric_spec – Dict[str, str] Required. Dicionary representing metrics to optimize. The dictionary key is the metric_id, which is reported by your training job, and the dictionary value is the optimization goal of the metric(‘minimize’ or ‘maximize’). example:
metric_spec = {‘loss’: ‘minimize’, ‘accuracy’: ‘maximize’}
parameter_spec (Dict[str, **hyperparameter_tuning._ParameterSpec]) – Required. Dictionary representing parameters to optimize. The dictionary key is the metric_id, which is passed into your training job as a command line key word argument, and the dictionary value is the parameter specification of the metric.
from google.cloud.aiplatform import hyperparameter_tuning as hpt
parameter_spec={
‘decay’: hpt.DoubleParameterSpec(min=1e-7, max=1, scale=’linear’), ‘learning_rate’: hpt.DoubleParameterSpec(min=1e-7, max=1, scale=’linear’) ‘batch_size’: hpt.DiscreteParamterSpec(values=[4, 8, 16, 32, 64, 128], scale=’linear’)
}
Supported parameter specifications can be found until aiplatform.hyperparameter_tuning. These parameter specification are currently supported: DoubleParameterSpec, IntegerParameterSpec, CategoricalParameterSpace, DiscreteParameterSpec
max_trial_count (int) – Reuired. The desired total number of Trials.
parallel_trial_count (int) – Required. The desired number of Trials to run in parallel.
max_failed_trial_count (int) – Optional. The number of failed Trials that need to be seen before failing the HyperparameterTuningJob. If set to 0, Vertex AI decides how many Trials must fail before the whole job fails.
search_algorithm (str) – The search algorithm specified for the Study. Accepts one of the following:
None - If you do not specify an algorithm, your job uses the default Vertex AI algorithm. The default algorithm applies Bayesian optimization to arrive at the optimal solution with a more effective search over the parameter space.
’grid’ - A simple grid search within the feasible space. This option is particularly useful if you want to specify a quantity of trials that is greater than the number of points in the feasible space. In such cases, if you do not specify a grid search, the Vertex AI default algorithm may generate duplicate suggestions. To use grid search, all parameter specs must be of type IntegerParameterSpec, CategoricalParameterSpace, or DiscreteParameterSpec.
’random’ - A simple random search within the feasible space.
measurement_selection (str) – This indicates which measurement to use if/when the service automatically selects the final measurement from previously reported intermediate measurements.
Accepts: ‘best’, ‘last’
Choose this based on two considerations: A) Do you expect your measurements to monotonically improve? If so, choose ‘last’. On the other hand, if you’re in a situation where your system can “over-train” and you expect the performance to get better for a while but then start declining, choose ‘best’. B) Are your measurements significantly noisy and/or irreproducible? If so, ‘best’ will tend to be over-optimistic, and it may be better to choose ‘last’. If both or neither of (A) and (B) apply, it doesn’t matter which selection type is chosen.
project (str) – Optional. Project to run the HyperparameterTuningjob in. Overrides project set in aiplatform.init.
location (str) – Optional. Location to run the HyperparameterTuning in. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to run call HyperparameterTuning service. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize HyperparameterTuningJobs. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
encryption_spec_key_name (str) – Optional. Customer-managed encryption key options for a HyperparameterTuningJob. If this is set, then all resources created by the HyperparameterTuningJob will be encrypted with the provided encryption key.
property network(: Optional[str )
The full name of the Google Compute Engine network to which this HyperparameterTuningJob should be peered.
Takes the format projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.
Private services access must already be configured for the network. If left unspecified, the HyperparameterTuningJob is not peered with any network.
run(service_account: Optional[str] = None, network: Optional[str] = None, timeout: Optional[int] = None, restart_job_on_worker_restart: bool = False, enable_web_access: bool = False, tensorboard: Optional[str] = None, sync: bool = True)
Run this configured CustomJob.
Parameters
service_account (str) – Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
network (str) – Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
timeout (int) – Optional. The maximum job running time in seconds. The default is 7 days.
restart_job_on_worker_restart (bool) – Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
enable_web_access (bool) – Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
tensorboard (str) – Optional. The name of a Vertex AI [Tensorboard][google.cloud.aiplatform.v1beta1.Tensorboard] resource to which this CustomJob will upload Tensorboard logs. Format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
The training script should write Tensorboard to following Vertex AI environment variable:
AIP_TENSORBOARD_LOG_DIR
service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
sync (bool) – Whether to execute this method synchronously. If False, this method will unblock and it will be executed in a concurrent Future.
class google.cloud.aiplatform.ImageDataset(dataset_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.datasets.dataset._Dataset
Managed image dataset resource for Vertex AI.
Retrieves an existing managed dataset given a dataset name or ID.
Parameters
dataset_name (str) – Required. A fully-qualified dataset resource name or dataset ID. Example: “projects/123/locations/us-central1/datasets/456” or “456” when project and location are initialized or passed.
project (str) – Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used.
location (str) – Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Custom credentials to use to retrieve this Dataset. Overrides credentials set in aiplatform.init.
classmethod create(display_name: str, gcs_source: Optional[Union[str, Sequence[str]]] = None, import_schema_uri: Optional[str] = None, data_item_labels: Optional[Dict] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Creates a new image dataset and optionally imports data into dataset when source and import_schema_uri are passed.
Parameters
display_name (str) – Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
gcs_source (Union[str, **Sequence[str]]) – Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
import_schema_uri (str) – Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
data_item_labels (Dict) – Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.project (str) – Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the managed image dataset resource.
Return type
image_dataset (ImageDataset)
class google.cloud.aiplatform.Model(model_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager
Retrieves the model resource and instantiates its representation.
Parameters
model_name (str) – Required. A fully-qualified model resource name or model ID. Example: “projects/123/locations/us-central1/models/456” or “456” when project and location are initialized or passed.
project (str) – Optional project to retrieve model from. If not set, project set in aiplatform.init will be used.
location (str) – Optional location to retrieve model from. If not set, location set in aiplatform.init will be used.
credentials – Optional[auth_credentials.Credentials]=None, Custom credentials to use to upload this model. If not set, credentials set in aiplatform.init will be used.
batch_predict(job_display_name: str, gcs_source: Optional[Union[str, Sequence[str]]] = None, bigquery_source: Optional[str] = None, instances_format: str = 'jsonl', gcs_destination_prefix: Optional[str] = None, bigquery_destination_prefix: Optional[str] = None, predictions_format: str = 'jsonl', model_parameters: Optional[Dict] = None, machine_type: Optional[str] = None, accelerator_type: Optional[str] = None, accelerator_count: Optional[int] = None, starting_replica_count: Optional[int] = None, max_replica_count: Optional[int] = None, generate_explanation: Optional[bool] = False, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, labels: Optional[Dict[str, str]] = None, credentials: Optional[google.auth.credentials.Credentials] = None, encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Creates a batch prediction job using this Model and outputs prediction results to the provided destination prefix in the specified predictions_format. One source and one destination prefix are required.
Example usage:
my_model.batch_predict(
job_display_name=”prediction-123”,
gcs_source=”gs://example-bucket/instances.csv”,
instances_format=”csv”,
bigquery_destination_prefix=”projectId.bqDatasetId.bqTableId”
)
Parameters
job_display_name (str) – Required. The user-defined name of the BatchPredictionJob. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
gcs_source – Optional[Sequence[str]] = None Google Cloud Storage URI(-s) to your instances to run batch prediction on. They must match instances_format. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames.
bigquery_source – Optional[str] = None BigQuery URI to a table, up to 2000 characters long. For example: bq://projectId.bqDatasetId.bqTableId
instances_format – str = “jsonl” The format in which instances are provided. Must be one of the formats listed in Model.supported_input_storage_formats. Default is “jsonl” when using gcs_source. If a bigquery_source is provided, this is overridden to “bigquery”.
gcs_destination_prefix – Optional[str] = None The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is
prediction-<model-display-name>-<job-create-time>
, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it filespredictions_0001.<extension>
,predictions_0002.<extension>
, …,predictions_N.<extension>
are created where<extension>
depends on chosenpredictions_format
, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has bothinstance
andprediction
schemata defined then each such file contains predictions as per thepredictions_format
. If prediction for any instance failed (partially or completely), then an additionalerrors_0001.<extension>
,errors_0002.<extension>
,…,errors_N.<extension>
files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additionalerror
field which as value has\
google.rpc.Status<Status>\
__ containing onlycode
andmessage
fields.bigquery_destination_prefix – Optional[str] = None The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name
prediction_<model-display-name>_<job-create-time>
where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created,predictions
, anderrors
. If the Model has bothinstance
andprediction
schemata defined then the tables have columns as follows: Thepredictions
table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. Theerrors
table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has\
google.rpc.Status<Status>\
__ represented as a STRUCT, and containing onlycode
andmessage
.predictions_format – str = “jsonl” Required. The format in which Vertex AI outputs the predictions, must be one of the formats specified in Model.supported_output_storage_formats. Default is “jsonl” when using gcs_destination_prefix. If a bigquery_destination_prefix is provided, this is overridden to “bigquery”.
model_parameters – Optional[Dict] = None Optional. The parameters that govern the predictions. The schema of the parameters may be specified via the Model’s parameters_schema_uri.
machine_type – Optional[str] = None Optional. The type of machine for running batch prediction on dedicated resources. Not specifying machine type will result in batch prediction job being run with automatic resources.
accelerator_type – Optional[str] = None Optional. The type of accelerator(s) that may be attached to the machine as per accelerator_count. Only used if machine_type is set.
accelerator_count – Optional[int] = None Optional. The number of accelerators to attach to the machine_type. Only used if machine_type is set.
starting_replica_count – Optional[int] = None The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set.
max_replica_count – Optional[int] = None The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. Default is 10.
generate_explanation (bool) – Optional. Generate explanation along with the batch prediction results. This will cause the batch prediction output to include explanations based on the prediction_format:
* bigquery: output includes a column named explanation. The value is a struct that conforms to the [aiplatform.gapic.Explanation] object.
> * jsonl: The JSON objects on each line include an additional entry
> keyed explanation. The value of the entry is a JSON object that
> conforms to the [aiplatform.gapic.Explanation] object.
> * csv: Generating explanations for CSV format is not supported.
* **explanation_metadata** (*explain.ExplanationMetadata*) – Optional. Explanation metadata configuration for this BatchPredictionJob.
Can be specified only if generate_explanation is set to True.
This value overrides the value of Model.explanation_metadata.
All fields of explanation_metadata are optional in the request. If
a field of the explanation_metadata object is not populated, the
corresponding field of the Model.explanation_metadata object is inherited.
For more details, see Ref docs <http://tinyurl.com/1igh60kt>
* **explanation_parameters** (*explain.ExplanationParameters*) – Optional. Parameters to configure explaining for Model’s predictions.
Can be specified only if generate_explanation is set to True.
This value overrides the value of Model.explanation_parameters.
All fields of explanation_parameters are optional in the request. If
a field of the explanation_parameters object is not populated, the
corresponding field of the Model.explanation_parameters object is inherited.
For more details, see Ref docs <http://tinyurl.com/1an4zake>
* **labels** – Optional[Dict[str, str]] = None
Optional. The labels with user-defined metadata to organize your
BatchPredictionJobs. Label keys and values can be no longer than
64 characters (Unicode codepoints), can only contain lowercase
letters, numeric characters, underscores and dashes.
International characters are allowed. See [https://goo.gl/xmQnxf](https://goo.gl/xmQnxf)
for more information and examples of labels.
* **credentials** – Optional[auth_credentials.Credentials] = None
Optional. Custom credentials to use to create this batch prediction
job. Overrides credentials set in aiplatform.init.
* **encryption_spec_key_name** (*Optional**[*[*str*](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)*]*) – Optional. The Cloud KMS resource identifier of the customer
managed encryption key used to protect the model. Has the
form:
`projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`.
The key needs to be in the same region as where the compute
resource is created.
If set, this Model and all sub-resources of this Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
Returns
Instantiated representation of the created batch prediction job.
Return type
(jobs.BatchPredictionJob)
property container_spec(: Optional[google.cloud.aiplatform_v1.types.model.ModelContainerSpec )
The specification of the container that is to be used when deploying this Model. Not present for AutoML Models.
deploy(endpoint: Optional[google.cloud.aiplatform.models.Endpoint] = None, deployed_model_display_name: Optional[str] = None, traffic_percentage: Optional[int] = 0, traffic_split: Optional[Dict[str, int]] = None, machine_type: Optional[str] = None, min_replica_count: int = 1, max_replica_count: int = 1, accelerator_type: Optional[str] = None, accelerator_count: Optional[int] = None, service_account: Optional[str] = None, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, metadata: Optional[Sequence[Tuple[str, str]]] = (), encryption_spec_key_name: Optional[str] = None, sync=True)
Deploys model to endpoint. Endpoint will be created if unspecified.
Parameters
endpoint ("Endpoint") – Optional. Endpoint to deploy model to. If not specified, endpoint display name will be model display name+’_endpoint’.
deployed_model_display_name (str) – Optional. The display name of the DeployedModel. If not provided upon creation, the Model’s display_name is used.
traffic_percentage (int) – Optional. Desired traffic to newly deployed model. Defaults to 0 if there are pre-existing deployed models. Defaults to 100 if there are no pre-existing deployed models. Negative values should not be provided. Traffic of previously deployed models at the endpoint will be scaled down to accommodate new deployed model’s traffic. Should not be provided if traffic_split is provided.
traffic_split (Dict[str, *[int](https://python.readthedocs.io/en/latest/library/functions.html#int)]*) – Optional. A map from a DeployedModel’s ID to the percentage of this Endpoint’s traffic that should be forwarded to that DeployedModel. If a DeployedModel’s ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. Key for model being deployed is “0”. Should not be provided if traffic_percentage is provided.
machine_type (str) – Optional. The type of machine. Not specifying machine type will result in model to be deployed with automatic resources.
min_replica_count (int) – Optional. The minimum number of machine replicas this deployed model will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
max_replica_count (int) – Optional. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, the smaller value of min_replica_count or 1 will be used.
accelerator_type (str) – Optional. Hardware accelerator type. Must also set accelerator_count if used. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count (int) – Optional. The number of accelerators to attach to a worker replica.
service_account (str) – The service account that the DeployedModel’s container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn’t have access to the resource project. Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.
explanation_metadata (explain.ExplanationMetadata) – Optional. Metadata describing the Model’s input and output for explanation. Both explanation_metadata and explanation_parameters must be passed together when used. For more details, see Ref docs http://tinyurl.com/1igh60kt
explanation_parameters (explain.ExplanationParameters) – Optional. Parameters to configure explaining for Model’s predictions. For more details, see Ref docs http://tinyurl.com/1an4zake
metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Model and all sub-resources of this Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Endpoint with the deployed model.
Return type
endpoint (“Endpoint”)
property description(: st )
Description of the model.
export_model(export_format_id: str, artifact_destination: Optional[str] = None, image_destination: Optional[str] = None, sync: bool = True)
Exports a trained, exportable Model to a location specified by the user. A Model is considered to be exportable if it has at least one supported_export_formats. Either artifact_destination or image_destination must be provided.
Usage:
my_model.export(
export_format_id=’tf-saved-model’
artifact_destination=’gs://my-bucket/models/’
)
or
my_model.export(
export_format_id=’custom-model’
image_destination=’us-central1-docker.pkg.dev/projectId/repo/image’
)
Parameters
export_format_id (str) – Required. The ID of the format in which the Model must be exported. The list of export formats that this Model supports can be found by calling Model.supported_export_formats.
artifact_destination (str) – The Cloud Storage location where the Model artifact is to be written to. Under the directory given as the destination a new one with name “
model-export-<model-display-name>-<timestamp-of-export-call>
”, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format, will be created. Inside, the Model and any of its supporting files will be written.This field should only be set when, in [Model.supported_export_formats], the value for the key given in export_format_id contains
ARTIFACT
.image_destination (str) – The Google Container Registry or Artifact Registry URI where the Model container image will be copied to. Accepted forms:
- Google Container Registry path. For example:
gcr.io/projectId/imageName:tag
.- Artifact Registry path. For example:
us-central1-docker.pkg.dev/projectId/repoName/imageName:tag
.This field should only be set when, in [Model.supported_export_formats], the value for the key given in export_format_id contains
IMAGE
.sync (bool) – Whether to execute this export synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Details of the completed export with output destination paths to the artifacts or container image.
Return type
Raises
ValueError – If model does not support exporting.
ValueError – If invalid arguments or export formats are provided.
classmethod list(filter: Optional[str] = None, order_by: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
List all Model resource instances.
Example Usage:
aiplatform.Model.list(
filter=’labels.my_label=”my_label_value” AND display_name=”my_model”’,
)
Parameters
filter (str) – Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending. Supported fields: display_name, create_time, update_time
project (str) – Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.
Returns
List[models.Model] - A list of Model resource objects
property predict_schemata(: Optional[google.cloud.aiplatform_v1.types.model.PredictSchemata )
The schemata that describe formats of the Model’s predictions and explanations, if available.
property supported_deployment_resources_types(: List[google.cloud.aiplatform_v1.types.model.Model.DeploymentResourcesType )
List of deployment resource types accepted for this Model.
When this Model is deployed, its prediction resources are described by the prediction_resources field of the objects returned by Endpoint.list_models(). Because not all Models support all resource configuration types, the configuration types this Model supports are listed here.
If no configuration types are listed, the Model cannot be deployed to an Endpoint and does not support online predictions (Endpoint.predict() or Endpoint.explain()). Such a Model can serve predictions by using a BatchPredictionJob, if it has at least one entry each in Model.supported_input_storage_formats and Model.supported_output_storage_formats.
property supported_export_formats(: Dict[str, List[google.cloud.aiplatform_v1.types.model.Model.ExportFormat.ExportableContent] )
The formats and content types in which this Model may be exported. If empty, this Model is not available for export.
For example, if this model can be exported as a Tensorflow SavedModel and have the artifacts written to Cloud Storage, the expected value would be:
{‘tf-saved-model’: [<ExportableContent.ARTIFACT: 1>]}
property supported_input_storage_formats(: List[str )
The formats this Model supports in the input_config field of a BatchPredictionJob. If Model.predict_schemata.instance_schema_uri exists, the instances should be given as per that schema.
Read the docs for more on batch prediction formats
If this Model doesn’t support any of these formats it means it cannot be used with a BatchPredictionJob. However, if it has supported_deployment_resources_types, it could serve online predictions by using Endpoint.predict() or Endpoint.explain().
property supported_output_storage_formats(: List[str )
The formats this Model supports in the output_config field of a BatchPredictionJob.
If both Model.predict_schemata.instance_schema_uri and Model.predict_schemata.prediction_schema_uri exist, the predictions are returned together with their instances. In other words, the prediction has the original instance data first, followed by the actual prediction content (as per the schema).
Read the docs for more on batch prediction formats
If this Model doesn’t support any of these formats it means it cannot be used with a BatchPredictionJob. However, if it has supported_deployment_resources_types, it could serve online predictions by using Endpoint.predict() or Endpoint.explain().
property training_job(: Optional[google.cloud.aiplatform.training_jobs._TrainingJob )
The TrainingJob that uploaded this Model, if any.
Raises
api_core.exceptions.NotFound – If the Model’s training job resource cannot be found on the Vertex service.
update(display_name: Optional[str] = None, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None)
Updates a model.
Example usage:
my_model = my_model.update(
display_name=’my-model’,
description=’my description’,
labels={‘key’: ‘value’},
)
Parameters
display_name (str) – The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description (str) – The description of the model.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
Returns
Updated model resource.
Return type
model
Raises
ValueError – If labels is not the correct format.
classmethod upload(display_name: str, serving_container_image_uri: str, *, artifact_uri: Optional[str] = None, serving_container_predict_route: Optional[str] = None, serving_container_health_route: Optional[str] = None, description: Optional[str] = None, serving_container_command: Optional[Sequence[str]] = None, serving_container_args: Optional[Sequence[str]] = None, serving_container_environment_variables: Optional[Dict[str, str]] = None, serving_container_ports: Optional[Sequence[int]] = None, instance_schema_uri: Optional[str] = None, parameters_schema_uri: Optional[str] = None, prediction_schema_uri: Optional[str] = None, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None, sync=True)
Uploads a model and returns a Model representing the uploaded Model resource.
Example usage:
my_model = Model.upload(
display_name=’my-model’,
artifact_uri=’gs://my-model/saved-model’
serving_container_image_uri=’tensorflow/serving’
)
Parameters
display_name (str) – Required. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
serving_container_image_uri (str) – Required. The URI of the Model serving container.
artifact_uri (str) – Optional. The path to the directory containing the Model artifact and any of its supporting files. Leave blank for custom container prediction. Not present for AutoML Models.
serving_container_predict_route (str) – Optional. An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.
serving_container_health_route (str) – Optional. An HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by Vertex AI.
description (str) – The description of the model.
serving_container_command – Optional[Sequence[str]]=None, The command with which the container is run. Not executed within a shell. The Docker image’s ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
serving_container_args – Optional[Sequence[str]]=None, The arguments to the command. The Docker image’s CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
serving_container_environment_variables – Optional[Dict[str, str]]=None, The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.
serving_container_ports – Optional[Sequence[int]]=None, Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default “0.0.0.0” address inside a container will be accessible from the network.
instance_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.parameters_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.prediction_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.explanation_metadata (explain.ExplanationMetadata) – Optional. Metadata describing the Model’s input and output for explanation. Both explanation_metadata and explanation_parameters must be passed together when used. For more details, see Ref docs http://tinyurl.com/1igh60kt
explanation_parameters (explain.ExplanationParameters) – Optional. Parameters to configure explaining for Model’s predictions. For more details, see Ref docs http://tinyurl.com/1an4zake
project – Optional[str]=None, Project to upload this model to. Overrides project set in aiplatform.init.
location – Optional[str]=None, Location to upload this model to. Overrides location set in aiplatform.init.
credentials – Optional[auth_credentials.Credentials]=None, Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Model and all sub-resources of this Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
staging_bucket (str) – Optional. Bucket to stage local model artifacts. Overrides staging_bucket set in aiplatform.init.
Returns
Instantiated representation of the uploaded model resource.
Return type
model
Raises
ValueError – If only explanation_metadata or explanation_parameters is specified. Also if model directory does not contain a supported model file.
classmethod upload_scikit_learn_model_file(model_file_path: str, sklearn_version: Optional[str] = None, display_name: str = 'Scikit-learn model', description: Optional[str] = None, instance_schema_uri: Optional[str] = None, parameters_schema_uri: Optional[str] = None, prediction_schema_uri: Optional[str] = None, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None, sync=True)
Uploads a model and returns a Model representing the uploaded Model resource.
Note: This function is experimental and can be changed in the future.
Example usage:
my_model = Model.upload_scikit_learn_model_file(
model_file_path="iris.sklearn_model.joblib"
)
Parameters
model_file_path (str) – Required. Local file path of the model.
sklearn_version (str) – Optional. The version of the Scikit-learn serving container. Supported versions: [“0.20”, “0.22”, “0.23”, “0.24”, “1.0”]. If the version is not specified, the latest version is used.
display_name (str) – Optional. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description (str) – The description of the model.
instance_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.parameters_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.prediction_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.explanation_metadata (explain.ExplanationMetadata) – Optional. Metadata describing the Model’s input and output for explanation. Both explanation_metadata and explanation_parameters must be passed together when used. For more details, see Ref docs http://tinyurl.com/1igh60kt
explanation_parameters (explain.ExplanationParameters) – Optional. Parameters to configure explaining for Model’s predictions. For more details, see Ref docs http://tinyurl.com/1an4zake
project – Optional[str]=None, Project to upload this model to. Overrides project set in aiplatform.init.
location – Optional[str]=None, Location to upload this model to. Overrides location set in aiplatform.init.
credentials – Optional[auth_credentials.Credentials]=None, Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Model and all sub-resources of this Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
staging_bucket (str) – Optional. Bucket to stage local model artifacts. Overrides staging_bucket set in aiplatform.init.
Returns
Instantiated representation of the uploaded model resource.
Return type
model
Raises
ValueError – If only explanation_metadata or explanation_parameters is specified. Also if model directory does not contain a supported model file.
classmethod upload_tensorflow_saved_model(saved_model_dir: str, tensorflow_version: Optional[str] = None, use_gpu: bool = False, display_name: str = 'Tensorflow model', description: Optional[str] = None, instance_schema_uri: Optional[str] = None, parameters_schema_uri: Optional[str] = None, prediction_schema_uri: Optional[str] = None, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None, sync=True)
Uploads a model and returns a Model representing the uploaded Model resource.
Note: This function is experimental and can be changed in the future.
Example usage:
my_model = Model.upload_scikit_learn_model_file(
model_file_path="iris.tensorflow_model.SavedModel"
)
Parameters
saved_model_dir (str) – Required. Local directory of the Tensorflow SavedModel.
tensorflow_version (str) – Optional. The version of the Tensorflow serving container. Supported versions: [“0.15”, “2.1”, “2.2”, “2.3”, “2.4”, “2.5”, “2.6”, “2.7”]. If the version is not specified, the latest version is used.
use_gpu (bool) – Whether to use GPU for model serving.
display_name (str) – Optional. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description (str) – The description of the model.
instance_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.parameters_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.prediction_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.explanation_metadata (explain.ExplanationMetadata) – Optional. Metadata describing the Model’s input and output for explanation. Both explanation_metadata and explanation_parameters must be passed together when used. For more details, see Ref docs http://tinyurl.com/1igh60kt
explanation_parameters (explain.ExplanationParameters) – Optional. Parameters to configure explaining for Model’s predictions. For more details, see Ref docs http://tinyurl.com/1an4zake
project – Optional[str]=None, Project to upload this model to. Overrides project set in aiplatform.init.
location – Optional[str]=None, Location to upload this model to. Overrides location set in aiplatform.init.
credentials – Optional[auth_credentials.Credentials]=None, Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Model and all sub-resources of this Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
staging_bucket (str) – Optional. Bucket to stage local model artifacts. Overrides staging_bucket set in aiplatform.init.
Returns
Instantiated representation of the uploaded model resource.
Return type
model
Raises
ValueError – If only explanation_metadata or explanation_parameters is specified. Also if model directory does not contain a supported model file.
classmethod upload_xgboost_model_file(model_file_path: str, xgboost_version: Optional[str] = None, display_name: str = 'XGBoost model', description: Optional[str] = None, instance_schema_uri: Optional[str] = None, parameters_schema_uri: Optional[str] = None, prediction_schema_uri: Optional[str] = None, explanation_metadata: Optional[google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata] = None, explanation_parameters: Optional[google.cloud.aiplatform_v1.types.explanation.ExplanationParameters] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None, sync=True)
Uploads a model and returns a Model representing the uploaded Model resource.
Note: This function is experimental and can be changed in the future.
Example usage:
my_model = Model.upload_xgboost_model_file(
model_file_path="iris.xgboost_model.bst"
)
Parameters
model_file_path (str) – Required. Local file path of the model.
xgboost_version (str) – Optional. The version of the XGBoost serving container. Supported versions: [“0.82”, “0.90”, “1.1”, “1.2”, “1.3”, “1.4”]. If the version is not specified, the latest version is used.
display_name (str) – Optional. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description (str) – The description of the model.
instance_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.parameters_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.prediction_schema_uri (str) – Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.explanation_metadata (explain.ExplanationMetadata) – Optional. Metadata describing the Model’s input and output for explanation. Both explanation_metadata and explanation_parameters must be passed together when used. For more details, see Ref docs http://tinyurl.com/1igh60kt
explanation_parameters (explain.ExplanationParameters) – Optional. Parameters to configure explaining for Model’s predictions. For more details, see Ref docs http://tinyurl.com/1an4zake
project – Optional[str]=None, Project to upload this model to. Overrides project set in aiplatform.init.
location – Optional[str]=None, Location to upload this model to. Overrides location set in aiplatform.init.
credentials – Optional[auth_credentials.Credentials]=None, Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Model and all sub-resources of this Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
staging_bucket (str) – Optional. Bucket to stage local model artifacts. Overrides staging_bucket set in aiplatform.init.
Returns
Instantiated representation of the uploaded model resource.
Return type
model
Raises
ValueError – If only explanation_metadata or explanation_parameters is specified. Also if model directory does not contain a supported model file.
property uri(: Optional[str )
Path to the directory containing the Model artifact and any of its supporting files. Not present for AutoML Models.
class google.cloud.aiplatform.PipelineJob(display_name: str, template_path: str, job_id: Optional[str] = None, pipeline_root: Optional[str] = None, parameter_values: Optional[Dict[str, Any]] = None, enable_caching: Optional[bool] = None, encryption_spec_key_name: Optional[str] = None, labels: Optional[Dict[str, str]] = None, credentials: Optional[google.auth.credentials.Credentials] = None, project: Optional[str] = None, location: Optional[str] = None)
Bases: google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager
Retrieves a PipelineJob resource and instantiates its representation.
Parameters
display_name (str) – Required. The user-defined name of this Pipeline.
template_path (str) – Required. The path of PipelineJob or PipelineSpec JSON file. It can be a local path or a Google Cloud Storage URI. Example: “gs://project.name”
job_id (str) – Optional. The unique ID of the job run. If not specified, pipeline name + timestamp will be used.
pipeline_root (str) – Optional. The root of the pipeline outputs. Default to be staging bucket.
parameter_values (Dict[str, **Any]) – Optional. The mapping from runtime parameter names to its values that control the pipeline run.
enable_caching (bool) – Optional. Whether to turn on caching for the run.
If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks.
If this is set, the setting applies to all tasks in the pipeline.
Overrides the compile time settings.
encryption_spec_key_name (str) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the job. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If this is set, then all resources created by the PipelineJob will be encrypted with the provided encryption key.
Overrides encryption_spec_key_name set in aiplatform.init.
labels (Dict[str,str]) – Optional. The user defined metadata to organize PipelineJob.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to create this PipelineJob. Overrides credentials set in aiplatform.init.
project (str) – Optional. The project that you want to run this PipelineJob in. If not set, the project set in aiplatform.init will be used.
location (str) – Optional. Location to create PipelineJob. If not set, location set in aiplatform.init will be used.
Raises
ValueError – If job_id or labels have incorrect format.
cancel()
Starts asynchronous cancellation on the PipelineJob. The server makes a best effort to cancel the job, but success is not guaranteed. On successful cancellation, the PipelineJob is not deleted; instead it becomes a job with state set to CANCELLED.
classmethod get(resource_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Get a Vertex AI Pipeline Job for the given resource_name.
Parameters
resource_name (str) – Required. A fully-qualified resource name or ID.
project (str) – Optional. Project to retrieve dataset from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve dataset from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
Returns
A Vertex AI PipelineJob.
property has_failed(: boo )
Returns True if pipeline has failed.
False otherwise.
classmethod list(filter: Optional[str] = None, order_by: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
List all instances of this PipelineJob resource.
Example Usage:
aiplatform.PipelineJob.list(
filter=’display_name=”experiment_a27”’,
order_by=’create_time desc’
)
Parameters
filter (str) – Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending. Supported fields: display_name, create_time, update_time
project (str) – Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.
Returns
List[PipelineJob] - A list of PipelineJob resource objects
run(service_account: Optional[str] = None, network: Optional[str] = None, sync: Optional[bool] = True)
Run this configured PipelineJob and monitor the job until completion.
Parameters
service_account (str) – Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
network (str) – Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC.
Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
sync (bool) – Optional. Whether to execute this method synchronously. If False, this method will unblock and it will be executed in a concurrent Future.
property state(: Optional[google.cloud.aiplatform_v1.types.pipeline_state.PipelineState )
Current pipeline state.
submit(service_account: Optional[str] = None, network: Optional[str] = None)
Run this configured PipelineJob.
Parameters
service_account (str) – Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
network (str) – Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC.
Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
wait()
Wait for thie PipelineJob to complete.
wait_for_resource_creation()
Waits until resource has been created.
class google.cloud.aiplatform.TabularDataset(dataset_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.datasets.column_names_dataset._ColumnNamesDataset
Managed tabular dataset resource for Vertex AI.
Retrieves an existing managed dataset given a dataset name or ID.
Parameters
dataset_name (str) – Required. A fully-qualified dataset resource name or dataset ID. Example: “projects/123/locations/us-central1/datasets/456” or “456” when project and location are initialized or passed.
project (str) – Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used.
location (str) – Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Custom credentials to use to retrieve this Dataset. Overrides credentials set in aiplatform.init.
classmethod create(display_name: str, gcs_source: Optional[Union[str, Sequence[str]]] = None, bq_source: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Creates a new tabular dataset.
Parameters
display_name (str) – Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
gcs_source (Union[str, **Sequence[str]]) – Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
bq_source (str) – BigQuery URI to the input table. .. rubric:: Example
”bq://project.dataset.table_name”
project (str) – Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the managed tabular dataset resource.
Return type
tabular_dataset (TabularDataset)
import_data()
Upload data to existing managed dataset.
Parameters
gcs_source (Union[str, **Sequence[str]]) – Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
import_schema_uri (str) – Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
data_item_labels (Dict) – Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the managed dataset resource.
Return type
dataset (Dataset)
class google.cloud.aiplatform.Tensorboard(tensorboard_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.tensorboard.tensorboard_resource._TensorboardServiceResource
Managed tensorboard resource for Vertex AI.
Retrieves an existing managed tensorboard given a tensorboard name or ID.
Parameters
tensorboard_name (str) – Required. A fully-qualified tensorboard resource name or tensorboard ID. Example: “projects/123/locations/us-central1/tensorboards/456” or “456” when project and location are initialized or passed.
project (str) – Optional. Project to retrieve tensorboard from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve tensorboard from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve this Tensorboard. Overrides credentials set in aiplatform.init.
classmethod create(display_name: str, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), encryption_spec_key_name: Optional[str] = None)
Creates a new tensorboard.
Example Usage:
tb = aiplatform.Tensorboard.create(
display_name=’my display name’, description=’my description’, labels={ > ‘key1’: ‘value1’, > ‘key2’: ‘value2’ }
)
Parameters
display_name (str) – Required. The user-defined name of the Tensorboard. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description (str) – Optional. Description of this Tensorboard.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
project (str) – Optional. Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Optional. Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
encryption_spec_key_name (str) – Optional. Cloud KMS resource identifier of the customer managed encryption key used to protect the tensorboard. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Tensorboard and all sub-resources of this Tensorboard will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
Returns
Instantiated representation of the managed tensorboard resource.
Return type
tensorboard (Tensorboard)
update(display_name: Optional[str] = None, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), encryption_spec_key_name: Optional[str] = None)
Updates an existing tensorboard.
Example Usage:
tb = aiplatform.Tensorboard(tensorboard_name=’123456’) tb.update(
display_name=’update my display name’, description=’update my description’,
)
Parameters
display_name (str) – Optional. User-defined name of the Tensorboard. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description (str) – Optional. Description of this Tensorboard.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
encryption_spec_key_name (str) – Optional. Cloud KMS resource identifier of the customer managed encryption key used to protect the tensorboard. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Tensorboard and all sub-resources of this Tensorboard will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
Returns
The managed tensorboard resource.
Return type
Tensorboard
class google.cloud.aiplatform.TensorboardExperiment(tensorboard_experiment_name: str, tensorboard_id: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.tensorboard.tensorboard_resource._TensorboardServiceResource
Managed tensorboard resource for Vertex AI.
Retrieves an existing tensorboard experiment given a tensorboard experiment name or ID.
Example Usage:
tb_exp = aiplatform.TensorboardExperiment(
tensorboard_experiment_name= “projects/123/locations/us-central1/tensorboards/456/experiments/678”
)
tb_exp = aiplatform.TensorboardExperiment(
tensorboard_experiment_name= “678” tensorboard_id = “456”
)
Parameters
tensorboard_experiment_name (str) – Required. A fully-qualified tensorboard experiment resource name or resource ID. Example: “projects/123/locations/us-central1/tensorboards/456/experiments/678” or “678” when tensorboard_id is passed and project and location are initialized or passed.
tensorboard_id (str) – Optional. A tensorboard resource ID.
project (str) – Optional. Project to retrieve tensorboard from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve tensorboard from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve this Tensorboard. Overrides credentials set in aiplatform.init.
classmethod create(tensorboard_experiment_id: str, tensorboard_name: str, display_name: Optional[str] = None, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Sequence[Tuple[str, str]] = ())
Creates a new TensorboardExperiment.
Example Usage:
tb = aiplatform.TensorboardExperiment.create(
tensorboard_experiment_id=’my-experiment’ tensorboard_id=’456’ display_name=’my display name’, description=’my description’, labels={ > ‘key1’: ‘value1’, > ‘key2’: ‘value2’ }
)
Parameters
tensorboard_experiment_id (str) – Required. The ID to use for the Tensorboard experiment, which will become the final component of the Tensorboard experiment’s resource name.
This value should be 1-128 characters, and valid characters are /[a-z][0-9]-/.
This corresponds to the
tensorboard_experiment_id
field on therequest
instance; ifrequest
is provided, this should not be set.tensorboard_name (str) – Required. The resource name or ID of the Tensorboard to create the TensorboardExperiment in. Format of resource name:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
display_name (str) – Optional. The user-defined name of the Tensorboard Experiment. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description (str) – Optional. Description of this Tensorboard Experiment.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
project (str) – Optional. Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Optional. Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
Returns
The TensorboardExperiment resource.
Return type
TensorboardExperiment
classmethod list(tensorboard_name: str, filter: Optional[str] = None, order_by: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
List TensorboardExperiemnts in a Tensorboard resource.
Example Usage:
aiplatform.TensorboardExperiment.list(
tensorboard_name=’projects/my-project/locations/us-central1/tensorboards/123’
)
Parameters
tensorboard_name (str) – Required. The resource name or resource ID of the Tensorboard to list TensorboardExperiments. Format, if resource name: ‘projects/{project}/locations/{location}/tensorboards/{tensorboard}’
filter (str) – Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending. Supported fields: display_name, create_time, update_time
project (str) – Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.
Returns
List[TensorboardExperiment] - A list of TensorboardExperiments
class google.cloud.aiplatform.TensorboardRun(tensorboard_run_name: str, tensorboard_id: Optional[str] = None, tensorboard_experiment_id: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.tensorboard.tensorboard_resource._TensorboardServiceResource
Managed tensorboard resource for Vertex AI.
Retrieves an existing tensorboard experiment given a tensorboard experiment name or ID.
Example Usage:
tb_exp = aiplatform.TensorboardRun(
tensorboard_run_name= “projects/123/locations/us-central1/tensorboards/456/experiments/678/run/8910”
)
tb_exp = aiplatform.TensorboardExperiment(
tensorboard_experiment_name= “8910”, tensorboard_id = “456”, tensorboard_experiment_id = “678”
)
Parameters
tensorboard_run_name (str) – Required. A fully-qualified tensorboard run resource name or resource ID. Example: “projects/123/locations/us-central1/tensorboards/456/experiments/678/runs/8910” or “8910” when tensorboard_id and tensorboard_experiment_id are passed and project and location are initialized or passed.
tensorboard_id (str) – Optional. A tensorboard resource ID.
tensorboard_experiment_id (str) – Optional. A tensorboard experiment resource ID.
project (str) – Optional. Project to retrieve tensorboard from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve tensorboard from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve this Tensorboard. Overrides credentials set in aiplatform.init.
Raises
ValueError – if only one of tensorboard_id or tensorboard_experiment_id is provided.
classmethod create(tensorboard_run_id: str, tensorboard_experiment_name: str, tensorboard_id: Optional[str] = None, display_name: Optional[str] = None, description: Optional[str] = None, labels: Optional[Dict[str, str]] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Sequence[Tuple[str, str]] = ())
Creates a new tensorboard.
Example Usage:
tb = aiplatform.TensorboardExperiment.create(
tensorboard_experiment_id=’my-experiment’ tensorboard_id=’456’ display_name=’my display name’, description=’my description’, labels={ > ‘key1’: ‘value1’, > ‘key2’: ‘value2’ }
)
Parameters
tensorboard_run_id (str) – Required. The ID to use for the Tensorboard run, which will become the final component of the Tensorboard run’s resource name.
This value should be 1-128 characters, and valid: characters are /[a-z][0-9]-/.
tensorboard_experiment_name (str) – Required. The resource name or ID of the TensorboardExperiment to create the TensorboardRun in. Resource name format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}/experiments/{experiment}
If resource ID is provided then tensorboard_id must be provided.
tensorboard_id (str) – Optional. The resource ID of the Tensorboard to create the TensorboardRun in. Format of resource name.
display_name (str) – Optional. The user-defined name of the Tensorboard Run. This value must be unique among all TensorboardRuns belonging to the same parent TensorboardExperiment.
If not provided tensorboard_run_id will be used.
description (str) – Optional. Description of this Tensorboard Run.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
project (str) – Optional. Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Optional. Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Optional. Strings which should be sent along with the request as metadata.
Returns
The TensorboardExperiment resource.
Return type
TensorboardExperiment
classmethod list(tensorboard_experiment_name: str, tensorboard_id: Optional[str] = None, filter: Optional[str] = None, order_by: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
List all instances of TensorboardRun in TensorboardExperiment.
Example Usage:
aiplatform.TensorboardRun.list(
tensorboard_name=’projects/my-project/locations/us-central1/tensorboards/123/experiments/456’
)
Parameters
tensorboard_experiment_name (str) – Required. The resource name or resource ID of the TensorboardExperiment to list TensorboardRun. Format, if resource name: ‘projects/{project}/locations/{location}/tensorboards/{tensorboard}/experiments/{experiment}’
If resource ID is provided then tensorboard_id must be provided.
tensorboard_id (str) – Optional. The resource ID of the Tensorboard that contains the TensorboardExperiment to list TensorboardRun.
filter (str) – Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
order_by (str) – Optional. A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending. Supported fields: display_name, create_time, update_time
project (str) – Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
location (str) – Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.
Returns
List[TensorboardRun] - A list of TensorboardRun
class google.cloud.aiplatform.TextDataset(dataset_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.datasets.dataset._Dataset
Managed text dataset resource for Vertex AI.
Retrieves an existing managed dataset given a dataset name or ID.
Parameters
dataset_name (str) – Required. A fully-qualified dataset resource name or dataset ID. Example: “projects/123/locations/us-central1/datasets/456” or “456” when project and location are initialized or passed.
project (str) – Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used.
location (str) – Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Custom credentials to use to retrieve this Dataset. Overrides credentials set in aiplatform.init.
classmethod create(display_name: str, gcs_source: Optional[Union[str, Sequence[str]]] = None, import_schema_uri: Optional[str] = None, data_item_labels: Optional[Dict] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Creates a new text dataset and optionally imports data into dataset when source and import_schema_uri are passed.
Example Usage:
ds = aiplatform.TextDataset.create(
> display_name=’my-dataset’,
> gcs_source=’gs://my-bucket/dataset.csv’,
> import_schema_uri=aiplatform.schema.dataset.ioformat.text.multi_label_classification
)
Parameters
display_name (str) – Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
gcs_source (Union[str, **Sequence[str]]) – Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
import_schema_uri (str) – Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
data_item_labels (Dict) – Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.project (str) – Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the managed text dataset resource.
Return type
text_dataset (TextDataset)
class google.cloud.aiplatform.TimeSeriesDataset(dataset_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.datasets.column_names_dataset._ColumnNamesDataset
Managed time series dataset resource for Vertex AI
Retrieves an existing managed dataset given a dataset name or ID.
Parameters
dataset_name (str) – Required. A fully-qualified dataset resource name or dataset ID. Example: “projects/123/locations/us-central1/datasets/456” or “456” when project and location are initialized or passed.
project (str) – Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used.
location (str) – Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Custom credentials to use to retrieve this Dataset. Overrides credentials set in aiplatform.init.
classmethod create(display_name: str, gcs_source: Optional[Union[str, Sequence[str]]] = None, bq_source: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Creates a new time series dataset.
Parameters
display_name (str) – Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
gcs_source (Union[str, **Sequence[str]]) – Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
bq_source (str) – BigQuery URI to the input table. .. rubric:: Example
”bq://project.dataset.table_name”
project (str) – Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the managed time series dataset resource.
Return type
time_series_dataset (TimeSeriesDataset)
import_data()
Upload data to existing managed dataset.
Parameters
gcs_source (Union[str, **Sequence[str]]) – Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
import_schema_uri (str) – Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
data_item_labels (Dict) – Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the managed dataset resource.
Return type
dataset (Dataset)
class google.cloud.aiplatform.VideoDataset(dataset_name: str, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None)
Bases: google.cloud.aiplatform.datasets.dataset._Dataset
Managed video dataset resource for Vertex AI.
Retrieves an existing managed dataset given a dataset name or ID.
Parameters
dataset_name (str) – Required. A fully-qualified dataset resource name or dataset ID. Example: “projects/123/locations/us-central1/datasets/456” or “456” when project and location are initialized or passed.
project (str) – Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used.
location (str) – Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used.
credentials (auth_credentials.Credentials) – Custom credentials to use to retrieve this Dataset. Overrides credentials set in aiplatform.init.
classmethod create(display_name: str, gcs_source: Optional[Union[str, Sequence[str]]] = None, import_schema_uri: Optional[str] = None, data_item_labels: Optional[Dict] = None, project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, request_metadata: Optional[Sequence[Tuple[str, str]]] = (), labels: Optional[Dict[str, str]] = None, encryption_spec_key_name: Optional[str] = None, sync: bool = True)
Creates a new video dataset and optionally imports data into dataset when source and import_schema_uri are passed.
Parameters
display_name (str) – Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
gcs_source (Union[str, **Sequence[str]]) – Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
import_schema_uri (str) – Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
data_item_labels (Dict) – Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.project (str) – Project to upload this model to. Overrides project set in aiplatform.init.
location (str) – Location to upload this model to. Overrides location set in aiplatform.init.
credentials (auth_credentials.Credentials) – Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
request_metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.
labels (Dict[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*) – Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
sync (bool) – Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
Returns
Instantiated representation of the managed video dataset resource.
Return type
video_dataset (VideoDataset)
google.cloud.aiplatform.get_experiment_df(experiment: Optional[str] = None)
Returns a Pandas DataFrame of the parameters and metrics associated with one experiment.
Example:
aiplatform.init(experiment=’exp-1’) aiplatform.start_run(run=’run-1’) aiplatform.log_params({‘learning_rate’: 0.1}) aiplatform.log_metrics({‘accuracy’: 0.9})
aiplatform.start_run(run=’run-2’) aiplatform.log_params({‘learning_rate’: 0.2}) aiplatform.log_metrics({‘accuracy’: 0.95})
experiment_name | run_name | param.learning_rate | metric.accuracy |exp-1 | run-1 | 0.1 | 0.9 |exp-1 | run-2 | 0.2 | 0.95 |
Parameters
experiment (str) –
of the Experiment to filter results. If not set (Name) –
results of current active experiment. (return) –
Returns
Pandas Dataframe of Experiment with metrics and parameters.
Raises
NotFound exception if experiment does not exist. –
ValueError if given experiment is not associated with a wrong schema. –
google.cloud.aiplatform.get_pipeline_df(pipeline: str)
Returns a Pandas DataFrame of the parameters and metrics associated with one pipeline.
Parameters
pipeline – Name of the Pipeline to filter results.
Returns
Pandas Dataframe of Pipeline with metrics and parameters.
Raises
NotFound exception if experiment does not exist. –
ValueError if given experiment is not associated with a wrong schema. –
google.cloud.aiplatform.init(*, project: Optional[str] = None, location: Optional[str] = None, experiment: Optional[str] = None, experiment_description: Optional[str] = None, staging_bucket: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, encryption_spec_key_name: Optional[str] = None)
Updates common initialization parameters with provided options.
Parameters
project (str) – The default project to use when making API calls.
location (str) – The default location to use when making API calls. If not set defaults to us-central-1.
experiment (str) – The experiment name.
experiment_description (str) – The description of the experiment.
staging_bucket (str) – The default staging bucket to use to stage artifacts when making API calls. In the form gs://…
credentials (google.auth.credentials.Credentials) – The default custom credentials to use when making API calls. If not provided credentials will be ascertained from the environment.
encryption_spec_key_name (Optional[str]) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect a resource. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this resource and all sub-resources will be secured by this key.
google.cloud.aiplatform.log_metrics(metrics: Dict[str, Union[float, int]])
Log single or multiple Metrics with specified key and value pairs.
Parameters
metrics (Dict) – Required. Metrics key/value pairs. Only flot and int are supported format for value.
Raises
TypeError – If value contains unsupported types.
ValueError – If Experiment or Run is not set.
google.cloud.aiplatform.log_params(params: Dict[str, Union[float, int, str]])
Log single or multiple parameters with specified key and value pairs.
Parameters
params (Dict) – Required. Parameter key/value pairs.
google.cloud.aiplatform.start_run(run: str)
Setup a run to current session.
Parameters
run (str) – Required. Name of the run to assign current session with.
Raises
ValueError if experiment is not set. Or if run execution** or ***metrics artifact* –
is already created but with a different schema. –