- 1.71.0 (latest)
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
AutoMLTabularTrainingJob(
display_name: str,
optimization_prediction_type: str,
optimization_objective: Optional[str] = None,
column_specs: Optional[Dict[str, str]] = None,
column_transformations: Optional[Union[Dict, List[Dict]]] = None,
optimization_objective_recall_value: Optional[float] = None,
optimization_objective_precision_value: Optional[float] = None,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
training_encryption_spec_key_name: Optional[str] = None,
model_encryption_spec_key_name: Optional[str] = None,
)
Constructs a AutoML Tabular Training Job.
Example usage:
job = training_jobs.AutoMLTabularTrainingJob( display_name="my_display_name", optimization_prediction_type="classification", optimization_objective="minimize-log-loss", column_specs={"column_1": "auto", "column_2": "numeric"}, )
Parameters
Name | Description |
display_name |
str
Required. The user-defined name of this TrainingPipeline. |
optimization_prediction_type |
str
The type of prediction the Model is to produce. "classification" - Predict one out of multiple target values is picked for each row. "regression" - Predict a value based on its relation to other values. This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings. |
optimization_objective |
str
Optional. Objective function the Model is to be optimized towards. The training task creates a Model that maximizes/minimizes the value of the objective function over the validation set. The supported optimization objectives depend on the prediction type, and in the case of classification also the number of distinct values in the target column (two distint values -> binary, 3 or more distinct values -> multi class). If the field is not set, the default objective function is used. Classification (binary): "maximize-au-roc" (default) - Maximize the area under the receiver operating characteristic (ROC) curve. "minimize-log-loss" - Minimize log loss. "maximize-au-prc" - Maximize the area under the precision-recall curve. "maximize-precision-at-recall" - Maximize precision for a specified recall value. "maximize-recall-at-precision" - Maximize recall for a specified precision value. Classification (multi class): "minimize-log-loss" (default) - Minimize log loss. Regression: "minimize-rmse" (default) - Minimize root-mean-squared error (RMSE). "minimize-mae" - Minimize mean-absolute error (MAE). "minimize-rmsle" - Minimize root-mean-squared log error (RMSLE). |
column_specs |
Dict[str, str]
Optional. Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using "." as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. |
column_transformations |
Union[Dict, List[Dict]]
Optional. Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using "." as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. |
optimization_objective_recall_value |
float
Optional. Required when maximize-precision-at-recall optimizationObjective was picked, represents the recall value at which the optimization is done. The minimum value is 0 and the maximum is 1.0. |
optimization_objective_precision_value |
float
Optional. Required when maximize-recall-at-precision optimizationObjective was picked, represents the precision value at which the optimization is done. The minimum value is 0 and the maximum is 1.0. |
project |
str
Optional. Project to run training in. Overrides project set in aiplatform.init. |
location |
str
Optional. Location to run training in. Overrides location set in aiplatform.init. |
credentials |
auth_credentials.Credentials
Optional. Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init. |
training_encryption_spec_key_name |
Optional[str]
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form: |
model_encryption_spec_key_name |
Optional[str]
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: |
Inheritance
builtins.object > google.cloud.aiplatform.base.VertexAiResourceNoun > builtins.object > google.cloud.aiplatform.base.FutureManager > google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager > google.cloud.aiplatform.training_jobs._TrainingJob > AutoMLTabularTrainingJobMethods
get_auto_column_specs
get_auto_column_specs(
dataset: google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset,
target_column: str,
)
Returns a dict with all non-target columns as keys and 'auto' as values.
Example usage:
column_specs = training_jobs.AutoMLTabularTrainingJob.get_auto_column_specs( dataset=my_dataset, target_column="my_target_column", )
Name | Description |
dataset |
datasets.TabularDataset
Required. Intended dataset. |
target_column |
str
Required. Intended target column. |
run
run(
dataset: google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset,
target_column: str,
training_fraction_split: float = 0.8,
validation_fraction_split: float = 0.1,
test_fraction_split: float = 0.1,
predefined_split_column_name: Optional[str] = None,
weight_column: Optional[str] = None,
budget_milli_node_hours: int = 1000,
model_display_name: Optional[str] = None,
disable_early_stopping: bool = False,
sync: bool = True,
)
Runs the training job and returns a model.
Data fraction splits:
Any of training_fraction_split
, validation_fraction_split
and
test_fraction_split
may optionally be provided, they must sum to up to 1. If
the provided ones sum to less than 1, the remainder is assigned to sets as
decided by Vertex AI. If none of the fractions are set, by default roughly 80%
of data will be used for training, 10% for validation, and 10% for test.
Name | Description |
dataset |
datasets.TabularDataset
Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from. |
target_column |
str
Required. The name of the column values of which the Model is to predict. |
training_fraction_split |
float
Required. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided. |
validation_fraction_split |
float
Required. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided. |
test_fraction_split |
float
Required. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided. |
predefined_split_column_name |
str
Optional. The key is a name of one of the Dataset's data columns. The value of the key (either the label's value or value in the column) must be one of { |
weight_column |
str
Optional. Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1. |
budget_milli_node_hours |
int
Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend's discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won't be attempted and will error. The minimum value is 1000 and the maximum is 72000. |
model_display_name |
str
Optional. If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job's display_name is used. |
disable_early_stopping |
bool
Required. If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model. |
sync |
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
Type | Description |
RuntimeError | If Training job has already been run or is waiting to run. |
Type | Description |
model | The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model. |