Reference for built-in linear learner algorithm

This page provides detailed reference information about arguments you submit to AI Platform Training when running a training job using the built-in linear learner algorithm.

Versioning

The built-in linear learner algorithm uses TensorFlow 1.14.

Data format arguments

The following arguments are used for data formatting and automatic preprocessing:

Arguments Details
preprocess Specify this to enable automatic preprocessing.
Types of automatic preprocessing:
  • Splits the data into validation_split and test_split percentages.
  • Fills up missing values (with mean for numerical columns).
  • Removes rows that have more than 10% column values missing.

Default: Unset
Type: Boolean flag. If set to true, enables automatic preprocessing.
training_data_path Cloud Storage path to a CSV file. The CSV file must have the following specifications:
  • CSV file must not contain a header
  • Only contain categorical or numerical columns
  • First column must be the target column
  • Blank values will be treated as missing

Required
Type: String
validation_data_path Cloud Storage path to a CSV file. The CSV file must have the same format as training_data_path.

Optional
Type: String
test_data_path Cloud Storage path to a CSV file. The CSV file must have the same format as training_data_path and validation_data_path.

Optional
Type: String
job-dir Cloud Storage path where model, checkpoints and other training artifacts reside. The following directories are created here:
  • model: This contains the trained model
  • processed_data: This contains three data files for training, testing, and validation if automatic preprocessing was enabled.
  • artifacts: This contains training preprocessing-related artifacts that help you to do client side preprocessing.
  • experiment: Checkpoints and summary related to TensorFlow model training.

Required
Type: String
PREPROCESSING PARAMETERS
(when preprocess is set.)
validation_split Fraction of training data that should be used as validation data.

Default: 0.20
Type: Float
Note: validation_split + test_split <=0.40
Only specify this if you are not specifying validation_data_path.
test_split Fraction of training data that should be used as test data.

Default: 0.20
Type: Float
Note: validation_split + test_split <=0.40
Only specify this if you are not specifying test_data_path.

Hyperparameters

The built-in linear learner algorithm has the following hyperparameters:

Hyperparameter Details
BASIC PARAMETERS
model_type The learning task.

Required
Type: String
Options: one of {classification, regression}
max_steps Number of steps (batches) to run the trainer for.

Default: To 10 epochs where one epoch means one pass over the whole training dataset.
Type: Integer
Options: [1, ∞)
learning_rate A scalar used to determine gradient step in gradient descent training.

Default: 0.001
Type: Float
Options: (0, ∞)
eval_steps Number of steps (batches) to run evaluation for.
If not specified, it means run evaluation on the whole validation dataset.

Type: Integer
Options: [1, ∞)
batch_size The number of data rows to process in each training step.

Default: 100
Type: Integer
Options: [1, ∞)
eval_frequency_secs Frequency in seconds at which evaluation and checkpointing will take place.

Default: 100
Type: Integer
Options: [1, ∞)
optimizer_type An optimizer is a specific implementation of the gradient descent algorithm.

Default: 'adam'
Type: String
Options: one of {ftrl, adam, sgd}
FTRL optimizer parameters
(arguments for optimizer_type='ftrl')
l1_regularization_strength A type of regularization that helps remove irrelevant or barely relevant features from the model.

Default: 0
Type: Float
Options: [0, ∞)
l2_regularization_strength A type of regularization that improves generalization in linear model.

Default: 0
Type: Float
Options: [0, ∞)
l2_shrinkage_regularization_strength L2 shrinkage regularization strength

Default: 0
Type: Float
Options: [0, ∞)
Adam optimizer parameters
(arguments for optimizer_type='adam')
beta_1 The exponential decay rate for the first-moment estimates.

Default: 0.99
Type: Float
Options: [0, ∞)
beta_2 The exponential decay rate for the first-moment estimates.
Default: 0.999
Type: Float
Options: [0, ∞)

Hyperparameter tuning

Hyperparameter tuning tests different hyperparameter configurations when training your model. It finds hyperparameter values that are optimal for the goal metric you choose. For each tunable argument, you can specify a range of values to restrict and focus the possibilities AI Platform Training can try.

Learn more about hyperparameter tuning on AI Platform Training.

Goal metrics

The following metrics can be optimized:

Objective Metric Direction Details
loss MINIMIZE Loss of training job
accuracy MAXIMIZE Accuracy of training (only for classification jobs)

Tunable hyperparameters

When training with the built-in linear learner algorithm, you can tune the following hyperparameters. Start by tuning parameters with "high tunable value". These have the greatest impact on your goal metric.

Hyperparameters Type Valid values
PARAMETERS WITH HIGH TUNABLE VALUE
(greatest impact on goal metric)
learning_rate

DOUBLE

[0.0005, 0.05]
max_steps

INTEGER

[1, ∞)
OTHER PARAMETERS
l1_regularization_strength

DOUBLE

[0, ∞)
l2_regularization_strength

DOUBLE

[0, ∞)
l2_shrinkage_regularization_strength

DOUBLE

[0, ∞)
beta_1

DOUBLE

[0, ∞)
beta_2

DOUBLE

[0, ∞)