Introduction to built-in algorithms

This page provides an overview of training with built-in algorithms. With built-in algorithms on AI Platform Training, you can run training jobs on your data without writing any code for a training application. You can submit your training data, select an algorithm, and then allow AI Platform Training to handle the preprocessing and training for you. After that, it's easy to deploy your model and get predictions on AI Platform Training.

How training with built-in algorithms works

AI Platform Training runs your training job on computing resources in the cloud. Here is the overall process:

  1. Compare the available built-in algorithms to determine whether they fit your specific dataset and use case.
  2. Format your input data for training with the built-in algorithm. You must submit your data as a CSV file with its header row removed, and the target column must be set as the first column. If applicable, follow any additional formatting requirements specific to the built-in algorithm you're using.
  3. Create a Cloud Storage bucket where AI Platform Training can store your training output, if you do not already have one.
  4. Select options to customize your training job. First, make selections to configure the overall training job, and then make further selections to configure the algorithm specifically. Optionally, you can make additional selections to configure hyperparameter tuning for your job.
    • For the overall training job, select a job name, the built-in algorithm to use, the machine(s) to use, the region where the job should run, and the Cloud Storage bucket location where you want AI Platform Training to store your training outputs.
    • For the algorithm-specific selections, you can enable AI Platform Training to perform automatic preprocessing on your dataset. You can also specify arguments such as the learning rate, training steps, and batch size.
    • For hyperparameter tuning, you can select a goal metric, such as maximizing your model's predictive accuracy or minimizing the training loss. Additionally, you can tune specific hyperparameters and set ranges for their values.
  5. Submit the training job, and view logs to monitor its progress and status.
  6. When your training job has completed successfully, you can deploy your trained model on AI Platform Training to set up a prediction server and get predictions on new data.

Limitations

Please note the following limitations for training with built-in algorithms:

  • Distributed training is not supported. To run a distributed training job on AI Platform Training, you must create a training application.
  • Training jobs submitted through the Google Cloud console use only legacy machine types. You can use Compute Engine machine types with training jobs submitted through gcloud or the Google API Client Library for Python. Learn more about machine types for training.
  • GPUs are supported for some algorithms. Refer to the detailed comparison of all the built-in algorithms for more information.
  • Multi-GPU machines do not yield greater speed with built-in algorithm training. If you're using GPUs, select machines with a single GPU.
  • TPUs are not supported for tabular built-in algorithm training. You must create a training application. Learn how to run a training job with TPUs.

Any further limitations for specific built-in algorithms are noted in the corresponding guides for each algorithm.

Hyperparameter tuning

Hyperparameter tuning is supported for training with built-in algorithms. First, specify a goal metric, along with whether to minimize or maximize it. You can maximize your model accuracy for classification, or minimize your training loss. Then, list the hyperparameters you want to adjust, along with a target value for each hyperparameter.

When you submit your training job with hyperparameter tuning, AI Platform Training runs multiple trials, tracking and adjusting the hyperparameters after each trial. When the hyperparameter tuning job is complete, AI Platform Training reports values for the most effective configuration of your hyperparameters, as well as a summary for each trial.

Learn more about hyperparameter tuning on AI Platform Training.

Overview of the algorithms

Built-in algorithms help you train models for a variety of use cases that are commonly solved with classification and regression. The following built-in algorithms are available for training on AI Platform Training:

  • Linear learner
  • Wide and deep
  • TabNet
  • XGBoost
  • Image classification
  • Object detection

Linear learner

The linear learner built-in algorithm is used for logistic regression, binary classification, and multiclass classification. AI Platform Training uses an implementation based on a TensorFlow Estimator.

A linear learner model assigns one weight to each input feature and sums the weights to predict a numerical target value. For logistic regression, this value is converted into a value between 0 and 1. This simple type of model is easy to interpret, because you can compare the feature weights to determine which input features have significant impacts on your predictions.

Learn more about how large-scale linear models work.

Wide and deep

The wide and deep built-in algorithm is used for large-scale classification and regression problems, such as recommender systems, search, and ranking problems. AI Platform Training uses an implementation based on a TensorFlow Estimator.

This type of model combines a linear model that learns and "memorizes" a wide range of rules with a deep neural network that "generalizes" the rules and applies them correctly to similar features in new, unseen data.

Learn more about wide and deep learning.

TabNet

The TabNet built-in algorithm is used for classification and regression problems on tabular data. AI Platform Training uses an implementation based on TensorFlow.

The TabNet built-in algorithm also provides feature attributions to help interpret the model's behavior, and explain its predictions.

Learn more about TabNet as a new built-in algorithm.

XGBoost

XGBoost (eXtreme Gradient Boosting) is a framework that implements a gradient boosting algorithm. XGBoost enables efficient supervised learning for classification, regression, and ranking tasks. XGBoost training is based on decision tree ensembles, which combine the results of multiple classification and regression models.

Learn more about how XGBoost works.

Image classification

The image detection built-in algorithm uses the TensorFlow image classification models. You can train an image classification model based on a TensorFlow implementation of EfficientNet or ResNet.

Object detection

The object detection built-in algorithm uses the TensorFlow Object Detection API to build a model that can identify multiple objects within a single image.

Comparing built-in algorithms

The following table provides a quick comparison of the built-in algorithms:

Algorithm name ML model used Type of problem Example use case(s) Supported accelerators for training
Linear learner TensorFlow Estimator
LinearClassifier and LinearRegressor.
Classification, regression Sales forecasting GPU
Wide and deep TensorFlow Estimator
DNNLinearCombinedClassifier, DNNLinearCombinedEstimator, and DNNLinearCombinedRegressor.
Classification, regression, ranking Recommendation systems, search GPU
TabNet TensorFlow Estimator Classification, regression Advertising click-through rate (CTR) prediction, fraud detection GPU
XGBoost XGBoost Classification, regression Advertising click-through rate (CTR) prediction GPU (only supported by the distributed version of the algorithm)
Image classification TensorFlow image classification models Classification Classifying images GPU, TPU
Object detection TensorFlow Object Detection API Object detection Detecting objects within complex image scenes GPU, TPU

Algorithm containers

When you submit your training job to AI Platform Training, you select the algorithm by specifying the URI to its corresponding Docker container hosted in Container Registry. Built-in algorithms are available through the following containers:

Algorithm Container Registry URI
Linear learner gcr.io/cloud-ml-algos/linear_learner_cpu:latest
gcr.io/cloud-ml-algos/linear_learner_gpu:latest
Wide and deep gcr.io/cloud-ml-algos/wide_deep_learner_cpu:latest
gcr.io/cloud-ml-algos/wide_deep_learner_gpu:latest
TabNet gcr.io/cloud-ml-algos/tab_net:latest
XGBoost gcr.io/cloud-ml-algos/boosted_trees:latest
gcr.io/cloud-ml-algos/xgboost_dist:latest
Image classification gcr.io/cloud-ml-algos/image_classification:latest
Object detection gcr.io/cloud-ml-algos/image_object_detection:latest

What's next