Introduction to built-in algorithms

This page provides an overview of training with built-in algorithms. With built-in algorithms on AI Platform, you can run training jobs on your data without writing any code for a training application. You can submit your training data, select an algorithm, and then allow AI Platform to handle the preprocessing and training for you. After that, it's easy to deploy your model and get predictions on AI Platform.

Built-in image algorithms allow you to train on TPUs with minimal configuration. The resulting TensorFlow SavedModel is compatible for serving on CPUs and GPUs.

Types of built-in algorithms

Built-in algorithms that accept tabular data (numerical and categorical data) are the following tabular built-in algorithms:

Built-in algorithms that accept image data are the following image built-in algorithms:

See a comparison of the built-in algorithms.

How training with built-in algorithms works

AI Platform runs your training job on computing resources in the cloud. Here is the overall process:

  1. Compare the available built-in algorithms to determine whether they fit your specific dataset and use case.
  2. Create a Cloud Storage bucket where AI Platform can store your training output, if you do not already have one.
  3. Store your training and validation data in the Cloud Storage bucket.

How training with tabular built-in algorithms works

Here is the overall process for using tabular built-in algorithms:

  1. Format your input data for training with the built-in algorithm. If applicable, follow any additional formatting requirements specific to the built-in algorithm you're using.
  2. Specify the location(s) of your training and validation data within your Cloud Storage bucket.
  3. Select options to configure the built-in algorithm you have selected.
    • You can enable AI Platform to perform automatic preprocessing on your dataset.
    • You can also specify arguments such as the learning rate, training steps, and batch size.
  4. Optionally, you can make additional selections to configure hyperparameter tuning for your job.
    • You can select a goal metric, such as maximizing your model's predictive accuracy or minimizing the training loss.
    • Additionally, you can tune specific hyperparameters and set ranges for their values.
  5. Make selections to configure the overall training job. Select a job name, the machine(s) to use, and the region where the job should run.
  6. Submit the training job, and view logs to monitor its progress and status.
  7. When your training job has completed successfully, you can deploy your trained model on AI Platform to set up a prediction server and get predictions on new data.

How training with image built-in algorithms works

Here is the overall process for using image built-in algorithms:

  1. Format your input data as TFRecords.
  2. Specify the location(s) of your training and validation data within your Cloud Storage bucket.
  3. Select options to configure the built-in algorithm you have selected. Available parameters include batch size, learning rate, and optimizers.
  4. For a hyperparameter tuning job, configure the hyperparameter tuning settings for your job.
    • There is one goal metric available for maximizing your model's predictive accuracy.
    • Additionally, you can tune specific hyperparameters and set ranges for their values.
  5. Make selections to configure the overall training job. Select a job name, the machine(s) to use, and the region where the job should run.
  6. Submit the training job, and view logs to monitor its progress and status.
  7. When your training job has completed successfully, you can deploy your trained model on AI Platform to set up a prediction server and get predictions on new data.

Limitations

Please note the following limitations for training with built-in algorithms:

  • Distributed training is not supported for TensorFlow built-in algorithms. To run a TensorFlow distributed training job on AI Platform, you must create a training application.
  • Training jobs submitted through the Google Cloud Console use only legacy machine types. You can use Compute Engine machine types with training jobs submitted through gcloud or the Google APIs Client Library for Python. Learn more about machine types for training.
  • GPUs are supported for some algorithms. Refer to the detailed comparison of all the built-in algorithms for more information.
  • Multi-GPU machines do not yield greater speed with built-in algorithm training. If you're using GPUs, select machines with a single GPU.
  • TPUs are not supported for tabular built-in algorithms. To use TPUs with tabular data, you must create a training application. Learn how to run a training job with TPUs.

Any further limitations for specific built-in algorithms are noted in the corresponding guides for each algorithm.

Hyperparameter tuning

Hyperparameter tuning is supported for training with built-in algorithms. For tabular built-in algorithms, you can specify a goal metric, along with whether to minimize or maximize it. You can maximize your model accuracy for classification, or minimize your training loss.

For each image built-in algorithm, there is only one goal metric available:

  • For image classification, you can maximize your model accuracy.
  • For object detection, you can maximize your average precision.

List the hyperparameters you want to adjust, along with a range of possible values for each hyperparameter.

When you submit your training job with hyperparameter tuning, AI Platform runs multiple trials, tracking and adjusting the hyperparameters after each trial. When the hyperparameter tuning job is complete, AI Platform reports values for the most effective configuration of your hyperparameters, as well as a summary for each trial.

Learn more about hyperparameter tuning on AI Platform.

Overview of the algorithms

Built-in algorithms help you train models for a variety of use cases that are commonly solved with classification and regression. The following built-in algorithms are available for training on AI Platform:

Data type Algorithm name Underlying ML framework
Tabular Linear learner TensorFlow
Tabular Wide and deep TensorFlow
Tabular XGBoost XGBoost
Image Image classification TensorFlow
Image Object detection TensorFlow

Linear learner

The linear learner built-in algorithm is used for logistic regression, binary classification, and multiclass classification. AI Platform uses an implementation based on a TensorFlow Estimator.

A linear learner model assigns one weight to each input feature and sums the weights to predict a numerical target value. For logistic regression, this value is converted into a value between 0 and 1. This simple type of model is easy to interpret, because you can compare the feature weights to determine which input features have significant impacts on your predictions.

Learn more about how large-scale linear models work.

Wide and deep

The wide and deep built-in algorithm is used for large-scale classification and regression problems, such as recommender systems, search, and ranking problems. AI Platform uses an implementation based on a TensorFlow Estimator.

This type of model combines a linear model that learns and "memorizes" a wide range of rules with a deep neural network that "generalizes" the rules and applies them correctly to similar features in new, unseen data.

Learn more about wide and deep learning.

XGBoost

XGBoost (eXtreme Gradient Boosting) is a framework that implements a gradient boosting algorithm. XGBoost enables efficient supervised learning for classification, regression, and ranking tasks. XGBoost training is based on decision tree ensembles, which combine the results of multiple classification and regression models.

Learn more about how XGBoost works.

Image classification

The image classification built-in algorithm is used to classify images. AI Platform uses an implementation based on a TensorFlow Estimator.

Object detection

The image object detection built-in algorithm is used to detect objects within images. AI Platform uses an implementation based on a TensorFlow Estimator.

Comparing built-in algorithms

The following table provides a quick comparison of the built-in algorithms:

Algorithm name ML model used Type of problem Example use case(s) Supported accelerators
Linear learner TensorFlow Estimator
LinearClassifier and LinearRegressor.
Classification, regression Sales forecasting GPU
Wide and deep TensorFlow Estimator
DNNLinearCombinedClassifier, DNNLinearCombinedEstimator, and DNNLinearCombinedRegressor.
Classification, regression, ranking Recommendation systems, search GPU
XGBoost (single-replica) XGBoost Classification, regression Advertising click-through rate (CTR) prediction None (CPU only)
Distributed XGBoost XGBoost Classification, regression Advertising click-through rate (CTR) prediction GPU
Image classification TensorFlow Classification Classifying images GPUs and TPUs
Object detection TensorFlow Classification Recognizing objects within images GPUs and TPUs

Algorithm containers

When you submit your training job to AI Platform, you select the algorithm by specifying the URI to its corresponding Docker container hosted in Container Registry. Built-in algorithms are available through the following containers:

Algorithm Container Registry URI
Linear learner gcr.io/cloud-ml-algos/linear_learner_cpu:latest
gcr.io/cloud-ml-algos/linear_learner_gpu:latest
Wide and deep gcr.io/cloud-ml-algos/wide_deep_learner_cpu:latest
gcr.io/cloud-ml-algos/wide_deep_learner_gpu:latest
XGBoost (single-replica) gcr.io/cloud-ml-algos/boosted_trees:latest (CPU only)
Distributed XGBoost gcr.io/cloud-ml-algos/xgboost_dist:latest (works for CPU or GPU training)
Image classification gcr.io/cloud-ml-algos/image_classification:latest
Object detection gcr.io/cloud-ml-algos/image_object_detection:latest

What's next

Var denne siden nyttig? Si fra hva du synes: