Supervised learning is a category of machine learning that uses labeled datasets to train algorithms to predict outcomes and recognize patterns. Unlike unsupervised learning, supervised learning algorithms are given labeled training to learn the relationship between the input and the outputs.
Supervised machine learning algorithms make it easier for organizations to create complex models that can make accurate predictions. As a result, they are widely used across various industries and fields, including healthcare, marketing, financial services, and more.
Here, we’ll cover the fundamentals of supervised learning in AI, how supervised learning algorithms work, and some of its most common use cases.
New customers get up to $300 in free credits to try Vertex AI and other Google Cloud products.
The data used in supervised learning is labeled — meaning that it contains examples of both inputs (called features) and correct outputs (labels). The algorithms analyze a large dataset of these training pairs to infer what a desired output value would be when asked to make a prediction on new data.
For instance, let’s pretend you want to teach a model to identify pictures of trees. You provide a labeled dataset that contains many different examples of types of trees and the names of each species. You let the algorithm try to define what set of characteristics belongs to each tree based on the labeled outputs. You can then test the model by showing it a tree picture and asking it to guess what species it is. If the model provides an incorrect answer, you can continue training it and adjusting its parameters with more examples to improve its accuracy and minimize errors.
Once the model has been trained and tested, you can use it to make predictions on unknown data based on the previous knowledge it has learned.
Supervised learning in machine learning is generally divided into two categories: classification and regression.
Classification algorithms are used to group data by predicting a categorical label or output variable based on the input data. Classification is used when output variables are categorical, meaning there are two or more classes.
One of the most common examples of classification algorithms in use is the spam filter in your email inbox. Here, a supervised learning model is trained to predict whether an email is spam or not with a dataset that contains labeled examples of both spam and legitimate emails. The algorithm extracts information about each email, including the sender, the subject line, body copy, and more. It then uses these features and corresponding output labels to learn patterns and assign a score that indicates whether an email is real or spam.
Regression algorithms are used to predict a real or continuous value, where the algorithm detects a relationship between two or more variables.
A common example of a regression task might be predicting a salary based on work experience. For instance, a supervised learning algorithm would be fed inputs related to work experience (e.g., length of time, the industry or field, location, etc.) and the corresponding assigned salary amount. After the model is trained, it could be used to predict the average salary based on work experience.
Supervised learning models can be used for a number of different business use cases that hep address a wide range of problems. Common supervised learning examples include the following:
When it comes to understanding the difference between supervised learning vs. unsupervised learning, the primary difference is the type of input data used to train the model. Supervised learning uses labeled training datasets to try and teach a model a specific, pre-defined goal.
By comparison, unsupervised learning uses unlabeled data and operates autonomously to try and learn the structure of the data without being given any explicit instructions.
Start building on Google Cloud with $300 in free credits and 20+ always free products.