What is Supervised Learning?

Supervised learning is a category of machine learning that uses labeled datasets to train algorithms to predict outcomes and recognize patterns. Unlike unsupervised learning, supervised learning algorithms are given labeled training to learn the relationship between the input and the outputs.

Supervised machine learning algorithms make it easier for organizations to create complex models that can make accurate predictions. As a result, they are widely used across various industries and fields, including healthcare, marketing, financial services, and more.

Here, we’ll cover the fundamentals of supervised learning in AI, how supervised learning algorithms work, and some of its most common use cases.

New customers get up to $300 in free credits to try Vertex AI and other Google Cloud products.

Get started for free Learn about Vertex AI

How does supervised learning work?

The data used in supervised learning is labeled — meaning that it contains examples of both inputs (called features) and correct outputs (labels). The algorithms analyze a large dataset of these training pairs to infer what a desired output value would be when asked to make a prediction on new data.

For instance, let’s pretend you want to teach a model to identify pictures of trees. You provide a labeled dataset that contains many different examples of types of trees and the names of each species. You let the algorithm try to define what set of characteristics belongs to each tree based on the labeled outputs. You can then test the model by showing it a tree picture and asking it to guess what species it is. If the model provides an incorrect answer, you can continue training it and adjusting its parameters with more examples to improve its accuracy and minimize errors.

Once the model has been trained and tested, you can use it to make predictions on unknown data based on the previous knowledge it has learned.

Types of supervised learning

Supervised learning in machine learning is generally divided into two categories: classification and regression.

Classification

Classification algorithms are used to group data by predicting a categorical label or output variable based on the input data. Classification is used when output variables are categorical, meaning there are two or more classes.

One of the most common examples of classification algorithms in use is the spam filter in your email inbox. Here, a supervised learning model is trained to predict whether an email is spam or not with a dataset that contains labeled examples of both spam and legitimate emails. The algorithm extracts information about each email, including the sender, the subject line, body copy, and more. It then uses these features and corresponding output labels to learn patterns and assign a score that indicates whether an email is real or spam.

Regression

Regression algorithms are used to predict a real or continuous value, where the algorithm detects a relationship between two or more variables.

A common example of a regression task might be predicting a salary based on work experience. For instance, a supervised learning algorithm would be fed inputs related to work experience (e.g., length of time, the industry or field, location, etc.) and the corresponding assigned salary amount. After the model is trained, it could be used to predict the average salary based on work experience.

Real world supervised learning examples

Supervised learning models can be used for a number of different business use cases that hep address a wide range of problems. Common supervised learning examples include the following:

Risk assessment: Supervised machine learning models can help banks and other financial services companies determine whether customers are likely to default loans, helping to minimize risk in their portfolios.
Image classification: Supervised machine learning algorithms are often trained to classify objects in images and videos. For example, an algorithm might be used to recognize a person in an image and automatically tag them on a social media platform.
Fraud detection: Supervised learning underpin many fraud detection systems, enabling enterprises to recognize fraudulent activity. These models are trained on datasets that contain both fraudulent and non-fraudulent activity so they can be used to flag suspicious activity in real time.
Recommendation systems: Supervised learning algorithms are used by online platforms and streaming services to power recommendations based on previous customer behavior or shopping history. The models extract important information about a user's behavior and suggest similar products and content.

Supervised learning vs. unsupervised learning

When it comes to understanding the difference between supervised learning vs. unsupervised learning, the primary difference is the type of input data used to train the model. Supervised learning uses labeled training datasets to try and teach a model a specific, pre-defined goal.

By comparison, unsupervised learning uses unlabeled data and operates autonomously to try and learn the structure of the data without being given any explicit instructions.