Anomaly detection overview

Anomaly detection is a data mining technique that you can use to identify data deviations in a given dataset. For example, if the return rate for a given product increases substantially from the baseline for that product, that might indicate a product defect or potential fraud. You can use anomaly detection to detect critical incidents, such as technical issues, or opportunities, such as changes in consumer behavior.

One challenge when you use anomaly detection is determining what counts as anomalous data. If you have labeled data that identifies anomalies, you can perform anomaly detection with one of the following supervised machine learning models:

  • Linear regression and logistic regression models
  • Boosted trees models
  • Random forest models
  • DNNs and Wide & Deep models
  • AutoML models

If you aren't certain what counts as anomalous data, or you don't have labeled data to train a model on, you can use unsupervised machine learning to perform anomaly detection. Use the ML.DETECT_ANOMALIES function with one of the following models to detect anomalies in training data or new serving data:

Data type Model types What ML.DETECT_ANOMALIES does
Time series ARIMA_PLUS Detect the anomalies in the time series.
ARIMA_PLUS_XREG Detect the anomalies in the time series with external regressors.
Independent and identically distributed random variables (IID) K-means Detect anomalies based on the shortest distance among the normalized distances from the input data to each cluster centroid. For a definition of normalized distances, see The k-means model output for the ML.DETECT_ANOMALIES function..
Autoencoder Detect anomalies based on the reconstruction loss in terms of mean squared error. For more information, see ML.RECONSTRUCTION_LOSS. The ML.RECONSTRUCTION_LOSS function can retrieve all types of reconstruction loss.
PCA Detect anomalies based upon the reconstruction loss in terms of mean squared error.