Anomaly detection overview
Anomaly detection is a data mining technique that you can use to identify data deviations in a given dataset. For example, if the return rate for a given product increases substantially from the baseline for that product, that might indicate a product defect or potential fraud. You can use anomaly detection to detect critical incidents, such as technical issues, or opportunities, such as changes in consumer behavior.
One challenge when you use anomaly detection is determining what counts as
anomalous data. If you have labeled data that identifies anomalies, you can
perform anomaly detection by using the
ML.PREDICT
function
with one of the following supervised machine learning models:
- Linear and logistic regression models
- Boosted trees models
- Random forest models
- Deep neural network (DNN) models
- Wide & Deep models
- AutoML models
If you aren't certain what counts as anomalous data, or you don't have labeled
data to train a model on, you can use unsupervised machine learning to perform
anomaly detection. Use the
ML.DETECT_ANOMALIES
functionwith one of the following models to detect anomalies in training data or new
serving data:
Data type | Model types | What ML.DETECT_ANOMALIES does |
---|---|---|
Time series | ARIMA_PLUS
|
Detect the anomalies in the time series. |
ARIMA_PLUS_XREG
|
Detect the anomalies in the time series with external regressors. | |
Independent and identically distributed random variables (IID) | K-means | Detect anomalies based on the shortest distance among the normalized
distances from the input data to each cluster centroid. For a definition of
normalized distances, see the k-means model output for the ML.DETECT_ANOMALIES function.. |
Autoencoder | Detect anomalies based on the reconstruction loss in terms of mean
squared error. For more information, see ML.RECONSTRUCTION_LOSS . The ML.RECONSTRUCTION_LOSS function can
retrieve all types of reconstruction loss. |
|
PCA | Detect anomalies based upon the reconstruction loss in terms of mean squared error. |