Anomaly detection overview

Anomaly detection is a data mining technique that you can use to identify data deviations in a given dataset. For example, if the return rate for a given product increases substantially from the baseline for that product, that might indicate a product defect or potential fraud. You can use anomaly detection to detect critical incidents, such as technical issues, or opportunities, such as changes in consumer behavior.

One challenge when you use anomaly detection is determining what counts as anomalous data. If you have labeled data that identifies anomalies, you can perform anomaly detection by using the ML.PREDICT function with one of the following supervised machine learning models:

If you aren't certain what counts as anomalous data, or you don't have labeled data to train a model on, you can use unsupervised machine learning to perform anomaly detection. Use the ML.DETECT_ANOMALIES function with one of the following models to detect anomalies in training data or new serving data:

Data type	Model types	What `ML.DETECT_ANOMALIES` does
Time series	`ARIMA_PLUS`	Detect the anomalies in the time series.
Time series	`ARIMA_PLUS_XREG`	Detect the anomalies in the time series with external regressors.
Independent and identically distributed random variables (IID)	K-means	Detect anomalies based on the shortest distance among the normalized distances from the input data to each cluster centroid. For a definition of normalized distances, see the k-means model output for the `ML.DETECT_ANOMALIES` function..
	Autoencoder	Detect anomalies based on the reconstruction loss in terms of mean squared error. For more information, see `ML.RECONSTRUCTION_LOSS`. The `ML.RECONSTRUCTION_LOSS` function can retrieve all types of reconstruction loss.
	PCA	Detect anomalies based upon the reconstruction loss in terms of mean squared error.

Recommended knowledge

By using the default settings in the CREATE MODEL statements and the inference functions, you can create and use an anomaly detection model even without much ML knowledge. However, having basic knowledge about ML development helps you optimize both your data and your model to deliver better results. We recommend using the following resources to develop familiarity with ML techniques and processes: