Forseti intelligent agents: an open-source anomaly detection module
Google Cloud Machine Learning team
Among security professionals, one way to identify a breach or spurious entity is to detect anomalies and abnormalities in customer’ usage trend. At Google, we use Forseti, a community-driven collection of open-source tools to improve the security of Google Cloud Platform (GCP) environments. Recently, we launched the “Forseti Intelligent Agents” initiative to identify anomalies, enable systems to take advantage of common user usage patterns, and identify other outlier data points. In this way, we hope to help security specialists for whom it’s otherwise cumbersome and time-consuming to manually flag these data points.
Anomaly detection is a classic and common solution implemented across multiple business domains. We tested several machine-learning (ML) techniques for use in anomaly detection, analyzing existing data that had been used to create firewall rules and identify outliers. The approach, the results of which you can find in this whitepaper, was experimental and based on static analysis.
At a high level, our goal is to use Forseti inventory data to achieve the following:
- Detect unusual instances between snapshots.
- Alert users of unusual firewall rules, provide comparisons with what expected behaviors.
- Provide potential remediation steps.
Below is our solution. Note that it uses static data for now, but we can transform it to use dynamic data, if needed.
The Forseti intelligent agents workflow
To build this solution, we took a multi-phase approach that imported firewall data into a BigQuery table, prepared and manipulated the data, then generated and evaluated a model. At the same time, we engaged in “feature-level decision stumps” (i.e., decision trees built after considering one feature as the label and all the rest as regular features) and performed bucketing and sample detection. Figure 1 is a high level depiction of our initial workflow. For pre-processing we experimented with approaches such as penalizing the subnet with a wider range. We also looked at Supernets, an example of which is depicted below.
Some of these flattened firewall rules that we used to train the model can be depicted as follows:
Then, for unsupervised learning, we experimented with techniques including k-means clustering, decision stumps, and visualization in low-dimensional space.
Feature weights for both principal components:
Based on these results, we looked at a normal organization with thousands of firewall rules, and examining the points and clusters to the right, found some of the following anomalies (marked in RED below):
*Model output has been anonymized for privacy and security.
We conducted these experiments with firewall rules to prototype different approaches. You can read these approaches in detail in the whitepaper.
A next step to follow up on this framework would be to use semi-supervised learning. Using some of the data points that our models can confidently flag as anomalous would also help in generating annotated data for such detailed analysis. Since we only used firewall rules in this initial study, as a next step, we plan to use other features such as hierarchical location of the firewall rules and network-related metadata.
If you’re interested in contributing to the Forseti intelligent agents initiative, you can play around with any sample inventory data (or even your own), helping us generate broader anomaly detection mechanisms. By enlisting the community’s help with intelligent agents, we hope to continue to expand the Forseti toolset to help ensure the security of your cloud environment.
For more details about this initiative, check out the solution here.
Joe Cheuk, Cloud Application Engineer; Praneet Dutta, Cloud Machine Learning Engineer; and Nitin Aggarwal, Technical Program Manager, Cloud Machine Learning contributed to this report.