Use Sensitive Data Protection data in context-aware analytics

This document demonstrates how to use entity context data from Sensitive Data Protection and additional log sources to add contextual understanding about the impact and scope of a potential threat when performing an investigation.

The use case described in this document detects the execution of a malicious file by a user (MITRE ATT&CK Technique T1204.002) and whether that user also has access to sensitive data elsewhere on the network.

This example requires that the following data has been ingested and normalized in Google Security Operations:

User activity data using network and EDR logs.
Resource relationships from data sources like Google Cloud IAM Analysis.
Sensitive Data Protection logs that contain labels about the type and sensitivity of the stored data.

Google Security Operations must be able to parse the raw data into Unified Data Model (UDM) entity and event records.

For information about ingesting Sensitive Data Protection data into Google Security Operations, see Exporting Sensitive Data Protection data to Google Security Operations.

Google Cloud IAM Analysis data

The Google Cloud IAM Analysis log data in this example identifies users in the organization and captures relationships each user has to other systems on the network. The following is a snippet of an IAM Analysis log stored as a UDM entity record. It stores information about the user, mikeross, who administers a BigQuery table called analytics:claim.patients.

metadata.vendor_name: "Google Cloud Platform"
metadata.product_name: "GCP IAM Analysis"
metadata.entity_type: "USER"
entity.user.userid: "mikeross"
relations[2].entity.resource.name: "analytics:claim.patients"
relations[2].entity.resource.resource_type: "TABLE"
relations[2].entity_type: "RESOURCE"
relations[2].relationship: "ADMINISTERS"

Sensitive Data Protection data

The Sensitive Data Protection log data in this example stores information about a BigQuery table. The following is a snippet of a Sensitive Data Protection log stored as a UDM entity record. It represents the BigQuery table called analytics:claim.patients with the Predicted InfoType label US_SOCIAL_SECURITY_NUMBER, indicating that the table stores United States Social Security numbers.

metadata.vendor_name: "Google Cloud Platform"
metadata.product_name: "GCP DLP CONTEXT"
metadata.entity_type: "RESOURCE"
metadata.description: "RISK_HIGH"
entity.resource.resource_type: "TABLE"
entity.resource.resource_subtype: "BigQuery Table"
entity.resource.attribute.cloud.environment"GOOGLE_CLOUD_PLATFORM"
entity.resource.attribute.labels[0].key: "Sensitivity Score"
entity.resource.attribute.labels[0].value: "SENSITIVITY_HIGH"
entity.resource.attribute.labels[1].key: "Predicted InfoType"
entity.resource.attribute.labels[1].value: "US_SOCIAL_SECURITY_NUMBER"
entity.resource.product_object_id: "analytics:claim.patients"

Web proxy events

The web proxy event in this example captures network activity. The following snippet is of a Zscaler web proxy log stored as a UDM event record. It captures a network download event of an executable file by user with the userid value mikeross where the received_bytes value is 514605.

metadata.log_type = "ZSCALER_WEBPROXY"
metadata.product_name = "NSS"
metadata.vendor_name = "Zscaler"
metadata.event_type = "NETWORK_HTTP"
network.http.response_code = 200
network.received_bytes = 514605
principal.user.userid = "mikeross"
target.url = "http://manygoodnews.com/dow/Client%20Update.exe"

EDR events

The EDR event in this example captures activity on an endpoint device. The following snippet is of a CrowdStrike Falcon EDR log stored as a UDM event record. It captures a network event involving the Microsoft Excel application and a user with the userid value mikeross.

metadata.log_type = "CS_EDR"
metadata.product_name = "Falcon"
metadata.vendor_name = "Crowdstrike"
metadata.event_type = "NETWORK_HTTP"
target.process.file.full_path = "\\Device\\HarddiskVolume1\\Program Files\\C:\\Program Files\\Microsoft Office\\Office16\\EXCEL.exe"
target.url = "http://manygoodnews.com/dow/Client%20Update.exe"
target.user.userid = "mikeross"

Notice that there is common information across these records, both the user identifier mikeross and table name, analytics:claim.patients. The next section in this document demonstrates how these values are used in the rule to join the records.

Detection engine rule in this example

This example rule detects the execution of a malicious file by a user (MITRE ATT&CK Technique T1204.002.

The rule assigns a higher risk score to a detection when the user also has access to sensitive data elsewhere on the network. The rule correlates the following information:

User activity, such as the download or launch of an executable.
The relationship between resources, for example the user's relationship to a BigQuery table.
Presence of sensitive information in the resource a user has access to, for example the type of data stored in the BigQuery table.

Here is a description of each section in the example rule.

The events section specifies the pattern of data that the rule looks for and includes the following:
- Group 1 and Group 2 identify network and EDR events that capture the download of a large amount of data or an executable that is also related to activity in the Excel application.
- Group 3 identifies records where the user identified in the network and EDR events also has permission to a BigQuery table.
- Group 4 identifies Sensitive Data Protection records for the BigQuery table that the user has access to.
Each group of expressions uses either the $table_name variable or the $user variable to join records related to the same user and database table.
In the outcome section, the rule creates a $risk_score variable and sets a value based on the sensitivity of the data in the table. In this case, it checks whether the data is labeled with the US_SOCIAL_SECURITY_NUMBER Sensitive Data Protection infoType.

The outcome section also sets additional variables such as $principalHostname and $entity_resource_name. These variables are returned and stored with the detection, so that when you view it in Google Security Operations you can also display the variable values as columns.
The condition section indicates that the pattern looks for all UDM records specified in the events section.

  rule high_risk_user_download_executable_from_macro {
 meta:
   author = "Google Cloud Security Demos"
   description = "Executable downloaded by Microsoft Excel from High Risk User"
   severity = "High"
   technique = "T1204.002"

 events:
   //Group 1. identify a proxy event with suspected executable download
   $proxy_event.principal.user.userid = $user
   $proxy_event.target.url =  /.*\.exe$/ or
   $proxy_event.network.received_bytes > 102400

   //Group 2. correlate with an EDR event indicating Excel activity
   $edr_event.target.user.userid  = $user
   $edr_event.target.process.file.full_path = /excel/ nocase
   $edr_event.metadata.event_type = "NETWORK_HTTP"

   //Group 3. Use the entity to find the permissions
   $user_entity.graph.entity.user.userid = $user
   $user_entity.graph.relations.entity.resource.name = $table_name

   //Group 4. the entity is from Cloud DLP data
   $table_context.graph.entity.resource.product_object_id = $table_name
   $table_context.graph.metadata.product_name = "GCP DLP CONTEXT"

 match:
    $user over 5m

 outcome:
   //calculate risk score
   $risk_score = max(
       if( $table_context.graph.entity.resource.attribute.labels.value = "US_SOCIAL_SECURITY_NUMBER", 80)
       )
   $technique = array_distinct("T1204.002")
   $principalHostname = array_distinct($proxy_event.principal.hostname)
   $principalIp = array_distinct($proxy_event.principal.ip)
   $principalMac = array_distinct($proxy_event.principal.mac)
   $targetHostname = array_distinct($proxy_event.target.hostname)
   $target_url = array_distinct($proxy_event.target.url)
   $targetIp = array_distinct($proxy_event.target.ip)
   $principalUserUserid =  array_distinct($proxy_event.principal.user.userid)
   $entity_resource_name = array_distinct($table_context.graph.entity.resource.name)

condition:
   $proxy_event and $edr_event and $user_entity and $table_context
}

About the detection

If you test the rule against existing data and it identifies the pattern of activity specified in the definition, it generates a detection. The Detection panel displays the detection generated after testing the rule. The Detection panel also displays the event and entity records that caused the rule to create a detection. In this example, the following records are displayed:

Google Cloud IAM Analysis UDM entity
Sensitive Data Protection UDM entity
Zscaler web proxy UDM event
CrowdStrike Falcon EDR UDM event

In the Detection panel, select any event or entity record to see details.

The detection also stores the variables defined in the outcome section of the rule. To display the variables in the Detection panel, select Columns, and then select one or more variable names from the Columns menu. The selected columns appear in the Detection panel.

What's next

To write custom rules, see Overview of the YARA-L 2.0 language.

To create custom context-aware analytics, see Create context-aware analytics

To use predefined threat analytics, see Using Google Security Operations curated detections.