How Google Security Operations enriches event and entity data

Supported in:

This document describes how Google Security Operations enriches data and the Unified Data Model (UDM) fields where data is stored.

To enable a security investigation, Google Security Operations ingests contextual data from different sources, performs analysis on the data, and provides additional context about artifacts in a customer environment. Analysts can use contextually enriched data in Detection Engine rules, investigative searches, or reports.

Google Security Operations performs the following types of enrichment:

  • Enriches entities by using the entity graph and merging.
  • Calculates and enriches each entity with a prevalence statistic that indicates its popularity in the environment.
  • Calculates the first time certain entity types were seen in the environment or the most recent time.
  • Enriches entities with information from Safe Browsing threat lists.
  • Enriches events with geolocation data.
  • Enriches entities with WHOIS data.
  • Enriches events with VirusTotal file metadata.
  • Enriches entities with VirusTotal relationship data.
  • Ingest and store Google Cloud Threat Intelligence data.

Enriched data from WHOIS, Safe Browsing, GCTI Threat Intelligence, VirusTotal metadata, and VirusTotal relationship are identified by event_type, product_name, and vendor_name. When creating a rule that uses this enriched data, we recommend that you include a filter in the rule that identifies the specific enrichment type to include. This filter helps improve performance of the rule. For example, include the following filter fields in the events section of the rule that joins WHOIS data.

$enrichment.graph.metadata.entity_type = "DOMAIN_NAME"
$enrichment.graph.metadata.product_name = "WHOISXMLAPI Simple Whois"
$enrichment.graph.metadata.vendor_name = "WHOIS"

Enrich entities by using the entity graph and merging

The entity graph identifies relationships between entities and resources in your environment. When entities from different sources are ingested into Google Security Operations, the entity graph maintains an adjacency list based on the relationship between the entities. The entity graph performs context enrichment by performing deduplication and merging.

During deduplication, redundant data is eliminated and intervals are formed to create a common entity. For example, consider two entities e1 and e2 with timestamps t1 and t2 respectively. The entities e1 and e2 are deduplicated and the timestamps that are different are not used during deduplication. The following fields are not used during deduplication:

  • collected_timestamp
  • creation_timestamp
  • interval

During merging, relationships between entities are formed for a time interval of one day. For example, consider an entity record of user A who has access to a Cloud Storage bucket. There is another entity record of user A who owns a device. After merging, these two entities result in a single entity user A that has two relations. One relation is that user A has access to the Cloud Storage bucket and the other relation is that user A owns the device. Google Security Operations performs a five-day lookback when it creates entity context data. This handles late arriving data and creates an implicit time to live on entity context data.

Google Security Operations uses aliasing to enrich the telemetry data and uses entity graphs to enrich the entities. The detection engine rules join the merged entities against the enriched telemetry data to provide context-aware analytics.

An event that contains an entity noun is considered as an entity. Here are some event types and their corresponding entity types:

  • ASSET_CONTEXT corresponds to ASSET.
  • RESOURCE_CONTEXT corresponds to RESOURCE.
  • USER_CONTEXT corresponds to USER.
  • GROUP_CONTEXT corresponds to GROUP.

The entity graph distinguishes between contextual data and indicators of compromise (IOC) using the threat information.

When you use contextually enriched data, consider the following entity graph behavior:

  • Don't add intervals in the entity, and instead let the entity graph create intervals. This is because intervals are generated during deduplication unless otherwise specified.
  • If the intervals are specified, only the same events are deduplicated, and the most recent entity is retained.
  • To ensure that live rules and retrohunts work as expected, entities must be ingested at least once daily.
  • If entities are not ingested daily and ingested only once in two or more days, live rules might work as expected, however, retrohunts might lose context of the event.
  • If entities are ingested more than once daily, then the entity is deduplicated to a single entity.
  • If the event data is missing for a day, the data of the past day is used temporarily to ensure that live rules work fine.

The entity graph also merges events having similar identifiers to get a consolidated view of the data. This merging happens based on the following list of identifiers:

  • Asset
    • entity.asset.product_object_id
    • entity.asset.hostname
    • entity.asset.asset_id
    • entity.asset.mac
  • User
    • entity.user.product_object_id
    • entity.user.userid
    • entity.user.windows_sid
    • entity.user.email_addresses
    • entity.user.employee_id
  • Resource
    • entity.resource.product_object_id
    • entity.resource.name
  • Group
    • entity.group.product_object_id
    • entity.group.email_addresses
    • entity.group.windows_sid

Calculate prevalence statistics

Google Security Operations performs statistical analysis on existing and incoming data and enriches entity context records with prevalence-related metrics.

Prevalence is a numeric value which indicates how popular an entity is. Popularity is defined by the number of assets accessing an artifact, such as a domain, file hash or IP address. The larger the number, the more popular the entity. For example, google.com has high prevalence values because it is accessed frequently. If a domain is accessed infrequently, it will have lower prevalence values. More popular entities are usually less likely to be malicious.

These enriched values are supported for domain, IP, and file (hash). The values are calculated and stored in the following fields.

Prevalence statistics for each entity are updated each day. Values are stored in a separate entity context that can be used by Detection Engine, but is not shown in Google Security Operations investigative views and UDM search.

The following fields can be used when creating Detection Engine rules.

Entity type UDM fields
Domain entity.domain.prevalence.day_count
entity.domain.prevalence.day_max
entity.domain.prevalence.day_max_sub_domains
entity.domain.prevalence.rolling_max
entity.domain.prevalence.rolling_max_sub_domains
File (Hash) entity.file.prevalence.day_count
entity.file.prevalence.day_max
entity.file.prevalence.rolling_max
IP address entity.artifact.prevalence.day_count
entity.artifact.prevalence.day_max
entity.artifact.prevalence.rolling_max

The day_max and rolling_max values are calculated differently. The fields are calculated as follows:

  • day_max is calculated as the maximum prevalence score for the artifact during the day, where a day is defined as 12:00:00 AM - 11:59:59 PM UTC.
  • rolling_max is calculated as the maximum per day prevalence score (i.e. day_max) for the artifact over the previous 10 day window.
  • day_count is used to calculate rolling_max and is always the value 10.

When calculated for a domain, the difference between day_max versus day_max_sub_domains (and rolling_max versus rolling_max_sub_domains) is as follows:

  • rolling_max and day_max represent the number of daily unique internal IP addresses accessing a given domain (excluding subdomains).
  • rolling_max_sub_domains and day_max_sub_domains represent the number of unique internal IP addresses accessing a given domain (including subdomains).

Prevalence statistics are calculated on newly ingested entity data. Calculations are not performed retroactively on previously ingested data. It takes approximately 36 hours for the statistics to be calculated and stored.

Calculate the first-seen and last-seen time of entities

Google Security Operations performs statistical analysis on incoming data and enriches entity context records with the first-seen and last-seen times of an entity. The first_seen_time field stores the date and time when the entity was first seen in the customer environment. The last_seen_time field stores the date and time of the most recent observation.

Because multiple indicators (UDM fields) can identify an asset or a user, the first-seen time is the first time any of the indicators that identify the user or asset was seen in the customer environment.

All UDM fields that describe an asset are the following:

  • entity.asset.hostname
  • entity.asset.ip
  • entity.asset.mac
  • entity.asset.asset_id
  • entity.asset.product_object_id

All UDM fields that describe a user are the following:

  • entity.user.windows_sid
  • entity.user.product_object_id
  • entity.user.userid
  • entity.user.employee_id
  • entity.user.email_addresses

The first-seen time and last-seen time enable an analyst to correlate certain activity that occurred after a domain, file (hash), asset, user, or IP address was first seen or that stopped occurring after the domain, file (hash), or IP address was last seen.

The first_seen_time and last_seen_time fields are populated with entities that describe a domain, IP address, and file (hash). For entities that describe a user or asset, only the first_seen_time field is populated. These values are not calculated for entities that describe other types, such as a group or resource.

The statistics are calculated for each entity across all namespaces. Google Security Operations does not calculate the statistics for each entity within individual namespaces. These statistics are not currently exported to theGoogle Security Operations events schema in BigQuery.

The enriched values are calculated and stored in the following UDM fields:

Entity type UDM fields
Domain entity.domain.first_seen_time
entity.domain.last_seen_time
File (hash) entity.file.first_seen_time
entity.file.last_seen_time
IP address entity.artifact.first_seen_time
entity.artifact.last_seen_time
Asset entity.asset.first_seen_time
User entity.user.first_seen_time

Enrich events with geolocation data

Incoming log data can include external IP addresses without corresponding location information. This is common when an event is logging information about device activity that is not in an enterprise network. For example, a login event to a cloud service would contain a source or client IP address based on the external IP address of a device returned by the carrier NAT.

Google Security Operations provides geolocation-enriched data for external IP addresses to enable more powerful rule detections and greater context for investigations. For example, Google Security Operations might use an external IP address to enrich the event with information about the country (such as the United States), a specific state (such as Alaska), and the network the IP address is in (such as the ASN and carrier name).

Google Security Operations uses location data supplied by Google to provide an approximate geographic location and network information for an IP address. You can write Detection Engine rules against these fields in the events. The enriched event data is also exported to BigQuery where it can be used in Google Security Operations dashboards and reporting.

The following IP addresses are not enriched:

  • RFC 1918 private IP address spaces because they are internal to the enterprise network.
  • RFC 5771 multicast IP address space because multicast addresses do not belong to a single location.
  • IPv6 Unique Local addresses.
  • Google Cloud service IP addresses. Exceptions are Google Cloud Compute Engine external IP addresses, which are enriched.

Google Security Operations enriches the following UDM fields with geolocation data:

  • principal
  • target
  • src
  • observer
Type of data UDM field
Location (for example, United States) ( principal | target | src | observer ).ip_geo_artifact.location.country_or_region
State (for example, New York) ( principal | target | src | observer ).ip_geo_artifact.location.state
Longitude ( principal | target | src | observer ).ip_geo_artifact.location.region_coordinates.longitude
Latitude ( principal | target | src | observer ).ip_geo_artifact.location.region_coordinates.latitude
ASN (autonomous system number) ( principal | target | src | observer ).ip_geo_artifact.network.asn
Carrier name ( principal | target | src | observer ).ip_geo_artifact.network.carrier_name
DNS domain ( principal | target | src | observer ).ip_geo_artifact.network.dns_domain
Organization name ( principal | target | src | observer ).ip_geo_artifact.network.organization_name

The following example shows the type of geographic information that would be added to a UDM event with an IP address tagged to the Netherlands:

UDM field Value
principal.ip_geo_artifact.location.country_or_region Netherlands
principal.ip_geo_artifact.location.region_coordinates.latitude 52.132633
principal.ip_geo_artifact.location.region_coordinates.longitude 5.291266
principal.ip_geo_artifact.network.asn 8455
principal.ip_geo_artifact.network.carrier_name schuberg philis

Inconsistencies

Google proprietary IP geolocation technology uses a combination of networking data and other inputs and methods to provide IP address location and network resolution for our users. Other organizations may use different signals or methods, which might occasionally lead to different results.

If cases arise in which you experience an inconsistency in IP geolocation results that Google provides, please open a customer support case, so that we can investigate and, if appropriate, correct our records moving forward.

Enrich entities with information from Safe Browsing threat lists

Google Security Operations ingests data from Safe Browsing related to file hashes. The data for each file is stored as an entity and provides additional context about the file. Analysts can create Detection Engine rules that query against this entity context data to build context-aware analytics.

The following information is stored with the entity context record.

UDM field Description
entity.metadata.product_entity_id A unique identifier for the entity.
entity.metadata.entity_type This value is FILE, indicating that the entity describes a file.
entity.metadata.collected_timestamp The date and time that the entity was observed or the event occurred.
entity.metadata.interval Stores the start time and end time that this data is valid. Because threat list content changes over time, the start_time and end_time reflects the time interval during which the data about the entity is valid. For example, a file hash was observed to be malicious or suspicious between start_time and end_time.
entity.metadata.threat.category This is the Google Security Operations SecurityCategory. This is set to one or more of the following values:
  • SOFTWARE_MALICIOUS: indicates that the threat is related to malware.
  • SOFTWARE_PUA: indicates that the threat is related to unwanted software.
entity.metadata.threat.severity This is the Google Security Operations ProductSeverity. If the value is CRITICAL, this indicates the artifact appears malicious. If the value is not specified, there is not enough confidence to indicate that the artifact is malicious.
entity.metadata.product_name Stores the value Google Safe Browsing.
entity.file.sha256 The SHA256 hash value for the file.

Enrich entities with WHOIS data

Google Security Operations ingests WHOIS data daily. During the ingestion of incoming customer device data, Google Security Operations evaluates domains in customer data against the WHOIS data. When there is a match, Google Security Operations stores the related WHOIS data with the entity record for the domain. For each entity, where entity.metadata.entity_type = DOMAIN_NAME, Google Security Operations enriches the entity with information from WHOIS.

Google Security Operations populates enriched WHOIS data into the following fields in the entity record:

  • entity.domain.admin.attribute.labels
  • entity.domain.audit_update_time
  • entity.domain.billing.attribute.labels
  • entity.domain.billing.office_address.country_or_region
  • entity.domain.contact_email
  • entity.domain.creation_time
  • entity.domain.expiration_time
  • entity.domain.iana_registrar_id
  • entity.domain.name_server
  • entity.domain.private_registration
  • entity.domain.registrant.company_name
  • entity.domain.registrant.office_address.state
  • entity.domain.registrant.office_address.country_or_region
  • entity.domain.registrant.email_addresses
  • entity.domain.registrant.user_display_name
  • entity.domain.registrar
  • entity.domain.registry_data_raw_text
  • entity.domain.status
  • entity.domain.tech.attribute.labels
  • entity.domain.update_time
  • entity.domain.whois_record_raw_text
  • entity.domain.whois_server
  • entity.domain.zone

For a description of these fields, see the Unified Data Model field list document.

Ingest and store Google Cloud Threat Intelligence data

Google Security Operations ingests data from Google Cloud Threat Intelligence (GCTI) data sources that provide you with contextual information you can use when investigating activity in your environment. You can query the following data sources:

  • GCTI Tor Exit Nodes: IP addresses that are known Tor exit nodes.
  • GCTI Benign Binaries: files that are either part of the operating system original distribution or were updated by an official operating system patch. Some official operating system binaries that have been abused by an adversary through activity common in living-off-the-land attacks are excluded from this data source, such as those focused on initial entry vectors.
  • GCTI Remote Access Tools: files that have frequently been used by malicious actors. These tools are generally legitimate applications that are sometimes abused to remotely connect to compromised systems.

    This contextual data is stored globally as entities. You can query the data using detection engine rules. Include the following UDM fields and values in the rule to query these global entities:

  • graph.metadata.vendor_name = Google Cloud Threat Intelligence

  • graph.metadata.product_name = GCTI Feed

In this document, the placeholder <variable_name> represents the unique variable name used in a rule to identify a UDM record.

Timed versus timeless Google Cloud Threat Intelligence data sources

Google Cloud Threat Intelligence data sources are either timed or timeless.

Timed data sources have a time range associated with each entry. This means that if a detection is generated on day 1, on any day in the future the same detection is expected to be generated for day 1 during a retro-hunt.

Timeless data sources have no time range associated with them. This is because only the latest set of data is what should be considered. Timeless data sources are frequently used for data such as file hashes that are not expected to change. If no detection is generated on day 1, on day 2 a detection might be generated for day 1 during a retro-hunt because a new entry was added.

Data about Tor exit node IP addresses

Google Security Operations ingests and stores IP addresses that are known Tor exit nodes. Tor exit nodes are points at which traffic exits the Tor network. Information ingested from this data source is stored in the following UDM fields. Data in this source is timed.

UDM field Description
<variable_name>.graph.metadata.vendor_name Stores the value Google Cloud Threat Intelligence.
<variable_name>.graph.metadata.product_name Stores the value GCTI Feed.
<variable_name>.graph.metadata.threat.threat_feed_name Stores the value Tor Exit Nodes.
<variable_name>.graph.entity.artifact.ip Stores the IP address ingested from the GCTI data source.

Data about benign operating system files

Google Security Operations ingests and stores file hashes from the GCTI Benign Binaries data source. Information ingested from this data source is stored in the following UDM fields. Data in this source is timeless.

UDM field Description
<variable_name>.graph.metadata.vendor_name Stores the value Google Cloud Threat Intelligence.
<variable_name>.graph.metadata.product_name Stores the value GCTI Feed.
<variable_name>.graph.metadata.threat.threat_feed_name Stores the value Benign Binaries.
<variable_name>.graph.entity.file.sha256 Stores the SHA256 hash value of the file.
<variable_name>.graph.entity.file.sha1 Stores the SHA1 hash value of the file.
<variable_name>.graph.entity.file.md5 Stores the MD5 hash value of the file.

Data about remote access tools

Remote access tools include file hashes for known remote access tools such as VNC clients that have frequently been used by malicious actors. These tools are generally legitimate applications that are sometimes abused to remotely connect to compromised systems. Information ingested from this data source is stored in the following UDM fields. Data in this source is timeless.

UDM field Description
.graph.metadata.vendor_name Stores the value Google Cloud Threat Intelligence.
.graph.metadata.product_name Stores the value GCTI Feed.
.graph.metadata.threat.threat_feed_name Stores the value Remote Access Tools.
.graph.entity.file.sha256 Stores the SHA256 hash value of the file.
.graph.entity.file.sha1 Stores the SHA1 hash value of the file.
.graph.entity.file.md5 Stores the MD5 hash value of the file.

Enrich events with VirusTotal file metadata

Google Security Operations enriches file hashes into UDM events and provides additional context during an investigation. UDM events are enriched through hash aliasing in a customer environment. Hash aliasing combines all types of file hashes and provides information about a file hash during a search.

The integration of VirusTotal file metadata and relationship enrichment with Google SecOps can be used to identify patterns of malicious activity and to track malware movements across a network.

A raw log provides limited information about the file. VirusTotal enriches the event with file metadata to provide a dump of bad hashes along with metadata about the bad file. The metadata includes information such as filenames, types, imported functions, and tags. You can use this information in the UDM search and detection engine with YARA-L to understand bad file events and in general during threat hunting. An example use case is to detect any modifications to the original file which would, in turn, import the file metadata for threat detection.

The following information is stored with the record. For a list of all UDM fields, see Unified Data Model field list.

Type of data UDM field
SHA-256 ( principal | target | src | observer ).file.sha256
MD5 ( principal | target | src | observer ).file.md5
SHA-1 ( principal | target | src | observer ).file.sha1
Size ( principal | target | src | observer ).file.size
ssdeep ( principal | target | src | observer ).file.ssdeep
vhash ( principal | target | src | observer ).file.vhash
authentihash ( principal | target | src | observer ).file.authentihash
File type ( principal | target | src | observer ).file.file_type
Tags ( principal | target | src | observer ).file.tags
Capabilities tags ( principal | target | src | observer ).file.capabilities_tags
Names ( principal | target | src | observer ).file.names
First-seen time ( principal | target | src | observer ).file.first_seen_time
Last-seen time ( principal | target | src | observer ).file.last_seen_time
Last modification time ( principal | target | src | observer ).file.last_modification_time
Last analysis time ( principal | target | src | observer ).file.last_analysis_time
Embedded URLs ( principal | target | src | observer ).file.embedded_urls
Embedded IPs ( principal | target | src | observer ).file.embedded_ips
Embedded domains ( principal | target | src | observer ).file.embedded_domains
Signature information ( principal | target | src | observer ).file.signature_info
Signature information
  • Sigcheck
( principal | target | src | observer).file.signature_info.sigcheck
Signature information
  • Sigcheck
    • Verification message
( principal | target | src | observer ).file.signature_info.sigcheck.verification_message
Signature information
  • Sigcheck
    • Verified
( principal | target | src | observer ).file.signature_info.sigcheck.verified
Signature information
  • Sigcheck
    • Signers
( principal | target | src | observer ).file.signature_info.sigcheck.signers
Signature information
  • Sigcheck
    • Signers
      • Name
( principal | target | src | observer ).file.signature_info.sigcheck.signers.name
Signature information
  • Sigcheck
    • Signers
      • Status
( principal | target | src | observer ).file.signature_info.sigcheck.signers.status
Signature information
  • Sigcheck
    • Signers
      • Valid usage for certificate
( principal | target | src | observer ).file.signature_info.sigcheck.signers.valid_usage
Signature information
  • Sigcheck
    • Signers
      • Certificate issuer
( principal | target | src | observer ).file.signature_info.sigcheck.signers.cert_issuer
Signature information
  • Sigcheck
    • X509
( principal | target | src | observer ).file.signature_info.sigcheck.x509
Signature information
  • Sigcheck
    • X509
      • Name
( principal | target | src | observer ).file.signature_info.sigcheck.x509.name
Signature information
  • Sigcheck
    • X509
      • Algorithm
( principal | target | src | observer ).file.signature_info.sigcheck.x509.algorithm
Signature information
  • Sigcheck
    • X509
      • Thumbprint
( principal | target | src | observer ).file.signature_info.sigcheck.x509.thumprint
Signature information
  • Sigcheck
    • X509
      • Certificate issuer
( principal | target | src | observer ).file.signature_info.sigcheck.x509.cert_issuer
Signature information
  • Sigcheck
    • X509
      • Serial number
( principal | target | src | observer ).file.signature_info.sigcheck.x509.serial_number
Signature information
  • Codesign
( principal | target | src | observer ).file.signature_info.codesign
Signature information
  • Codesign
    • ID
( principal | target | src | observer ).file.signature_info.codesign.id
Signature information
  • Codesign
    • Format
( principal | target | src | observer ).file.signature_info.codesign.format
Signature information
  • Codesign
    • Compilation time
( principal | target | src | observer ).file.signature_info.codesign.compilation_time
Exiftool information ( principal | target | src | observer ).file.exif_info
Exiftool information
  • Original file name
( principal | target | src | observer ).file.exif_info.original_file
Exiftool information
  • Product name
( principal | target | src | observer ).file.exif_info.product
Exiftool information
  • Company name
( principal | target | src | observer ).file.exif_info.company
Exiftool information
  • File description
( principal | target | src | observer ).file.exif_info.file_description
Exiftool information
  • Entry point
( principal | target | src | observer ).file.exif_info.entry_point
Exiftool information
  • Compilation time
( principal | target | src | observer ).file.exif_info.compilation_time
PDF information ( principal | target | src | observer ).file.pdf_info
PDF information
  • Number of /JS tags
( principal | target | src | observer ).file.pdf_info.js
PDF information
  • Number of /JavaScript tags
( principal | target | src | observer ).file.pdf_info.javascript
PDF information
  • Number of /Launch tags
( principal | target | src | observer ).file.pdf_info.launch_action_count
PDF information
  • Number of object streams
( principal | target | src | observer ).file.pdf_info.object_stream_count
PDF information
  • Number of object definitions (endobj keyword)
( principal | target | src | observer ).file.pdf_info.endobj_count
PDF information
  • PDF version
( principal | target | src | observer ).file.pdf_info.header
PDF information
  • Number of /AcroForm tags
( principal | target | src | observer ).file.pdf_info.acroform
PDF information
  • Number of /AA tags
( principal | target | src | observer ).file.pdf_info.autoaction
PDF information
  • Number of /EmbeddedFile tags
( principal | target | src | observer ).file.pdf_info.embedded_file
PDF information
  • /Encrypt tag
( principal | target | src | observer ).file.pdf_info.encrypted
PDF information
  • Number of /RichMedia tags
( principal | target | src | observer ).file.pdf_info.flash
PDF information
  • Number of /JBIG2Decode tags
( principal | target | src | observer ).file.pdf_info.jbig2_compression
PDF information
  • Number of object definitions (obj keyword)
( principal | target | src | observer ).file.pdf_info.obj_count
PDF information
  • Number of defined stream objects (stream keyword)
( principal | target | src | observer ).file.pdf_info.endstream_count
PDF information
  • Number of pages in the PDF
( principal | target | src | observer ).file.pdf_info.page_count
PDF information
  • Number of defined stream objects (stream keyword)
( principal | target | src | observer ).file.pdf_info.stream_count
PDF information
  • Number of /OpenAction tags
( principal | target | src | observer ).file.pdf_info.openaction
PDF information
  • Number of startxref keywords
( principal | target | src | observer ).file.pdf_info.startxref
PDF information
  • Number of colors expressed with more than 3 bytes (CVE-2009-3459)
( principal | target | src | observer ).file.pdf_info.suspicious_colors
PDF information
  • Number of trailer keywords
( principal | target | src | observer ).file.pdf_info.trailer
PDF information
  • Number of /XFA tags found
( principal | target | src | observer ).file.pdf_info.xfa
PDF information
  • Number of xref keywords
( principal | target | src | observer ).file.pdf_info.xref
PE file metadata ( principal | target | src | observer ).file.pe_file
PE file metadata
  • Imphash
( principal | target | src | observer ).file.pe_file.imphash
PE file metadata
  • Entry point
( principal | target | src | observer ).file.pe_file.entry_point
PE file metadata
  • Entry point exiftool
( principal | target | src | observer ).file.pe_file.entry_point_exiftool
PE file metadata
  • Compilation time
( principal | target | src | observer ).file.pe_file.compilation_time
PE file metadata
  • Compilation exiftool time
( principal | target | src | observer ).file.pe_file.compilation_exiftool_time
PE file metadata
  • Sections
( principal | target | src | observer ).file.pe_file.section
PE file metadata
  • Sections
    • Name
( principal | target | src | observer ).file.pe_file.section.name
PE file metadata
  • Sections
    • Entropy
( principal | target | src | observer ).file.pe_file.section.entropy
PE file metadata
  • Sections
    • Raw size in bytes
( principal | target | src | observer ).file.pe_file.section.raw_size_bytes
PE file metadata
  • Sections
    • Virtual size in bytes
( principal | target | src | observer ).file.pe_file.section.virtual_size_bytes
PE file metadata
  • Sections
    • MD5 hex
( principal | target | src | observer ).file.pe_file.section.md5_hex
PE file metadata
  • Imports
( principal | target | src | observer ).file.pe_file.imports
PE file metadata
  • Imports
    • Library
( principal | target | src | observer ).file.pe_file.imports.library
PE file metadata
  • Imports
    • Functions
( principal | target | src | observer ).file.pe_file.imports.functions
PE file metadata
  • Resource information
( principal | target | src | observer ).file.pe_file.resource
PE file metadata
  • Resource information
    • SHA-256 hex
( principal | target | src | observer ).file.pe_file.resource.sha256_hex
PE file metadata
  • Resource information
    • Resource type identified by magic Python module
( principal | target | src | observer ).file.pe_file.resource.filetype_magic
PE file metadata
  • Resource information
    • Human-readable version of the language and sublanguage identifiers, as defined in the Windows PE specification
( principal | target | src | observer ).file.pe_file.resource_language_code
PE file metadata
  • Resource information
    • Entropy
( principal | target | src | observer ).file.pe_file.resource.entropy
PE file metadata
  • Resource information
    • File type
( principal | target | src | observer ).file.pe_file.resource.file_type
PE file metadata
  • Number of resources by resource type
( principal | target | src | observer ).file.pe_file.resources_type_count_str
PE file metadata
  • Number of resources by language
( principal | target | src | observer ).file.pe_file.resources_language_count_str

Enrich entities with VirusTotal relationship data

VirusTotal helps analyze suspicious files, domains, IP addresses, and URLs to detect malware and other breaches, and share the findings with the security community. Google Security Operations ingests data from VirusTotal related connections. This data is stored as an entity and provides information about the relation between file hashes and files, domains, IP addresses, and URLs.

Analysts can use this data to determine whether a file hash is bad based on information about the URL or domain from other sources. This information can be used to create Detection Engine rules that query against the entity context data to build context-aware analytics.

This data is only available for certain VirusTotal and Google Security Operations licenses. Check your entitlements with your account manager.

The following information is stored with the entity context record:

UDM field Description
entity.metadata.product_entity_id A unique identifier for the entity
entity.metadata.entity_type Stores the value FILE, indicating that the entity describes a file
entity.metadata.interval start_time refers to the beginning of time and end_time is the end of time for which this data is valid
entity.metadata.source_labels This field stores a list of key-value pairs of source_id and target_id for this entity. source_id is the file hash and target_id can be hash or value of the URL, domain name, or IP address that this file is related to. You can search for the URL, domain name, IP address, or file at virustotal.com.
entity.metadata.product_name Stores the value 'VirusTotal Relationships'
entity.metadata.vendor_name Stores the value 'VirusTotal'
entity.file.sha256 Stores the SHA-256 hash value for the file
entity.file.relations A list of child entities that the parent file entity is related to
entity.relations.relationship This field explains the type of relationship between parent and child entities. The value can be either EXECUTES, DOWNLOADED_FROM, or CONTACTS.
entity.relations.direction Stores the value 'UNIDIRECTIONAL' and indicates the direction of relation with the child entity
entity.relations.entity.url The URL that the file in the parent entity contacts (if the relationship between the parent entity and the URL is CONTACTS) or the URL from which the file in the parent entity was downloaded (if the relationship between the parent entity and the URL is DOWNLOADED_FROM).
entity.relations.entity.ip A list of IP addresses that the file in parent entity contacts or was downloaded from It only contains one IP address.
entity.relations.entity.domain.name The domain name which the file in parent entity contacts or was downloaded from
entity.relations.entity.file.sha256 Stores the SHA-256 hash value for the file in the relation
entity.relations.entity_type This field contains the type of entity in the relation. The value can be URL, DOMAIN_NAME, IP_ADDRESS, or FILE. These fields are populated in accordance with the entity_type. For example, if entity_type is URL, then entity.relations.entity.url is populated.

What's next

For information about how to use enriched data with other Google Security Operations features, see the following: