Enrich event and entity data with Google SecOps

Supported in:

Google secops SIEM

This document describes how Google Security Operations enriches data and the Unified Data Model (UDM) fields where data is stored.

To enable a security investigation, Google SecOps ingests contextual data from different sources, performs analysis on the data, and provides additional context about artifacts in a customer environment. Analysts can use contextually enriched data in Detection Engine rules, investigative searches, or reports.

Google SecOps performs the following types of enrichment:

Enriches entities by using the entity graph and merging.
Calculates and enriches each entity with a prevalence statistic that indicates its popularity in the environment.
Enriches events with geolocation data.
Calculates the entity first- and last-seen timestamps.
Enriches events with geolocation data.
Enriches entities with information from Safe Browsing threat lists.
Enriches entities with WHOIS data.
Enriches events with VirusTotal file metadata.
Enriches entities with VirusTotal relationship data.
Enriches events and entities from various sources within your environment (for example, Windows AD, Azure AD, Okta, Google Cloud, IAM).
Ingests and stores Google Cloud Threat Intelligence data.

The entity_type, product_name, and vendor_name parameters identify enriched data from the following sources:

Safe Browsing
WHOIS
Google Cloud Threat Intelligence (GCTI)
VirusTotal metadata
VirusTotal relationship data

When you create a rule that uses this enriched data (that is, from enriched data from Safe Browsing, WHOIS, GCTI Threat Intelligence, VirusTotal metadata, and VirusTotal relationship data), we recommend that you include a filter in the rule that identifies the specific enrichment type to include. This filter helps improve performance of the rule. For example, include the following filter fields in the events section of the rule that joins WHOIS data.

$enrichment.graph.metadata.entity_type = "DOMAIN_NAME"
$enrichment.graph.metadata.product_name = "WHOISXMLAPI Simple Whois"
$enrichment.graph.metadata.vendor_name = "WHOIS"

Enrich entities by using the entity graph and merging

The entity graph identifies relationships between entities and resources in your environment. When entities from different sources are ingested into Google SecOps, the entity graph maintains an adjacency list based on the relationship between the entities. The entity graph performs context enrichment by performing deduplication and merging.

During deduplication, redundant data is eliminated and intervals are formed to create a common entity. For example, consider two entities e1 and e2 with timestamps t1 and t2 respectively. The entities e1 and e2 are deduplicated and the timestamps that are different are not used during deduplication. The following fields are not used during deduplication:

collected_timestamp
creation_timestamp
interval

During merging, relationships between entities are formed for a time interval of one day. For example, consider an entity record of user A who has access to a Cloud Storage bucket. There is another entity record of user A who owns a device. After merging, these two entities result in a single entity user A that has two relations. One relation is that user A has access to the Cloud Storage bucket and the other relation is that user A owns the device. Google SecOps creates entity-context data with a five-day lookback window. This process handles late-arriving data and creates an implicit time to live for the entity-context data.

Google SecOps uses aliasing to enrich the telemetry data and uses entity graphs to enrich the entities. The detection engine rules join the merged entities against the enriched telemetry data to provide context-aware analytics.

An event that contains an entity noun is considered as an entity. Here are some event types and their corresponding entity types:

ASSET_CONTEXT corresponds to ASSET.
RESOURCE_CONTEXT corresponds to RESOURCE.
USER_CONTEXT corresponds to USER.
GROUP_CONTEXT corresponds to GROUP.

The entity graph distinguishes between contextual data and indicators of compromise (IOC) using the threat information.

When you use contextually enriched data, consider the following entity graph behavior:

Don't add intervals in the entity, and instead let the entity graph create intervals. This is because intervals are generated during deduplication unless otherwise specified.
If the intervals are specified, only the same events are deduplicated, and the most recent entity is retained.
To ensure that live rules and retrohunts work as expected, entities must be ingested at least once daily.
If entities are not ingested daily and ingested only once in two or more days, live rules might work as expected, however, retrohunts might lose context of the event.
If entities are ingested more than once daily, then the entity is deduplicated to a single entity.
If the event data is missing for a day, the data of the past day is used temporarily to ensure that live rules work fine.

The entity graph also merges events having similar identifiers to get a consolidated view of the data. This merging happens based on the following list of identifiers:

Asset
- entity.asset.product_object_id
- entity.asset.hostname
- entity.asset.asset_id
- entity.asset.mac
User
- entity.user.product_object_id
- entity.user.userid
- entity.user.windows_sid
- entity.user.email_addresses
- entity.user.employee_id
Resource
- entity.resource.product_object_id
- entity.resource.name
Group
- entity.group.product_object_id
- entity.group.email_addresses
- entity.group.windows_sid

Note: Google SecOps creates entity-context data with a five-day lookback window to handle late-arriving data. This process also creates an implicit time to live for the data if the end_time of the entity.metatdata.interval is omitted. When there is an implicit time to live on entity-context data, Google SecOps stores the data for the full, five-day lookback, but the Google SecOps user interface displays the interval.end_time field with the timestamp truncated to the beginning of the day after the interval.start_time. For example, suppose you store an entity-context record with an interval.start_time on May 1 without the interval.end_time field. Google SecOps stores the record through May 5. However, the UI displays interval.end_time as 00:00:00 on May 2. The record is no longer visible on May 6.

Calculate prevalence statistics

Google SecOps performs statistical analysis on existing and incoming data and enriches entity-context records with prevalence-related metrics.

Prevalence is a numeric value which indicates how popular an entity is. Popularity is defined by the number of assets accessing an artifact, such as a domain, file hash or IP address. The larger the number, the more popular the entity. For example, google.com has high prevalence values because it is accessed frequently. If a domain is accessed infrequently, it will have lower prevalence values. More popular entities are usually less likely to be malicious.

These enriched values are supported for domain, IP, and file (hash). The values are calculated and stored in the following fields.

Prevalence statistics for each entity are updated each day. Values are stored in a separate entity context that can be used by Detection Engine, but is not shown in Google SecOps investigative views and UDM search.

The following fields can be used when creating Detection Engine rules.

Entity type	UDM fields
Domain	`entity.domain.prevalence.day_count` `entity.domain.prevalence.day_max` `entity.domain.prevalence.day_max_sub_domains` `entity.domain.prevalence.rolling_max` `entity.domain.prevalence.rolling_max_sub_domains`
File (Hash)	`entity.file.prevalence.day_count` `entity.file.prevalence.day_max` `entity.file.prevalence.rolling_max`
IP address	`entity.artifact.prevalence.day_count` `entity.artifact.prevalence.day_max` `entity.artifact.prevalence.rolling_max`

The day_max and rolling_max values are calculated differently. The fields are calculated as follows:

day_max is calculated as the maximum prevalence score for the artifact during the day, where a day is defined as 12:00:00 AM - 11:59:59 PM UTC.
rolling_max is calculated as the maximum per day prevalence score (i.e. day_max) for the artifact over the previous 10 day window.
day_count is used to calculate rolling_max and is always the value 10.

When calculated for a domain, the difference between day_max versus day_max_sub_domains (and rolling_max versus rolling_max_sub_domains) is as follows:

rolling_max and day_max represent the number of daily unique internal IP addresses accessing a given domain (excluding subdomains).
rolling_max_sub_domains and day_max_sub_domains represent the number of unique internal IP addresses accessing a given domain (including subdomains).

Prevalence statistics are calculated on newly ingested entity data. Calculations are not performed retroactively on previously ingested data. It takes approximately 36 hours for the statistics to be calculated and stored.

Calculate the first-seen and last-seen times of entities

Google SecOps performs statistical analysis on incoming data and enriches entity context records with the first-seen and last-seen times of an entity. The first_seen_time field stores the date and time when the entity was first seen in the customer environment. The last_seen_time field stores the date and time of the most recent observation.

Because multiple indicators (UDM fields) can identify an asset or a user, the first-seen time is the first time any of the indicators that identify the user or asset was seen in the customer environment.

All UDM fields that describe an asset are the following:

entity.asset.hostname
entity.asset.ip
entity.asset.mac
entity.asset.asset_id
entity.asset.product_object_id

All UDM fields that describe a user are the following:

entity.user.windows_sid
entity.user.product_object_id
entity.user.userid
entity.user.employee_id
entity.user.email_addresses

The first-seen time and last-seen time enable an analyst to correlate certain activity that occurred after a domain, file (hash), asset, user, or IP address was first seen or that stopped occurring after the domain, file (hash), or IP address was last seen.

The first_seen_time and last_seen_time fields are populated with entities that describe a domain, IP address, and file (hash). For entities that describe a user or asset, only the first_seen_time field is populated. These values are not calculated for entities that describe other types, such as a group or resource.

The statistics are calculated for each entity across all namespaces. Google SecOps does not calculate the statistics for each entity within individual namespaces. These statistics are not currently exported to theGoogle SecOps events schema in BigQuery.

The enriched values are calculated and stored in the following UDM fields:

Entity type	UDM fields
Domain	`entity.domain.first_seen_time` `entity.domain.last_seen_time`
File (hash)	`entity.file.first_seen_time` `entity.file.last_seen_time`
IP address	`entity.artifact.first_seen_time` `entity.artifact.last_seen_time`
Asset	`entity.asset.first_seen_time`
User	`entity.user.first_seen_time`

Enrich events with geolocation data

Incoming log data can include external IP addresses without corresponding location information. This is common when an event is logging information about device activity that is not in an enterprise network. For example, a login event to a cloud service would contain a source or client IP address based on the external IP address of a device returned by the carrier NAT.

Google SecOps provides geolocation-enriched data for external IP addresses to enable more powerful rule detections and greater context for investigations. For example, Google SecOps might use an external IP address to enrich the event with information about the country (such as the United States), a specific state (such as Alaska), and the network the IP address is in (such as the ASN and carrier name).

Google SecOps uses location data supplied by Google to provide an approximate geographic location and network information for an IP address. You can write Detection Engine rules against these fields in the events. The enriched event data is also exported to BigQuery where it can be used in Google SecOps dashboards and reporting.

The following IP addresses are not enriched:

RFC 1918 private IP address spaces because they are internal to the enterprise network.
RFC 5771 multicast IP address space because multicast addresses do not belong to a single location.
IPv6 Unique Local addresses.
Google Cloud service IP addresses. Exceptions are Google Cloud Compute Engine external IP addresses, which are enriched.

Google SecOps enriches the following UDM fields with geolocation data:

principal
target
src
observer

Type of data	UDM field
Location (for example, United States)	`( principal \| target \| src \| observer ).ip_geo_artifact.location.country_or_region`
State (for example, New York)	`( principal \| target \| src \| observer ).ip_geo_artifact.location.state`
Longitude	`( principal \| target \| src \| observer ).ip_geo_artifact.location.region_coordinates.longitude`
Latitude	`( principal \| target \| src \| observer ).ip_geo_artifact.location.region_coordinates.latitude`
ASN (autonomous system number)	`( principal \| target \| src \| observer ).ip_geo_artifact.network.asn`
Carrier name	`( principal \| target \| src \| observer ).ip_geo_artifact.network.carrier_name`
DNS domain	`( principal \| target \| src \| observer ).ip_geo_artifact.network.dns_domain`
Organization name	`( principal \| target \| src \| observer ).ip_geo_artifact.network.organization_name`

The following example shows the type of geographic information that would be added to a UDM event with an IP address tagged to the Netherlands:

UDM field	Value
`principal.ip_geo_artifact.location.country_or_region`	`Netherlands`
`principal.ip_geo_artifact.location.region_coordinates.latitude`	`52.132633`
`principal.ip_geo_artifact.location.region_coordinates.longitude`	`5.291266`
`principal.ip_geo_artifact.network.asn`	`8455`
`principal.ip_geo_artifact.network.carrier_name`	`schuberg philis`

Inconsistencies

Google proprietary IP geolocation technology uses a combination of networking data and other inputs and methods to provide IP address location and network resolution for our users. Other organizations may use different signals or methods, which might occasionally lead to different results.

If cases arise in which you experience an inconsistency in IP geolocation results that Google provides, please open a customer support case, so that we can investigate and, if appropriate, correct our records moving forward.

Enrich entities with information from Safe Browsing threat lists

Google SecOps ingests data from Safe Browsing related to file hashes. The data for each file is stored as an entity and provides additional context about the file. Analysts can create Detection Engine rules that query against this entity context data to build context-aware analytics.

The following information is stored with the entity context record.

UDM field	Description
`entity.metadata.product_entity_id`	A unique identifier for the entity.
`entity.metadata.entity_type`	This value is `FILE`, indicating that the entity describes a file.
`entity.metadata.collected_timestamp`	The date and time that the entity was observed or the event occurred.
`entity.metadata.interval`	Stores the start time and end time that this data is valid. Because threat list content changes over time, the `start_time` and `end_time` reflects the time interval during which the data about the entity is valid. For example, a file hash was observed to be malicious or suspicious between `start_time` and `end_time`.
`entity.metadata.threat.category`	The Google SecOps `SecurityCategory`. This is set to one or more of the following values: `SOFTWARE_MALICIOUS`: indicates that the threat is related to malware. `SOFTWARE_PUA`: indicates that the threat is related to unwanted software.
`entity.metadata.threat.severity`	This is the Google SecOps `ProductSeverity`. If the value is `CRITICAL`, this indicates the artifact appears malicious. If the value is not specified, there is not enough confidence to indicate that the artifact is malicious.
`entity.metadata.product_name`	Stores the value `Google Safe Browsing`.
`entity.file.sha256`	The SHA256 hash value for the file.

Enrich entities with WHOIS data

Google SecOps ingests WHOIS data daily. During the ingestion of incoming customer device data, Google SecOps evaluates domains in customer data against the WHOIS data. When there is a match, Google SecOps stores the related WHOIS data with the entity record for the domain. For each entity, where entity.metadata.entity_type = DOMAIN_NAME, Google SecOps enriches the entity with information from WHOIS.

Google SecOps populates enriched WHOIS data into the following fields in the entity record:

entity.domain.admin.attribute.labels
entity.domain.audit_update_time
entity.domain.billing.attribute.labels
entity.domain.billing.office_address.country_or_region
entity.domain.contact_email
entity.domain.creation_time
entity.domain.expiration_time
entity.domain.iana_registrar_id
entity.domain.name_server
entity.domain.private_registration
entity.domain.registrant.company_name
entity.domain.registrant.office_address.state
entity.domain.registrant.office_address.country_or_region
entity.domain.registrant.email_addresses
entity.domain.registrant.user_display_name
entity.domain.registrar
entity.domain.registry_data_raw_text
entity.domain.status
entity.domain.tech.attribute.labels
entity.domain.update_time
entity.domain.whois_record_raw_text
entity.domain.whois_server
entity.domain.zone

For a description of these fields, see the Unified Data Model field list document.

Ingest and store Google Cloud Threat Intelligence data

Google SecOps ingests data from Google Cloud Threat Intelligence (GCTI) data sources that provide you with contextual information you can use when investigating activity in your environment.

You can query the following data sources:

GCTI Tor Exit Nodes: IP addresses that are known Tor exit nodes.
GCTI Benign Binaries: files that are either part of the operating system original distribution or were updated by an official operating system patch. Some official operating system binaries that have been abused by an adversary through activity common in living-off-the-land attacks are excluded from this data source, such as those focused on initial entry vectors.
GCTI Remote Access Tools: files that have frequently been used by malicious actors. These tools are generally legitimate applications that are sometimes abused to remotely connect to compromised systems.

This contextual data is stored globally as entities. You can query the data using detection engine rules. Include the following UDM fields and values in the rule to query these global entities:
graph.metadata.vendor_name = Google Cloud Threat Intelligence
graph.metadata.product_name = GCTI Feed

In this document, the placeholder <variable_name> represents the unique variable name used in a rule to identify a UDM record.

Timed versus timeless Google Cloud Threat Intelligence data sources

Google Cloud Threat Intelligence data sources are either timed or timeless.

Timed data sources have a time range associated with each entry. This means that if a detection is generated on day 1, on any day in the future the same detection is expected to be generated for day 1 during a retro-hunt.

Timeless data sources have no time range associated with them. This is because only the latest set of data is what should be considered. Timeless data sources are frequently used for data such as file hashes that are not expected to change. If no detection is generated on day 1, on day 2 a detection might be generated for day 1 during a retro-hunt because a new entry was added.

Data about Tor exit node IP addresses

Google SecOps ingests and stores IP addresses that are known Tor exit nodes. Tor exit nodes are points at which traffic exits the Tor network. Information ingested from this data source is stored in the following UDM fields. Data in this source is timed.

UDM field	Description
`<variable_name>.graph.metadata.vendor_name`	Stores the value `Google Cloud Threat Intelligence`.
`<variable_name>.graph.metadata.product_name`	Stores the value `GCTI Feed`.
`<variable_name>.graph.metadata.threat.threat_feed_name`	Stores the value `Tor Exit Nodes`.
`<variable_name>.graph.entity.artifact.ip`	Stores the IP address ingested from the GCTI data source.

Data about benign operating system files

Google SecOps ingests and stores file hashes from the GCTI Benign Binaries data source. Information ingested from this data source is stored in the following UDM fields. Data in this source is timeless.

UDM field	Description
`<variable_name>.graph.metadata.vendor_name`	Stores the value `Google Cloud Threat Intelligence`.
`<variable_name>.graph.metadata.product_name`	Stores the value `GCTI Feed`.
`<variable_name>.graph.metadata.threat.threat_feed_name`	Stores the value `Benign Binaries`.
`<variable_name>.graph.entity.file.sha256`	Stores the SHA256 hash value of the file.
`<variable_name>.graph.entity.file.sha1`	Stores the SHA1 hash value of the file.
`<variable_name>.graph.entity.file.md5`	Stores the MD5 hash value of the file.

Data about remote access tools

Remote access tools include file hashes for known remote access tools such as VNC clients that have frequently been used by malicious actors. These tools are generally legitimate applications that are sometimes abused to remotely connect to compromised systems. Information ingested from this data source is stored in the following UDM fields. Data in this source is timeless.

UDM field	Description
`.graph.metadata.vendor_name`	Stores the value `Google Cloud Threat Intelligence`.
`.graph.metadata.product_name`	Stores the value `GCTI Feed`.
`.graph.metadata.threat.threat_feed_name`	Stores the value `Remote Access Tools`.
`.graph.entity.file.sha256`	Stores the SHA256 hash value of the file.
`.graph.entity.file.sha1`	Stores the SHA1 hash value of the file.
`.graph.entity.file.md5`	Stores the MD5 hash value of the file.

Enrich events with VirusTotal file metadata

Google SecOps enriches file hashes into UDM events and provides additional context during an investigation. UDM events are enriched through hash aliasing in a customer environment. Hash aliasing combines all types of file hashes and provides information about a file hash during a search.

The integration of VirusTotal file metadata and relationship enrichment with Google SecOps can be used to identify patterns of malicious activity and to track malware movements across a network.

A raw log provides limited information about the file. VirusTotal enriches the event with file metadata to provide a dump of bad hashes along with metadata about the bad file. The metadata includes information such as filenames, types, imported functions, and tags. You can use this information in the UDM search and detection engine with YARA-L to understand bad file events and in general during threat hunting. An example use case is to detect any modifications to the original file which would, in turn, import the file metadata for threat detection.

The following information is stored with the record. For a list of all UDM fields, see Unified Data Model field list.

Type of data	UDM field
SHA-256	`( principal \| target \| src \| observer ).file.sha256`
MD5	`( principal \| target \| src \| observer ).file.md5`
SHA-1	`( principal \| target \| src \| observer ).file.sha1`
Size	`( principal \| target \| src \| observer ).file.size`
ssdeep	`( principal \| target \| src \| observer ).file.ssdeep`
vhash	`( principal \| target \| src \| observer ).file.vhash`
authentihash	`( principal \| target \| src \| observer ).file.authentihash`
File type	`( principal \| target \| src \| observer ).file.file_type`
Tags	`( principal \| target \| src \| observer ).file.tags`
Capabilities tags	`( principal \| target \| src \| observer ).file.capabilities_tags`
Names	`( principal \| target \| src \| observer ).file.names`
First-seen time	`( principal \| target \| src \| observer ).file.first_seen_time`
Last-seen time	`( principal \| target \| src \| observer ).file.last_seen_time`
Last modification time	`( principal \| target \| src \| observer ).file.last_modification_time`
Last analysis time	`( principal \| target \| src \| observer ).file.last_analysis_time`
Embedded URLs	`( principal \| target \| src \| observer ).file.embedded_urls`
Embedded IPs	`( principal \| target \| src \| observer ).file.embedded_ips`
Embedded domains	`( principal \| target \| src \| observer ).file.embedded_domains`
Signature information	`( principal \| target \| src \| observer ).file.signature_info`
Signature information Sigcheck	`( principal \| target \| src \| observer).file.signature_info.sigcheck`
Signature information Sigcheck Verification message	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.verification_message`
Signature information Sigcheck Verified	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.verified`
Signature information Sigcheck Signers	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.signers`
Signature information Sigcheck Signers Name	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.signers.name`
Signature information Sigcheck Signers Status	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.signers.status`
Signature information Sigcheck Signers Valid usage for certificate	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.signers.valid_usage`
Signature information Sigcheck Signers Certificate issuer	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.signers.cert_issuer`
Signature information Sigcheck X509	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.x509`
Signature information Sigcheck X509 Name	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.x509.name`
Signature information Sigcheck X509 Algorithm	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.x509.algorithm`
Signature information Sigcheck X509 Thumbprint	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.x509.thumprint`
Signature information Sigcheck X509 Certificate issuer	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.x509.cert_issuer`
Signature information Sigcheck X509 Serial number	`( principal \| target \| src \| observer ).file.signature_info.sigcheck.x509.serial_number`
Signature information Codesign	`( principal \| target \| src \| observer ).file.signature_info.codesign`
Signature information Codesign ID	`( principal \| target \| src \| observer ).file.signature_info.codesign.id`
Signature information Codesign Format	`( principal \| target \| src \| observer ).file.signature_info.codesign.format`
Signature information Codesign Compilation time	`( principal \| target \| src \| observer ).file.signature_info.codesign.compilation_time`
Exiftool information	`( principal \| target \| src \| observer ).file.exif_info`
Exiftool information Original file name	`( principal \| target \| src \| observer ).file.exif_info.original_file`
Exiftool information Product name	`( principal \| target \| src \| observer ).file.exif_info.product`
Exiftool information Company name	`( principal \| target \| src \| observer ).file.exif_info.company`
Exiftool information File description	`( principal \| target \| src \| observer ).file.exif_info.file_description`
Exiftool information Entry point	`( principal \| target \| src \| observer ).file.exif_info.entry_point`
Exiftool information Compilation time	`( principal \| target \| src \| observer ).file.exif_info.compilation_time`
PDF information	`( principal \| target \| src \| observer ).file.pdf_info`
PDF information Number of /JS tags	`( principal \| target \| src \| observer ).file.pdf_info.js`
PDF information Number of /JavaScript tags	`( principal \| target \| src \| observer ).file.pdf_info.javascript`
PDF information Number of /Launch tags	`( principal \| target \| src \| observer ).file.pdf_info.launch_action_count`
PDF information Number of object streams	`( principal \| target \| src \| observer ).file.pdf_info.object_stream_count`
PDF information Number of object definitions (endobj keyword)	`( principal \| target \| src \| observer ).file.pdf_info.endobj_count`
PDF information PDF version	`( principal \| target \| src \| observer ).file.pdf_info.header`
PDF information Number of /AcroForm tags	`( principal \| target \| src \| observer ).file.pdf_info.acroform`
PDF information Number of /AA tags	`( principal \| target \| src \| observer ).file.pdf_info.autoaction`
PDF information Number of /EmbeddedFile tags	`( principal \| target \| src \| observer ).file.pdf_info.embedded_file`
PDF information /Encrypt tag	`( principal \| target \| src \| observer ).file.pdf_info.encrypted`
PDF information Number of /RichMedia tags	`( principal \| target \| src \| observer ).file.pdf_info.flash`
PDF information Number of /JBIG2Decode tags	`( principal \| target \| src \| observer ).file.pdf_info.jbig2_compression`
PDF information Number of object definitions (obj keyword)	`( principal \| target \| src \| observer ).file.pdf_info.obj_count`
PDF information Number of defined stream objects (stream keyword)	`( principal \| target \| src \| observer ).file.pdf_info.endstream_count`
PDF information Number of pages in the PDF	`( principal \| target \| src \| observer ).file.pdf_info.page_count`
PDF information Number of defined stream objects (stream keyword)	`( principal \| target \| src \| observer ).file.pdf_info.stream_count`
PDF information Number of /OpenAction tags	`( principal \| target \| src \| observer ).file.pdf_info.openaction`
PDF information Number of startxref keywords	`( principal \| target \| src \| observer ).file.pdf_info.startxref`
PDF information Number of colors expressed with more than 3 bytes (CVE-2009-3459)	`( principal \| target \| src \| observer ).file.pdf_info.suspicious_colors`
PDF information Number of trailer keywords	`( principal \| target \| src \| observer ).file.pdf_info.trailer`
PDF information Number of /XFA tags found	`( principal \| target \| src \| observer ).file.pdf_info.xfa`
PDF information Number of xref keywords	`( principal \| target \| src \| observer ).file.pdf_info.xref`
PE file metadata	`( principal \| target \| src \| observer ).file.pe_file`
PE file metadata Imphash	`( principal \| target \| src \| observer ).file.pe_file.imphash`
PE file metadata Entry point	`( principal \| target \| src \| observer ).file.pe_file.entry_point`
PE file metadata Entry point exiftool	`( principal \| target \| src \| observer ).file.pe_file.entry_point_exiftool`
PE file metadata Compilation time	`( principal \| target \| src \| observer ).file.pe_file.compilation_time`
PE file metadata Compilation exiftool time	`( principal \| target \| src \| observer ).file.pe_file.compilation_exiftool_time`
PE file metadata Sections	`( principal \| target \| src \| observer ).file.pe_file.section`
PE file metadata Sections Name	`( principal \| target \| src \| observer ).file.pe_file.section.name`
PE file metadata Sections Entropy	`( principal \| target \| src \| observer ).file.pe_file.section.entropy`
PE file metadata Sections Raw size in bytes	`( principal \| target \| src \| observer ).file.pe_file.section.raw_size_bytes`
PE file metadata Sections Virtual size in bytes	`( principal \| target \| src \| observer ).file.pe_file.section.virtual_size_bytes`
PE file metadata Sections MD5 hex	`( principal \| target \| src \| observer ).file.pe_file.section.md5_hex`
PE file metadata Imports	`( principal \| target \| src \| observer ).file.pe_file.imports`
PE file metadata Imports Library	`( principal \| target \| src \| observer ).file.pe_file.imports.library`
PE file metadata Imports Functions	`( principal \| target \| src \| observer ).file.pe_file.imports.functions`
PE file metadata Resource information	`( principal \| target \| src \| observer ).file.pe_file.resource`
PE file metadata Resource information SHA-256 hex	`( principal \| target \| src \| observer ).file.pe_file.resource.sha256_hex`
PE file metadata Resource information Resource type identified by magic Python module	`( principal \| target \| src \| observer ).file.pe_file.resource.filetype_magic`
PE file metadata Resource information Human-readable version of the language and sublanguage identifiers, as defined in the Windows PE specification	`( principal \| target \| src \| observer ).file.pe_file.resource_language_code`
PE file metadata Resource information Entropy	`( principal \| target \| src \| observer ).file.pe_file.resource.entropy`
PE file metadata Resource information File type	`( principal \| target \| src \| observer ).file.pe_file.resource.file_type`
PE file metadata Number of resources by resource type	`( principal \| target \| src \| observer ).file.pe_file.resources_type_count_str`
PE file metadata Number of resources by language	`( principal \| target \| src \| observer ).file.pe_file.resources_language_count_str`

Enrich entities with VirusTotal relationship data

VirusTotal helps analyze suspicious files, domains, IP addresses, and URLs to detect malware and other breaches, and share the findings with the security community. Google SecOps ingests data from VirusTotal related connections. This data is stored as an entity and provides information about the relation between file hashes and files, domains, IP addresses, and URLs.

Analysts can use this data to determine if a file hash is malicious based on information about the URL or domain from other sources. This information can be used to create Detection Engine rules that query against the entity-context data to build context-aware analytics.

This data is only available for certain VirusTotal and Google SecOps licenses. Check your entitlements with your account manager.

The following information is stored with the entity-context record:

UDM field	Description
`entity.metadata.product_entity_id`	A unique identifier for the entity
`entity.metadata.entity_type`	Stores the value `FILE`, indicating that the entity describes a file
`entity.metadata.interval`	`start_time` refers to the beginning of time and `end_time` is the end of time for which this data is valid
`entity.metadata.source_labels`	This field stores a list of key-value pairs of `source_id` and `target_id` for this entity. `source_id` is the file hash and `target_id` can be hash or value of the URL, domain name, or IP address that this file is related to. You can search for the URL, domain name, IP address, or file at virustotal.com.
`entity.metadata.product_name`	Stores the value 'VirusTotal Relationships'
`entity.metadata.vendor_name`	Stores the value 'VirusTotal'
`entity.file.sha256`	Stores the SHA-256 hash value for the file
`entity.file.relations`	A list of child entities that the parent file entity is related to
`entity.relations.relationship`	This field explains the type of relationship between parent and child entities. The value can be either `EXECUTES`, `DOWNLOADED_FROM`, or `CONTACTS`.
`entity.relations.direction`	Stores the value 'UNIDIRECTIONAL' and indicates the direction of relation with the child entity
`entity.relations.entity.url`	The URL that the file in the parent entity contacts (if the relationship between the parent entity and the URL is `CONTACTS`) or the URL from which the file in the parent entity was downloaded (if the relationship between the parent entity and the URL is `DOWNLOADED_FROM`).
`entity.relations.entity.ip`	A list of IP addresses that the file in parent entity contacts or was downloaded from It only contains one IP address.
`entity.relations.entity.domain.name`	The domain name which the file in parent entity contacts or was downloaded from
`entity.relations.entity.file.sha256`	Stores the SHA-256 hash value for the file in the relation
`entity.relations.entity_type`	This field contains the type of entity in the relation. The value can be `URL`, `DOMAIN_NAME`, `IP_ADDRESS`, or `FILE`. These fields are populated in accordance with the `entity_type`. For example, if `entity_type` is `URL`, then `entity.relations.entity.url` is populated.

What's next

For information about how to use enriched data with other Google SecOps features, see the following:

Need more help? Get answers from Community members and Google SecOps professionals.