Understand data availability for search
This document details the data ingestion lifecycle, including end-to-end data flow and latency, and how these factors impact the availability of recently ingested data for querying and analysis.
Ingest and process data in Google Security Operations
This section describes how Google SecOps ingests, processes, and analyses security data.
Data ingestion
The data ingestion pipeline begins by collecting your raw security data from sources such as:
- Security logs from your internal systems
- Data stored in Cloud Storage
- Your Security Operations Center (SOC) and other internal systems
Google SecOps brings this data to the platform using one of its secure ingestion methods.
The primary ingestion methods are:
Direct Google Cloud ingestion
Google SecOps uses a direct Google Cloud ingestion to automatically pull in logs and telemetry data from your organization's Google Cloud, including Cloud Logging, Cloud Asset Inventory metadata, and Security Command Center Premium findings.
Ingestion APIs
Send data directly to Google SecOps using its public REST Ingestion APIs. You use this method for custom integrations or to send data as either unstructured logs or pre-formatted Unified Data Model (UDM) events.
Bindplane agent
You can deploy the versatile Bindplane agent in your environment (on-prem or other clouds) to collect logs from a wide variety of sources and forward them to Google SecOps.
Data feeds
In Google SecOps, you configure data feeds to pull data logs from third-party sources, such as specific Google Cloud buckets (like Amazon S3) or third-party APIs (like Okta or Microsoft 365).
Normalization and data enrichment
Once data arrives in Google SecOps, the platform processes it through the following stages:
Parsing and normalization
Raw log data is first processed by a parser to validate, extract, and transform the data from its original format into the standardized UDM. Parsing and normalization lets you analyze disparate data sources (for example, firewall logs, endpoint data, cloud logs) using a single, consistent schema. The original raw log remains stored alongside the UDM event.
Indexing
After normalization, Google SecOps indexes the UDM data to deliver fast query speeds across massive datasets, making the UDM events searchable. It also indexes the original raw logs for "raw log" searches.
Data enrichment
Google SecOps enriches your data with valuable context as follows:
- Entity context (aliasing): Aliasing enriches UDM log records by identifying and adding context data and indicators for log entities. For example, it connects a user's login name to their various IP addresses, hostnames, and MAC addresses, building a consolidated "entity graph."
- Threat intelligence: Data is automatically compared against Google's vast threat intelligence, including sources like VirusTotal and Safe Browsing, to identify known malicious domains, IPs, file hashes, and more.
- Geolocation: IP addresses are enriched with geographic location data.
- WHOIS: Domain names are enriched with their public registration (WHOIS) information.
Data availability for analysis
After being processed and enriched, UDM data is immediately available for analysis:
-
The Detection Engine automatically runs your custom detection rules (and Google's built-in rules) against all incoming data to automatically identify threats and generate alerts.
Search and investigation
An analyst can now search across all this normalized and enriched data using UDM search, pivot between related entities (like a
user, to theirasset, to a maliciousdomain), and investigate alerts.
Search methods
Google SecOps provides several distinct methods for searching your data, each serving a different purpose.
UDM search
UDM search is the primary and fastest search method, used for most investigations.
- What it searches: It queries the normalized and indexed UDM events. Because all data is parsed into this standard format, you can write one query to find the same activity (like a login) across all your different products (for example, Windows, Okta, Linux).
- How it works: You use a specific syntax to query fields, operators, and values.
- Example:
principal.hostname = "win-server" AND target.ip = "10.1.2.3"
Raw log search
Use Raw log search to find something in the original, unparsed log message that may not have been mapped to a UDM field.
- What it searches: It scans the original, raw text of the logs before they were parsed and normalized. This is useful for finding specific strings, command-line arguments, or other artifacts that aren't indexed UDM fields.
- How it works: You use the
raw =prefix. It can be slower than UDM search because it doesn't search indexed fields. - Example (String):
raw = "PsExec.exe" - Example (Regex):
raw = /admin\$/
Natural language search (Gemini)
Natural language search (Gemini) lets you use plain English to ask questions, which Gemini then translates into a formal UDM query.
- What it searches: It provides a conversational interface to query UDM data.
- How it works: You type a question, and Gemini generates the underlying UDM search query for you, which you can then run or refine.
- Example: "Show me all failed logins from user 'bob' in the last 24 hours"
SOAR search
SOAR search is specific to SOAR components. You use it to manage security incidents, not to hunt in logs.
- What it searches: It searches for Cases and Entities (like users, assets, IPs) within the SOAR platform.
- How it works: You can use free-text or field-based filters to find cases by, for example, their ID, alert name, status, and assigned user.
- Example: Search for
CaseIds:180orAlertName:Brute Force
Data ingestion pipeline to search availability
The system processes newly ingested data through several steps. The duration of these steps determines when newly ingested data becomes available for querying and analysis.
The following table breaks down the processing steps for newly ingested data by search method. Newly ingested data becomes searchable after these steps are complete.
| Search method | Data being searched | Processing steps contributing to availability time |
|---|---|---|
| Normalized & enriched UDM events |
|
|
| Raw log search | Original, unparsed log text |
|
| SOAR search | Cases and entities |
This is a different lifecycle, as it searches for alerts and cases, not logs. The time is based on:
|
Example data flow
The following example demonstrates how Google SecOps ingests, processes, enhances, and analyzes your security data, making it available for searches and further analysis.
Example of data processing steps
- Retrieves security data from cloud services like Amazon S3 or the Google Cloud. Google SecOps encrypts this data in transit.
- Separates and stores your encrypted security data in your account. Access is limited to you and a small number of Google personnel for product support, development, and maintenance.
- Parses and validates raw security data, making it easier to process and view.
- Indexes and normalizes the data for quick searches.
- Stores the parsed and indexed data within your account.
- Enriches with context data.
- Offers secure access for users to search and review their security data.
- Compares your security data with the VirusTotal malware database to identify matches. In a Google SecOps event view, such as the Asset view, click VT Context to see VirusTotal information. Google SecOps doesn't share your security data with VirusTotal.
Examples of the expected time until Search availability
Expected time till the newly ingested data becomes available to Search, is the sum of the flow durations along the data flow.
For example, a typical average time for data availability in UDM search is approximately 5 minutes and 30 seconds from when the data is sent to the Google SecOps ingestion service.
| Data flow step | Description | Flow duration |
|---|---|---|
| Cloud Storage to Raw logs | Ingests raw logs from Cloud Storage. | Less than 30 seconds |
| Security logs to Data forwarding service | Transmits security logs from internal systems to the platform. | N/A |
| Data forwarding service to Raw logs | Sends raw security data received from various sources to the ingestion pipeline. | Less than 30 seconds |
| Raw logs to Parse and validate | Parses and validates raw logs into the UDM format. | Less than 3 minutes |
| Parse and validate to Index | Indexes the parsed UDM data for fast searching. | N/A |
| Index to Parsed customer data | Makes the indexed data available as parsed customer data for analysis. | Less than 2 minutes |
Need more help? Get answers from Community members and Google SecOps professionals.