Google Security Operations data in BigQuery
Google Security Operations provides a managed data lake of normalized and threat intelligence enriched telemetry by exporting data to BigQuery. This enables you to do the following:
- Run ad-hoc queries directly in BigQuery.
- Use your own business intelligence tools, such as Looker or Microsoft Power BI, to create dashboards, reports, and analytics.
- Join Google Security Operations data with third-party datasets.
- Run analytics using data science or machine learning tools.
- Run reports using predefined default dashboards and custom dashboards.
Google Security Operations exports the following categories of data to BigQuery:
- UDM event records: UDM records created from log data ingested by customers. These records are enriched with aliasing information.
- Rules matches (detections): instances where a rule matches one or more events.
- IoC matches: artifacts (for example domains, IP addresses) from events that match Indicator of Compromise (IoC) feeds. This includes matches to from global feeds and customer-specific feeds.
- Ingestion metrics: include statistics, such as number of log lines ingested, number of events produced from logs, number of log errors indicating that logs could not be parsed, and the state of Google Security Operations forwarders. For more information, see Ingestion metrics BigQuery schema.
- Entity graph and entity relationships: stores the description of entities and their relationships with other entities.
Data export flow
The data export flow is as follows:
- A set of Google Security Operations data, specific to a use case, is exported to a BigQuery instance that exists in a customer-specific Google Cloud project and is managed by Google. Data for each use case is exported to a separate table. This is exported from Google Security Operations to BigQuery in a customer-specific project.
- As part of the export, Google Security Operations creates a pre-defined Looker data model for each use case.
- Google Security Operations default dashboards are built using the predefined Looker data models. You can create custom dashboards in Google Security Operations using the predefined Looker data models.
- Customers can write ad-hoc queries against Google Security Operations data stored in BigQuery tables.
Customers can also create more advanced analytics using other third-party tools that integrate with BigQuery.
The BigQuery instance is created in the same region as the Google Security Operations tenant. One BigQuery instance is created for each customer ID. Raw logs are not exported to the Google Security Operations data lake in BigQuery. Data is exported on a fill forward basis. As data is ingested and normalized in Google Security Operations it is exported to BigQuery. You cannot backfill previously ingested data. The retention period for data in all BigQuery tables is 365 days.
For Looker connections, contact your Google Security Operations representative for service account credentials that enable you to connect your Looker instance to Google Security Operations data in BigQuery. The service account will have read-only permission.
Overview of the tables
Google Security Operations creates the datalake
dataset in BigQuery and the following tables:
entity_enum_value_to_name_mapping
: for enumerated types in theentity_graph
table, maps the numerical values to the string values.entity_graph
: stores data about UDM entities.events
: stores data about UDM events.ingestion_metrics
: stores statistics related to ingestion and normalization of data from specific ingestion sources, such as Google Security Operations forwarders, feeds, and Ingestion API.ioc_matches
: stores IOC matches found against UDM events.job_metadata
: an internal table used to track the export of data to BigQuery.rule_detections
: stores detections returned by rules run in Google Security Operations.rulesets
: stores information about Google Security Operations curated detections, including the category each rule set belongs to, whether it is enabled, and the current alerting status.udm_enum_value_to_name_mapping
: For enumerated types in the events table, maps the numerical values to the string values.udm_events_aggregates
: stores aggregated data summarized by hour of normalized events.
Access data in BigQuery
You can run queries directly in BigQuery or connect your own business intelligence tool, such as Looker or Microsoft Power BI, to BigQuery.
To enable access to the BigQuery instance, use either the Google Security Operations CLI or the Google Security Operations BigQuery Access API. You can provide an email address for either a user or a group that you own. If you configure access to a group, use the group to manage which team members can access the BigQuery instance.
To connect Looker or another business intelligence tool to BigQuery, contact
your Google Security Operations representative for service account credentials that enable you to
connect an application to the Google Security Operations BigQuery dataset. The service
account will have IAM BigQuery Data Viewer role (roles/bigquery.dataViewer
) and BigQuery Job Viewer role (roles/bigquery.jobUser
).
What's next
- Learn more about the following schemas:
- For information about accessing and running queries in BigQuery, see Run interactive and batch query jobs.
- For information about how to query partitioned tables, see Query partitioned tables.
For information about how to connect Looker to BigQuery, see Looker documentation about connecting to BigQuery.