Scenarios for exporting logging data: Security and access analytics

This document shows how to export logs from Cloud Logging to destinations like BigQuery, or Chronicle or a third-party SIEM to meet the security and analytics requirements of your organization's cloud infrastructure environment. Organizations often use such security analytics tools to help prevent, detect and respond to threats like malware, phishing, ransomware, as well as potential asset misconfigurations. To help you meet these security and analytics requirements, this document first introduces the variety of security-relevant logs in your Google Cloud environment, from your application logs to platform logs in addition to audit logs like Admin Activity logs and Data Access logs.

Using the log scoping tool, you can first evaluate these security-relevant log types in terms of the visibility and threat detection coverage they provide. The tool helps you by mapping out threat tactics and techniques from the popular MITRE ATT&CK® threat model to the specific Google Cloud log types(s) that would help you investigate these common threats. Similarly, it maps out Security Command Center's Event Threat Detection modules to the corresponding Google Cloud log sources they rely on.

In this scenario, the exported logs are delivered to a dataset in BigQuery that you configure as a part of the export in addition to the user-defined logging filter. You grant permissions to limit access to the logs as appropriate. You can simplify managing and querying the data by configuring the log sink to export data into partitioned tables where data is partitioned by day based on log entry's timestamp field. This approach can help reduce query costs by reducing the amount of data scanned as a part of queries. Another benefit of partitioned tables is that you can set a partition expiration at the individual table or the entire dataset level, so that you maintain the logging data only as long as you find it useful. For example, you might keep audit logging data for 3 years and then older data is automatically deleted as partitions age out and expire.

This scenario is part of the series Design patterns for exporting Cloud Logging.

Evaluate which logs to export

Log scoping tool

The following interactive log scoping tool lists valuable security-relevant logs across Google Cloud including Cloud Audit Logs, Access Transparency logs, and several platform logs. To help you evaluate which logs to export and analyze to meet your own security and compliance needs, this tool maps each log type to the corresponding:

Select desired log types to automatically generate a suitable log filter to be used in the Set up the logging export section.

Record the log filter

In Cloud Shell, create a variable to save the preceding auto-generated log filter. You can also refine it further depending on your requirements. For example, you could filter (or exclude) resources only in one or more specific projects.

export LOG_FILTER='log-filter'

Set up the logging export

The following diagram shows the steps for enabling logging export to BigQuery.

  • Set up the logging export dataset in BigQuery.
  • Turn on audit logging for all Google Cloud services.
  • Turn on platform logs for specific Google Cloud services.
  • Configure the logging export.
  • Set IAM policy permissions for the BigQuery dataset.

Enabling logging export to BigQuery.

Set up the dataset in BigQuery

Follow the instructions to set up a dataset that will host your exported logs. If you are using aggregated logs, the BigQuery dataset should be located in one of the Google Cloud projects within your organization. When you are using log exports for a single project, the BigQuery dataset should be in the same project.

Turn on audit logging for all services

Data Access audit logs—except for BigQuery—are disabled by default. To configure Data Access audit logs directly in Google Cloud Console, see Configure Data Access audit logs. Alternatively, you can configure Data Access audit logs with gcloud using IAM policy objects. For an example broad IAM policy configuration that enables Data Access audit logs for all services and all users, see Enable all Data Access audit logs.

Turn on service-specific platform logs

Platform logs must be enabled on a service-by-service basis, typically at the resource level. For example, Cloud DNS logs are enabled at the VPC network level, VPC Flow Logs are enabled at the subnet level for all VMs in the subnet, and Firewall Rules logs are enabled at the individual Firewall Rule level. For details on how to enable a specific log type, see log scoping tool, and click the Enable link in each row.

Configure the logging export

From the gcloud command-line tool, you use the gcloud logging sinks create command or the organizations.sinks.create API call to create a sink with the appropriate logging filter. The following example gcloud command creates a sink called gcp_logging_sink_bq for the organization. The sink includes all child projects and specifies the log filter saved in the $LOG_FILTER variable created above.

gcloud logging sinks create gcp_logging_sink_bq \
     bigquery.googleapis.com/projects/compliance-logging-export/datasets/gcp_logging_export \
     --use-partitioned-tables \
     --log-filter='$LOG_FILTER' \
     --include-children \
     --organization=324989855333

The command output is similar to the following:

Created [https://logging.googleapis.com/v2/organizations/324989855333/sinks/gcp_logging_sink_bq].
Please remember to grant `serviceAccount:gcp-logging-sink-bq@logging-o324989855333.iam.gserviceaccount.com` the WRITER role on the dataset..
More information about sinks can be found at /logging/docs/export/configure_export

In the serviceAccount entry returned from the API call, the identity gcp-logging-sink-bq@logging-o324989855333.iam.gserviceaccount.com is included in the response. This identity represents a Google Cloud service account that has been created specifically for your log export. Until you grant this identity write access to the BigQuery dataset, log entry exports from this sink will fail. To learn more, see the following section.

Set IAM policy permissions for the BigQuery dataset

By adding the service account gcp-logging-sink-bq@logging-o324989855333.iam.gserviceaccount.com to the gcp_logging_export dataset with the Editor permissions, you grant the service account permission to write to the destination. Until you add these permissions, the sink export will fail.

To add the permissions to the service account, follow these steps:

  1. In the Cloud Console, go to BigQuery:

    Go to BigQuery

  2. Open your newly created gcp_logging_export dataset.

  3. In the Dataset info tab, click on the Sharing drop-down menu, and then click Permissions.

  4. In the Dataset Permissions side panel, click Add Principal.

  5. In the New principals field, enter the service account. As shown in the following image, under the Role drop-down menu, select BigQuery Data Editor:

    IAM policy permissions - Editor

After you create the logging export by using this filter, log files begin to populate the BigQuery dataset in the configured project.

For an example set of permissions, you can do the following:

  • Remove all nonessential users from the BigQuery dataset permissions.
  • Add full control for the BigQuery admin.
  • Grant the export service account permissions to write the export logs.
  • Grant other individual users viewer access to the Google Cloud logging exports.

Using the exported logs

When you export logs to a BigQuery dataset, Cloud Logging creates tables to hold the exported log entries. Log entries are placed in tables that are named based on the entries' log names, and partitioned based on the entries' timestamps. For example, Data Access audit logs are routed to cloudaudit_googleapis_com_data_access table as shown in the following image:

List of tables with schema.

Both Admin Activity and Data Access logs are loaded in BigQuery using the protoPayload format. See Fields in exported audit logs for details on schema conversions done by Cloud Logging before writing to BigQuery.

Sample questions and queries

You can run a broad range of queries against the audit logs. These queries perform analysis to understand who is accessing or modifying data in Google Cloud. Replace gcp_logging_export with the name of your BigQuery dataset.

Which users most frequently accessed data in the past week?

The following query uses the Data Access audit logs to find the user identities that most frequently accessed BigQuery tables data over the past week.

SELECT
  protopayload_auditlog.authenticationInfo.principalEmail,
  COUNT(*) AS COUNTER
FROM `gcp_logging_export.cloudaudit_googleapis_com_data_access`
WHERE
  (protopayload_auditlog.methodName = "google.cloud.bigquery.v2.JobService.InsertJob" OR
   protopayload_auditlog.methodName = "google.cloud.bigquery.v2.JobService.Query")
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY
  1
ORDER BY
  2 desc, 1
LIMIT
  100

Which users accessed the data in the "accounts" table last month?

The following query uses the Data Access audit logs to find the user account that most frequently queried the "accounts" table over the past month. Replace MY_PROJECT_ID and MY_DATASET with your project ID and dataset respectively.

SELECT
  protopayload_auditlog.authenticationInfo.principalEmail,
  COUNT(*) AS COUNTER
FROM `gcp_logging_export.cloudaudit_googleapis_com_data_access`,
  UNNEST(protopayload_auditlog.authorizationInfo) authorizationInfo
WHERE
  (protopayload_auditlog.methodName = "google.cloud.bigquery.v2.JobService.InsertJob" OR
   protopayload_auditlog.methodName = "google.cloud.bigquery.v2.JobService.Query")
  AND authorizationInfo.permission = "bigquery.tables.getData"
  AND authorizationInfo.resource = "projects/MY_PROJECT_ID/datasets/MY_DATASET/tables/accounts"
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
  1
ORDER BY
  2 desc, 1
LIMIT
  100

Which users deleted VMs in the last week?

The following query uses the Admin Activity audit logs to find the user identities that deleted VMs over the past week.

SELECT
  timestamp,
  resource.labels.instance_id,
  protopayload_auditlog.authenticationInfo.principalEmail,
  protopayload_auditlog.resourceName,
  protopayload_auditlog.methodName
FROM `gcp_logging_export.cloudaudit_googleapis_com_activity`
WHERE
  resource.type = "gce_instance"
  AND protopayload_auditlog.methodName = "v1.compute.instances.delete"
  AND operation.first IS TRUE
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY
  timestamp desc,
  resource.labels.instance_id
LIMIT
  1000

The following query uses the Admin Activity audit logs to find how often autoscaling has been employed over the past month.

SELECT
  protopayload_auditlog.methodName,
  COUNT(*) AS counter
FROM `gcp_logging_export.cloudaudit_googleapis_com_activity`
WHERE
  resource.type = "gce_instance_group_manager"
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
  1
ORDER BY
  1
LIMIT
  1000

Using the following query, you can visualize the trend for Compute Engine instance manager operations over time, broken down by day.

SELECT
  TIMESTAMP_TRUNC(timestamp, DAY) AS day,
  protopayload_auditlog.methodName AS methodName,
  COUNT(*) AS counter
FROM `gcp_logging_export.cloudaudit_googleapis_com_activity`
WHERE
  resource.type = "gce_instance_group_manager"
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
  1, 2
ORDER BY
  1, 2

You can use the preceding query as a data source custom query in Google Data Studio to visualize the trend over time, as shown in the following graph.

Data Studio visualization.

See Data Studio custom queries for more information.

What tables are most frequently accessed and by whom?

The following query uses the Data Access audit logs to find the BigQuery tables with most frequently read and modified data over the past month. It displays the associated user identity along with breakdown of total number of times data was read versus modified.

SELECT
  protopayload_auditlog.resourceName,
  protopayload_auditlog.authenticationInfo.principalEmail,
  COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL) AS dataReadEvents,
  COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL) AS dataChangeEvents,
  COUNT(*) AS totalEvents
FROM `gcp_logging_export.cloudaudit_googleapis_com_data_access`
WHERE
  STARTS_WITH(resource.type, 'bigquery') IS TRUE
  AND (JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL
    OR JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL)
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
  1, 2
ORDER BY
  5 DESC, 1, 2
LIMIT 1000

What are the top 10 queries against BigQuery in the past week?

The following query uses the Data Access audit logs to find the most common queries over the past week. It also lists the corresponding users and the referenced tables.

SELECT
  COALESCE(
   JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, "$.jobChange.job.jobConfig.queryConfig.query"),
   JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, "$.jobInsertion.job.jobConfig.queryConfig.query")) as query,
  STRING_AGG(DISTINCT protopayload_auditlog.authenticationInfo.principalEmail, ',') as users,
  ANY_VALUE(COALESCE(
   JSON_EXTRACT_ARRAY(protopayload_auditlog.metadataJson, "$.jobChange.job.jobStats.queryStats.referencedTables"),
   JSON_EXTRACT_ARRAY(protopayload_auditlog.metadataJson, "$.jobInsertion.job.jobStats.queryStats.referencedTables"))) as tables,
  COUNT(*) AS counter
FROM `gcp_logging_export.cloudaudit_googleapis_com_data_access`
WHERE
  (resource.type = 'bigquery_project' OR resource.type = 'bigquery_dataset')
  AND operation.last IS TRUE
  AND (JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.jobChange") IS NOT NULL
    OR JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.jobInsertion") IS NOT NULL)
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY
  query
ORDER BY
  counter DESC
LIMIT 10

What are the most common actions recorded in the data access log over the past month?

The following query uses all Cloud Audit Logs to find the 100 most frequent actions recorded over the past month.

SELECT
  ANY_VALUE(_TABLE_SUFFIX),
  protopayload_auditlog.methodName,
  protopayload_auditlog.serviceName,
  resource.type,
  COUNT(*) AS counter
FROM `gcp_logging_export.cloudaudit_googleapis_com_*`
WHERE
  timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
  protopayload_auditlog.methodName,
  protopayload_auditlog.serviceName,
  resource.type
ORDER BY
  counter DESC
LIMIT 100

What's next