This scenario shows how to export logs from Cloud Logging to BigQuery to meet the security and analytics requirements of your organization's cloud infrastructure environment. Organizations often use analytical tools to help identify unauthorized changes in configuration and inappropriate access to data. To help you meet these security and analytics requirements, Cloud Logging can capture two types of audit logs: Admin Activity logs and Data Access logs.
In this scenario, the exported logs are delivered to a dataset in BigQuery that you configure as a part of the export. You grant permissions to limit access to the logs as appropriate. You can simplify managing and querying the data by organizing the exported data into date-partitioned tables. This approach can help reduce query costs by reducing the amount of data scanned as a part of queries. One benefit of partitioning is that you can set an expiration date for the partitions so that you maintain the logging data only as long as you find it useful. For example, you might keep audit logging data for 3 years and then delete it.
This scenario is part of the series Design patterns for exporting Cloud Logging.
Set up the logging export
The following diagram shows these steps for enabling logging export to BigQuery.
- Set up the logging export dataset in BigQuery.
- Turn on audit logging for all Google Cloud services.
- Configure the logging export.
- Set IAM policy permissions for the BigQuery dataset.
Set up the dataset in BigQuery
Follow the instructions to set up a dataset that will host your exported logs. If you are using aggregated logs, the BigQuery dataset should be located in one of the Google Cloud projects within your organization. When you are using log exports for a single project, the BigQuery dataset should be in the same project.
Best practice: When you create the table, set a partition expiration to limit the size of the logging export storage and cumulative storage costs over time.
Turn on audit logging for all services
Data Access audit logs—except for BigQuery—are disabled by default. In order to enable all audit logs, follow the instructions to update the IAM policy with the configuration listed in the audit policy documentation. The steps include the following:
- Downloading the current IAM policy as a file.
- Adding the audit log policy JSON or YAML object to the current policy file.
- Updating the project with the updated policy file.
The following is an example JSON object that enables all audit logs for all services.
"auditConfigs": [ { "service": "allServices", "auditLogConfigs": [ { "logType": "ADMIN_READ" }, { "logType": "DATA_READ" }, { "logType": "DATA_WRITE" }, ] }, ]
Configure the logging export
After you set up aggregated exports or logs export, you need to refine the logging filters to export audit logs. The following logging filter includes the Admin Activity and Data Access audit logs and the logs for specific resource types.
logName:"/logs/cloudaudit.googleapis.com" OR resource.type:gce OR resource.type=gcs_bucket OR resource.type=cloudsql_database OR resource.type=bigquery_resource
From the gcloud
command-line tool, you use the
gcloud logging sinks create
command or the
organizations.sinks.create
API call to create a sink with the
appropriate filters. The following example gcloud
command creates a sink
called gcp_logging_sink_gcs
for the organization. The sink includes all
children projects and specifies filtering to select specific audit logs.
gcloud logging sinks create gcp_logging_sink_bq22 \ bigquery.googleapis.com/projects/compliance-logging-export/datasets/gcp_logging_export \ --log-filter='logName:"/logs/cloudaudit.googleapis.com"' \ --include-children \ --organization=324989855333
The command output is similar to the following:
Created [https://logging.googleapis.com/v2/organizations/324989855333/sinks/gcp_logging_sink_bq]. Please remember to grant `serviceAccount:gcp-logging-sink-bq@logging-o324989855333.iam.gserviceaccount.com` the WRITER role on the dataset.. More information about sinks can be found at /logging/docs/export/configure_export
In the serviceAccount
entry returned from the API call, the identity
gcp-logging-sink-bq@logging-o324989855333.iam.gserviceaccount.com
is included
in the response. This identity represents a Google Cloud
service account that has been created for the export. Until you grant this
identity write access to the destination, log entry exports from this sink will
fail. For more information, see the next section or the
required permissions
for a BigQuery dataset.
Set IAM policy permissions for the BigQuery dataset
By adding the service account
gcp-logging-sink-bq@logging-o324989855333.iam.gserviceaccount.com
to the
gcp_logging_export
dataset with the Editor permissions, you grant the service
account permission to write to the destination. Until you add these permissions,
the sink export will fail.
To add the permissions to the service account, follow these steps:
In the Cloud Console, go to BigQuery:
Next to the
gcp_logging_export
dataset, click arrow_downward, and then click Share Dataset.In the Add people field, enter the service account.
After you create the logging export by using this filter, log files begin to populate the BigQuery dataset in the configured project.
Best practice: Implement a policy of least permissions based on your needs. You can configure the dataset permissions based on specific Google Cloud user accounts, Google Groups, or Google Cloud service accounts. Use IAM permissions to grant access to the BigQuery dataset.
For an example set of permissions, you can do the following:
- Remove all nonessential users from the BigQuery dataset permissions.
- Add full control for the BigQuery admin.
- Grant the export user permissions to write the export logs.
- Grant other individual users viewer access to the Google Cloud logging exports.
You can update the IAM permissions for the dataset directly in the
Cloud Console, through the gsutil
command-line utility, or through
the IAM API.
Using the exported logs
When you export logs to a BigQuery dataset, Cloud Logging creates dated tables to hold the exported log entries. Log entries are placed in tables whose names are based on the entries' log names.
The list of tables appear with the YYYYMMDD suffix, as shown in the following sample.
Because the exported logs are audit logs, the Admin Activity and Data Access
logs are loaded in BigQuery using the protoPayload
format.
Granting external access
You might want to grant specific users access to exported logs—for example, security analysts, your DevOps team, and auditors. BigQuery offers a number of options to grant secure access to the logs.
Logs location strategies
There are several options to select which logs should be viewable to users in BigQuery.
Create copies of the logs to share.
Manually or programmatically create a copy of an individual log table or set of log tables, and place the copies in a separate BigQuery dataset. Then use the separate BigQuery dataset permissions to share logs with specific users as appropriate.
Advantages: You can limit the amount of data that is exposed to only the copied data.
Disadvantages: You have to create, share, and manage separate data sets and permissions, which can lead to higher costs.
Grant read-only access to all logs.
Manually or programmatically set viewer permissions to the BigQuery logging export tables, which grants access to all log exports.
Advantages: Access is easy to grant.
Disadvantages: You must grant access to all of the logs rather than specific log files.
User access control strategies
There are several options for granting access to the logs in BigQuery.
Use a Google Group.
Create a Google Group such as
auditors@example.com
with read-only access to the logging export BigQuery dataset. You then manage the list of Google Accounts by adding or removing auditors to the Google Group.Advantages: It's easy to manage access through a group; there's clear user-access purpose.
Disadvantages: It's not possible to tell who has access without looking at the group's membership
Use individual Google Accounts.
Grant individual Google Account access to the logging export BigQuery dataset for each individual user that requires it.
Advantages: It's easy to add each user manually or programmatically.
Disadvantages: It's not possible to discern audit users from other viewers.
Sample questions and queries
You can run a broad range of queries against the audit logs. These queries perform analysis to understand who is accessing data in Google Cloud or how your autoscalers are operating over time.
Which users most frequently accessed data in the past week?
The following query uses the Data Access Cloud Audit Logs logs to find the user account that most frequently appears in the Data Access logs.
SELECT protopayload_auditlog.authenticationInfo.principalEmail, COUNT(*) AS COUNTER FROM (TABLE_DATE_RANGE( Logging_export.cloudaudit_googleapis_com_data_access_, DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY'), CURRENT_TIMESTAMP()) ) WHERE protopayload_auditlog.methodName = "jobservice.query" GROUP BY protopayload_auditlog.authenticationInfo.principalEmail ORDER BY protopayload_auditlog.authenticationInfo.principalEmail, COUNTER desc LIMIT 1000
Which users accessed the data in the "accounts" table last month?
The following query uses the Data Access Cloud Audit Logs logs to find the user
account that most frequently queried the "accounts" table. Replace
your-project-id
with your project ID.
SELECT protopayload_auditlog.authenticationInfo.principalEmail, COUNT(*) AS COUNTER FROM (TABLE_DATE_RANGE(logging_export.cloudaudit_googleapis_com_data_access_,DATE_ADD(CURRENT_TIMESTAMP(),-30,'DAY'),CURRENT_TIMESTAMP())) WHERE protopayload_auditlog.methodName = "jobservice.query" AND protopayload_auditlog.authorizationInfo.permission = "bigquery.tables.getData" AND protopayload_auditlog.authorizationInfo.resource = "projects/your-project-id/datasets/bqtesting/tables/accounts" GROUP BY protopayload_auditlog.authenticationInfo.principalEmail ORDER BY protopayload_auditlog.authenticationInfo.principalEmail, COUNTER desc LIMIT 1000
Which users deleted virtual machines in the last week?
The following query uses the Admin Activity Cloud Audit Logs logs to find the user accounts that deleted virtual machines over the past week.
SELECT timestamp, resource.labels.instance_id, protopayload_auditlog.authenticationInfo.principalEmail, protopayload_auditlog.resourceName, protopayload_auditlog.methodName FROM (TABLE_DATE_RANGE( logging_export.cloudaudit_googleapis_com_activity_, DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY'), CURRENT_TIMESTAMP()) ) WHERE resource.type = "gce_instance" AND operation.first IS TRUE AND protopayload_auditlog.methodName = "v1.compute.instances.delete" ORDER BY timestamp, resource.labels.instance_id LIMIT 1000
The following query is a follow-up query that summarizes the account by simple count.
SELECT protopayload_auditlog.authenticationInfo.principalEmail, count(*) as counter FROM (TABLE_DATE_RANGE( logging_export.cloudaudit_googleapis_com_activity_, DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY'), CURRENT_TIMESTAMP()) ) WHERE resource.type = "gce_instance" AND operation.first IS TRUE AND protopayload_auditlog.methodName = "v1.compute.instances.delete" GROUP BY protopayload_auditlog.authenticationInfo.principalEmail ORDER BY protopayload_auditlog.authenticationInfo.principalEmail, counter LIMIT 1000
How often has autoscaling been used in the past month? Is autoscaling trending up or down, or is it holding steady?
The following query uses the Admin Activity Cloud Audit Logs logs to find how often autoscaling has been employed over the past month.
SELECT protopayload_auditlog.methodName, COUNT(*) AS counter FROM (TABLE_DATE_RANGE( logging_export.cloudaudit_googleapis_com_activity_, DATE_ADD(CURRENT_TIMESTAMP(),-30,'DAY'), CURRENT_TIMESTAMP()) ) WHERE resource.type = "gce_instance_group_manager" GROUP BY protopayload_auditlog.methodName, ORDER BY protopayload_auditlog.methodName LIMIT 1000
Using the following query, you can visualize the trend of Compute Engine instance manager operations over time.
SELECT timestamp, protopayload_auditlog.methodName AS methodName, COUNT(*) AS counter FROM (TABLE_DATE_RANGE( logging_export.cloudaudit_googleapis_com_activity_, DATE_ADD(CURRENT_TIMESTAMP(),-30,'DAY'), CURRENT_TIMESTAMP()) ) WHERE resource.type = "gce_instance_group_manager" GROUP BY timestamp, methodName ORDER BY timestamp, methodName
You can use the preceding query as a data source custom query in Google Data Studio to visualize the trend over time, as shown in the following graph.
See Data Studio custom queries for more information.
What data sets are most frequently accessed and by whom?
The following query uses the Data Access Cloud Audit Logs Logs to find the BigQuery datasets that are most frequently accessed, and displays the associated user account along with the count.
SELECT protopayload_auditlog.authorizationInfo.resource, protopayload_auditlog.authenticationInfo.principalEmail, COUNT(*) AS counter FROM (TABLE_DATE_RANGE( logging_export.cloudaudit_googleapis_com_data_access_, DATE_ADD(CURRENT_TIMESTAMP(),-30,'DAY'), CURRENT_TIMESTAMP()) ) WHERE protopayload_auditlog.authorizationInfo.permission = "bigquery.tables.getData" GROUP BY protopayload_auditlog.authorizationInfo.resource, protopayload_auditlog.authenticationInfo.principalEmail ORDER BY protopayload_auditlog.authorizationInfo.resource, protopayload_auditlog.authenticationInfo.principalEmail, counter DESC LIMIT 1000
What are the top 10 queries against BigQuery in the past week?
The following query uses the Data Access Cloud Audit Logs Logs to find the most common queries.
SELECT protopayload_auditlog.servicedata_v1_bigquery.jobQueryRequest.query, COUNT(*) AS counter FROM (TABLE_DATE_RANGE( Logging_export.cloudaudit_googleapis_com_data_access_, DATE_ADD(CURRENT_TIMESTAMP(),-7,'DAY'), CURRENT_TIMESTAMP())) WHERE protopayload_auditlog.methodName = "jobservice.query" GROUP BY protopayload_auditlog.servicedata_v1_bigquery.jobQueryRequest.query ORDER BY counter DESC LIMIT 1000
What are the most common actions recorded in the data access log over the past 30 days?
The following query uses the Data Access Cloud Audit Logs Logs to find the most common actions recorded over the past 30 days.
SELECT protopayload_auditlog.methodName, resource.type, COUNT(*) AS counter FROM (TABLE_DATE_RANGE( logging_export.cloudaudit_googleapis_com_data_access_, DATE_ADD(CURRENT_TIMESTAMP(),-30,'DAY'), CURRENT_TIMESTAMP()) ) GROUP BY protopayload_auditlog.methodName, resource.type ORDER BY COUNTER DESC LIMIT 1000
What's next
- Exporting Cloud Logging logs to Elastic Cloud
Look at the other export scenarios:
Try out other Google Cloud features for yourself. Have a look at our tutorials.