Cloud Audit Logs are a collection of logs provided by Google Cloud that provide insight into operational concerns related to your use of Google Cloud services. This page provides details about BigQuery specific log information, and it demonstrates how to use BigQuery to analyze logged activity.
The audit log message system relies on structured logs, and the BigQuery service provides two distinct kinds of messages:
AuditData: The old version of logs, which reports API invocations.
BigQueryAuditMetadata: The new version of logs, which reports resource interactions such as which tables were read from and written to by a given query job and which tables expired due to having an expiration time configured.
In general, users will want to leverage the new
In the Stackdriver Logging logs, the
protoPayload.serviceData information is
not set or used. In
BigQueryAuditMetadata messages, there is more information:
resource.typeis set to one of the following values:
resource.labels.locationcontains the location of the job.
resource.labels.dataset_idcontains the encapsulating dataset.
protoPayload.methodNameis set to one of the following values:
protoPayload.resourceNamenow contains the URI for the referenced resource. For example, a table created by using an insert job reports the resource URI of the table The earlier format reported the API resource (the job identifier).
protoPayload.authorizationInfoonly includes information relevant to the specific event. With earlier AuditData messages, you could merge multiple records when source and destination tables were in the same dataset in a query job.
Mapping audit entries to log streams
Audit logs are organized into the following three streams. For more information about the streams, see the Cloud Audit Logs documentation.
- Data access
- System event
- Admin activity
Data access (data_access)
data_access stream contains entries about jobs by using the
JobChange events and about table data modifications
by using the
For example, when a load job appends data to a table, the
TableDataChange event. A
TableDataRead event indicates when
a consumer reads a table.
Note: BigQuery does not emit data access log entries in the following scenarios:
Data appended to a table by using the streaming insert mechanism does not generate
Recursive dataset deletions, such as removing a dataset and its contents in a single API call, do not yield deletion entries for each resource contained in the dataset. The dataset removal is present in the activity log.
Partitioned tables do not generate
TableDataChangeentries for partition expirations.
Wildcard tables access generates a single
TableDataReadentry and doesn't write a separate entry for each queried table.
System event (system_event)
You can set an expiration time on tables to remove them at a specified time.
system_event stream reports a
TableDeletion event when
the table expires and is removed.
Admin activity (activity)
activity stream reports all remaining activities and events
such as table and dataset creation.
Visibility and access control
BigQuery audit logs can include information that users might consider sensitive, such as SQL text, schema definitions, and identifiers for resources such as table and datasets. For information about managing access to this information, see the Stackdriver Logging access control documentation.
Caller identities and resource names
All caller identities and IP addresses are redacted from the audit logs, unless at least one of the following conditions is met:
- This isn't a read-only access.
- The identity is a service account that belongs to the project.
- The identity belongs to the same customer as the project.
For cross-project access, there are additional rules that apply:
The billing project must be the project that sends the request, and the data project must be the project whose resources are also accessed during the job. For example, a query job in a billing project reads some table data from the data project.
The billing project resource ID is redacted from the data project log unless the projects have the same domain associated with them or are in the same organization.
Identities and caller IP addresses are redacted from the data project log unless one of the preceding or one of the following conditions apply:
- The billing project and the data project have the same domain associated with them or are in the same organization, and the billing project already includes the identity and caller IP address.
- The identity has permission to run queries in the project and the action
Stackdriver Logging exports
BigQuery automatically sends audit logs to Stackdriver Logging. Stackdriver Logging lets users filter and export messages to other services, including Pub/Sub, Cloud Storage, and BigQuery.
With long term log retention and log exports to BigQuery, you can do aggregated analysis on logs data. Stackdriver Logging documents how messages are transformed when exported to BigQuery.
To filter relevant BigQuery Audit messages, you can express filters as part of the export.
For example, the following advanced filter represents an export that
only includes the newer
You can express additional filters based on the fields within the log messages. For more information about crafting advanced filters, see the advanced log filter documentation.
Defining a BigQuery log sink using gcloud
The following example command line shows how you can use the
gcloud command-line tool to
create a logging sink
in a dataset named
auditlog_dataset that only
gcloud logging sinks create my-example-sink bigquery.googleapis.com/projects/my-project-id/datasets/auditlog_dataset --log-filter='protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"'
Querying exported logs
The following examples show how you can use
to analyze BigQuery usage.
AuditData fields are present in the
protopayload_auditlog.servicedata_v1_bigquery record in the schema.
Note: Change the
to the dataset and table date you've configured in the Stackdriver Logging
Example: Query cost breakdown by identity
This query shows estimated query costs by user identity. It estimates costs based on the list price for on-demand queries in the US. This pricing might not be accurate for other locations or for customers who are leveraging flat-rate billing.
#standardSQL WITH data as ( SELECT protopayload_auditlog.authenticationInfo.principalEmail as principalEmail, protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD` ) SELECT principalEmail, FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost FROM data WHERE jobCompletedEvent.eventName = 'query_job_completed' GROUP BY principalEmail ORDER BY Estimated_USD_Cost DESC
Example: Hourly cost breakdown
This query shows estimated query costs by hour.
#standardSQL SELECT TIMESTAMP_TRUNC(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.endTime, HOUR) AS time_window, FORMAT('%9.2f',5.0 * (SUM(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD` WHERE protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.eventName = 'query_job_completed' GROUP BY time_window ORDER BY time_window DESC
The following examples show how you can use
to analyze BigQuery usage. Because of the schema conversion done during
the export from Stackdriver Logging into BigQuery, the message bodies are
presented in semi-structured form. The
STRING field, and it contains the JSON representation of the message. You
in standard SQL to analyze this content.
Note: Change the
FROM clause in each of these examples to the
corresponding exported tables in your project.
Example: Report expired tables
BigQueryAuditMetadata messages log when a table is deleted because its
expiration time was reached. The following sample query shows when these
messages occur and includes a URI that references the table resource
that was removed.
#standardSQL SELECT protopayload_auditlog.resourceName AS resourceName, receiveTimestamp as logTime FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_system_event_201901*` WHERE protopayload_auditlog.methodName = 'InternalTableExpired' ORDER BY resourceName
Example: Most popular datasets
This query shows coarse, per-dataset statistics about table reads and table modifications.
#standardSQL SELECT REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/([^/]+)/tables') AS datasetRef, COUNT(DISTINCT REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/[^/]+/tables/(.*)$')) AS active_tables, COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL) AS dataReadEvents, COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL) AS dataChangeEvents FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_2019*` WHERE JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL OR JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL GROUP BY datasetRef ORDER BY datasetRef