Cloud Audit Logs are a collection of logs provided by Google Cloud Platform that provide insight into operational concerns related to your use of Google Cloud services. This page provides details about BigQuery specific log information, and it demonstrates how to use BigQuery to analyze logged activity.
The audit log message system relies on structured logs, and the BigQuery service
provides two distinct kinds of messages. The first,
AuditData, represents an
older version of logs, and is more oriented around reporting API invocations.
The newer format of log messages,
BigQueryAuditMetadata, make it simpler to
understand resource interactions. For example,
identifying which tables were read from and written to by a given query job.
The newer format also exposes new capabilities unavailable using the older
message format, such as being able to report tables that expired due to having
an expiration time configured.
In general, most users should leverage the newer
but the system will continue to report the older
AuditData logs to support
existing users of the audit log functionality.
In these logs, the
protoPayload.serviceData information is not set or used.
There are additional chances present in
resource.typeis set to one of the following: "bigquery_project" for jobs or "bigquery_dataset" for storage related information.
resource.labels.locationis set to the location of the job.
For storage events,
resource.labels.dataset_idcontains the encapsulating dataset.
protoPayload.methodNameis set to one of the following values:
protoPayload.resourceNamenow contains the URI for the referenced resource. The V1 format reported the API resource. For example, a table created by using an insert job reports the resource URI of the table in the new logging format. The old format reported the job identifier.
protoPayload.authorizationInfoonly includes information relevant to the specific event. In older AuditData messages, multiple records could be merged in some cases such as when source and destination tables were in the same dataset in a query job.
Mapping audit entries to log streams
Audit logs are organized into three streams: Admin Activity, System Event, and Data access. For more background on the streams, see the Cloud Audit Logging documentation.
Data access (data_access)
data_access stream provides information on data access. For BigQuery, this
stream contains entries about jobs via the
data_access stream also contains entries about table
data modifications via the
For example, when a load job appends data to a table, a
is reported. When a consumer reads a table or the results of a query, a
TableDataRead event is reported.
Note: BigQuery does not emit data access log entries in the following scenarios:
Data appended to a table via the streaming insert mechanism does not generate
Recursive dataset deletions such as removing a dataset and its contents in a single API call, do not yield deletion entries for each resource contained in the dataset. The dataset removal is present in the activity log.
Partitioned tables do not generate
TableDataChangeentries for partition expirations.
System event (system_event)
You can set an expiration time on tables to remove them at a specified time.
When the table expires and is removed, a
TableDeletion event is written to the
system event stream.
Admin activity (activity)
All remaining activities and events are reported as part of the main activity stream. Events such as job insertions and completions are reported in this stream, as are other resource events such as dataset creation.
Visibility and Access Control
BigQuery audit logs may include information which users may consider sensitive, such as SQL text, schema definitions, and identifiers for resources such as table and datasets. Please consult the Stackdriver logging access control documentation for more information about managing access to this information.
Stackdriver Logging exports
BigQuery automatically sends audit logs to Stackdriver Logging. Stackdriver Logging allows users to filter and export messages to other services, including Cloud Pub/Sub, Cloud Storage, and BigQuery.
In addition to provided long term log retention functionality, log exports to BigQuery provide the ability to do aggregated analysis on logs data. Stackdriver Logging documents how messages are transformed when exported to BigQuery.
To filter relevant BigQuery Audit messages, you can express filters as part of the export.
For example, this advanced filter represents an export that
only includes the newer
You can express additional filters based on the fields within the log messages. See the advanced log filter documentation for more information on crafting advanced filters.
Defining a BigQuery log sink using gcloud
An example of using the
gcloud tool to create a logging sink
in a dataset named "auditlog_dataset" that only includes BigQueryAuditMetadata
messages can be expressed as follows:
gcloud logging sinks create my-example-sink bigquery.googleapis.com/projects/my-project-id/datasets/auditlog_dataset --log-filter='protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"'
Querying exported logs
This section shows examples of analyzing BigQuery usage by using
the older AuditData messages. The AuditData fields are presented within the
protopayload_auditlog.servicedata_v1_bigquery record in the schema.
Note: Change the
to the dataset and table date you've configured via the Stackdriver Logging
Example: Query cost breakdown by identity
This query shows estimated query costs by user identity. It estimates costs based on the list price for on-demand queries in the US. This pricing may not be accurate for other locations or for customers leveraging flat-rate billing.
#standardSQL WITH data as ( SELECT protopayload_auditlog.authenticationInfo.principalEmail as principalEmail, protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD` ) SELECT principalEmail, FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost FROM data WHERE jobCompletedEvent.eventName = 'query_job_completed' GROUP BY principalEmail ORDER BY Estimated_USD_Cost DESC
Example: Hourly cost breakdown
This query shows estimated query costs by hour.
#standardSQL SELECT TIMESTAMP_TRUNC(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.endTime, HOUR) AS time_window, FORMAT('%9.2f',5.0 * (SUM(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD` WHERE protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.eventName = 'query_job_completed' GROUP BY time_window ORDER BY time_window DESC
This section shows examples of analyzing BigQuery usage by using the newer
BigQueryAuditMetadata messages. Because of the schema conversion done during
the export from Stackdriver Logging into BigQuery, the message bodies are
presented in semi-structured form. The
STRING field, and it contains the JSON representation of the message. You
can leverage JSON functions
in standard SQL to analyze this content.
Note: Change the
FROM clause in each of these examples to the
corresponding exported tables in your project.
Example: Report expired tables
Tables deleted because their expiration time was reached are logged in the new log format. This sample query logs when these messages occur and a URI that references the table resource that was removed.
#standardSQL SELECT protopayload_auditlog.resourceName AS resourceName, receiveTimestamp as logTime FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_system_event_201901*` WHERE protopayload_auditlog.methodName = 'InternalTableExpired' ORDER BY resourceName
Example: Most popular datasets
This query shows coarse, per-dataset statistics about table reads and table modifications.
#standardSQL SELECT REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/([^/]+)/tables') AS datasetRef, COUNT(DISTINCT REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/[^/]+/tables/(.*)$')) AS active_tables, COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL) AS dataReadEvents, COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL) AS dataChangeEvents FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_2019*` WHERE JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL OR JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL GROUP BY datasetRef ORDER BY datasetRef