Overview

Cloud Audit Logs are a collection of logs provided by Google Cloud Platform that provide insight into operational concerns related to your use of Google Cloud services. This page provides details about BigQuery specific log information, and it demonstrates how to use BigQuery to analyze logged activity.

Versions

BigQuery supports two versions of log messages: AuditData (the older version) and BigQueryAuditMetadata (the newer version). The older AuditData log entries tend to map directly to individual API calls made against the BigQuery service.

The newer format, BigQueryAuditMetadata, represents a better view into BigQuery operations. BigQueryAuditMetadata provides details on how resources are changed indirectly by other resources, particularly when reporting on asynchronous events and events that are not strongly coupled to a particular API call. For example, the BigQueryAuditMetadata entries can report when BigQuery tables are removed because of a configured expiration time. This is not possible with the older logging format.

In general, most users should leverage the newer BigQueryAuditMetadata logs, but the system will continue to report the older AuditData logs to support existing users of the audit log functionality.

Message formats

AuditData

The older AuditData is reported as the protoPayload.serviceData submessage within a LogEntry message.

BigQueryAuditMetadata

The newer BigQueryAuditMetadata is reported as the protoPayload.metadata within a LogEntry message. In these logs, the protoPayload.serviceData information is not set or used.

There are additional changes present in the log entry for BigQueryAuditMetadata messages:

  • resource.type is set to one of the following: "bigquery_project" for jobs or "bigquery_dataset" for storage related information.

  • For jobs, resource.labels.location is set to the location of the job.

  • For storage events, resource.labels.dataset_id contains the encapsulating dataset.

  • The protoPayload.methodName is set to one of the following values:

  • google.cloud.bigquery.v2.TableService.InsertTable

  • google.cloud.bigquery.v2.TableService.UpdateTable
  • google.cloud.bigquery.v2.TableService.PatchTable
  • google.cloud.bigquery.v2.TableService.DeleteTable
  • google.cloud.bigquery.v2.DatasetService.InsertDataset
  • google.cloud.bigquery.v2.DatasetService.UpdateDataset
  • google.cloud.bigquery.v2.DatasetService.PatchDataset
  • google.cloud.bigquery.v2.DatasetService.DeleteDataset
  • google.cloud.bigquery.v2.TableDataService.List
  • google.cloud.bigquery.v2.JobService.InsertJob
  • google.cloud.bigquery.v2.JobService.Query
  • google.cloud.bigquery.v2.JobService.GetQueryResults
  • InternalTableExpired
  • The protoPayload.resourceName now contains the URI for the referenced resource. The V1 format reported the API resource. For example, a table created by using an insert job reports the resource URI of the table in the new logging format. The old format reported the job identifier.

  • The protoPayload.authorizationInfo only includes information relevant to the specific event. In older AuditData messages, multiple records could be merged in some cases such as when source and destination tables were in the same dataset in a query job.

Excluded entries

Mapping audit entries to log streams

Audit logs are organized into three streams: Admin Activity, System Event, and Data access. For more background on the streams, see the Cloud Audit Logging documentation.

Data access (data_access)

The data_access stream provides information on data access. For BigQuery, this stream contains entries about jobs via the JobInsertion and JobChange events. The data_access stream also contains entries about table data modifications via the TableDataChange and TableDataRead events.

For example, when a load job appends data to a table, a TableDataChange event is reported. When a consumer reads a table or the results of a query, a TableDataRead event is reported.

Note: BigQuery does not emit data access log entries in the following scenarios:

  • Data appended to a table via the streaming insert mechanism does not generate TableDataChange log entries.

  • Recursive dataset deletions such as removing a dataset and its contents in a single API call, do not yield deletion entries for each resource contained in the dataset. The dataset removal is present in the activity log.

  • Partitioned tables do not generate TableDataChange entries for partition expirations.

System event (system_event)

You can set an expiration time on tables to remove them at a specified time. When the table expires and is removed, a TableDeletion event is written to the system event stream.

Admin activity (activity)

All remaining activities and events are reported as part of the main activity stream. Events such as job insertions and completions are reported in this stream, as are other resource events such as dataset creation.

Stackdriver Logging exports

BigQuery automatically sends audit logs to Stackdriver Logging. Stackdriver Logging allows users to filter and export messages to other services, including Cloud Pub/Sub, Cloud Storage, and BigQuery.

In addition to provided long term log retention functionality, log exports to BigQuery provide the ability to do aggregated analysis on logs data. Stackdriver Logging documents how messages are transformed when exported to BigQuery.

Filtering exports

To filter relevant BigQuery Audit messages, you can express filters as part of the export.

For example, this advanced filter represents an export that only includes the newer BigQueryAuditMetadata format:

protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"

You can express additional filters based on the fields within the log messages. See the advanced log filter documentation for more information on crafting advanced filters.

Querying exported logs

AuditData examples

This section shows examples of analyzing BigQuery usage by using the older AuditData messages. The AuditData fields are presented within the protopayload_auditlog.servicedata_v1_bigquery record in the schema.

Note: Change the FROM clause, MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD, to the dataset and table date you've configured via the Stackdriver Logging export.

Example: Query cost breakdown by identity

This query shows estimated query costs by user identity. It estimates costs based on the list price for on-demand queries in the US. This pricing may not be accurate for other locations or for customers leveraging flat-rate billing.

  #standardSQL
  WITH data as
  (
    SELECT
      protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
      protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent
    FROM
      `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD`
  )
  SELECT
    principalEmail,
    FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
  FROM
    data
  WHERE
    jobCompletedEvent.eventName = 'query_job_completed'
  GROUP BY principalEmail
  ORDER BY Estimated_USD_Cost DESC
Example: Hourly cost breakdown

This query shows estimated query costs by hour.

  #standardSQL
  SELECT
    TIMESTAMP_TRUNC(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.endTime, HOUR) AS time_window,
    FORMAT('%9.2f',5.0 * (SUM(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
  FROM
    `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD`
  WHERE
    protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.eventName = 'query_job_completed'
  GROUP BY time_window
      ORDER BY time_window DESC

BigQueryAuditMetadata examples

This section shows examples of analyzing BigQuery usage by using the newer BigQueryAuditMetadata messages. Because of the schema conversion done during the export from Stackdriver Logging into BigQuery, the message bodies are presented in semi-structured form. The protopayload_auditlog.metadataJson is a STRING field, and it contains the JSON representation of the message. You can leverage JSON functions in standard SQL to analyze this content.

Note: Change the FROM clause in each of these examples to the corresponding exported tables in your project.

Example: Report expired tables

Tables deleted because their expiration time was reached are logged in the new log format. This sample query logs when these messages occur and a URI that references the table resource that was removed.

  #standardSQL
  SELECT
    protopayload_auditlog.resourceName AS resourceName,
    receiveTimestamp as logTime
  FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_system_event_201901*`
  WHERE
    protopayload_auditlog.methodName = 'InternalTableExpired'
  ORDER BY resourceName
Example: Most popular datasets

This query shows coarse, per-dataset statistics about table reads and table modifications.

  #standardSQL
  SELECT
    REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/([^/]+)/tables') AS datasetRef,
    COUNT(DISTINCT REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/[^/]+/tables/(.*)$')) AS active_tables,
    COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL) AS dataReadEvents,
    COUNTIF(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL) AS dataChangeEvents
  FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_2019*`
  WHERE
    JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL
    OR JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL
  GROUP BY datasetRef
  ORDER BY datasetRef

Was deze pagina nuttig? Laat ons weten hoe goed we u hebben geholpen:

Feedback verzenden over...