BigQuery schema for logs

This page details the formatting and rules that apply when routing log entries from Cloud Logging to BigQuery.

Overview

You can route log entries from Cloud Logging to BigQuery using sinks. When you create a sink, you define a BigQuery dataset as the destination. Logging sends log entries that match the sink's rules to partitioned tables that are created for you in that BigQuery dataset.

BigQuery table schemas for data received from Cloud Logging are based on the structure of the LogEntry type and the contents of the log entry payloads. Cloud Logging also applies rules to shorten BigQuery schema field names for audit logs and for certain structured payload fields.

Logging sinks stream logging data into BigQuery in small batches, which lets you query data without running a load job. For details, see Streaming data into BigQuery. For pricing information, see the streaming inserts section found in BigQuery pricing: Data ingestion pricing.

Field naming conventions

There are a few naming conventions that apply to the log entry fields when sending logs to BigQuery:

  • Log entry field names can't exceed 128 characters.

  • Log entry field names can be composed only of alphanumeric characters. Any unsupported characters are removed from field names and replaced with underscore characters. For example, jsonPayload.foo%% would be transformed to jsonPayload.foo__.

    Log entry field names must begin with an alphanumeric character, even after transformation; any leading underscores are removed.

  • For log entry fields that are part of the LogEntry type, the corresponding BigQuery field names are exactly the same as the log entry fields.

  • For any user-supplied log entry fields, the corresponding BigQuery field names are normalized to the lowercase, but naming is otherwise preserved.

  • For fields in structured payloads, as long as the @type specifier isn't present, the corresponding BigQuery field names are normalized to the lowercase, but naming is otherwise preserved.

    For information on structured payloads where the @type specifier is present, see Payload fields with @type on this page.

The following examples show how these naming conventions are applied:

Log entry field LogEntry type mapping BigQuery field name
insertId insertId insertId
textPayload textPayload textPayload
httpRequest.status httpRequest.status httpRequest.status
httpRequest.requestMethod.GET httpRequest.requestMethod.[ABC] httpRequest.requestMethod.get
resource.labels.moduleid resource.labels.[ABC] resource.labels.moduleid
jsonPayload.MESSAGE jsonPayload.[ABC] jsonPayload.message
jsonPayload.myField.mySubfield jsonPayload.[ABC].[XYZ] jsonPayload.myfield.mysubfield

Payload fields with @type

This section discusses special BigQuery schema field names for log entries whose payloads contain the specifier @type. This includes audit log entries routed to BigQuery.

Payloads in log entries can contain structured data. Any structured field can include an optional type specifier in the following format:

@type: type.googleapis.com/[TYPE]

Naming rules explain why an audit log entry's protoPayload field might be mapped to the BigQuery schema field protopayload_auditlog.

Naming rules for @type

Structured fields that have type specifiers are customarily given BigQuery field names that have a [TYPE] appended to their field name. The value of [TYPE] can be any string.

The naming rules for @type apply only to the top level of jsonPayload or protoPayload; nested fields are ignored. When treating top-level structured payload fields, Logging removes the prefix type.googleapis.com.

For example, the following table shows the mapping of the top-level structured payload fields to BigQuery field names:

Payload Payload @type Payload field BigQuery field name
jsonPayload (none) statusCode jsonPayload.statusCode
jsonPayload type.googleapis.com/abc.Xyz statusCode jsonpayload_abc_xyz.statuscode
protoPayload (none) statusCode protoPayload.statuscode
protoPayload type.googleapis.com/abc.Xyz statusCode protopayload_abc_xyz.statuscode

A few exceptions apply to the preceding rules for fields with type specifiers:

  • In App Engine request logs, the payload's name in logs routed to BigQuery is protoPayload, even though the payload includes a type specifier.

  • Cloud Logging applies some special rules to shorten BigQuery schema field names for audit logs. This is discussed in the Audit log schema fields section on this page.

Example

This example shows how structured payload fields are named and used when received by BigQuery.

Assume that a log entry's payload is structured like the following:

jsonPayload: {
  @type: "type.googleapis.com/google.cloud.v1.CustomType"
    name_a: {
      sub_a: "A value"
    }
    name_b: {
      sub_b: 22
    }
  }

The mapping to BigQuery fields is as follows:

  • The top-level structured field jsonPayload contains a @type specifier. Its BigQuery name is jsonpayload_v1_customtype.

  • The nested fields are treated with the standard BigQuery naming rules, as type-specifier rules don't apply to nested fields.

Thus, the following BigQuery names are defined for the log entry's payload:

  jsonpayload_v1_customtype
  jsonpayload_v1_customtype._type
  jsonpayload_v1_customtype.name_b
  jsonpayload_v1_customtype.name_b.sub_b
  jsonpayload_v1_customtype.name_a
  jsonpayload_v1_customtype.name_a.sub_a

Fields in exported audit logs

If you aren't working with audit logs that have been routed to BigQuery, then you can skip this section.

The audit log payload fields protoPayload.request, protoPayload.response, and protoPayload.metadata have @type specifiers but are treated as JSON data. That is, their BigQuery schema names are their field names with Json appended to them, and they contain string data in JSON format.

The two sets of audit log payload field names are listed in the following table:

Log entry field BigQuery field name
protoPayload protopayload_auditlog
protopayload.metadata protopayload_auditlog.metadataJson
protoPayload.serviceData protopayload_auditlog.servicedata_v1_bigquery
Example: protopayload_auditlog.servicedata_v1_bigquery.tableInsertRequest
protoPayload.request protopayload_auditlog.requestJson
protoPayload.response protopayload_auditlog.responseJson

Note that the serviceData naming convention is specific to audit logs that are generated by BigQuery and that are then routed from Cloud Logging to BigQuery. Those audit log entries contain a serviceData field that has a @type specifier of type.googleapis.com/google.cloud.bigquery.logging.v1.auditdata.

Example

An audit log entry generated by BigQuery has a field with the following name:

protoPayload.serviceData.tableInsertRequest

If this log entry were then routed to BigQuery, how would the tableInsertRequest field be referenced? Before the name shortening, the corresponding field name in BigQuery would be:

protopayload_google_cloud_audit_auditlog.servicedata_google_cloud_bigquery_logging_v1_auditdata.tableInsertRequest

After the name shortening, the same field is referenced in BigQuery tables like this:

protopayload_auditlog.servicedata_v1_bigquery.tableInsertRequest

Partitioned tables

This section provides an overview of partitioned tables for logs that are routed to BigQuery.

When you route logs to a BigQuery dataset, Logging creates tables to hold the log entries. There are two table types by which Logging organizes the data it routes: date-sharded tables and partitioned tables. Both table types partition the logs data based on log entries' timestamp fields. However, there are two key differences between the table types as follows:

  • Performance: A partitioned table divides a large table into smaller partitions, so that you can improve query performance and, thus, better control your BigQuery costs by reducing the number of bytes read by a query.

  • Table nomenclature: The table types use different naming conventions, as discussed in the section below.

Table organization

Log entries are sharded into BigQuery tables whose organization and names are based on the entries' log names and timestamps.

The table names are suffixed with the calendar date of log entry's UTC timestamp, using the ISO 8601 basic format (YYYYMMDD).

The following table shows examples of how log names and sample timestamps are mapped to table names in BigQuery:

Log name Log entry timestamp BigQuery table name
(date-sharded)
BigQuery table name
(partitioned)
syslog 2017-05-23T18:19:22.135Z syslog_20170523 syslog
apache-access 2017-01-01T00:00:00.000Z apache_access_20170101 apache_access
compute.googleapis.com/activity_log 2017-12-31T23:59:59.999Z compute_googleapis_com_activity_log_20171231 compute_googleapis_com_activity_log

Creating partitioned tables

When creating a sink to route your logs to BigQuery, you can use either date-sharded tables or partitioned tables. The default selection is a date-sharded table.

For instructions using the Google Cloud Console, see Configure sinks.

For instructions using Cloud SDK, the command-line interface, see gcloud logging sinks create.

Mismatches in schema

The first log entry received by BigQuery determines the schema for the destination BigQuery table. BigQuery creates a table whose columns are based on the first log entry's fields and their types.

A schema mismatch occurs when log entries are written to the destination table and either of the following errors occurs:

  • A later log entry changes the field type for an existing field in the table.

    For example, if the initial log entry's jsonPayload.user_id field is a string, then that log entry generates a table with a string type for that field. If you later start logging jsonPayload.user_id as an array, then that causes a schema mismatch.

  • New fields are added to a log entry and that causes the number of columns in the destination table to exceed the BigQuery column limit. For more information about the column limit, see BigQuery quotas and limits.

When BigQuery identifies a schema mismatch, it creates a table within the corresponding dataset to store the error information. The tables' type determines the table name. For date-sharded tables, the format is export_errors_YYYYMMDD and for partitioned tables the format is export_errors. For more information, see Table organization.

When exporting log entries, Logging sends messages as a batch to BigQuery. BigQuery uses the following rules to determine to which table the log entries in the current batch of messages are written:

  • When a field type change occurs, only those log entries that caused a schema mismatch are written to the error table. Log entries in the current batch of messages that don't cause a schema mismatch are written to the original destination table.

  • When the column limit is exceeded, all log entries in the current batch of messages are written to the error table.

The error table contains data from the LogEntry and information about the mismatch:

  • LogEntry fields written to the error table:

    • logName
    • timestamp
    • receiveTimestamp
    • severity
    • insertId
    • trace
    • resource.type
  • Schema mismatch information written to the error table:

    • Full resource path for the log sink
    • The complete error message returned by BigQuery
    • The complete log entry; however, the log entry is converted from JSON into a string

Logging communicates schema mismatches to the Cloud project that contains the routing sink in the following ways:

  • Project Owners receive an email. Details include: Google Cloud project ID, sink name, and destination.
  • The Google Cloud Console Activity page displays an error, Stackdriver Config error. Details include the sink name and destination, and a link to an example of a log entry that caused the error.
  • The system logs-based metric exports/error_count informs you of the total number of log entries that weren't routed due to errors.

To correct field-type mismatches for later log entries, fix the field type so that it matches the current schema. You can also rename the table or change the sink's parameters, so that Logging recreates the table in a different dataset. For instructions, see Managing sinks.

Viewing your logs

You can view your logs, and their BigQuery schema, by selecting a table in the BigQuery web UI that has received log entries from Cloud Logging.