Stay organized with collections Save and categorize content based on your preferences.

Overview of log parsing

This document provides an conceptual overview of how Chronicle parses raw logs to Unified Data Model (UDM) format.

Chronicle can receive log data originating from the following ingestion sources:

  • Chronicle forwarder
  • Chronicle API Feed
  • Chronicle Ingestion API
  • Third-party technology partner

In general, customers send data as original raw logs. Chronicle uniquely identifies the device that generated the logs using the LogType. The LogType identifies both:

  • the vendor and device that generated the log, such as Cisco Firewall, Linux DHCP Server, or Bro DNS.
  • which parser converts the raw log to structured Unified Data Model (UDM). There is a one-to-one relationship between a parser and a LogType. Each parser converts data received by a single LogType.

Chronicle provides a set of default parsers that read original raw logs and generate structured UDM records using data in the original raw log. Chronicle maintains these parsers. Customers can also define custom data mapping instructions by creating a customer-specific parser. Contact your Chronicle representative for information about creating a customer-specific parser.

Ingestion and normalization workflow

The parser contains data mapping instructions. It defines how data is mapped from the original raw log to one or more fields in the UDM data structure.

If there are no parsing errors, Chronicle creates a UDM-structured record using data from the raw log. The process of converting a raw log to a UDM record is called "normalization".

A default parser might map a subset of core values from the raw log. Typically, these core fields are the most important to provide security insights in Chronicle. Unmapped values may remain in the raw log, but are not stored in the UDM record.

Using the Ingestion API, a customer can also send data in the structured Unified Data Model (UDM) format.

A parsing example using a Squid Web Proxy log

This section provides an example Squid web proxy log and describes how the values are mapped to a UDM record. For description of all fields in the UDM schema, see Unified Data Model field list.

The example Squid web proxy log contains space-separated values. Each record represents one event and stores the following data: timestamp, duration, client, result code/result status, bytes transmitted, request method, URL, user, hierarchy code, and content type. In this example, the following fields are extracted and mapped into a UDM record: time, client, result status, bytes, request method, and URL.

1588059648.129 23 192.168.23.4 TCP_HIT/200 904 GET www.google.com/images/sunlogo.png - NONE/- image/jpeg

Example squid web proxy

As you compare these structures, notice that only a subset of the original log data are included in the UDM record. Certain fields are required and others are optional. In addition, only a subset of the sections in the UDM record contain data. If the parser does not map data from the original log to the UDM record, then you do not see that section of the UDM record in Chronicle.

Log values mapped to UDM

The metadata section stores the event timestamp. Notice that the value was converted from EPOCH to RFC 3339 format. This conversion is optional. The timestamp can be stored as EPOCH format, with preprocessing to separate the seconds and milliseconds portions into separate fields.

The metadata.event_type field stores the value "NETWORK_HTTP" which is an Enum value that identifies the type of event. The value of the metadata.event_type determines which additional UDM fields are required versus optional. The product_name and vendor_name values contain user-friendly descriptions of the device that recorded the original log.

The metadata.event_type in a UDM Event record is not the same as the log_type defined when ingesting data using the Ingestion API. These two attributes store different information.

The network section contains values from the original log event. Notice in this example that the status value from the original log was parsed from the 'result code/status' field before being written to the UDM record. Only the result_code was included in the UDM record.

Log values mapped to UDM

The principal section stores the client information from the original log. The target section stores both the fully qualified URL and the hostname. The parser extracts the hostname from the URL and then stores both hostname and URL to separate UDM fields.

This is the UDM record formatted as JSON. Notice that only sections which contain data are included. The src, observer, intermediary, about, security_results, and extensions sections are not included.

{
        "metadata": {
            "event_timestamp": "2020-04-28T07:40:48.129Z",
            "event_type": "NETWORK_HTTP",
            "product_name": "Squid Proxy",
            "vendor_name": "Squid"
        },
        "principal": {
            "ip": "192.168.23.4"
        },
        "target": {
            "hostname": "www.google.com",
            "url": "www.google.com/images/sunlogo.png"
        },
        "network": {
            "http": {
                "method": "GET",
                "response_code": 200,
                "received_bytes": 904
            }
        }
}

Customize how ingested data is parsed

Chronicle provides the following capabilities that enable customers to customize data parsing on incoming original log data.

  • Customer-specific parsers: customers create a custom parser configuration for a specific log type that meets their specific requirements. A customer-specific parser replaces the default parser for the specific LogType. Contact your Chronicle representative for information about creating a customer-specific parser.
  • Parser extensions: Customers can add custom mapping instructions in addition to the default parser configuration. Each customer can create their own unique set of custom mapping instructions. These mapping instructions define how to extract and transform additional fields from original raw logs to UDM fields. A parser extension does not replace the default or customer-specific parser.

Structure of a parser

Data mapping instructions within a parser have distinct types of instructions:

  • Step 1: Extract the data from the original log.
  • Step 2: Manipulate the extracted data. This includes using conditional logic to selectively parse values, convert data types, replace sub-strings in a value, convert to uppercase or lowercase, etc.
  • Step 3: Assign processed values to UDM fields.
  • Step 4: Output the mapped UDM record to the @output key.

Step 1: Extract fields from the log

Chronicle provides a set of filters, based on Logstash, to extract fields from original log files. Depending on the format of the log, you use one or multiple extraction filters to extract all data from the log. If the string is:

  • native JSON, parser syntax is similar to the JSON filter which supports JSON formatted logs. Nested JSON is not supported.
  • XML format, parser syntax is similar to the XML filter which supports XML formatted logs.
  • key-value pairs, parser syntax is similar to the Kv filter which supports key-value formatted messages.
  • CSV format, parser syntax is similar to the Csv filter which supports csv formatted messages.
  • all other formats, parser syntax is similar to the GROK filter with GROK built in patterns . This uses Regex-style extraction instructions.

Chronicle provides a subset of the capabilities available in each filter. Chronicle also provides custom data mapping syntax not available in the filters. See the Parser syntax reference document for a description of features that are supported and custom functions.

Continuing with the Squid web proxy log example, the following data extracting instruction include a combination of Logstash GROK syntax and regular expressions to extract data from each log.

grok {
  match => {
    "message" =>
      "%{NUMBER:when}\\s+\\d+\\s%{SYSLOGHOST:srcip} %{WORD:action}\\/%{NUMBER:returnCode} %{NUMBER:size} %{WORD:method} (?P<url>\\S+) (?P<username>.*?) %{WORD}\\/(?P<tgtip>\\S+).*"
  }
}

This data extraction statement stores values in the following intermediate variables:

  • when
  • srcip
  • action
  • returnCode
  • size
  • method
  • username
  • url
  • tgtip

Step 2: Manipulate and transform the extracted values

Chronicle leverages the Logstash mutate filter plug-in capabilities to enable manipulation of values extracted from the original log. Chronicle provides a subset of the capabilities available the plug-in. See the Parser syntax document for a description of features that are supported and custom functions, such as:

  • cast values to a different data type
  • replace values in the string
  • merge two arrays or append a string to an array. Strings values are converted to an array before merging.
  • convert to either lowercase or uppercase

Here are a few data transformation examples that build on the Squid web proxy log presented earlier.

The following statement converts the value in the action variable to lowercase.

mutate {
   lowercase => [ "action"] 
   }
}

The following statement casts the value in the size variable to uinteger.

mutate {
    convert => {
      "size" => "uinteger"
    }
  }

Step 3: Assign values to UDM fields in an event

After values are pre-processed, assign them to fields in a UDM record. You can assign both extracted values and hardcoded values to a UDM field. The following examples build on the Squid Webproxy log example above.

The following statement sets the metadata.event_type field of the event to the string value, "NETWORK_HTTP". This is one of the valid values defined in the
metadata.event_type enum.

replace => {
  "event.idm.read_only_udm.metadata.event_type" => "NETWORK_HTTP"
}

The following statement sets the network.http.method field of the UDM record to the value of the method variable.

mutate {
  replace => {
    "event.idm.read_only_udm.network.http.method" => "%{method}"
  }
}

The following statement adds the value in the srcip variable to the principal.ip field of the UDM record. The principal.ip field can store an array of values, so the srcip value is appended to the list.

mutate {
  merge => {
    "event.idm.read_only_udm.principal.ip" => "srcip"
  }
}

The following statement stores the value in the url variable to the target.url field in the UDM record. It does this by renaming "url" to "event.idm.read_only_udm.target.url".

  rename => {
    "url" => "event.idm.read_only_udm.target.url"
  }
}

Step 4: Bind the UDM record to the output

This final statement in the data mapping instruction outputs the processed data to a UDM record.

  mutate {
    merge => {
      "@output" => "event"
    }
  }