Parser syntax reference
This document describes the functions, parsing patterns, and other syntax supported in data mapping instructions. See Overview of parsing for a conceptual overview of how Chronicle parses raw logs to Unified Data Model (UDM) format.
Default parsers, customer-specific parsers, and 'code snippet' parser extensions use code-like data mapping instructions to convert fields in the original raw log to Unified Data Model format (UDM). The parser syntax is similar to Logstash, but not identical.
Extract data using the GROK function
Using GROK, you can use predefined patterns and regular expressions to match log messages, and extract values from the log message into tokens. GROK data extraction requires that field labels are defined as part of the data extraction process.
Syntax for predefined patterns in GROK
%{pattern:token}
%{IP:hostip}
%{NUMBER:event_id}
Syntax for regular expressions in GROK
The following regex pattern examples can be used to extract values from log messages.
Regex | Data |
---|---|
\s | Space |
\S | Not space |
\d | Digit |
\D | Not digit |
\w | Word |
\W | Not word |
(?P<token>regex_pattern)
(?P<eventId>\\S+)
Sample log message and GROK patterns
Example original raw log
Mar 15 11:08:06 hostdevice1: FW-112233: Accepted connection TCP 10.100.123.45/9988 to 8.8.8.8/53
GROK pattern to extract data from the log
%{SYSLOGTIMESTAMP:when} %{HOST:deviceName}: FW-%{INT:messageid}: (?P<action>Accepted|Denied) connection %{WORD:protocol} %{IP:srcAddr}/%{INT:srcPort} to %{IP:dstAddr}/{INT:dstPort}
The following table describe available tokens and values that can be used in GROK patterns.
Token | Value |
---|---|
when | Mar 15 11:08:06 |
deviceName | hostdevice1 |
messageid | 112233 |
action | Accepted |
protocol | TCP |
srcAddr | 10.100.123.45 |
srcPort | 9988 |
dstAddr | 8.8.8.8 |
dstPort | 53 |
GROK extraction syntax
grok {
match => {
"message" => "<grok pattern>"
}
}
GROK overwrite option
The overwrite option used with the GROK syntax allows you to overwrite a field that already exists. This feature can be used to replace a default value with the value extracted by the GROK pattern.
mutate {
replace => {
"fieldName" => "<default value>"
}
}
grok {
match => { "message" => ["(?P<fieldName>.*)"] }
overwrite => ["fieldName"]
}
Extract JSON formatted logs
JSON extraction syntax
json {
source => "message"
}
Manipulating JSON arrays
JSON arrays can be accessed by adding an array_function parameter.
json {
source => "message"
array_function => "split_columns"
}
The split_columns function makes elements of an array accessible through an index. So, if you have an array that looks like the following:
{ "ips" : ["1.2.3.4","1.2.3.5"] .. }
you will be able to access the two values using ips.0
and ips.1
tokens.
If you have nested arrays or multiple arrays, the function will unnest arrays recursively. Here is a nested array example.
{
"devices: [
{
"ips": ["1.2.3.4"]
}
]
}
In this case, you access the IP address using devices.0.ips.0
token.
Because devices.1
doesn't exist, it will behave the same as other non-existing
elements of JSON.
If an element in a JSON doesn't exist, then:
- You cannot access it using an
if
statement unless you initialize the token to an empty string, "", before calling the JSON filter. - You cannot use it in the mutate plugin's replace filter because it will cause an error.
- You can use it in the mutate plugin's rename filter because these will be ignored.
- You can use it in the mutate plug-in's merge filter because these will be ignored.
mutate {
merge => {
"event.idm.read_only_udm.observer.ip" => "ips.0" }
}
mutate {
merge => {
"event.idm.read_only_udm.observer.ip" => "ips.1" }
}
# this doesn't fail even though this element doesn't exist.
mutate {
merge => {
"event.idm.read_only_udm.observer.ip" => "ips.2" }
}
# this doesn't fail even though this element doesn't exist.
mutate {
merge => {
"event.idm.read_only_udm.observer.ip" => "ips.3" }
}
Extracting XML formatted logs
XML extraction syntax
Define the path to the field in the original log using XPath expression syntax.
xml {
source => "message"
xpath => {
"/Event/System/EventID" => "eventId"
"/Event/System/Computer" => "hostname"
}
}
Extract key-value formatted logs
Key-value extraction syntax
kv {
source => "message"
field_split => "|"
value_split => ":"
whitespace => "strict"
trim_value => "\""
}
Key-value extraction example
# initialize the token
mutate {
replace => {
"destination" => ""
}
}
# use the kv filter to split the log.
kv {
source => "message"
field_split => " "
trim_value => "\""
on_error = "kvfail"
}
# assigned one of the field values to a UDM field
mutate {
replace => {
"event.idm.read_only_udm.target.hostname" => "%{destination}"
}
}
The kv filter includes the following options:
- field_split option: The field_split option enables you to extract key-value pairs from a string, for example extracting parameters from a URL query string or key-value pairs from the log. Specify the delimiter that separates each key-value pair.
- value_split option: Use the value_split option to identify the delimiter between the key and the value.
- whitespace option: Use the whitespace option to handle acceptance of unnecessary whitespace around the key/value pair. The default option is lenient which ignores the surrounding whitespace. If you have a situation where the white space should not be ignored, set the option to "strict".
- trim_value option: Use the trim_value option to remove extraneous leading and trailing characters from the value, such as quotation marks.
Extract CSV formatted logs
# parse the message into individual variables, identified as column1, column2, column3, etc.
csv {
source => "message"
}
# assign each value to a token
mutate {
replace => {
"resource_id" => "%{column1}"
"principal_company_name" => "%{column3}"
"location" => "%{column4}"
"transaction_amount" => "%{column6}"
"status" => "%{column9}"
"meta_description" => "%{column11}"
"target_userid" => "%{column24}"
"target_company_name" => "%{column13}"
"principal_userid" => "%{column15}"
"date" => "%{column16}"
"time" => "%{column17}"
}
}
Transform data using the mutate plugin
Use the mutate filter plugin to transform and consolidate data into a single block or to break the data into separate mutate blocks. When using a single block for the mutate functions, be aware that the mutations are executed in the order described in the Logstash mutate plugin documentation.
Convert functions
Use the convert function to transform values to different data types. This conversion is needed to assign values into fields within the respective data type schemas. Legacy proto definitions (EDR, Webproxy, etc.) require data type conversions to match the target data type. For example, IP address fields need to be converted before assigning the value to the target schema field. UDM allows for handling of most fields as strings, including IP address fields. Supported data types include the following:
- boolean
- float
- hash
- integer
- ipaddress
- macaddress
- string
- uinteger
Convert example
mutate {
convert => {
"jsonPayload.packets_sent" => "uinteger"
}
}
Gsub function
Match a regular expression against a field value and replace all matches with a replacement string. This applies only to string fields.
Gsub syntax
This configuration takes an array consisting of 3 elements per field/substitution. In other words, for every substitution you want to make you add three elements to the gsub configuration array, specifically the name of the field, the regular expression to replace, and the substitution string.
The gsub function supports re2 syntax{: class="external"). You can use simple strings most of the time as long as they don't contain characters that have a special meaning, such as brackets ( [ or ] ). If you need special characters to be interpreted literally, they must be "escaped" by preceding each character with a backslash ( \ ). One important exception to that rule is escaping backslashes themselves. Both the literal backslash and the backslash that indicates that it should be "escaped" must themselves be escaped, so if you want to refer to a literal backslash you need four backslashes ( \\).
mutate {
gsub => [
# replace all occurrences of the three letters "cat" with the three letters "dog"
"fieldname1", "cat", "dog",
# replace all forward slashes with underscore
"fieldname2", "/", "_",
# replace backslashes, question marks, hashes, and minuses # with a dot "."
"fieldname3", "[\\\\?#-]", "."
]
}
Lowercase functions
The lowercase function is used to transform a value into a lowercase value.
lowercase syntax
mutate {
lowercase => [ "token" ]
}
Lowercase example
mutate {
lowercase => [ "protocol" ]
}
Merge function
The merge function is used to join multiple fields. When parsing repeated fields, such as ip_address fields,use of the merge function to assign IP addresses to the token. Additionally, the merge function is used to generate the normalized output message that is ingested in Chronicle and can be used to generate multiple events from the same log line.
Merge syntax
mutate {
merge => {
"destinationToken" => "addedToken"
}
}
Merge function example using a repeated field
mutate {
merge => {
"event.idm.read_only_udm.target.ip" => "dstAddr"
}
Merge function example - output to a UDM record
mutate {
merge => {
"@output" => "event"
}
}
Rename function
The rename function renames a token and assigns the value to a new token. Use this function when the tokenized value can be directly assigned to the schema defined token. The original token and new token must be of the same data type before performing the rename transformation. By using the rename function, the original token is destroyed and replaced with the new token.
Rename function syntax
mutate {
rename => {
"originalToken" => "newToken"
}
}
Rename function example
mutate {
rename => {
"proto" => "event.idm.read_only_udm.network.ip_protocol"
"srcport" => "event.idm.read_only_udm.network.target.port"
}
}
Replace function
The replace function assigns a value to a token. The assignment can be based on constants, existing field values or a combination of values. The replace function can also be used to define a token declaration. This function can only be used for string values.
Replace function syntax - Assign a constant
mutate {
replace => {
"token" => "newConstantValue"
}
}
Replace function syntax - assign a variable value
mutate {
replace => {
"token" => "%{otherTokenValue}"
}
}
Replace function example - assign a constant
mutate {
replace => {
"event.idm.read_only_udm.security_result.action" => "ALLOWED"
}
}
Replace function example - assign a variable value
mutate {
replace => {
"shost" => "%{dhost}"
}
}
Uppercase function
The uppercase function is used to transform a value into an uppercase value.
Uppercase function syntax
mutate {
uppercase => [ "token" ]
}
Uppercase function example
mutate {
uppercase => [ "protocol" ]
}
Transform data using other functions
Date function
The date function is required to handle the date and timestamp from the log extraction. UDM fields that store a Timestamp require a properly normalized date value. The date function supports a variety of date formats, including ISO8601, UNIX and others along with custom-defined date and time formats.
Chronicle supports the following pre-defined date formats:
- ISO8601
- RFC 3399
- UNIX
- UNIX_MS
Chronicle also provides a set of system-supplied timestamps that can be used in mapping instructions.
- @createTimestamp - Always included and represents the time Chronicle received the logs.
- @timestamp - Optional. The timestamp provided by Splunk or PCAP collection, if it exists.
- @collectionTimestamp - Optional value at the log entry level. This timestamp represents the time the Forwarder collected the log entry. However, this value might not be present for logs that are ingested using the out-of-band processor.
At the end of processing, Chronicle uses the timestamp present in the @timestamp field as the timestamp value for all events. By default, the date filter takes precedence to populate the @timestamp field. However, if there is a need to use the log receipt time as the event timestamp, you can use the rename function to rename the @createTimestamp to @timestamp. Best practice is to use the log message and extract the date value. If the log doesn't include a timestamp value, you might need to use createTimestamp for log ingestion.
Date function syntax
date {
match => ["token", "format"]
}
Date function example
date {
match => ["when", "yyyy-MM-dd HH:mm:ss"]
}
Date example using a timezone specification
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss"]
timezone => "America/New_York"
}
Specify multiple date formats
date {
match => ["ts", "yyyy-MM-dd HH:mm:ss", "UNIX", "ISO8601", "UNIX_MS"] }
Handle timestamps without a year value - rebase option
The rebase option for the date filter supports the ability to handle timestamps without a year value. It sets the year based on the time the data was ingested. // event
rebase option example
date {
match => ["when", "MMM dd HH:mm:ss"]
rebase => true
}
Drop function
This function is used to drop all messages that reach this filter logic.
Drop syntax
drop {}
Drop example
if [domain] == "-" {
drop {}
}
Conditional logic
Conditionals are consistent with the Logstash documentation concerning the usage of conditional statements. However, with parser syntax, only use conditionals as part of the filter logic for event transformation. Currently, the only conditional logic statements available are if, if/else, and if/else if/else.
Operator | Syntax |
---|---|
Equal | == |
Not equal | != |
Less than | < |
Greater than | > |
Less than or equal | <= |
Greater than or equal | >= |
Regular expression match | =~ |
Regular expression does not match | !~ |
Conditional syntax - If
if [token] == "value" {
<code block>
}
Conditional syntax - if/else
if [token1] == "value1" and [token2] == "value2" {
<code block 1>
} else {
<code block 2>
}
Conditional syntax - if/else if/else
if [token] == "value1" {
<code block 1>
} else if [token] == "value2" {
<code block 2>
} else {
<code block 3>
}
Conditional syntax - If example
if [protocol] == "tcp" or [protocol] == "udp" or [protocol] == "icmp" {
mutate {
uppercase => [ "protocol" ]
}
}
Conditional syntax - if/else if/else example
if [action] == "drop" or [action] == "deny" or [action] == "drop ICMP" {
mutate {
replace => {
"event.idm.read_only_udm.security_result.action" => "BLOCK"
}
}
} else if [action] == "allow" {
mutate {
replace => {
"event.idm.read_only_udm.security_result.action" => "ALLOW"
}
}
} else {
mutate {
replace => {
"event.idm.read_only_udm.security_result.action" => "UNKNOWN_ACTION"
}
}
}
Error handling - on_error
Set the on_error property on any filter to catch errors. This property sets a value to true if an error was encountered, and false otherwise.
on_error syntax
on_error => "<value>"
On_Error function example
For example, you can use the on_error function to check if a value is an IP address without causing a failure. A use case for this function is to determine if the field value is an IP address or hostname, and then handle the field value appropriately.
mutate {
convert => {
"host" => "ipaddress"
}
on_error => "is_not_ip"
}
if [is_not_ip] {
# This means it's not an IP
}
Output data to a UDM record
Use the merge function to generate output. It is possible to generate more than one event message based on a single log line. For example, you might want to create the web proxy event message, but also generate an alert message.
Generating output - single event
mutate {
merge => {
"@output" => "event"
}
}
Generating output - multiple events
mutate {
merge => {
"@output" => "event1"
}
}
mutate {
merge => {
"@output" => "event2"
}
}
When generating multiple events as output, instead of assigning field values to
event.*
, use a designation such as event1.*
and event2.*
to differentiate
between the values assigned to the first event versus the second event.