Parser syntax reference

This document describes the functions, parsing patterns, and other syntax supported in data mapping instructions. See Overview of parsing for a conceptual overview of how Google Security Operations parses raw logs to Unified Data Model (UDM) format.

Default parsers, customer-specific parsers, and 'code snippet' parser extensions use code-like data mapping instructions to convert fields in the original raw log to Unified Data Model format (UDM). The parser syntax is similar to Logstash, but not identical.

Extract data using the Grok function

With Grok, you can use predefined patterns and regular expressions to match log messages, and extract values from the log message into tokens. Grok data extraction requires that field labels are defined as part of the data extraction process.

Syntax for predefined patterns in Grok

%{pattern:token}

%{IP:hostip}
%{NUMBER:event_id}

Syntax for regular expressions in Grok

The following regex pattern examples can be used to extract values from log messages.

Regex	Data
\s	Space
\S	Not space
\d	Digit
\D	Not digit
\w	Word
\W	Not word

(?P<token>regex_pattern)
(?P<eventId>\\S+)

Sample log message and Grok patterns

Example original raw log

Mar 15 11:08:06 hostdevice1: FW-112233: Accepted connection TCP 10.100.123.45/9988 to 8.8.8.8/53

Grok pattern to extract data from the log

%{SYSLOGTIMESTAMP:when} %{HOST:deviceName}: FW-%{INT:messageid}: (?P<action>Accepted|Denied) connection %{WORD:protocol} %{IP:srcAddr}/%{INT:srcPort} to %{IP:dstAddr}/%{INT:dstPort}

The following table describe available tokens and values that can be used in Grok patterns.

Token	Value
when	Mar 15 11:08:06
deviceName	hostdevice1
messageid	112233
action	Accepted
protocol	TCP
srcAddr	10.100.123.45
srcPort	9988
dstAddr	8.8.8.8
dstPort	53

Grok extraction syntax

grok {
 match => {
 "message" => "<grok pattern>"
 }
}

Grok overwrite option

The overwrite option used with the Grok syntax allows you to overwrite a field that already exists. This feature can be used to replace a default value with the value extracted by the Grok pattern.

mutate {
 replace => {
 "fieldName" => "<default value>"
 }
}


grok {
 match => { "message" => ["(?P<fieldName>.*)"] }
 overwrite => ["fieldName"]
}

Extract JSON formatted logs

JSON extraction syntax

json {
  source => "message"
}

Manipulating JSON arrays

JSON arrays can be accessed by adding an array_function parameter.

json {
  source => "message"
  array_function => "split_columns"
}

The split_columns function makes elements of an array accessible through an index. So, if you have an array that looks like the following:

{ "ips" : ["1.2.3.4","1.2.3.5"] .. }

you will be able to access the two values using ips.0 and ips.1 tokens.

If you have nested arrays or multiple arrays, the function will unnest arrays recursively. Here is a nested array example.

{
  "devices": [
    {
      "ips": ["1.2.3.4"]
    }
  ]
}

In this case, you access the IP address using devices.0.ips.0 token. Because devices.1 doesn't exist, it will behave the same as other non-existing elements of JSON.

If an element in a JSON doesn't exist, then:

You cannot access it using an if statement unless you initialize the token to an empty string, "", before calling the JSON filter.
You cannot use it in the mutate plugin's replace filter because it will cause an error.
You can use it in the mutate plugin's rename filter because these will be ignored.
You can use it in the mutate plug-in's merge filter because these will be ignored.

mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.0"  }
}

mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.1"  }
}

# this doesn't fail even though this element doesn't exist.
mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.2"  }
}

# this doesn't fail even though this element doesn't exist.
mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.3"  }
}

Extracting XML formatted logs

XML extraction syntax

Define the path to the field in the original log using XPath expression syntax.

xml {
  source => "message"
    xpath => {
      "/Event/System/EventID" => "eventId"
      "/Event/System/Computer" => "hostname"
 }
}

Manipulating XML with iteration

For example if the sample looks like this:

message -
<Event>
    <HOST_LIST>
        <HOST>
            <ID>iD1</ID>
            <IP>iP1</IP>
        </HOST>
        <HOST>
            <ID>iD2</ID>
            <IP>iP2</IP>
        </HOST>
    </HOST_LIST>
</Event>

If we want to iterate over the above sample log then use the following:

for index, _ in xml(message,/Event/HOST_LIST/HOST){
   xml {
    source => "message"
    xpath => {
      "/Event/HOST_LIST/HOST[%{index}]/ID" => "IDs"
      "/Event/HOST_LIST/HOST[%{index}]/IP" => "IPs"
    }
  }
}

We also support nested for loop. See the sample below:

message -
<Event>
    <HOST_LIST>
        <HOST>
            <ID>id1</ID>
            <IP>ip1</IP>
            <Hashes>
                <Hash>hash1</Hash>
                <Hash>hash2</Hash>
            </Hashes>
        </HOST>
        <HOST>
            <ID>id2</ID>
            <IP>ip2</IP>
            <Hashes>
                <Hash>hash1</Hash>
                <Hash>hash2</Hash>
            </Hashes>
        </HOST>
    </HOST_LIST>
</Event>
"

for index, _ in xml(message, /Event/HOST_LIST/HOST){
    xml {
      source => "message"
      xpath => {
        "/Event/HOST_LIST/HOST[%{index}]/ID" => "IDs"
      }
    }
    for i, _ in xml(message, /Event/HOST_LIST/HOST[%{index}]/Hashes/Hash) {
      xml {
        source => "message"
        xpath => {
          "/Event/HOST_LIST/HOST[%{index}]/Hashes/Hash[%{i}]" => "data"
        }
      }
    }
  }

Note - index starts with 1.

Extract key-value formatted logs

Key-value extraction syntax

kv {
  source => "message"
  field_split => "|"
  value_split => ":"
  whitespace => "strict"
  trim_value => "\""
}

Key-value extraction example

# initialize the token
mutate {
  replace => {
    "destination" => ""
    }
}

# use the kv filter to split the log.
kv {
  source => "message"
  field_split => " "
  trim_value => "\""
  on_error = "kvfail"
}

# assigned one of the field values to a UDM field
mutate {
  replace => {
    "event.idm.read_only_udm.target.hostname" => "%{destination}"
  }
}

The kv filter includes the following options:

field_split option: The field_split option enables you to extract key-value pairs from a string, for example extracting parameters from a URL query string or key-value pairs from the log. Specify the delimiter that separates each key-value pair.
value_split option: Use the value_split option to identify the delimiter between the key and the value.
whitespace option: Use the whitespace option to handle acceptance of unnecessary whitespace around the key/value pair. The default option is lenient which ignores the surrounding whitespace. If you have a situation where the white space should not be ignored, set the option to "strict".
trim_value option: Use the trim_value option to remove extraneous leading and trailing characters from the value, such as quotation marks.

Extract CSV formatted logs

# parse the message into individual variables, identified as column1, column2, column3, etc.
csv {
  source => "message"
}


# assign each value to a token
  mutate {
    replace => {
      "resource_id" => "%{column1}"
      "principal_company_name" => "%{column3}"
      "location" => "%{column4}"
      "transaction_amount" => "%{column6}"
      "status" => "%{column9}"
      "meta_description" => "%{column11}"
      "target_userid" => "%{column24}"
      "target_company_name" => "%{column13}"
      "principal_userid" => "%{column15}"
      "date" => "%{column16}"
      "time" => "%{column17}"
    }
}

Transform data using the mutate plugin

Use the mutate filter plugin to transform and consolidate data into a single block or to break the data into separate mutate blocks. When using a single block for the mutate functions, be aware that the mutations are executed in the order described in the Logstash mutate plugin documentation.

Convert functions

Use the convert function to transform values to different data types. This conversion is needed to assign values into fields within the respective data type schemas. Legacy proto definitions (EDR, Webproxy, etc.) require data type conversions to match the target data type. For example, IP address fields need to be converted before assigning the value to the target schema field. UDM allows for handling of most fields as strings, including IP address fields. Supported data types include the following:

boolean
float
hash
integer
ipaddress
macaddress
string
uinteger
hextodec
hextoascii

Convert example

mutate {
  convert => {
    "jsonPayload.packets_sent" => "uinteger"
  }
}

Gsub function

Match a regular expression against a field value and replace all matches with a replacement string. This applies only to string fields.

Gsub syntax

This configuration takes an array consisting of 3 elements per field/substitution. In other words, for every substitution you want to make you add three elements to the gsub configuration array, specifically the name of the field, the regular expression to replace, and the substitution string.

The gsub function supports re2 syntax. You can use simple strings most of the time as long as they don't contain characters that have a special meaning, such as brackets ( [ or ] ). If you need special characters to be interpreted literally, they must be "escaped" by preceding each character with a backslash ( \ ). One important exception to that rule is escaping backslashes themselves. Both the literal backslash and the backslash that indicates that it should be "escaped" must themselves be escaped, so if you want to refer to a literal backslash you need four backslashes ( \\).

mutate {
  gsub => [
    # replace all occurrences of the three letters "cat" with the three letters "dog"
    "fieldname1", "cat", "dog",
    # replace all forward slashes with underscore
    "fieldname2", "/", "_",
    # replace backslashes, question marks, hashes, and minuses  # with a dot "."
    "fieldname3", "[\\\\?#-]", "."
   ]
}

Lowercase functions

The lowercase function is used to transform a value into a lowercase value.

Lowercase syntax

mutate {
  lowercase => [ "token" ]
}

Lowercase example

mutate {
  lowercase => [ "protocol" ]
}

Merge function

The merge function is used to join multiple fields. When parsing repeated fields, such as ip_address fields,use of the merge function to assign IP addresses to the token. Additionally, the merge function is used to generate the normalized output message that is ingested in Google Security Operations and can be used to generate multiple events from the same log line.

Merge syntax

mutate {
  merge => {
    "destinationToken" => "addedToken"
   }
}

Merge function example - using a repeated field

mutate {
  merge => {
    "event.idm.read_only_udm.target.ip" => "dstAddr"
  }

Merge function example - output to a UDM record

mutate {
  merge => {
    "@output" => "event"
  }
}

Rename function

The rename function renames a token and assigns the value to a new token. Use this function when the tokenized value can be directly assigned to the schema defined token. The original token and new token must be of the same data type before performing the rename transformation. By using the rename function, the original token is destroyed and replaced with the new token.

Rename function syntax

mutate {
  rename => {
    "originalToken" => "newToken"
  }
}

Rename function example

mutate {
  rename => {
    "proto" => "event.idm.read_only_udm.network.ip_protocol"
    "srcport" => "event.idm.read_only_udm.network.target.port"
  }
}

Replace function

The replace function assigns a value to a token. The assignment can be based on constants, existing field values or a combination of values. The replace function can also be used to define a token declaration. This function can only be used for string values.

Replace function syntax - assign a constant

mutate {
  replace => {
    "token" => "newConstantValue"
  }
}

Replace function syntax - assign a variable value

mutate {
  replace => {
    "token" => "%{otherTokenValue}"
  }
}

Replace function example - assign a constant

mutate {
  replace => {
    "event.idm.read_only_udm.security_result.action" => "ALLOWED"
  }
}

Replace function example - assign a variable value

mutate {
  replace => {
    "shost" => "%{dhost}"
  }
}

Uppercase function

The uppercase function is used to transform a value into an uppercase value.

Uppercase function syntax

mutate {
  uppercase => [ "token" ]
}

Uppercase function example

mutate {
  uppercase => [ "protocol" ]
}

RemoveField Function

The remove_field function destroys a token. The name of the token to be destroyed can be either static or dynamic using existing token values. No action is performed if the token doesn't exist.

RemoveField function syntax - remove a static token

mutate {
  remove_field => [ "token" ]
}

RemoveField function syntax - remove a dynamic token

mutate {
  remove_field => [ "%{someTokenValue}" ]
}

RemoveField Function Example - remove a static token

mutate {
  remove_field => [ "event.webproxy.protocol" ]
}

RemoveField Function Example - remove a dynamic token

mutate {
  remove_field => [ "network.%{application_protocol}" ]
}

Copy Function

The copy function deep copies the value of source token into destination token. There is no restriction on type of value that can be copied. After copying, any change to destination token's value will have no effect on source token's value and vice-versa because the value is deepcopied. Before applying copy function, the source token must exist. Also, if the destination token does not exist, a new token is created otherwise the value of old token is overridden.

Copy syntax

mutate {
  copy => {
    "destinationToken" => "sourceToken"
  }
}

Split Function

The split function splits a string into an iterable array.

mutate {
    split => {
       source => "src_field"
       separator => ","
       target => "target_field"
    }
  }

Transform data using other functions

base64 function

The base64 function converts a base64 encoded value to a string. This function is based on the Go language base64 package.

The source field identifies the variable where the input value is stored. The target value identifies the variable where to store the output. By default, the function uses Standard decoding, but can be configured to use URL decoding.

base64 {
  source => "ip_address"
  target => "ip_address_string"
  encoding => "Standard"
}

The following example converts a base64 encoded IP address. The example checks if the ip_address variable is populated, decodes the value, and then stores the decoded value in the ip_address_string variable. The value stored in the the ip_address_string variable is appended to the event.idm.read_only_udm.target.ip UDM field.

if [ip_address] != "" {
  base64 {
    source => "ip_address"
    target => "ip_address_string"
  }
  mutate {
    merge => {
      "event.idm.read_only_udm.target.ip" => "%{ip_address_string}"
    }
  }
}

Date function

The date function is required to handle the date and timestamp from the log extraction. UDM fields that store a Timestamp require a properly normalized date value. The date function supports a variety of date formats, including ISO8601, UNIX and others along with custom-defined date and time formats.

Google Security Operations supports the following predefined date formats:

ISO 8601
RFC 3399
UNIX
UNIX_MS

Google Security Operations also provides a set of system-supplied timestamps that can be used in mapping instructions.

@createTimestamp: Always included and represents the time Google Security Operations received the logs.
@timestamp: Optional. The timestamp provided by Splunk or PCAP collection, if it exists.
@collectionTimestamp: Optional value at the log entry level. This timestamp represents the time the Forwarder collected the log entry. However, this value might not be present for logs that are ingested using the out-of-band processor.

At the end of processing, Google Security Operations uses the timestamp present in the @timestamp field as the timestamp value for all events. By default, the date filter takes precedence to populate the @timestamp field. However, if there is a need to use the log receipt time as the event timestamp, you can use the rename function to rename the @createTimestamp to @timestamp. Best practice is to use the log message and extract the date value. If the log doesn't include a timestamp value, you might need to use createTimestamp for log ingestion.

For UNIX and UNIX-MS date formats, use the on_error statement to handle errors.

Date function syntax

date {
  match => ["token", "format"]
  on_error => "no_match"
}

Date function example

date {
  match => ["when", "yyyy-MM-dd HH:mm:ss"]
  on_error => "no_match"
}

Date example using a timezone specification

date {
  match => ["logtime", "yyyy-MM-dd HH:mm:ss"]
  timezone => "America/New_York"
  on_error => "no_match"
}

Specify multiple date formats

date {
  match => ["ts", "yyyy-MM-dd HH:mm:ss", "UNIX", "ISO8601", "UNIX_MS"] 
  on_error => "no_match"}

Handle timestamps without a year value - rebase option

The rebase option for the date filter supports the ability to handle timestamps without a year value. It sets the year based on the time the data was ingested.

rebase option example

date {
  match => ["when", "MMM dd HH:mm:ss"]
  rebase => true
  on_error => "no_match"
}

Drop function

This function is used to drop all messages that reach this filter logic.

Drop syntax

drop {}

Drop example

if [domain] == "-" {
  drop {}
}

Conditional logic

Conditionals are consistent with the Logstash documentation concerning the usage of conditional statements. However, with parser syntax, only use conditionals as part of the filter logic for event transformation. Currently, the only conditional logic statements available are if, if/else, and if/else if/else.

Operator	Syntax
Equal	==
Not equal	!=
Less than	<
Greater than	>
Less than or equal	<=
Greater than or equal	>=
Regular expression match	=~
Regular expression does not match	!~

Conditional syntax - If

if [token] == "value" {
 <code block>
}

Conditional syntax - if/else

if [token1] == "value1" and [token2] == "value2" {
  <code block 1>
} else {
  <code block 2>
}

Conditional syntax - if/else if/else

if [token] == "value1" {
  <code block 1>
} else if [token] == "value2" {
  <code block 2>
} else {
  <code block 3>
}

Conditional syntax - If example

if [protocol] == "tcp" or [protocol] == "udp" or [protocol] == "icmp" {
  mutate {
   uppercase => [ "protocol" ]
 }
}

Conditional syntax - if/else if/else example

if [action] == "drop" or [action] == "deny" or [action] == "drop ICMP" {
  mutate {
    replace => {
    "event.idm.read_only_udm.security_result.action" => "BLOCK"
    }
  }
} else if [action] == "allow" {
  mutate {
    replace => {
      "event.idm.read_only_udm.security_result.action" => "ALLOW"
}
  }
} else {
  mutate {
    replace => {
      "event.idm.read_only_udm.security_result.action" => "UNKNOWN_ACTION"
    }
  }
}

Error handling - on_error

Set the on_error property on any filter to catch errors. This property sets a value to true if an error was encountered, and false otherwise.

on_error syntax

on_error => "<value>"

On_Error function example

For example, you can use the on_error function to check if a value is an IP address without causing a failure. A use case for this function is to determine if the field value is an IP address or hostname, and then handle the field value appropriately.

mutate {
  convert => {
    "host" => "ipaddress"
  }
  on_error => "is_not_ip"
}

if [is_not_ip] {
  # This means it's not an IP
}

Output data to a UDM record

Use the merge function to generate output. It is possible to generate more than one event message based on a single log line. For example, you might want to create the web proxy event message, but also generate an alert message.

Generating output - single event

mutate {
  merge => {
    "@output" => "event"
 }
}

Generating output - multiple events

mutate {
  merge => {
    "@output" => "event1"
 }
}
mutate {
  merge => {
    "@output" => "event2"
 }
}

When generating multiple events as output, instead of assigning field values to event.*, use a designation such as event1.* and event2.* to differentiate between the values assigned to the first event versus the second event.

Validate data using statedump plugin

Use the statedump filter plugin to validate the internal state of a parser. The filter shows all the values set by the parser during troubleshooting. You can use multiple blocks of statedump filters using the label property.

You can use the statedump filter for troubleshooting only. You must remove the statedump filter blocks before validation.

Statedump filter example with label

statedump {
  label => "foo"
}

Statedump output example:

Internal State (label=foo):
{
  "@createTimestamp": {
    "nanos": 0,
    "seconds": 1693549534
  },
  "@enableCbnForLoop": true,
  "@onErrorCount": 0,
  "@output": [],
  "@timezone": "",
  "event": {
    "idm": {
      "read_only_udm": {
        "metadata": {
          "event_type": "GENERIC_EVENT"
        }
      }
    }
  },
  "message": "my sample log"
}

The statedump filter includes the following options:

@createTimestamp - The time when this dump was created.
@enableCbnForLoop - Internal flag.
@onErrorCount - Number of errors discovered so far.
@output - The final output from the parser.
@timezone - The offset from UTC for log entries.
event - The event mapping done by the parser.
message - The log message which the parser is run against.