Parser syntax reference

This document describes the functions, parsing patterns, and other syntax supported in data mapping instructions. See Overview of parsing for a conceptual overview of how Google Security Operations parses raw logs to Unified Data Model (UDM) format.

Default parsers, customer-specific parsers, and 'code snippet' parser extensions use code-like data mapping instructions to convert fields in the original raw log to Unified Data Model format (UDM). The parser syntax is similar to Logstash, but not identical.

Extract data using the Grok function

With Grok, you can use predefined patterns and regular expressions to match log messages, and extract values from the log message into tokens. Grok data extraction requires that field labels are defined as part of the data extraction process.

Syntax for predefined patterns in Grok

%{pattern:token}
%{IP:hostip}
%{NUMBER:event_id}

Syntax for regular expressions in Grok

The following regex pattern examples can be used to extract values from log messages.

Regex Data
\s Space
\S Not space
\d Digit
\D Not digit
\w Word
\W Not word
(?P<token>regex_pattern)
(?P<eventId>\\S+)

Sample log message and Grok patterns

Example original raw log

Mar 15 11:08:06 hostdevice1: FW-112233: Accepted connection TCP 10.100.123.45/9988 to 8.8.8.8/53

Grok pattern to extract data from the log

%{SYSLOGTIMESTAMP:when} %{HOST:deviceName}: FW-%{INT:messageid}: (?P<action>Accepted|Denied) connection %{WORD:protocol} %{IP:srcAddr}/%{INT:srcPort} to %{IP:dstAddr}/%{INT:dstPort}

The following table describe available tokens and values that can be used in Grok patterns.

Token Value
when Mar 15 11:08:06
deviceName hostdevice1
messageid 112233
action Accepted
protocol TCP
srcAddr 10.100.123.45
srcPort 9988
dstAddr 8.8.8.8
dstPort 53

Grok extraction syntax

grok {
 match => {
 "message" => "<grok pattern>"
 }
}

Grok overwrite option

The overwrite option used with the Grok syntax allows you to overwrite a field that already exists. This feature can be used to replace a default value with the value extracted by the Grok pattern.

mutate {
 replace => {
 "fieldName" => "<default value>"
 }
}


grok {
 match => { "message" => ["(?P<fieldName>.*)"] }
 overwrite => ["fieldName"]
}

Extract JSON formatted logs

JSON extraction syntax

json {
  source => "message"
}

Manipulating JSON arrays

JSON arrays can be accessed by adding an array_function parameter.

json {
  source => "message"
  array_function => "split_columns"
}

The split_columns function makes elements of an array accessible through an index. So, if you have an array that looks like the following:

{ "ips" : ["1.2.3.4","1.2.3.5"] .. }

you will be able to access the two values using ips.0 and ips.1 tokens.

If you have nested arrays or multiple arrays, the function will unnest arrays recursively. Here is a nested array example.

{
  "devices": [
    {
      "ips": ["1.2.3.4"]
    }
  ]
}

In this case, you access the IP address using devices.0.ips.0 token. Because devices.1 doesn't exist, it will behave the same as other non-existing elements of JSON.

If an element in a JSON doesn't exist, then:

  • You cannot access it using an if statement unless you initialize the token to an empty string, "", before calling the JSON filter.
  • You cannot use it in the mutate plugin's replace filter because it will cause an error.
  • You can use it in the mutate plugin's rename filter because these will be ignored.
  • You can use it in the mutate plug-in's merge filter because these will be ignored.
mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.0"  }
}

mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.1"  }
}

# this doesn't fail even though this element doesn't exist.
mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.2"  }
}

# this doesn't fail even though this element doesn't exist.
mutate {
  merge => {
    "event.idm.read_only_udm.observer.ip" => "ips.3"  }
}

Extracting XML formatted logs

XML extraction syntax

Define the path to the field in the original log using XPath expression syntax.

xml {
  source => "message"
    xpath => {
      "/Event/System/EventID" => "eventId"
      "/Event/System/Computer" => "hostname"
 }
}

Manipulating XML with iteration

For example if the sample looks like this:

message -
<Event>
    <HOST_LIST>
        <HOST>
            <ID>iD1</ID>
            <IP>iP1</IP>
        </HOST>
        <HOST>
            <ID>iD2</ID>
            <IP>iP2</IP>
        </HOST>
    </HOST_LIST>
</Event>

If we want to iterate over the above sample log then use the following:

for index, _ in xml(message,/Event/HOST_LIST/HOST){
   xml {
    source => "message"
    xpath => {
      "/Event/HOST_LIST/HOST[%{index}]/ID" => "IDs"
      "/Event/HOST_LIST/HOST[%{index}]/IP" => "IPs"
    }
  }
}

We also support nested for loop. See the sample below:

message -
<Event>
    <HOST_LIST>
        <HOST>
            <ID>id1</ID>
            <IP>ip1</IP>
            <Hashes>
                <Hash>hash1</Hash>
                <Hash>hash2</Hash>
            </Hashes>
        </HOST>
        <HOST>
            <ID>id2</ID>
            <IP>ip2</IP>
            <Hashes>
                <Hash>hash1</Hash>
                <Hash>hash2</Hash>
            </Hashes>
        </HOST>
    </HOST_LIST>
</Event>
"
for index, _ in xml(message, /Event/HOST_LIST/HOST){
    xml {
      source => "message"
      xpath => {
        "/Event/HOST_LIST/HOST[%{index}]/ID" => "IDs"
      }
    }
    for i, _ in xml(message, /Event/HOST_LIST/HOST[%{index}]/Hashes/Hash) {
      xml {
        source => "message"
        xpath => {
          "/Event/HOST_LIST/HOST[%{index}]/Hashes/Hash[%{i}]" => "data"
        }
      }
    }
  }

Note - index starts with 1.

Extract key-value formatted logs

Key-value extraction syntax

kv {
  source => "message"
  field_split => "|"
  value_split => ":"
  whitespace => "strict"
  trim_value => "\""
}

Key-value extraction example

# initialize the token
mutate {
  replace => {
    "destination" => ""
    }
}

# use the kv filter to split the log.
kv {
  source => "message"
  field_split => " "
  trim_value => "\""
  on_error = "kvfail"
}

# assigned one of the field values to a UDM field
mutate {
  replace => {
    "event.idm.read_only_udm.target.hostname" => "%{destination}"
  }
}

The kv filter includes the following options:

  • field_split option: The field_split option enables you to extract key-value pairs from a string, for example extracting parameters from a URL query string or key-value pairs from the log. Specify the delimiter that separates each key-value pair.
  • value_split option: Use the value_split option to identify the delimiter between the key and the value.
  • whitespace option: Use the whitespace option to handle acceptance of unnecessary whitespace around the key/value pair. The default option is lenient which ignores the surrounding whitespace. If you have a situation where the white space should not be ignored, set the option to "strict".
  • trim_value option: Use the trim_value option to remove extraneous leading and trailing characters from the value, such as quotation marks.

Extract CSV formatted logs

# parse the message into individual variables, identified as column1, column2, column3, etc.
csv {
  source => "message"
}


# assign each value to a token
  mutate {
    replace => {
      "resource_id" => "%{column1}"
      "principal_company_name" => "%{column3}"
      "location" => "%{column4}"
      "transaction_amount" => "%{column6}"
      "status" => "%{column9}"
      "meta_description" => "%{column11}"
      "target_userid" => "%{column24}"
      "target_company_name" => "%{column13}"
      "principal_userid" => "%{column15}"
      "date" => "%{column16}"
      "time" => "%{column17}"
    }
}

Loop over a JSON array using a for loop

You can use a for loop to iterate over a JSON array. The syntax is as follows:

for <item> in <array> {
    ...
  }

Loop over an array

The following is a sample log containing a businessPhones field, which is an array containing phone numbers.

entries: <
    data: "{\"businessPhones\":[\"(123) 234-2320\", \"(123) 234-2321\"],
    \"displayName\":\"Quinn\",\"employeeId\":null,
    \"givenName\":\"Quinn\",\"id\":\"abc-123\",
    \"jobTitle\":\"Technician\",\"mail\":\"quinn@altostrat.com\",
    \"onPremisesSecurityIdentifier\":\"A-1-5-21\",}"

The JSON representation of the businessPhones field is as follows:

{
   "data":{
      "businessPhones":[
         "(123) 234-2320",
         "(123) 234-2321"
      ]
   }
}

You can use the for keyword to iterate over phone numbers:

filter {
  json {
    source => "message"
    array_function => "split_columns"
  }
  for phoneNumber in data.businessPhones {
      mutate {
        merge => {
          "event.idm.read_only_udm.principal.resource.attribute.labels" => "phoneNumber"
        }
      }
    statedump {}
    }
}

The statedump output (abbreviated version) looks like the following:

"event": {
    "idm": {
      "read_only_udm": {
        "principal": {
          "resource": {
            "attribute": {
              "labels": [
                "(123) 234-2320",
                "(123) 234-2321"
              ]
            }
          }
        }
      }
    }
  }

Get the index of an array

You can also get the index value of an array. The value of the index starts with 0. The syntax is as follows:

for index, <item> in <array> {
    ...
  }

In the following example, you can extract the index of each element in the businessPhones array:

{
   "data":{
      "businessPhones":[
         "(123) 234-2320",
         "(123) 234-2321"
      ]
   }
}

In the following parser code, the for loop iterates over each element of the buinessPhones array and extracts its index value:

filter {
 json {
   source => "message"
   array_function => "split_columns"
 }
 for index, phoneNumber in data.businessPhones {
   mutate {
     convert => {
       "index" => "string"
     }
   }
   mutate {
     replace => {
       "phoneNumber_label" => ""
     }
   }
   mutate {
     replace => {
       "phoneNumber_label.key" => "phoneNumber %{index}"
       "phoneNumber_label.value" => "%{phoneNumber}"
     }
     on_error => "phoneNumber_invalid"
   }
   if ![phoneNumber_invalid] {
     mutate {
       merge => {
         "event.idm.read_only_udm.principal.resource.attribute.labels" => "phoneNumber_label"
       }
       on_error => "phoneNumber_label_merge_failed"
     }
   }
   statedump {}
 }
 mutate {
   replace => {
     "event.idm.read_only_udm.metadata.event_type" => "GENERIC_EVENT"
   }
 }
 mutate {
   merge => {
     "@output" => "event"
   }
 }
}

The statedump output (abbreviated version) looks like the following:

"event": {
   "idm": {
     "read_only_udm": {
       "principal": {
         "resource": {
           "attribute": {
             "labels": [
               {
                 "key": "phoneNumber 0",
                 "value": "(123) 234-2320"
               },
               {
                 "key": "phoneNumber 1",
                 "value": "(123) 234-2321"
               }
             ]
           }
         }
       }
     }
   }
 }

Access nested arrays

You can use a nested for loop to access nested arrays.

In the following sample log, resourceIdentifiers is an array of resource identifier objects. The resource identifier object contains three keys—tenantid, type, and subnet, which is a nested array of objects.

{
  "records": {
    "hostname": "host"
  },
  "resourceIdentifiers": [
    {
      "tentantid": "a123",
      "type": "access",
      "subnet": [
        {
          "ip": "10.1.1.1"
        },
        {
          "ip": "10.1.1.2"
        }
      ]
    }
  ]
}

In the following parser code, the nested for loops iterate over an array of objects extracting key-value pairs and organizing those key-value pairs into UDM fields.

filter {
 json {
   source => "message"
   array_function => "split_columns"
 }

 for index, resourceId in resourceIdentifiers {
   for key, value in resourceId map {
       mutate {
         replace => {
           "resId_map_label" => ""
         }
       }
       if [key] != "" {
         mutate {
           replace => {
             "resId_map_label.key" => "resourceId %{key}"
           }
           on_error => "key_invalid"
         }
         if ![key_invalid] and [value] != "" {
           mutate {
             replace => {
               "resId_map_label.value" => "%{value}"
             }
             # Because the key with the name "subnet" is an array, it will produce an error.
             # Hence, the value inside the "subnet" key will not show in the output.
             # To map the array values inside the "subnet" key, we need to run one more for loop over the value of the "subnet" key.
             on_error => "value_nested"
           }
           if ![value_nested] {
             mutate {
               merge => {
                 "event.idm.read_only_udm.principal.resource.attribute.labels" => "resId_map_label"
               }
               on_error => "resId_map_label_merge_failed"
             }
           }
           else {
             for sub_index, subnet in value {
               mutate {
                 convert => {
                   "sub_index" => "string"
                 }
               }
               for subnet_key, subnet_value in subnet map {
                 mutate {
                   replace => {
                     "subnet_map_label" => ""
                   }
                 }
                 mutate {
                   replace => {
                     "subnet_map_label.key" => "%{key} %{subnet_key} %{sub_index}"
                     "subnet_map_label.value" => "%{subnet_value}"
                   }
                   on_error => "subnet_value_invalid"
                 }
                 if ![subnet_value_invalid] {
                   mutate {
                     merge => {
                       "event.idm.read_only_udm.principal.resource.attribute.labels" => "subnet_map_label"
                     }
                     on_error => "subnet_map_label_merge_failed"
                   }
                 }
               }
             }
             statedump {}
           }
         }
       }
   }
 }
 mutate {
   replace => {
     "event.idm.read_only_udm.metadata.event_type" => "GENERIC_EVENT"
   }
 }
 mutate {
   merge => {
     "@output" => "event"
   }
 }
}

The statedump output (abbreviated version) looks like the following:

events_for_log_entry:  {
 events:  {
   timestamp:  {
     seconds:  1709619033
     nanos:  818679197
   }
   idm:  {
     read_only_udm:  {
       metadata:  {
         event_timestamp:  {
           seconds:  1709619033
           nanos:  818679197
         }
         event_type:  GENERIC_EVENT
       }
       principal:  {
         resource:  {
           attribute:  {
             labels:  {
               key:  "subnet ip 0"
               value:  "10.1.1.1"
             }
             labels:  {
               key:  "subnet ip 1"
               value:  "10.1.1.2"
             }
             labels:  {
               key:  "resourceId tentantid"
               value:  "a123"
             }
             labels:  {
               key:  "resourceId type"
               value:  "access"
             }
           }
         }
       }
     }
   }
 }
}

The following are some of the salient aspects of the parser code that is used to access nested arrays:

  • Outer for loopfor index, resourceId in resourceIdentifiers {...}:

    • This loop iterates over an array called resourceIdentifiers, which contains objects representing resource identifiers.
  • Inner for loopfor key, value in resourceId map {...}:

    • This loop iterates over each key-value pair within each resourceId object. It does this using the map keyword. To learn more about the map keyword, see Loop over a JSON object key-value pairs.

    • Create labels to store key-value pairs using the replace function.

    • Handle potential errors using on_error flags (for example, key_invalid and value_nested).

    • Iterate over the nested subnet array—for sub_index, subnet in value {...}: This for loop is used to iterate over the subnet key, which contains an array of subnet objects. Each key-value pair in the subnet array is then iterated over using the map keyword—for subnet_key, subnet_value in subnet map {...}.

    • Merge transformed data: Using the merge function, join the transformed data back into the event under a specific path (event.idm.read_only_udm.principal.resource.attribute.labels).

Loop over key-value pairs of a JSON object using map

You can use the map keyword to loop over key-value pairs of a JSON object. The syntax is as follows:

for key, value in <object> map {
    ...
  }

Both key and value are of the string data type.

Loop over key-value pairs of an object

The following is a sample log containing details of a Kubernetes resource.

entries: <
    message: "{\"resource\": {\"type\": \"k8s_container\",\"labels\":
    {\"container_name\": \"test-container\",\"namespace_name\": \"default\",
    \"location\": \"us-west1-a\",\"project_id\": \"abc-123\",
    \"cluster_name\": \"test-cluster\",\"pod_name\":
    \"test-pod-123\"}}}"

The JSON representation of the log is as follows:

{
   "resource":{
      "type":"k8s_container",
      "labels":{
         "container_name":"test-container",
         "namespace_name":"default",
         "location":"us-west1-a",
         "project_id":"abc-123",
         "cluster_name":"test-cluster",
         "pod_name":"test-pod-123"
      }
   }
}

The map keyword is added as a suffix to the existing for loop expression. This lets you iterate over key-value pairs of an object. In the following parser code, map is used to iterate over the key-value pairs of the resource.labels object.

filter {
  json {
    source => "message"
    array_function => "split_columns"
  }
  for key, value in resource.labels map {
    mutate {
      replace => {
        "test.key" => "%{key}"
        "test.value" => "%{value}"
      }
    }
  statedump {}
  }
}

The output for each iteration is as follows:

First Iteration:

"test": {
    "key": "cluster_name",
    "value": "test-cluster"
  }

Second Iteration:

"test": {
    "key": "container_name",
    "value": "test-container"
  }

Third Iteration:

 "test": {
    "key": "location",
    "value": "us-west1-a"
 }

Fourth Iteration:

"test": {
    "key": "namespace_name",
    "value": "default"
  }

Fifth Iteration:

"test": {
    "key": "pod_name",
    "value": "test-pod-123"
  }

Sixth Iteration:

 "test": {
    "key": "project_id",
    "value": "abc-123"
  }

Loop over key-value pairs of a nested object

You can use the map keyword to loop over key-value pairs of nested objects.

The following is a sample log containing details of a Kubernetes resource.

entries: <
    message: "{\"textPayload\": \"2023-07-18 18:01:27,259 INFO kube_hunter.
    modules.report.collector Found open service \\\"Kubelet API\\\" at 10.64.
    3.1:10250\",\"insertId\": \"3ki6ud2owr8d4afy\",\"resource\": {\"type\":
    \"k8s_container\",\"labels\": {\"container_name\": \"test-container\",
    \"namespace_name\": \"default\",\"location\": {\"country\" : \"US\",
    \"code\" : \"us-west1-a\"},\"cluster_name\": \"test-cluster\",
    \"pod_name\": \"test-pod-123\"}}}"

The JSON representation of the log is as follows:

{
   "resource":{
      "type":"k8s_container",
      "labels":{
         "container_name":"test-container",
         "namespace_name":"default",
         "location":{
            "code":"us-west1-a",
            "country":"US"
         },
         "cluster_name":"test-cluster",
         "pod_name":"test-pod-123"
      }
   }
}

In the following parser code, the map keyword is used to iterate over the key-value pairs of the nested object resource.labels.location.

filter {
  json {
    source => "message"
    array_function => "split_columns"
  }
  for key, value in resource.labels map {
    mutate {
      replace => {
        "test.key" => "%{key}"
      }
    }
    mutate {
      replace => {
        "test.value" => "%{value}"
      }
      on_error => "nested_key"
    }
    if [test][key] == "location" {
      for nestedKey, nestedValue in value map {
        mutate {
          replace => {
            "locationLabel.key" => "%{nestedKey}"
            "locationLabel.value" => "%{nestedValue}"
          }
        }
        statedump {}
      }
    }
  }
}

The statedump output (abbreviated version) for resource.labels.location looks like the following:

"key": "location",
"locationLabel": {
    "key": "code",
    "value": "us-west1-a"
  }
  .
  .
  .
"locationLabel": {
    "key": "country",
    "value": "US"
  }

Transform data using the mutate plugin

Use the mutate filter plugin to transform and consolidate data into a single block or to break the data into separate mutate blocks. When using a single block for the mutate functions, be aware that the mutations are executed in the order described in the Logstash mutate plugin documentation.

Convert functions

Use the convert function to transform values to different data types. This conversion is needed to assign values into fields within the respective data type schemas. Legacy proto definitions (EDR, Webproxy, etc.) require data type conversions to match the target data type. For example, IP address fields need to be converted before assigning the value to the target schema field. UDM allows for handling of most fields as strings, including IP address fields. Supported data types include the following:

  • boolean
  • float
  • hash
  • integer
  • ipaddress
  • macaddress
  • string
  • uinteger
  • hextodec
  • hextoascii

Convert example

mutate {
  convert => {
    "jsonPayload.packets_sent" => "uinteger"
  }
}

Gsub function

Match a regular expression against a field value and replace all matches with a replacement string. This applies only to string fields.

Gsub syntax

This configuration takes an array consisting of 3 elements per field/substitution. In other words, for every substitution you want to make you add three elements to the gsub configuration array, specifically the name of the field, the regular expression to replace, and the substitution string.

The gsub function supports re2 syntax. You can use simple strings most of the time as long as they don't contain characters that have a special meaning, such as brackets ( [ or ] ). If you need special characters to be interpreted literally, they must be escaped by preceding each character with a backslash (\).

One important exception to this rule is escaping a backslash itself where if you want to search for a literal backslash within your text, you need four backslashes (\\\\).

mutate {
  gsub => [
    # replace all occurrences of the three letters "cat" with the three letters "dog"
    "fieldname1", "cat", "dog",
    # replace all forward slashes with underscore
    "fieldname2", "/", "_",
    # replace backslashes, question marks, hashes, and minuses  # with a dot "."
    "fieldname3", "[\\\\?#-]", "."
   ]
}

Lowercase functions

The lowercase function is used to transform a value into a lowercase value.

Lowercase syntax

mutate {
  lowercase => [ "token" ]
}

Lowercase example

mutate {
  lowercase => [ "protocol" ]
}

Merge function

The merge function is used to join multiple fields. When parsing repeated fields, such as ip_address fields,use of the merge function to assign IP addresses to the token. Additionally, the merge function is used to generate the normalized output message that is ingested in Google Security Operations and can be used to generate multiple events from the same log line.

Merge syntax

mutate {
  merge => {
    "destinationToken" => "addedToken"
   }
}

Merge function example - using a repeated field

mutate {
  merge => {
    "event.idm.read_only_udm.target.ip" => "dstAddr"
  }


Merge function example - output to a UDM record

mutate {
  merge => {
    "@output" => "event"
  }
}

Rename function

The rename function renames a token and assigns the value to a new token. Use this function when the tokenized value can be directly assigned to the schema defined token. The original token and new token must be of the same data type before performing the rename transformation. By using the rename function, the original token is destroyed and replaced with the new token.

Rename function syntax

mutate {
  rename => {
    "originalToken" => "newToken"
  }
}

Rename function example

mutate {
  rename => {
    "proto" => "event.idm.read_only_udm.network.ip_protocol"
    "srcport" => "event.idm.read_only_udm.network.target.port"
  }
}

Replace function

The replace function assigns a value to a token. The assignment can be based on constants, existing field values or a combination of values. The replace function can also be used to define a token declaration. This function can only be used for string values.

Replace function syntax - assign a constant

mutate {
  replace => {
    "token" => "newConstantValue"
  }
}

Replace function syntax - assign a variable value

mutate {
  replace => {
    "token" => "%{otherTokenValue}"
  }
}

Replace function example - assign a constant

mutate {
  replace => {
    "event.idm.read_only_udm.security_result.action" => "ALLOWED"
  }
}

Replace function example - assign a variable value

mutate {
  replace => {
    "shost" => "%{dhost}"
  }
}

Uppercase function

The uppercase function is used to transform a value into an uppercase value.

Uppercase function syntax

mutate {
  uppercase => [ "token" ]
}

Uppercase function example

mutate {
  uppercase => [ "protocol" ]
}

RemoveField Function

The remove_field function destroys a token. The name of the token to be destroyed can be either static or dynamic using existing token values. No action is performed if the token doesn't exist.

RemoveField function syntax - remove a static token

mutate {
  remove_field => [ "token" ]
}

RemoveField function syntax - remove a dynamic token

mutate {
  remove_field => [ "%{someTokenValue}" ]
}

RemoveField Function Example - remove a static token

mutate {
  remove_field => [ "event.webproxy.protocol" ]
}

RemoveField Function Example - remove a dynamic token

mutate {
  remove_field => [ "network.%{application_protocol}" ]
}

Copy Function

The copy function deep copies the value of source token into destination token. There is no restriction on type of value that can be copied. After copying, any change to destination token's value will have no effect on source token's value and vice-versa because the value is deepcopied. Before applying copy function, the source token must exist. Also, if the destination token does not exist, a new token is created otherwise the value of old token is overridden.

Copy syntax

mutate {
  copy => {
    "destinationToken" => "sourceToken"
  }
}

Split Function

The split function splits a string into an iterable array.

mutate {
    split => {
       source => "src_field"
       separator => ","
       target => "target_field"
    }
  }

Transform data using other functions

base64 function

The base64 function converts a base64 encoded value to a string. This function is based on the Go language base64 package.

The source field identifies the variable where the input value is stored. The target value identifies the variable where to store the output. By default, the function uses Standard decoding, but can be configured to use URL decoding.

base64 {
  source => "ip_address"
  target => "ip_address_string"
  encoding => "Standard"
}

The following example converts a base64 encoded IP address. The example checks if the ip_address variable is populated, decodes the value, and then stores the decoded value in the ip_address_string variable. The value stored in the the ip_address_string variable is appended to the event.idm.read_only_udm.target.ip UDM field.

if [ip_address] != "" {
  base64 {
    source => "ip_address"
    target => "ip_address_string"
  }
  mutate {
    merge => {
      "event.idm.read_only_udm.target.ip" => "%{ip_address_string}"
    }
  }
}

Date function

The date function is required to handle the date and timestamp from the log extraction. UDM fields that store a Timestamp require a properly normalized date value. The date function supports a variety of date formats, including ISO8601, UNIX and others along with custom-defined date and time formats.

Google Security Operations supports the following predefined date formats:

  • ISO 8601
  • RFC 3399
  • UNIX
  • UNIX_MS

Google Security Operations also provides a set of system-supplied timestamps that can be used in mapping instructions.

  • @createTimestamp: Always included and represents the time Google Security Operations received the logs.
  • @timestamp: Optional. The timestamp provided by Splunk or PCAP collection, if it exists.
  • @collectionTimestamp: Optional value at the log entry level. This timestamp represents the time the Forwarder collected the log entry. However, this value might not be present for logs that are ingested using the out-of-band processor.

At the end of processing, Google Security Operations uses the timestamp present in the @timestamp field as the timestamp value for all events. By default, the date filter takes precedence to populate the @timestamp field. However, if there is a need to use the log receipt time as the event timestamp, you can use the rename function to rename the @createTimestamp to @timestamp. Best practice is to use the log message and extract the date value. If the log doesn't include a timestamp value, you might need to use createTimestamp for log ingestion.

For UNIX and UNIX-MS date formats, use the on_error statement to handle errors.

Date function syntax

date {
  match => ["token", "format"]
  on_error => "no_match"
}

Date function example

date {
  match => ["when", "yyyy-MM-dd HH:mm:ss"]
  on_error => "no_match"
}

Date example using a timezone specification

date {
  match => ["logtime", "yyyy-MM-dd HH:mm:ss"]
  timezone => "America/New_York"
  on_error => "no_match"
}

Specify multiple date formats

date {
  match => ["ts", "yyyy-MM-dd HH:mm:ss", "UNIX", "ISO8601", "UNIX_MS"] 
  on_error => "no_match"}

Handle timestamps without a year value - rebase option

The rebase option for the date filter supports the ability to handle timestamps without a year value. It sets the year based on the time the data was ingested.

rebase option example

date {
  match => ["when", "MMM dd HH:mm:ss"]
  rebase => true
  on_error => "no_match"
}

Example: Extract a timestamp from a raw log

You can extract a timestamp from a raw log and store the value to a UDM field, such as metadata.collected_timestamp.

Consider the following 1Password raw log containing a timestamp field.

entries:  {
    data:  "{\"country\":\"US\",\"target_user\":
    {\"uuid\":\"FTASPXQHWRF3XMJDLGKWBMZ2LI\",\"name\":\"John Doe\",
    \"email\":\"abc.def.@demo.com\"},\"location\":{\"country\":\"US\",
    \"region\":\"California\",\"city\":\"Hawthorne\",\"latitude\":33.9168,
    \"longitude\":-118.3432},\"category\":\"success\",\"type\":\"mfa_ok\",
    \"details\":null,\"client\":{\"os_name\":\"Windows\",\"os_version\":\"10.
    0\",\"ip_address\":\"2603:8000:7600:c4e1:4db:400b:ff2:6626\",
    \"app_name\":\"1Password Browser Extension\",\"app_version\":\"20216\",
    \"platform_name\":\"Chrome\",\"platform_version\":\"89.0.4389.82\"},
    \"uuid\":\"EPNGUJLHFVHCXMJL5LJQGXTENA\",
    \"session_uuid\":\"UYA65VLTKZAMJAYVODY6BJ36VE\",
    \"timestamp\":\"2022-07-27T22:46:30.312374636Z\"}"
  }

The following is a JSON representation of the raw log.

{
  "country": "US",
  "target_user": {
    "uuid": "FTASPXQHWRF3XMJDLGKWBMZ2LI",
    "name": "Stephanie Badum",
    "email": "abc.def.@demo.com"
  },
  "location": {
    "country": "US",
    "region": "California",
    "city": "Hawthorne",
    "latitude": 33.9168,
    "longitude": -118.3432
  },
  "category": "success",
  "type": "mfa_ok",
  "details": null,
  "client": {
    "os_name": "Windows",
    "os_version": "10.0",
    "ip_address": "2603:8000:7600:c4e1:4db:400b:ff2:6626",
    "app_name": "1Password Browser Extension",
    "app_version": "20216",
    "platform_name": "Chrome",
    "platform_version": "89.0.4389.82"
  },
  "uuid": "EPNGUJLHFVHCXMJL5LJQGXTENA",
  "session_uuid": "UYA65VLTKZAMJAYVODY6BJ36VE",
  "timestamp": "2022-07-27T22:46:30.312374636Z"
}

In the following example, the grok filter is used to extract an ISO 8601 timestamp from the timestamp field in a 1Password raw log and store it in EventTime. The date filter then parses the value in EventTime according to the ISO 8601 date format. The target field is used to designate where the result of the date parsing should be saved. In this case, the parsed date is stored in the event.idm.read_only_udm.metadata.collected_timestamp field.

filter {
  json {
    source => "message"
    array_function => "split_columns"
  }
  grok {
      match => {
        "timestamp" => "%{TIMESTAMP_ISO8601:EventTime}"
      }
      on_error => "time_stamp_failure"
    }
    if [EventTime] != "" {
      date {
        match => ["EventTime", "ISO8601"]
        target => "event.idm.read_only_udm.metadata.collected_timestamp"
      }
    }
}

If the target field is not specified, then the timestamp is mapped to the metadata.event_timestamp UDM field.

Drop function

This function is used to drop all messages that reach this filter logic.

Drop syntax

drop {}

Drop example

if [domain] == "-" {
  drop {}
}

Conditional logic

Conditionals are consistent with the Logstash documentation concerning the usage of conditional statements. However, with parser syntax, only use conditionals as part of the filter logic for event transformation. Currently, the only conditional logic statements available are if, if/else, and if/else if/else.

Operator Syntax
Equal ==
Not equal !=
Less than <
Greater than >
Less than or equal <=
Greater than or equal >=
Regular expression match =~
Regular expression does not match !~

Conditional syntax - If

if [token] == "value" {
 <code block>
}

Conditional syntax - if/else

if [token1] == "value1" and [token2] == "value2" {
  <code block 1>
} else {
  <code block 2>
}

Conditional syntax - if/else if/else

if [token] == "value1" {
  <code block 1>
} else if [token] == "value2" {
  <code block 2>
} else {
  <code block 3>
}

Conditional syntax - If example

if [protocol] == "tcp" or [protocol] == "udp" or [protocol] == "icmp" {
  mutate {
   uppercase => [ "protocol" ]
 }
}

Conditional syntax - if/else if/else example

if [action] == "drop" or [action] == "deny" or [action] == "drop ICMP" {
  mutate {
    replace => {
    "event.idm.read_only_udm.security_result.action" => "BLOCK"
    }
  }
} else if [action] == "allow" {
  mutate {
    replace => {
      "event.idm.read_only_udm.security_result.action" => "ALLOW"
}
  }
} else {
  mutate {
    replace => {
      "event.idm.read_only_udm.security_result.action" => "UNKNOWN_ACTION"
    }
  }
}

Error handling - on_error

Set the on_error property on any filter to catch errors. This property sets a value to true if an error was encountered, and false otherwise.

on_error syntax

on_error => "<value>"

On_Error function example

For example, you can use the on_error function to check if a value is an IP address without causing a failure. A use case for this function is to determine if the field value is an IP address or hostname, and then handle the field value appropriately.

mutate {
  convert => {
    "host" => "ipaddress"
  }
  on_error => "is_not_ip"
}

if [is_not_ip] {
  # This means it's not an IP
}

Output data to a UDM record

Use the merge function to generate output. It is possible to generate more than one event message based on a single log line. For example, you might want to create the web proxy event message, but also generate an alert message.

Generating output - single event

mutate {
  merge => {
    "@output" => "event"
 }
}

Generating output - multiple events

mutate {
  merge => {
    "@output" => "event1"
 }
}
mutate {
  merge => {
    "@output" => "event2"
 }
}

When generating multiple events as output, instead of assigning field values to event.*, use a designation such as event1.* and event2.* to differentiate between the values assigned to the first event versus the second event.

Validate data using statedump plugin

Use the statedump filter plugin to validate the internal state of a parser. The filter shows all the values set by the parser during troubleshooting. You can use multiple blocks of statedump filters using the label property.

You can use the statedump filter for troubleshooting only. You must remove the statedump filter blocks before validation.

Statedump filter example with label

statedump {
  label => "foo"
}

Statedump output example:

Internal State (label=foo):
{
  "@createTimestamp": {
    "nanos": 0,
    "seconds": 1693549534
  },
  "@enableCbnForLoop": true,
  "@onErrorCount": 0,
  "@output": [],
  "@timezone": "",
  "event": {
    "idm": {
      "read_only_udm": {
        "metadata": {
          "event_type": "GENERIC_EVENT"
        }
      }
    }
  },
  "message": "my sample log"
}

The statedump filter includes the following options:

  • @createTimestamp - The time when this dump was created.

  • @enableCbnForLoop - Internal flag.

  • @onErrorCount - Number of errors discovered so far.

  • @output - The final output from the parser.

  • @timezone - The offset from UTC for log entries.

  • event - The event mapping done by the parser.

  • message - The log message which the parser is run against.