Using parser extensions

Google Security Operations provides multiple methods to define how data in original raw logs are parsed and normalized to a Unified Data Model (UDM) record.

  • Default parsers: Prebuilt data mapping instructions managed by Google Security Operations that map original raw log data to UDM fields.
  • Custom parsers: Custom data mapping instructions created and managed by a customer that meet the specific data parsing needs of the customer.
  • Parser extensions: Additional data mapping instructions that extend a default or custom parser to map additional fields in the original raw log. This does not completely replace a default or custom parser, but extends the existing mapping instructions in a default or custom parser.

This document describes how to use parser extensions.

Before you begin

The following documents explain prerequisite concepts that are important to know when working with parser extensions:

About parser extensions

A parser extension lets you to create additional mapping instructions beyond those defined in a default or custom parser to satisfy a unique use case. This capability is intended to extend an existing default or custom parser. A parser extension does not replace a default or custom parser. You cannot create a new parser using a parser extension.

The parser extension reads the original raw log and inserts the extracted values into specific fields in the UDM record. The UDM record contains data that is set by both the default or custom parser and the parser extension.

Data mapping instructions in a parser extension take precedence over those in a default or custom parser. If there is a conflict in mapping instructions, the parser extension will overwrite a value set by the default or custom parser. For example, if the default parser maps a raw log field to the event.metadata.description UDM field and the parser extension maps a different raw log field to the same UDM field, then the parser extension will overwrite the value set by the default parser. An exception is with repeated fields. You can configure the parser extension to append values when writing data to a repeated field.

You create one parser extension per log type. Each log type is identified by a unique ingestion label. See Supported default parsers for a list of log types.

To create a parser extension, Google Security Operations must be able to ingest and normalize the original raw logs using a default or custom parser. The parser extension extracts additional data from the original raw log, then merges it into the UDM record.

Parser extensions support the following types of mapping instructions:

  • Code snippet type: You write parser code similar to default and custom parsers. The original raw logs can be in any of the supported data formats for the log type.
  • Data field type: You specify the origin and destination fields in the application interface. The original raw logs must be formatted as either of the following:
    • Native JSON, native XML, or CSV.
    • Syslog header plus native JSON, native XML, or CSV. You can create a data field type mapping instruction for a subset of Log Types in Supported default parsers. Look for those with the format JSON, XML, CSV, SYSLOG + JSON, SYSLOG + XML, and SYSLOG + CSV.

Keep in mind the following as you create a parser extension:

  • Data can be mapped to any UDM field that supports the standard data types and repeated values.
  • You cannot map data to the following UDM fields:
    • event.idm.read_only_udm.additional
    • event.idm.graph.entity.additional
  • Before creating a data mapping instruction, make sure your Google Security Operations instance has ingested original raw logs in the last 30 days for the log type, and that these raw logs contain the field you plan to define in the precondition criteria. These original raw logs are used to validate the data mapping instructions.
  • After a parser extension is live, it will begin parsing incoming data. You cannot parse raw log data retroactively.

Lifecycle of a parser extension

Parser extensions have a lifecycle with the following states:

  • DRAFT: Newly created parser extension which has not yet been submitted.
  • VALIDATING: Google Security Operations is validating the mapping instructions against existing raw logs to ensure that fields are parsed with no errors.
  • LIVE: The parser extension passed validation and is now in production. It is extracting and transforming data from incoming raw logs into UDM records.
  • FAILED: The parser extension failed validation.

Open the Parser Extensions page

Perform the steps in one of the following sections to access the Parser Extensions page.

Start from the navigation bar

  1. In the navigation bar, select Settings, SIEM Settings, and then Parsers.
  2. Identify the log type that you want to extend in the Parsers table.
  3. Navigate to that row, then click the Menu.
  4. Click Create Extension.
  1. Use Raw Log Search to search for records similar to those that will be parsed.
  2. Select an event from the Events > Timeline panel.
  3. Expand the Event Data panel.
  4. Click the Manage Parser button.
  5. In the Manage Parser dialog, select Create Extension, and then click Next. The Parser Extensions page opens in edit mode. You can begin defining the parser extension.

Create a new parser extension

This section describes how to create a parser extension after you open the Parser Extensions page. The available fields on the Parser Extensions page will differ, depending on the structure of the raw log.

  1. Review the example raw sign in the Raw Log panel to confirm that it is representative of logs the parser extension will process. Use the example raw log as reference when creating the parser extension.

    • if you navigated to the Parser Extensions page from Raw Log Search, the panel displays the original raw log you selected in the search results.

    • if you navigated to the Parser Extensions page in the navigation bar, a sample raw log for that log type is displayed.

  2. Choose the Extension Method. Select one of the following:

    • Map data fields: Create a data field mapping. Use the application fields to define the original raw log field and the destination UDM field.

    • Write code snippet: Create a code snippet for all supported log formats. The code snippet uses the same parser syntax as default and custom parsers. For more information about parser syntax, see Parser syntax.

Continue with one of the following subsections specific to the selected Extension Method.

Create a data field mapping instruction

Create a data field mapping instruction when the incoming raw logs are in JSON, XML, CSV, Syslog header plus JSON, Syslog header plus XML, or Syslog header plus CSV format. Define the path to the original field name and the destination UDM field in the data mapping instruction.

  1. In the Repeated Fields selector, specify how the parser extension saves a value to fields that support an array of values.

    • Append Values: The value is appended to the existing set of values stored in the field.
    • Replace Values: The value replaces all previously stored values with the new value.

    Some UDM fields, such as principal.ip and entity.asset.hostname, store an array of values. These repeated fields are identified by the label repeated in the Unified Data model field list. For more detailed information, see Repeated Fields selector.

  2. If the Syslog and Target fields appear, Google Security Operations detected that the raw log includes a Syslog header.

    If Google Security Operations identifies that the example raw log is not native JSON, native XML, or CSV and has a Syslog header, it displays the Syslog and Target fields. Use the following fields to define a Grok and regular expression patterns that pre-preprocess the Syslog header and extracts the structured portion of the log. The structured portion of the log can be mapped using data fields.

    • Syslog field: Specify the extraction pattern, using Grok and regular expressions, that identifies the Syslog header and the raw log message.
    • Target field: Specify the variable name in the extraction pattern that stores the structured portion of the log.

    For information about how to define an extraction pattern using Grok and regular expressions, see Define the Syslog extractor fields.

    The following image provides an example of how to add an extraction pattern and variable name to the Syslog and Target fields, respectively.

    Syslog extractor fields

    Both the Syslog and Target fields are required and work together to separated the Syslog header from the structured portion of the log.

  3. After entering values in the Syslog and Target fields, click the Validate button. The validation process checks for both syntax and parsing errors, then returns either of the following:

    • Success: The data mapping fields appear. Define the remainder of the parser extension.
    • Failure: An error message appears. Correct the error condition before continuing.
  4. Optionally, define a precondition instruction.

    The precondition instruction identifies only a subset of original raw logs that the parser extension processes by matching a static value to a field in the raw log. If an incoming raw log meets the precondition criteria, meaning the values match, then the parser extension applies the mapping instruction. If the values don't match, the parser extension does not apply the mapping instruction.

    Complete the following fields:

    • Precondition Field: Enter either the full path to the field if the log data format is JSON or XML, or the column position if the data format is CSV.
    • Precondition Operator: Select either EQUALS or NOT EQUALS.
    • Precondition Value: Enter the value that must match data in the Precondition Field.
  5. Define the data mapping instruction:

    • Raw Data Field: Enter either the full path to the field if the log data format is JSON or XML, or the column position if the data format is CSV.
    • Destination Field: Enter the fully qualified UDM field name where the value will be stored, for example udm.metadata.collected_timestamp.seconds.
  6. Click Submit to save the mapping instruction.

  7. Google Security Operations validates the mapping instruction.

    • If the validation process succeeds, the state changes to Live and the mapping instruction begins processing incoming log data.
    • If the validation process fails, the state changes to Failed and an error is displayed in the Raw Log field.

    The following is an example validation error.

    ERROR: generic::unknown: pipeline.ParseLogEntry failed: LOG_PARSING_CBN_ERROR:
    "generic::invalid_argument: pipeline failed: filter mutate (7) failed: copy failure:
    copy source field \"jsonPayload.dest_instance.region\" must not be empty
    (try using replace to provide the value before calling copy)
    
    "LOG: {"insertId":"14suym9fw9f63r","jsonPayload":{"bytes_sent":"492",
    "connection":{"dest_ip":"10.12.12.33","dest_port":32768,"protocol":6,
    "src_ip":"10.142.0.238","src_port":22},"end_time":"2023-02-13T22:38:30.490546349Z",
    "packets_sent":"15","reporter":"SRC","src_instance":{"project_id":"example-labs",
    "region":"us-east1","vm_name":"example-us-east1","zone":"us-east1-b"},
    "src_vpc":{"project_id":"example-labs","subnetwork_name":"default",
    "vpc_name":"default"},"start_time":"2023-02-13T22:38:29.024032655Z"},
    "logName":"projects/example-labs/logs/compute.googleapis.com%2Fvpc_flows",
    "receiveTimestamp":"2023-02-13T22:38:37.443315735Z","resource":{"labels":
    {"location":"us-east1-b","project_id":"example-labs",
      "subnetwork_id":"00000000000000000000","subnetwork_name":"default"},
      "type":"gce_subnetwork"},"timestamp":"2023-02-13T22:38:37.443315735Z"}
    

See Fields in a data mapping instruction for a list of all possible fields in a parser extension.

Fields in a data mapping instruction

This section describes all fields that can be set in a parser extension.

Field name Description
Syslog A user-defined pattern that preprocesses and separates a Syslog header from the structured potion of a raw log.
Target Variable name in the Syslog field that stores the structured portion of the log.
Precondition Field Field identifier in the raw log that contains the value to be compared. Used in a precondition instruction.
Precondition Operator Select either EQUALS or NOT EQUALS. Used in a precondition instruction.
Precondition Value The static value that will be compared with the Precondition Field in the raw log. Used in a precondition instruction.
Raw Data Field

Used in a mapping instruction.

If the data format is JSON, define the path to the field, for example: jsonPayload.connection.dest_ip.

If the data format is XML, define the full path to the field, for example: /Event/Reason-Code.

If the data format is CSV, define the index position of the column. Index positions start at 1.

Destination Field

Used in a mapping instruction.

Define the full path to the UDM field where data will be stored. For example:

udm.network.dhcp.opcode

or

graph.entity.asset.hostname

Create a code snippet mapping instruction

A code snippet uses Logstash-like syntax to define how to extract and transform values in the original raw log and assign them to the UDM record. A code snippet uses the same syntax and sections of the instruction as a default or custom parser. The sections in a code snippet are the following:

  • Section 1. Extract the data from the original log.
  • Section 2. Transform the extracted data.
  • Section 3. Assign one or more values to a UDM field.
  • Section 4. Bind UDM event fields to the @output key.

The following example illustrates a code snippet.

Here is an example raw log:

{
    "insertId": "00000000",
    "jsonPayload": {
        ...section omitted for brevity...
        "packets_sent": "4",
        ...section omitted for brevity...
    },
    "timestamp": "2022-05-03T01:45:00.150614953Z"
}

Here is a sample code snippet that maps the value in jsonPayload.packets_sent to the network.sent_bytes UDM field.

filter {
    # Section 1. extract the data from the original JSON log
    json {
        source => "message"
        array_function => "split_columns"
    }

   # Section 2. transform the extracted data
    mutate {
      convert => {
        "jsonPayload.packets_sent" => "uinteger"
      }
    }

    # Section 3. assign the value to a UDM field
    mutate {
        copy => {
          "event.idm.read_only_udm.network.sent_bytes" => "jsonPayload.packets_sent"
        }
    }

    # Section 4. Bind the UDM fields to the @output key
    mutate {
      merge => {
          "@output" => "event"
      }
    }
}
  1. Click Submit to save the mapping instruction.

  2. Google Security Operations validates the mapping instruction.

    • If the validation process succeeds, the state changes to Live and the mapping instruction begins processing incoming log data.
    • If the validation process fails, the state changes to Failed and an error is displayed in the Raw Log field.

View an existing parser extension

  1. In the navigation bar, select Settings, SIEM Settings, and then Parsers.
  2. In the Parsers list, identify the log type with a parser extension. This is identified by text EXTENSION next to the name.
  3. Navigate to that row, then click the Menu.
  4. Click View Extension.
  5. The View Custom/Prebuilt Parser > Extension tab appears with details about the parser extension. The summary panel displays the LIVE parser extension by default. if there is a

Edit a parser extension

  1. Open the View Custom/Prebuilt Parser > Extension tab. See View an existing parser extension for instructions about how to open the page.
  2. Click the Edit Extension button. The Parser Extensions page appears.
  3. Edit the paser extension.
    • To cancel editing and discard changes, click Discard Draft.
  4. When you are finished editing the parser extension, click Submit.
  5. If you submit the change, the validation process runs to validate the new configuration.

Delete a parser extension

  1. Open the View Custom/Prebuilt Parser > Extension tab. See View an existing parser extension for instructions about how to open that page.

  2. Click the Edit Extension button. The Parser Extensions page appears.

  3. Click the Delete Extension button.

While editing a parser extension, you can delete it at any time. Click either of the following:

  • Discard Draft
  • Delete Failed Extension

More about the repeated fields selector

Some UDM fields store an array of values, such as principal.ip and Entity.asset.hostname. If you create a parser extension to store data in a repeated field, this option lets you to control whether the value is appended to the array or whether it replaces all existing values set by the default parser. Repeated fields are identified by the label repeated in the Unified Data Model field list.

If Appends values is selected, then the parser extension appends the extracted value to the array of existing values in the UDM field. If Replace Values is selected, then the parser extension replaces the array of existing values in the UDM field with the extracted value. The Repeated Fields selector does not affect how data is stored in fields that are not repeated.

A parser extension can map data to a repeated field only when the repeated field is at the lowest level of the hierarchy. For example, mapping values to udm.principal.ip is supported because the repeated ip field is at the lowest level of the hierarchy and principal is not a repeated field. Mapping values to udm.intermediary.hostname is not supported because intermediary is a repeated field and is not at the lowest level of the hierarchy.

The following table provides examples of how the Repeated Fields configuration affects the generated UDM record.

Repeated Fields selection Example log Parser extension configuration Generated result
Append Values {"protoPayload":{"@type":"type.AuditLog","authenticationInfo":{"principalEmail":"admin@cmmar.co"},"requestMetadata":{"callerIp":"1.1.1.1, 2.2.2.2"}}} Precondition Field: protoPayload.requestMetadata.callerIp
Precondition Value: " "
Precondition Operator: NOT_EQUALS
Raw Data Field: protoPayload.requestMetadata.callerIp
Destination Field: event.idm.read_only_udm.principal.ip
metadata:{event_timestamp:{}.....}principal:{Ip:"1.1.1.1, 2.2.2.2"} } }
Append Values {"protoPayload":{"@type":"type.AuditLog","authenticationInfo":{"principalEmail":"admin@cmmar.co"},"requestMetadata":{"callerIp":"2.2.2.2, 3.3.3.3", "name":"Akamai Ltd"}}} Precondition 1:
Precondition Field:protoPayload.requestMetadata.callerIp
Precondition Value: " "
Precondition Operator: NOT_EQUALS
Raw Data Field: protoPayload.requestMetadata.callerIp
Destination Field: event.idm.read_only_udm.principal.ip

Precondition 2:
Raw Data Field: protoPayload.requestMetadata.name
Destination Field: event.idm.read_only_udm.metadata.product_name

Events generated by prebuilt parser before applying extension.
metadata:{event_timestamp:{} ... principal:{ip:"1.1.1.1"}}}

Output after applying extension.
timestamp:{} idm:{read_only_udm:{metadata:{event_timestamp:{} .... product_name: "Akamai Ltd"}principal:{ip:"1.1.1.1, 2.2.2.2, 3.3.3.3"}}}

Replace Values {"protoPayload":{"@type":"type..AuditLog","authenticationInfo":{"principalEmail":"admin@cmmar.co"},"requestMetadata":{"callerIp":"2.2.2.2"}}} Precondition Field: protoPayload.authenticationInfo.principalEmail
Precondition Value: " "
Precondition Operator: NOT_EQUALS
Raw Data Field: protoPayload.authenticationInfo.principalEmail
Destination Field: event.idm.read_only_udm.principal.ip
UDM events generated by prebuilt parser before applying extension.
timestamp:{} idm:{read_only_udm:{metadata:{event_timestamp:{} ... principal:{ip:"1.1.1.1"}}}

UDM output after applying extension timestamp:{} idm:{read_only_udm:{metadata:{event_timestamp:{} ....} principal:{ip:"2.2.2.2"}}}

More about the Syslog extractor fields

The Syslog extractor fields enable you to separate the Syslog header from a structured log by defining the Grok, regular expression, plus a named token in the regular expression pattern to store the output.

Define the Syslog extractor fields

Values in the Syslog and Target fields work together to define how the parser extension separates the Syslog header from the structured portion of a raw log. In the Syslog field, you define an expression using a combination of Grok and regular expression syntax. The expression includes a variable name that identifies the structured portion of the raw log. In the Target field, you specify that variable name.

The following example illustrates how these fields work together.

The following is an example raw log:

<13>1 2022-09-14T15:03:04+00:00 fieldname fieldname - - - {"timestamp": "2021-03-14T14:54:40.842152+0000","flow_id": 1885148860701096, "src_ip": "10.11.22.1","src_port": 51972,"dest_ip": "1.2.3.4","dest_port": 55291,"proto": "TCP"}

The raw log contains the following sections:

  • Syslog header: <13> 2022-09-14T15:03:04+00:00 fieldname fieldname - - -

  • JSON formatted event: {"timestamp": "2021-03-14T14:54:40.842152+0000","flow_id": 1885148860701096, "src_ip": "10.11.22.1","src_port": 51972,"dest_ip": "1.2.3.4","dest_port": 55291,"proto": "TCP"}

To separate the Syslog header from the JSON portion of the raw log, use the following example expression in the Syslog field: %{TIMESTAMP_ISO8601} %{WORD} %{WORD} ([- ]+)?%{GREEDYDATA:msg}

  • This portion of the expression identifies the Syslog header: %{TIMESTAMP\_ISO8601} %{WORD} %{WORD} ([- ]+)?
  • This portion of the expression captures the JSON segment of the raw log: %{GREEDYDATA:msg}

This example includes the variable name msg. You choose the variable name. The parser extension extracts the JSON segment of the raw log and assigns it to the variable msg.

In the Target field, enter the variable name msg. The value stored in the msg variable is input to the data field mapping instructions you create in the parser extension.

Using the example raw log, the following segment is input to data mapping instruction:

{"timestamp": "2021-03-14T14:54:40.842152+0000","flow_id": 1885148860701096, "src_ip": "10.11.22.1","src_port": 51972,"dest_ip": "1.2.3.4","dest_port": 55291,"proto": "TCP"}

The following image shows the completed Syslog and Target fields:

Syslog extractor fields

The following table provides additional examples with sample logs, the Syslog extraction pattern, Target variable name, and the result.

Sample raw log Syslog field Target field Result
<13>1 2022-07-14T15:03:04+00:00 suricata suricata - - - {\"timestamp\": \"2021-03-14T14:54:40.842152+0000\",\"flow_id\": 1885148860701096,\"in_iface\": \"enp94s0\",\"event_type\": \"alert\",\"vlan\": 522,\"src_ip\": \"1.1.2.1\",\"src_port\": 51972,\"dest_ip\": \"1.2.3.4\",\"dest_port\": 55291,\"proto\": \"TCP\"}" %{TIMESTAMP_ISO8601} %{WORD} %{WORD} ([- ]+)?%{GREEDYDATA:msg} msg field_mappings { field: "msg" value: "{\"timestamp\": \"2021-03-14T14:54:40.842152+0000\",\"flow_id\": 1885148860701096,\"in_iface\": \"enp94s0\",\"event_type\": \"alert\",\"vlan\": 522,\"src_ip\": \"1.1.2.1\",\"src_port\": 51972,\"dest_ip\": \"1.2.3.4\",\"dest_port\": 55291,\"proto\": \"TCP\"}" }
<13>1 2022-07-14T15:03:04+00:00 suricata suricata - - - {\"timestamp\": \"2021-03-14T14:54:40.842152+0000\"} - - - {\"timestamp\": \"2021-03-14T14:54:40.842152+0000\",\"flow_id\": 1885148860701096,\"in_iface\": \"enp94s0\",\"event_type\": \"alert\",\"vlan\": 522,\"src_ip\": \"1.1.2.1\",\"src_port\": 51972,\"dest_ip\": \"1.2.3.4\",\"dest_port\": 55291,\"proto\": \"TCP\"} %{TIMESTAMP_ISO8601} %{WORD} %{WORD} ([- ]+)?%{GREEDYDATA:msg1} ([- ]+)?%{GREEDYDATA:msg2} msg2 field_mappings { field: "msg2" value: "{\"timestamp\": \"2021-03-14T14:54:40.842152+0000\",\"flow_id\": 1885148860701096,\"in_iface\": \"enp94s0\",\"event_type\": \"alert\",\"vlan\": 522,\"src_ip\": \"1.1.2.1\",\"src_port\": 51972,\"dest_ip\": \"1.2.3.4\",\"dest_port\": 55291,\"proto\": \"TCP\"}" }
"<13>1 2022-07-14T15:03:04+00:00 suricata suricata - - - {\"timestamp\": \"2021-03-14T14:54:40.842152+0000\"} - - - {\"timestamp\": \"2021-03-14T14:54:40.842152+0000\",\"flow_id\": 1885148860701096,\"in_iface\": \"enp94s0\",\"event_type\": \"alert\",\"vlan\": 522,\"src_ip\": \"1.1.2.1\",\"src_port\": 51972,\"dest_ip\": \"1.2.3.4\",\"dest_port\": 55291,\"proto\": \"TCP\"}" %{TIMESTAMP_ISO8601} %{WORD} %{WORD} ([- ]+)?%{GREEDYDATA:message} ([- ]+)?%{GREEDYDATA:msg2} msg2 Error - message already exists in state and not overwritable.

Control access to parser extensions

New permissions are available that control who can view and manage parser extensions. By default, parser extensions can be accessed by users with the Administrator role. For more information about managing Users and Groups, or assigning roles, see Role Based Access Control for more information.

The new roles in Google Security Operations are summarized in the following table.

Feature Action Description
Parser Delete Delete parser extensions.
Parser Edit Create and edit parser extensions.
Parser View View parser extensions.