YARA-L 2.0 language syntax

This section describes the major elements of the YARA-L syntax. See also Overview of the YARA-L 2.0 language.

Rule structure

For YARA-L 2.0, you must specify variable declarations, definitions, and usages in the following order:

  1. meta
  2. events
  3. match (optional)
  4. outcome (optional)
  5. condition
  6. options (optional)

The following illustrates the generic structure of a rule:

rule <rule Name>
{
  meta:
    // Stores arbitrary key-value pairs of rule details, such as who wrote
    // it, what it detects on, version control, etc.

  events:
    // Conditions to filter events and the relationship between events.

  match:
    // Values to return when matches are found.

  outcome:
    // Additional information extracted from each detection.

  condition:
    // Condition to check events and the variables used to find matches.

  options:
    // Options to turn on or off while executing this rule.
}

Comments

Designate comments with two slash characters (// comment) or multi-line comments set off using slash asterisk characters (/* comment */), as you would in C.

Literals

Nonnegative integers (without decimal points), string, boolean, and regex literals are supported.

String and regex literals

You can use either of the following quotation characters to enclose strings in YARA-L 2.0. However, quoted text is interpreted differently depending on which one you use.

  1. Double quotes (") — Use for normal strings. Must include escape characters.
    For example: "hello\tworld" —\t is interpreted as a tab

  2. Back quotes (`) — Use to interpret all characters literally.
    For example: `hello\tworld` —\t is not interpreted as a tab

For regular expressions, you have two options.

If you want to use regular expressions directly without the re.regex() function, use /regex/ for the regular expression literals.

You can also use string literals as regex literals when you use the re.regex() function. Note that for double quote string literals, you must escape backslash characters with backslash characters, which can look awkward.

For example, the following regular expressions are equivalent:

  • re.regex($e.network.email.from, `.*altostrat\.com`)
  • re.regex($e.network.email.from, ".*altostrat\\.com")
  • $e.network.email.from = /.*altostrat\.com/

Google recommends using back quote characters for strings in regular expressions for ease of readability.

Operators

You can use the following operators in YARA-L:

Operator Description
= equal/declaration
!= not equal
< less than
<= less than or equal
> greater than
>= greater than or equal

Variables

In YARA-L 2.0, all variables are represented as $<variable name>.

You can define the following types of variables:

  • Event variables — Represent groups of events in normalized form (UDM) or entity events. Specify conditions for event variables in the events section. You identify event variables using a name, event source, and event fields. Allowed sources are udm (for normalized events) and graph (for entity events). If the source is omitted, udm is set as the default source. Event fields are represented as a chain of .<field name> (for example, $e.field1.field2). Event field chains always start from the top-level source (UDM or Entity).

  • Match variables — Declare in the match section. Match variables become grouping fields for the query, as one row is returned for each unique set of match variables (and for each time window). When the rule finds a match, the match variable values are returned. Specify what each match variable represents in the events section.

  • Placeholder variables — Declare and define in the events section. Placeholder variables are similar to match variables. However, you can use placeholder variables in the condition section to specify match conditions.

Use match variables and placeholder variables to declare relationships between event fields through transitive join conditions (see Events Section Syntax for more detail).

Keywords

Keywords in YARA-L 2.0 are case-insensitive. For example, and or AND are equivalent. Variable names must not conflict with keywords. For example, $AND or $outcome is invalid.

The following are keywords for detection engine rules: rule, meta, match, over, events, condition, outcome, options, and, or, not, nocase, in, regex, cidr, before, after, all, any, if, max, min, sum, array, array_distinct, count, and count_distinct.

Maps

Structs and Labels

Some UDM fields use either the Struct or Label data type.

To search for a specific key-value pair in both Struct and Label, use the standard map syntax:

// A Struct field.
$e.udm.additional.fields["pod_name"] = "kube-scheduler"
// A Label field.
$e.metadata.ingestion_labels["MetadataKeyDeletion"] = "startup-script"

Supported cases

Events and Outcome Section
// Using a Struct field in the events section
events:
  $e.udm.additional.fields["pod_name"] = "kube-scheduler"

// Using a Label field in the outcome section
outcome:
  $value = array_distinct($e.metadata.ingestion_labels["MetadataKeyDeletion"])
Assigning a map value to a Placeholder
$placeholder = $u1.metadata.ingestion_labels["MetadataKeyDeletion"]
Using a map field in a join condition
// using a Struct field in a join condition between two udm events $u1 and $u2
$u1.metadata.event_type = $u2.udm.additional.fields["pod_name"]

Unsupported cases

Combining any or all keywords with a map

For example, the following is not currently supported: all $e.udm.additional.fields["pod_name"] = "kube-scheduler"

Duplicate value handling

Map accesses always returns a single value. In the uncommon edge case that the map access could refer to multiple values, the map access will deterministically return the first value.

This can happen in either of the following cases:

  • A label has a duplicate key.

    The label structure represents a map, but does not enforce key uniqueness. By convention, a map should have unique keys, so Chronicle does not recommend populating a label with duplicate keys.

    The rule text $e.metadata.ingestion_labels["dupe-key"] would return the first possible value, val1, if run over the following data example:

    // Disrecommended usage of label with a duplicate key:
    event {
      metadata{
        ingestion_labels{
          key: "dupe-key"
          value: "val1" // This is the first possible value for "dupe-key"
        }
        ingestion_labels{
          key: "dupe-key"
          value: "val2"
        }
      }
    }
    
  • A label has an ancestor repeated field.

    A repeated field might contain a label as a child field. Two different entries in the top-level repeated field might contain labels that have the same key. The rule text $e.security_result.rule_labels["key"] would return the first possible value, val3, if run over the following data example:

    event {
      // security_result is a repeated field.
      security_result {
        threat_name: "threat1"
        rule_labels {
          key: "key"
          value: "val3" // This is the first possible value for "key"
        }
      }
      security_result {
        threat_name: "threat2"
        rule_labels {
          key: "key"
          value: "val4"
        }
      }
    }
    

Functions

This section describes the YARA-L 2.0 functions that Chronicle supports in Detection Engine.

These functions can be used in the following areas in a rule:

String functions

Chronicle supports the following string manipulation functions:

  • strings.concat(a, b)
  • strings.coalesce(a, b)
  • strings.to_lower(stringText)
  • strings.to_upper(stringText)
  • strings.base64_decode(encodedString)

The following sections describe how to use each.

Concatenate strings or integers

Returns the concatenation of two strings, two integers, or a combination of the two.

strings.concat(a, b)

This function takes two arguments, that can be either strings or integers, and returns the two values concatenated as a string. Integers are cast to a string before concatenation. The arguments can be literals or event fields. If both arguments are fields, the two attributes must be from the same event.

The following example includes a string variable and string literal as arguments.

"google-test" = strings.concat($e.principal.hostname, "-test")

The following example includes a string variable and integer variable as arguments. Both principal.hostname and principal.port are from the same event, $e, and are concatenated to return a string.

"google80" = strings.concat($e.principal.hostname, $e.principal.port)

The following example attempts to concatenate principal.port from event $e1, with principal.hostname from event $e2. It will return a compiler error because the arguments are different event variables.

// returns a compiler error
"test" = strings.concat($e1.principal.port, $e2.principal.hostname)

Coalesce string values

Returns the value of the first expression that does not evaluate to an empty string (for example, "non-zero value"). If both arguments evaluate to an empty string, the function call returns an empty string.

strings.coalesce(a, b)

The arguments can be literals, event fields, or function calls. Both arguments must be of STRING type. If both arguments are fields, the two attributes must be from the same event.

The following example includes a string variable and string literal as arguments. The condition evaluates to true when (1) $e.network.email.from is suspicious@gmail.com or (2) $e.network.email.from is empty and $e.network.email.to is suspicious@gmail.com.

"suspicious@gmail.com" = strings.coalesce($e.network.email.from, $e.network.email.to)

The following example includes nested coalesce calls. This condition compares the first non-null IP address from event $e against values in the reference list ip_watchlist. The order that the arguments are coalesced in this call is the same as the order they are enumerated in the rule condition:

  1. $e.principal.ip is evaluated first.
  2. $e.src.ip is evaluated next.
  3. $e.target.ip is evaluated next.
  4. Finally, the string "No IP" is returned as a default value if the previous IP fields are unset.
strings.coalesce(
  strings.coalesce($e.principal.ip, $e.src.ip),
  strings.coalesce($e.target.ip, "No IP")
) in %ip_watchlist

The following example attempts to coalesce principal.hostname from event $e1 and event $e2. It will return a compiler error because the arguments are different event variables.

// returns a compiler error
"test" = strings.coalesce($e1.principal.hostname, $e2.principal.hostname)

Convert string to uppercase or lowercase

These functions return string text after changing all characters to either uppercase or lowercase.

  • strings.to_lower(stringText)
  • strings.to_upper(stringText)
"test@google.com" = strings.to_lower($e.network.email.from)
"TEST@GOOGLE.COM" = strings.to_upper($e.network.email.to)

Base64 decode a string

Returns a string containing the base64 decoded version of the encoded string.

strings.base64_decode(encodedString)

This function takes one base64 encoded string as an argument. If encodedString is not a valid base64 encoded string, the function returns encodedString as-is.

This example returns True if principal.domain.name is "dGVzdA==", which is base64 encoding for the string "test".

"test" = strings.base64_decode($e.principal.domain.name)

RegExp functions

Chronicle supports the following regular expression functions:

  • re.regex(stringText, regex)
  • re.capture(stringText, regex)
  • re.replace(stringText, replaceRegex, replacementText)

RegExp match

You can define regular expression matching in YARA-L 2.0 using either of the following syntax:

  • Using YARA syntax — Related to events. The following is a generic representation of this syntax: $e.field = /regex/
  • Using YARA-L syntax — As a function taking in the following parameters:
    • Field the regular expression is applied to.
    • Regular expression specified as a string. You can use the nocase modifier after strings to indicate that the search should ignore capitalization. The following is a generic representation of this syntax: re.regex($e.field, `regex`)

Be aware of the following while defining regular expressions in YARA-L 2.0:

  • In either case, the predicate is true if the string contains a substring that matches the regular expression provided. It is unnecessary to add .* to the beginning or at the end of the regular expression.
  • To match the exact string or only a prefix or suffix, include the ^ (starting) and $ (ending) anchor characters in the regular expression. For example, /^full$/ matches "full" exactly, while /full/ could match "fullest", "lawfull", and "joyfully".
  • If the UDM field includes newline characters, the regexp only matches the first line of the UDM field. To enforce full UDM field matching, add a (?s) to the regular expression. For example, replace /.*allUDM.*/ with /(?s).*allUDM.*/.

RegExp capture

Captures (extracts) data from a string using the regular expression pattern provided in the argument.

re.capture(stringText, regex)

This function takes two arguments:

  • stringText: the original string to search.
  • regex: the regular expression indicating the pattern to search for.

The regular expression can contain 0 or 1 capture groups in parentheses. If the regular expression contains 0 capture groups, the function returns the first entire matching substring. If the regular expression contains 1 capture group, it returns the first matching substring for the capture group. Defining two or more capture groups returns a compiler error.

In this example, if $e.principal.hostname contains "aaa1bbaa2" the following would be True, because the function returns the first instance. This example has no capture groups.

"aaa1" = re.capture($e.principal.hostname, "a+[1-9]")

This example captures everything after the @ symbol in an email. If the $e.network.email.from field is test@google.com, the example returns google.com. This example contains one capture group.

"google.com" = re.capture($e.network.email.from , "@(.*)")

If the regular expression does not match any substring in the text, the function returns an empty string. You can omit events where no match occurs by excluding the empty string, which is especially important when you are using re.capture() with an inequality:

// Exclude the empty string to omit events where no match occurs.
"" != re.capture($e.network.email.from , "@(.*)")

// Exclude a specific string with an inequality.
"google.com" != re.capture($e.network.email.from , "@(.*)")

RegExp replacement

Performs a regular expression replacement.

re.replace(stringText, replaceRegex, replacementText)

This function takes three arguments:

  • stringText: the original string.
  • replaceRegex: the regular expression indicating the pattern to search for.
  • replacementText: The text to insert into each match.

Returns a new string derived from the original stringText, where all substrings that match the pattern in replaceRegex are replaced with the value in replacementText. You can use backslash-escaped digits (\1 to \9) within replacementText to insert text matching the corresponding parenthesized group in the replaceRegex pattern. Use \0 to refer to the entire matching text.

The function replaces non-overlapping matches and will prioritize replacing the first occurrence found. For example, re.replace("banana", "ana", "111") returns the string "b111na".

This example captures everything after the @ symbol in an email, replaces com with org, and then returns the result. Notice the use of nested functions.

"email@google.org" = re.replace($e.network.email.from, "com", "org")

This example uses backslash-escaped digits in the replacementText argument to reference matches to the replaceRegex pattern.

"test1.com.google" = re.replace(
                       $e.principal.hostname, // holds "test1.test2.google.com"
                       "test2\.([a-z]*)\.([a-z]*)",
                       "\\2.\\1"  // \\1 holds "google", \\2 holds "com"
                     )

Note the following cases when dealing with empty strings and re.replace():

Using empty string as replaceRegex:

// In the function call below, if $e.principal.hostname contains "name",
// the result is: 1n1a1m1e1, because an empty string is found next to
// every character in `stringText`.
re.replace($e.principal.hostname, "", "1")

To replace an empty string, you can use "^$" as replaceRegex:

// In the function call below, if $e.principal.hostname contains the empty
// string, "", the result is: "none".
re.replace($e.principal.hostname, "^$", "none")

Date functions

Chronicle supports the following date-related functions:

  • timestamp.get_minute(unix_seconds [, time_zone])
  • timestamp.get_hour(unix_seconds [, time_zone])
  • timestamp.get_day_of_week(unix_seconds [, time_zone])
  • timestamp.get_week(unix_seconds [, time_zone])
  • timestamp.current_seconds()

Chronicle supports negative integers as the unix_seconds argument. Negative integers represent times before the Unix epoch. If you provide an invalid integer, for example a value that results in an overflow, the function will return -1. This is an uncommon scenario.

Because YARA-L 2 doesn't support negative integer literals, make sure to check for this condition using the less than or greater than operator. For example:

0 > timestamp.get_hour(123)

Time extraction

Returns an integer in the range [0, 59].

timestamp.get_minute(unix_seconds [, time_zone])

The following function returns an integer in the range [0, 23], representing the hour of day.

timestamp.get_hour(unix_seconds [, time_zone])

The following function returns an integer in the range [1, 7] representing the day of week starting with Sunday. For example, 1 = Sunday; 2 = Monday, etc.

timestamp.get_day_of_week(unix_seconds [, time_zone])

The following function returns an integer in the range [0, 53] representing the week of the year. Weeks begin with Sunday. Dates before the first Sunday of the year are in week 0.

timestamp.get_week(unix_seconds [, time_zone])

These time extraction functions have the same arguments.

  • unix_seconds is an integer representing the number of seconds past Unix epoch, such as $e.metadata.event_timestamp.seconds, or a placeholder containing that value.
  • time_zone is optional and is a string representing a time_zone. If omitted, the default is "GMT". You can specify time zones using string literals. The options are:
    • The TZ database name, for example "America/Los_Angeles". For more information, see the "TZ Database Name" column from this page
    • The time zone offset from UTC, in the format(+|-)H[H][:M[M]], for example: "-08:00".

In this example, the time_zone argument is omitted, so it defaults to "GMT".

$ts = $e.metadata.collected_timestamp.seconds

timestamp.get_hour($ts) = 15

This example uses a string literal to define the time_zone.

$ts = $e.metadata.collected_timestamp.seconds

2 = timestamp.get_day_of_week($ts, "America/Los_Angeles")

Here are examples of other valid time_zone specifiers, which you can pass as the second argument to time extraction functions:

  • "America/Los_Angeles", or "-08:00". ("PST" is not supported)
  • "America/New_York", or "-05:00". ("EST" is not supported)
  • "Europe/London"
  • "UTC"
  • "GMT"

Current timestamp

Returns an integer representing the current time in Unix seconds. This is approximately equal to the detection timestamp and is based on when the rule is run.

timestamp.current_seconds()

The following example returns True if the certificate has been expired for more than 24h. It calculates the time difference by subtracting the current Unix seconds, and then comparing using a greater than operator.

86400 < timestamp.current_seconds() - $e.network.tls.certificate.not_after

Math functions

Absolute value

Returns the absolute value of an integer expression.

math.abs(intExpression)

This example returns True if the event was more than 5 minutes from the time specified (in seconds from the Unix epoch), regardless of whether the event came before or after the time specified. A call to math.abs cannot depend on multiple variables or placeholders. For example, you cannot replace the hardcoded time value of 1643687343 in the example below with $e2.metadata.event_timestamp.seconds.

300 < math.abs($e1.metadata.event_timestamp.seconds - 1643687343)

Net functions

Returns true when the given IP address is within the specified subnetwork.

net.ip_in_range_cidr(ipAddress, subnetworkRange)

You can use YARA-L to search for UDM events across all of the IP addresses within a subnetwork using the net.ip_in_range_cidr() statement. Both IPv4 and IPv6 are supported.

To search across a range of IP addresses, specify an IP UDM field and a Classless Inter-Domain Routing (CIDR) range. YARA-L can handle both singular and repeating IP address fields.

IPv4 example:

net.ip_in_range_cidr($e.principal.ip, "192.0.2.0/24")

IPv6 example:

net.ip_in_range_cidr($e.network.dhcp.yiaddr, "2001:db8::/32")

For an example rule using the net.ip_in_range_cidr()statement, see the example rule. Single Event within Range of IP Addresses

Array functions

Array Length

Returns the number of repeated field elements.

arrays.length($e.principal.ip) = 2

If multiple repeated fields are along the path, returns the total number of repeated field elements.

arrays.length($e.intermediary.ip) = 3

Function to placeholder assignment

You can assign the result of a function call to a placeholder in the events section. For example:

$placeholder = strings.concat($e.principal.hostname, "my-string").

You can then use the placeholder variables in the match, condition, and outcome sections. However, there are two limitations with function to placeholder assignment:

  1. Every placeholder in function to placeholder assignment must be assigned to an expression containing an event field. For example, the following examples are valid:

    $ph1 = $e.principal.hostname
    $ph2 = $e.src.hostname
    
    // Both $ph1 and $ph2 have been assigned to an expression containing an event field.
    $ph1 = strings.concat($ph2, ".com")
    
    $ph1 = $e.network.email.from
    $ph2 = strings.concat($e.principal.hostname, "@gmail.com")
    
    // Both $ph1 and $ph2 have been assigned to an expression containing an event field.
    $ph1 = strings.to_lower($ph2)
    

    However, the example below is invalid:

    $ph1 = strings.concat($e.principal.hostname, "foo")
    $ph2 = strings.concat($ph1, "bar") // $ph2 has NOT been assigned to an expression containing an event field.
    
  2. Function call should depend on one and exactly one event. However, more than one field from the same event can be used in function call arguments. For example, the following is valid:

    $ph = strings.concat($event.principal.hostname, "string2")

    $ph = strings.concat($event.principal.hostname, $event.src.hostname)

    However, the following is invalid:

    $ph = strings.concat("string1", "string2")

    $ph = strings.concat($event.principal.hostname, $anotherEvent.src.hostname)

Reference Lists syntax

See our page on Reference Lists for more information on reference list behavior and reference list syntax.

You can use reference lists in the events or outcome sections. Here is the syntax for using various types of reference lists in a rule:

// STRING reference list
$e.principal.hostname in %string_reference_list

// REGEX reference list
$e.principal.hostname in regex %regex_reference_list

// CIDR reference list
$e.principal.ip in cidr %cidr_reference_list

You can also use the not operator and the nocase operator with reference lists as shown below. The nocase operator is compatible with STRING lists and REGEX lists.

// Exclude events whose hostnames match substrings in my_regex_list.
not $e.principal.hostname in regex %my_regex_list

// Event hostnames must match at least 1 string in my_string_list (case insensitive).
$e.principal.hostname in %my_string_list nocase

For performance reasons, the Detection Engine restricts reference list usage.

  • Maximum in statements in a rule, with or without special operators: 7
  • Maximum in statements with the regex operator: 4
  • Maximum in statements with the cidr operator: 2

Meta section syntax

Meta section is composed of multiple lines, where each line defines a key-value pair. A key part must be an unquoted string, and a value part must be a quoted string:

<key> = "<value>"

The following is an example of a valid meta section line: meta: author = "Chronicle" severity = "HIGH"

Events section syntax

In the events section, list the predicates to specify the following:

  • What each match or placeholder variable represents
  • Simple binary expressions as conditions
  • Function expressions as conditions
  • Reference list expressions as conditions
  • Logical operators

Variable declarations

For variable declarations, use the following syntax:

  • <EVENT_FIELD> = <VAR>
  • <VAR> = <EVENT_FIELD>

Both are equivalent, as shown in the following examples:

  • $e.source.hostname = $hostname
  • $userid = $e.principal.user.userid

This declaration indicates that this variable represents the specified field for the event variable. When the event field is a repeated field, the match variable can represent any value in the array. It is also possible to assign multiple event fields to a single match or placeholder variable. This is a transitive join condition.

For example, the following:

  • $e1.source.ip = $ip
  • $e2.target.ip = $ip

Are equivalent to:

  • $e1.source.ip = $ip
  • $e1.source.ip = $e2.target.ip

When a variable is used, the variable must be declared through variable declaration. If a variable is used without any declaration, it is regarded as a compilation error.

Simple binary expressions as conditions

For a simple binary expression to use as condition, use the following syntax:

  • <EXPR> <OP> <EXPR>

Expression can be either event field, variable, literal, or function expression.

For example:

  • $e.source.hostname = "host1234"
  • $e.source.port < 1024
  • 1024 < $e.source.port
  • $e1.source.hostname != $e2.target.hostname
  • $e1.metadata.collected_timestamp.seconds > $e2.metadata.collected_timestamp.seconds
  • $port >= 25
  • $host = $e2.target.hostname
  • "google-test" = strings.concat($e.principal.hostname, "-test")
  • "email@google.org" = re.replace($e.network.email.from, "com", "org")

If both sides are literals, it is regarded as a compilation error.

Function expressions as conditions

Some function expressions return boolean value, which can be used as an individual predicate in the events section. Such functions are:

  • re.regex()
  • net.ip_in_range_cidr()

For example:

  • re.regex($e.principal.hostname, `.*\.google\.com`)
  • net.ip_in_range_cidr($e.principal.ip, "192.0.2.0/24")

Reference list expressions as conditions

You can use reference lists in the events section. See the section on Reference Lists for more details.

Logical operators

You can use the logical and and logical or operators in the events section as shown in the following examples:

  • $e.metadata.event_type = "NETWORK_DNS" or $e.metadata.event_type = "NETWORK_DHCP"
  • ($e.metadata.event_type = "NETWORK_DNS" and $e.principal.ip = "192.0.2.12") or ($e.metadata.event_type = "NETWORK_DHCP" and $e.principal.mac = "AB:CD:01:10:EF:22")
  • not $e.metadata.event_type = "NETWORK_DNS"

By default, the precedence order from highest to lowest is not, and, or.

For example, "a or b and c" is evaluated as "a or (b and c)". You can use parentheses to alter the precedence if needed.

In the events section, all predicates are regarded as anded together by default.

Operators in events

You can use the operators with enumerated types. It can be applied to rules to simplify and optimize (use operator instead of reference lists) the performance.

In the following example, 'USER_UNCATEGORIZED' and 'USER_RESOURCE_DELETION' correspond to 15000 and 15014, so the rule will look for all the listed events:

$e.metadata.event_type >= "USER_CATEGORIZED" and $e.metadata.event_type <= "USER_RESOURCE_DELETION"

List of events:

  • USER_RESOURCE_DELETION
  • USER_RESOURCE_UPDATE_CONTENT
  • USER_RESOURCE_UPDATE_PERMISSIONS
  • USER_STATS
  • USER_UNCATEGORIZED

Modifiers

nocase

When you have a comparison expression between string values or a regex expression, you can append nocase at the end of the expression to ignore capitalization.

  • $e.principal.hostname != "http-server" nocase
  • $e1.principal.hostname = $e2.target.hostname nocase
  • $e.principal.hostname = /dns-server-[0-9]+/ nocase
  • re.regex($e.target.hostname, `client-[0-9]+`) nocase

This cannot be used when a type of field is an enumerated value. Below examples are invalid and will generate compilation errors:

  • $e.metadata.event_type = "NETWORK_DNS" nocase
  • $e.network.ip_protocol = "TCP" nocase

Repeated fields

any, all

In UDM and Entity, some fields are labeled as repeated, which indicates they are lists of values or other types of messages. In YARA-L, each element in the repeated field is treated individually. That means, if the repeated field is used in the rule, we evaluate the rule for each element in the field. This can lead to an unexpected behavior. For example, if a rule has both $e.principal.ip = "1.2.3.4" and $e.principal.ip = "5.6.7.8" in the events section, the rule never generates any matches, even if both "1.2.3.4" and "5.6.7.8" are in principal.ip.

To evaluate the repeated field as a whole, you can use any and all operators. When any is used, the predicate is evaluated as true if any value in the repeated field satisfies the condition. When all is used, the predicate is evaluated as true if all values in the repeated field satisfy the condition.

  • any $e.target.ip = "127.0.0.1"
  • all $e.target.ip != "127.0.0.1"
  • re.regex(any $e.about.hostname, `server-[0-9]+`)
  • net.ip_in_range_cidr(all $e.principal.ip, "10.0.0.0/8")

The any and all operators can only be used with repeated fields. In addition, they cannot be used when assigning a repeated field to a placeholder variable or joining with a field of another event.

For example, any $e.principal.ip = $ip and any $e1.principal.ip = $e2.principal.ip are not valid syntax. To match or join a repeated field, use $e.principal.ip = $ip. There will be one match variable value or join for each element of the repeated field.

When writing a condition with any or all, be aware that negating the condition with not might not have the same meaning as using the negated operator.

For example:

  • not all $e.principal.ip = "192.168.12.16" checks if not all IP addresses match "192.168.12.16", meaning the rule is checking whether any IP address does not match "192.168.12.16".
  • all $e.principal.ip != "192.168.12.16" checks if all IP addresses do not match "192.168.12.16", meaning the rule is checking that no IP addresses match to "192.168.12.16".

Array indexing

You can perform array indexing on repeated fields. To access the n-th repeated field element, use the standard list syntax (elements are 0-indexed). An out-of-bounds element returns the default value.

  • $e.principal.ip[0] = "192.168.12.16"
  • $e.principal.ip[999] = "" If there are fewer than 1000 elements, this returns true.

An index must be a non-negative integer literal. Values that have an int type (e.g. placeholder set to an int) don't count. Array indexing cannot be combined with any/all. Array indexing cannot be combined with map syntax. If the field path contains multiple repeated fields, all repeated fields must use array indexing.

The following are all examples of invalid syntax:

  • $e.intermediary.ip[0] is not valid because intermediary is a repeated field and we are attempting to use array indexing.
  • $e.principal.ip[-1] is not valid because -1 is not a positive integer.
  • any $e.intermediary.ip[0] is not valid because any/all cannot be combined with array indexing.
  • $e.additional.fields[0]["key"] is not valid because array indexing cannot be combined with map syntax.

Event variable join requirements

All event variables used in the rule must be joined with every other event variable in either of the following ways:

  • directly through an equality comparison between event fields of the two joined event variables, for example: $e1.field = $e2.field. The expression must not include arithmetic or function calls.

  • indirectly through a transitive join involving only an event field (see variable declaration for a definition of "transitive join"). The expression must not include arithmetic or function calls.

For example, assuming $e1, $e2, and $e3 are used in the rule, the following events sections are valid.

events:
  $e1.principal.hostname = $e2.src.hostname // $e1 joins with $e2
  $e2.principal.ip = $e3.src.ip // $e2 joins with $e3
events:
  // all of $e1, $e2 and $e3 are transitively joined via the placeholder variable $ip
  $e1.src.ip = $ip
  $e2.target.ip = $ip
  $e3.about.ip = $ip
events:
  $e1.principal.hostname = $e2.src.hostname // $e1 joins with $e2

  // Function to event comparison is not a valid join condition for $e1 and $e2,
  // but the whole events section is valid because we have a valid join condition in the first line.
  re.capture($e1.src.hostname, ".*") = $e2.target.hostname

However, here are examples of invalid events sections.

events:
  // Event to function comparison is an invalid join condition for $e1 and $e2.
  $e1.principal.hostname = re.capture($e2.principal.application, ".*")
events:
  // Event to arithmetic comparison is an invalid join condition for $e1 and $e2.
  $e1.principal.port = $e2.src.port + 1
events:
  $e1.src.ip = $ip
  $e2.target.ip = $ip
  $e3.about.ip = "192.1.2.0" //$e3 is not joined with $e1 or $e2.
events:
  $e1.src.ip = $ip

  // Function to placeholder comparison is an invalid transitive join condition.
  re.capture($e2.target.ip, ".*") = $ip
events:
  $e1.src.port = $port

  // Arithmetic to placeholder comparison is an invalid transitive join condition.
  $e2.principal.port + 800 = $port

Match section syntax

In the match section, list the match variables for group events before checking for match conditions. Those fields are returned with each match.

  • Specify what each match variable represents in the events section.
  • Specify the time range to use to correlate events after the over keyword. Events outside the time range are ignored.
  • Use the following syntax to specify the time range: <number><m/h/d>

    Where m/h/d means minutes, hours, and days respectively.

  • Minimum time you can specify is 1 minute.

  • Maximum time you can specify is 48 hours.

The following is an example of a valid match:

$var1, $var2 over 5m

This statement returns $var1 and $var2 (defined in the events section) when the rule finds a match. The time specified is 5 minutes. Events that are more than 5 minute apart are not correlated and therefore ignored by the rule.

Here is another example of a valid match:

$user over 1h

This statement returns $user when the rule finds a match. The time window specified is 1 hour. Events that are more than an hour apart are not correlated. The rule does not consider them to be a detection.

Here is another example of a valid match:

$source_ip, $target_ip, $hostname over 2m

This statement returns $source_ip, $target_ip, and $hostname when the rule finds a match. The time window specified is 2 minutes. Events that are more than 2 minutes apart are not correlated. The rule does not consider them to be a detection.

The following examples illustrate invalid match sections:

  • var1, var2 over 5m // invalid variable name
  • $user 1h // missing keyword

Sliding window

By default, YARA-L 2.0 rules are evaluated using hop windows. A time range of enterprise event data is divided into a set of overlapping hop windows, each with the duration specified in the match section. Events are then correlated within each hop window. With hop windows, it is impossible to search for events that happen in a specific order (for example, e1 happens up to 2 minutes after e2). An occurrence of event e1 and an occurrence of event e2 are correlated as long as they are within the hop window duration of each other.

Rules can also be evaluated using sliding windows. With sliding windows, sliding windows with the duration specified in the match section are generated when beginning or ending with a specified pivot event variable. Events are then correlated within each sliding window. This makes it possible to search for events that happen in a specific order (for example, e1 happens within 2 minutes of e2). An occurrence of event e1 and an occurrence of event e2 are correlated if event e1 occurs within the sliding window duration after event e2.

Specify sliding windows in the match section of a rule as follows:

<match-var-1>, <match-var-2>, ... over <duration> before|after <pivot-event-var>

The pivot event variable is the event variable that sliding windows are based on. If you use the before keyword, sliding windows are generated, ending with each occurrence of the pivot event. If the after keyword is used, sliding windows are generated beginning with each occurrence of the pivot event.

The following are examples of valid sliding window usages:

  • $var1, $var2 over 5m after $e1
  • $user over 1h before $e2

Outcome section syntax

In the outcome section, you can define up to 20 outcome variables, with arbitrary names. These outcomes will be stored in the detections generated by the rule. Each detection may have different values for the outcomes.

The outcome name, $risk_score, is special. You can optionally define an outcome with this name, and if you do, it must be an integer type. If populated, the risk_score will be shown in the Enterprise Insights view for alerts that come from rule detections.

If you do not include a $risk_score variable in the outcome section of a rule, one of the following default values is set:

  • If the rule is configured to generate an alert, then $risk_score is set to 40.
  • If the rule is not configured to generate an alert, then $risk_score is set to 15.

The value of $risk_score is stored in the security_result.risk_score UDM field.

Outcome variable data types

Each outcome variable can have a different data type, which is determined by the expression used to compute it. We support the following outcome data types:

  • integer
  • string
  • lists of integers
  • lists of strings

If a match variable is on a repeated field and that repeated field contains duplicate elements, then the duplicate element is considered multiple times when computing outcomes. For example, this could result in unexpected values for outcomes that use sum(), but would not affect outcomes that use max().

Conditional logic

You can use conditional logic to compute the value of an outcome. Conditionals are specified using the following syntax pattern:

if(BOOL_CLAUSE, THEN_CLAUSE)
if(BOOL_CLAUSE, THEN_CLAUSE, ELSE_CLAUSE)

You can read a conditional expression as "if BOOL_CLAUSE is true, then return THEN_CLAUSE, else return ELSE_CLAUSE".

BOOL_CLAUSE must evaluate to a boolean value. A BOOL_CLAUSE expression takes a similar form as expressions in the events section. For example, it can contain:

  • UDM field names with comparison operator, for example:

    if($context.graph.entity.user.title = "Vendor", 100, 0)

  • placeholder variable that was defined in the events section, for example:

    if($severity = "HIGH", 100, 0)

  • functions that return a boolean, for example:

    if(re.regex($e.network.email.from, .*altostrat\.com), 100, 0)

  • look up in a reference list, for example:

    if($u.principal.hostname in %my_reference_list_name, 100, 0)

The THEN_CLAUSE and ELSE_CLAUSE must be the same data type. We support integers and strings.

You can omit the ELSE_CLAUSE if the data type is integer. If omitted, the ELSE_CLAUSE evaluates to 0. For example:

`if($e.field = "a", 5)` is equivalent to `if($e.field = "a", 5, 0)`

You must provide the ELSE_CLAUSE if the data type is string.

Mathematical operations

You can use mathematical operations to compute integer data type in the outcome and events sections of a rule. Chronicle supports addition, subtraction, multiplication, and division as top level operators in a computation.

The following snippet is an example computation in the outcome section:

outcome:
  $risk_score = max(100 + if($severity = "HIGH", 10, 5) - if($severity = "LOW", 20, 0))

Placeholder variables in outcomes

When computing outcome variables, you can use placeholder variables which were defined in the events section of your rule. In this example, assume that $email_sent_bytes was defined in the events section of the rule:

Single-event example:

// No match section, so this is a single-event rule.

outcome:
  // Use placeholder directly as an outcome value.
  $my_outcome = $email_sent_bytes

  // Use placeholder in a conditional.
  $other_outcome = if($file_size > 1024, "SEVERE", "MODERATE")

condition:
  $e

Multi-event example:

match:
  // This is a multi event rule with a match section.
  $hostname over 5m

outcome:
  // Use placeholder directly in an aggregation function.
  $max_email_size = max($email_sent_bytes)

  // Use placeholder in a mathematical computation.
  $total_bytes_exfiltrated = sum(
    1024
    + $email_sent_bytes
    + $file_event.principal.file.size
  )

condition:
  $email_event and $file_event

Aggregations

The outcome section can be used in multi-event rules (rules that contain a match section), and in single-event rules (rules that do not contain a match section). Requirements for aggregations are as follows:

  • Multi-event rules (with match section)

    • Expression to compute outcomes is evaluated over all events that generated a particular detection.
    • Expression must be wrapped in an aggregate function
      • Example: $max_email_size = max($e.network.sent_bytes)
      • If the expression contains a repeated field, the aggregate operates over all elements in the repeated field, over all events that generated the detection
  • Single-event rules (without match section)

    • Expression to compute outcomes is evaluated over the single event that generated a particular detection.
    • Must use aggregate function for expressions that involve at least one repeated field
      • Example: $suspicious_ips = array($e.principal.ip)
      • The aggregate operates over all elements in the repeated field
    • Can not use aggregate function for expressions that do not involve a repeated field
      • Example: $threat_status = if($e.principal.file.size > 1024, "SEVERE", "MODERATE")

You can use the following aggregation functions:

  • max(): outputs the maximum over all possible values. Only works with integer.
  • min(): outputs the minimum over all possible values. Only works with integer.
  • sum(): outputs the sum over all possible values. Only works with integer.
  • count_distinct(): collects all possible values, then outputs the distinct count of possible values.
  • count(): behaves like count_distinct(), but returns a non-distinct count of possible values.
  • array_distinct(): collects all possible values, then outputs a list of these values. It will truncate the list of values to 25 random elements.
  • array(): behaves like array_distinct(), but returns a non-distinct list of values. It also truncates the list of values to 25 random elements.

The aggregate function is important when a rule includes a condition section that specifies multiple events must exist, because the aggregate function will operate on all the events that generated the detection.

For example, if your outcome and condition sections contain:

outcome:
  $asset_id_count = count($event.principal.asset_id)
  $asset_id_distinct_count = count_distinct($event.principal.asset_id)

  $asset_id_list = array($event.principal.asset_id)
  $asset_id_distinct_list = array_distinct($event.principal.asset_id)

condition:
  #event > 1

Since the condition section requires there to be more than one event for each detection, the aggregate functions will operate on multiple events. Suppose the following events generated one detection:

event:
  // UDM event 1
  asset_id="asset-a"

event:
  // UDM event 2
  asset_id="asset-b"

event:
  // UDM event 3
  asset_id="asset-b"

Then the values of your outcomes will be:

  • $asset_id_count = 3
  • $asset_id_distinct_count = 2
  • $asset_id_list = ["asset-a", "asset-b", "asset-b"]`
  • $asset_id_distinct_list = ["asset-a", "asset-b"]

Things to know when using the outcome section:

Other notes and restrictions:

  • The outcome section cannot reference a new placeholder variable which wasn't already defined in the events section.
  • The outcome section cannot use event variables that have not been defined in the events section.
  • The outcome section can use an event field that was not used in the events section, given that the event variable that the event field belongs to was already defined in the events section.
  • The outcome section can only correlate event variables that have already been correlated in the events section. Correlations happen when two event fields from different event variables are equated.

You can find an example using the outcome section in Overview of the YARA-L 2.0. See Create context-aware analytics for details on detection deduping with the outcome section.

Condition section syntax

In the condition section, you can:

  • specify a match condition over events and placeholders defined in the events section. See the following section, Event and placeholder conditionals, for more details.
  • (optional) use the and keyword to specify a match condition using outcome variables defined in the outcome section. See the following section, Outcome conditionals, for more details.

The following condition patterns are valid:

condition:
  <event/placeholder conditionals>
condition:
  <event/placeholder conditionals> and <outcome conditionals>

Event and placeholder conditionals

List condition predicates for events and placeholder variables here, joined with the keyword and or or.

The following conditions are bounding conditions. They force the associated event variable to exist, meaning that at least one occurrence of the event must appear in any detection.

  • $var // equivalent to #var > 0
  • #var > n // where n >= 0
  • #var >= m // where m > 0

The following conditions are non-bounding conditions. They allow the associated event variable to not exist, meaning that it is possible that no occurrence of the event appears in a detection. This enables the making of non-existence rules, which search for the absence of a variable instead of the presence of a variable.

  • !$var // equivalent to #var = 0
  • #var >= 0
  • #var < n // where n > 0
  • #var <= m // where m >= 0

You can join an event with an entity, and then check for an absence of the event. The following pseudo-code example joins an event field, $u.field and an entity field, $e.graph.field in the events section, and also checks for an absence of the event in the condition section, !$u and $e.

  events:
      $u.field = "value" // $u is an event
      $e.graph.field = "value" // $e is an entity
      // ...other sections of the rule...
  condition:
      !$u and $e

You cannot check for the absence of the entity. Considering the example above, the following statement to check for an absence of the entity is not valid: $u and !$e.

In the following example, the special character # on a variable (either the event variable or the placeholder variable) represents the count of distinct events or values of that variable.

$e and #port > 50 or #event1 > 2 or #event2 > 1 or #event3 > 0

The following non-existence example is also valid and evaluates to true if there are more than two distinct events from $event1, and zero distinct events from $event2.

#event1 > 2 and !$event2

There are restrictions around what type of event or placeholder variables can have non-bounding conditions. The variable must be one of the following:

  • A UDM event variable in a rule with 2 or more UDM event variables.
  • A placeholder variable associated with at least 1 UDM event variable without non-bounding conditions.

The following are examples of invalid predicates:

  • $e, #port > 50 // incorrect keyword usage
  • $e or #port < 50 // or keyword not supported with non-bounding conditions
  • not $e // not keyword is not allowed for event and placeholder conditions

Outcome conditionals

List condition predicates for outcome variables here, joined with the keyword and or or, or preceded by the keyword not.

Specify outcome conditionals differently depending on the type of the outcome variable:

  • integer: compare against an integer literal with operators =, >, >=, <, <=, !=, for example:

    $risk_score > 10

  • string: compare against a string literal with either = or !=, for example:

    $severity = "HIGH"

  • list of integers or arrays: specify condition using the arrays.contains function, for example:

    arrays.contains($event_ids, "id_1234")

Rule classification

Specifying an outcome conditional in a rule that has a match section means that the rule will be classified as a multi-event rule for rule quota. Please see single event rule and multiple event rule for more information about single and multiple event classifications.

Count (#) character

The # character is a special character in the condition section. If it is used before any event or placeholder variable name, it represents the number of distinct events or values that satisfy all the events section conditions.

Value ($) character

The $ character is another special character in the condition section. If it is used before any outcome variable name, it represents the value of that outcome.

If it is used before any event or placeholder variable name (for example, $event), it is a shorthand for #event > 0.

Options section syntax

In the options section, you can specify the options for the rule. Syntax for the options section is similar to that of the meta section. But a key must be one of predefined option names, and the value is not restricted to string type.

Currently, the only available option is allow_zero_values.

  • allow_zero_value — If set to true, matches generated by the rule can have zero values as match variable values. Zero values are given to event fields when they are left unpopulated. This option is set to false by default.

Following is the valid options section line:

  • allow_zero_values = true

Type checking

Chronicle performs type checking against your YARA-L syntax as you create rules within the interface. The type checking errors displayed help you to revise the rule in such a way as to ensure that it will work as expected.

The following are examples of invalid predicates:

// $e.target.port is of type integer which cannot be compared to a string.
$e.target.port = "80"

// "LOGIN" is not a valid event_type enum value.
$e.metadata.event_type = "LOGIN"