YARA-L 2.0 language syntax
This section describes the major elements of the YARA-L syntax. See also Overview of the YARA-L 2.0 language.
Rule structure
For YARA-L 2.0, you must specify variable declarations, definitions, and usages in the following order:
- meta
- events
- match (optional)
- outcome (optional)
- condition
- options (optional)
The following illustrates the generic structure of a rule:
rule <rule Name>
{
meta:
// Stores arbitrary key-value pairs of rule details, such as who wrote
// it, what it detects on, version control, etc.
events:
// Conditions to filter events and the relationship between events.
match:
// Values to return when matches are found.
outcome:
// Additional information extracted from each detection.
condition:
// Condition to check events and the variables used to find matches.
options:
// Options to turn on or off while executing this rule.
}
Comments
Designate comments with two slash characters (// comment
) or multi-line comments set off using slash asterisk characters (/* comment */
), as you would in C.
Literals
Nonnegative integers (without decimal points), string, boolean, and regex literals are supported.
String and regex literals
You can use either of the following quotation characters to enclose strings in YARA-L 2.0. However, quoted text is interpreted differently depending on which one you use.
Double quotes (") — Use for normal strings. Must include escape characters.
For example: "hello\tworld" —\t is interpreted as a tabBack quotes (`) — Use to interpret all characters literally.
For example: `hello\tworld` —\t is not interpreted as a tab
For regular expressions, you have two options.
If you want to use regular expressions directly without the re.regex()
function, use /regex/
for the regular expression literals.
You can also use string literals as regex literals when you use the re.regex()
function. Note that for double quote string literals, you must escape backslash characters with backslash characters, which can look awkward.
For example, the following regular expressions are equivalent:
re.regex($e.network.email.from, `.*altostrat\.com`)
re.regex($e.network.email.from, ".*altostrat\\.com")
$e.network.email.from = /.*altostrat\.com/
Google recommends using back quote characters for strings in regular expressions for ease of readability.
Operators
You can use the following operators in YARA-L:
Operator | Description |
= | equal/declaration |
!= | not equal |
< | less than |
<= | less than or equal |
> | greater than |
>= | greater than or equal |
Variables
In YARA-L 2.0, all variables are represented as $<variable name>
.
You can define the following types of variables:
Event variables — Represent groups of events in normalized form (UDM) or entity events. Specify conditions for event variables in the
events
section. You identify event variables using a name, event source, and event fields. Allowed sources areudm
(for normalized events) andgraph
(for entity events). If the source is omitted,udm
is set as the default source. Event fields are represented as a chain of .<field name> (for example, $e.field1.field2). Event field chains always start from the top-level source (UDM or Entity).Match variables — Declare in the
match
section. Match variables become grouping fields for the query, as one row is returned for each unique set of match variables (and for each time window). When the rule finds a match, the match variable values are returned. Specify what each match variable represents in theevents
section.Placeholder variables — Declare and define in the
events
section. Placeholder variables are similar to match variables. However, you can use placeholder variables in thecondition
section to specify match conditions.
Use match variables and placeholder variables to declare relationships between event fields through transitive join conditions (see Events Section Syntax for more detail).
Keywords
Keywords in YARA-L 2.0 are case-insensitive. For example, and
or AND
are
equivalent. Variable names must not conflict with keywords. For example,
$AND
or $outcome
is invalid.
The following are keywords for detection engine rules: rule
, meta
, match
, over
, events
, condition
, outcome
, options
, and
, or
, not
, nocase
, in
, regex
, cidr
, before
, after
, all
, any
, if
, max
, min
, sum
, array
, array_distinct
, count
, and count_distinct
.
Maps
Structs and Labels
Some UDM fields use either the Struct or Label data type.
To search for a specific key-value pair in both Struct and Label, use the standard map syntax:
// A Struct field.
$e.udm.additional.fields["pod_name"] = "kube-scheduler"
// A Label field.
$e.metadata.ingestion_labels["MetadataKeyDeletion"] = "startup-script"
Supported cases
Events and Outcome Section
// Using a Struct field in the events section
events:
$e.udm.additional.fields["pod_name"] = "kube-scheduler"
// Using a Label field in the outcome section
outcome:
$value = array_distinct($e.metadata.ingestion_labels["MetadataKeyDeletion"])
Assigning a map value to a Placeholder
$placeholder = $u1.metadata.ingestion_labels["MetadataKeyDeletion"]
Using a map field in a join condition
// using a Struct field in a join condition between two udm events $u1 and $u2
$u1.metadata.event_type = $u2.udm.additional.fields["pod_name"]
Unsupported cases
Combining any
or all
keywords with a map
For example, the following is not currently supported:
all $e.udm.additional.fields["pod_name"] = "kube-scheduler"
Duplicate value handling
Map accesses always returns a single value. In the uncommon edge case that the map access could refer to multiple values, the map access will deterministically return the first value.
This can happen in either of the following cases:
A label has a duplicate key.
The label structure represents a map, but does not enforce key uniqueness. By convention, a map should have unique keys, so Chronicle does not recommend populating a label with duplicate keys.
The rule text
$e.metadata.ingestion_labels["dupe-key"]
would return the first possible value,val1
, if run over the following data example:// Disrecommended usage of label with a duplicate key: event { metadata{ ingestion_labels{ key: "dupe-key" value: "val1" // This is the first possible value for "dupe-key" } ingestion_labels{ key: "dupe-key" value: "val2" } } }
A label has an ancestor repeated field.
A repeated field might contain a label as a child field. Two different entries in the top-level repeated field might contain labels that have the same key. The rule text
$e.security_result.rule_labels["key"]
would return the first possible value,val3
, if run over the following data example:event { // security_result is a repeated field. security_result { threat_name: "threat1" rule_labels { key: "key" value: "val3" // This is the first possible value for "key" } } security_result { threat_name: "threat2" rule_labels { key: "key" value: "val4" } } }
Functions
This section describes the YARA-L 2.0 functions that Chronicle supports in Detection Engine.
These functions can be used in the following areas in a rule:
events
section.BOOL_CLAUSE
of a conditional in the outcome section.
String functions
Chronicle supports the following string manipulation functions:
- strings.concat(a, b)
- strings.coalesce(a, b)
- strings.to_lower(stringText)
- strings.to_upper(stringText)
- strings.base64_decode(encodedString)
The following sections describe how to use each.
Concatenate strings or integers
Returns the concatenation of two strings, two integers, or a combination of the two.
strings.concat(a, b)
This function takes two arguments, that can be either strings or integers, and returns the two values concatenated as a string. Integers are cast to a string before concatenation. The arguments can be literals or event fields. If both arguments are fields, the two attributes must be from the same event.
The following example includes a string variable and string literal as arguments.
"google-test" = strings.concat($e.principal.hostname, "-test")
The following example includes a string variable and integer variable as arguments. Both principal.hostname and principal.port are from the same event, $e, and are concatenated to return a string.
"google80" = strings.concat($e.principal.hostname, $e.principal.port)
The following example attempts to concatenate principal.port from event $e1, with principal.hostname from event $e2. It will return a compiler error because the arguments are different event variables.
// returns a compiler error
"test" = strings.concat($e1.principal.port, $e2.principal.hostname)
Coalesce string values
Returns the value of the first expression that does not evaluate to an empty string (for example, "non-zero value"). If both arguments evaluate to an empty string, the function call returns an empty string.
strings.coalesce(a, b)
The arguments can be literals, event fields, or function calls. Both arguments must be of STRING type. If both arguments are fields, the two attributes must be from the same event.
The following example includes a string variable and string literal as
arguments. The condition evaluates to true when (1) $e.network.email.from
is suspicious@gmail.com
or (2) $e.network.email.from
is empty and $e.network.email.to
is suspicious@gmail.com
.
"suspicious@gmail.com" = strings.coalesce($e.network.email.from, $e.network.email.to)
The following example includes nested coalesce calls. This condition compares the first
non-null IP address from event $e
against values in the reference list ip_watchlist
.
The order that the arguments are coalesced in this call is the same as the order
they are enumerated in the rule condition:
$e.principal.ip
is evaluated first.$e.src.ip
is evaluated next.$e.target.ip
is evaluated next.- Finally, the string "No IP" is returned as a default value if the previous IP fields are unset.
strings.coalesce(
strings.coalesce($e.principal.ip, $e.src.ip),
strings.coalesce($e.target.ip, "No IP")
) in %ip_watchlist
The following example attempts to coalesce principal.hostname
from event $e1
and event $e2
.
It will return a compiler error because the arguments are different event variables.
// returns a compiler error
"test" = strings.coalesce($e1.principal.hostname, $e2.principal.hostname)
Convert string to uppercase or lowercase
These functions return string text after changing all characters to either uppercase or lowercase.
- strings.to_lower(stringText)
- strings.to_upper(stringText)
"test@google.com" = strings.to_lower($e.network.email.from)
"TEST@GOOGLE.COM" = strings.to_upper($e.network.email.to)
Base64 decode a string
Returns a string containing the base64 decoded version of the encoded string.
strings.base64_decode(encodedString)
This function takes one base64 encoded string as an argument. If encodedString is not a valid base64 encoded string, the function returns encodedString as-is.
This example returns True if principal.domain.name is "dGVzdA==", which is base64 encoding for the string "test".
"test" = strings.base64_decode($e.principal.domain.name)
RegExp functions
Chronicle supports the following regular expression functions:
- re.regex(stringText, regex)
- re.capture(stringText, regex)
- re.replace(stringText, replaceRegex, replacementText)
RegExp match
You can define regular expression matching in YARA-L 2.0 using either of the following syntax:
- Using YARA syntax — Related to events.
The following is a generic representation of this syntax:
$e.field = /regex/
- Using YARA-L syntax — As a function taking in the following parameters:
- Field the regular expression is applied to.
- Regular expression specified as a string. You can use the
nocase
modifier after strings to indicate that the search should ignore capitalization. The following is a generic representation of this syntax:re.regex($e.field, `regex`)
Be aware of the following while defining regular expressions in YARA-L 2.0:
- In either case, the predicate is true if the string contains a substring that matches the regular expression provided. It is unnecessary to add
.*
to the beginning or at the end of the regular expression. - To match the exact string or only a prefix or suffix, include the
^
(starting) and$
(ending) anchor characters in the regular expression. For example,/^full$/
matches"full"
exactly, while/full/
could match"fullest"
,"lawfull"
, and"joyfully"
. - If the UDM field includes newline characters, the
regexp
only matches the first line of the UDM field. To enforce full UDM field matching, add a(?s)
to the regular expression. For example, replace/.*allUDM.*/
with/(?s).*allUDM.*/
.
RegExp capture
Captures (extracts) data from a string using the regular expression pattern provided in the argument.
re.capture(stringText, regex)
This function takes two arguments:
- stringText: the original string to search.
- regex: the regular expression indicating the pattern to search for.
The regular expression can contain 0 or 1 capture groups in parentheses. If the regular expression contains 0 capture groups, the function returns the first entire matching substring. If the regular expression contains 1 capture group, it returns the first matching substring for the capture group. Defining two or more capture groups returns a compiler error.
In this example, if $e.principal.hostname contains "aaa1bbaa2" the following would be True, because the function returns the first instance. This example has no capture groups.
"aaa1" = re.capture($e.principal.hostname, "a+[1-9]")
This example captures everything after the @ symbol in an email. If the
$e.network.email.from field is test@google.com
, the example returns
google.com. This example contains one capture group.
"google.com" = re.capture($e.network.email.from , "@(.*)")
If the regular expression does not match any substring in the text, the
function returns an empty string. You can omit events where no match occurs
by excluding the empty string, which is especially important when you are
using re.capture()
with an inequality:
// Exclude the empty string to omit events where no match occurs.
"" != re.capture($e.network.email.from , "@(.*)")
// Exclude a specific string with an inequality.
"google.com" != re.capture($e.network.email.from , "@(.*)")
RegExp replacement
Performs a regular expression replacement.
re.replace(stringText, replaceRegex, replacementText)
This function takes three arguments:
- stringText: the original string.
- replaceRegex: the regular expression indicating the pattern to search for.
- replacementText: The text to insert into each match.
Returns a new string derived from the original stringText, where all substrings that match the pattern in replaceRegex are replaced with the value in replacementText. You can use backslash-escaped digits (\1 to \9) within replacementText to insert text matching the corresponding parenthesized group in the replaceRegex pattern. Use \0 to refer to the entire matching text.
The function replaces non-overlapping matches and will prioritize replacing the first occurrence found. For example, re.replace("banana", "ana", "111") returns the string "b111na".
This example captures everything after the @
symbol in an email, replaces com
with org
, and then returns the result. Notice the use of nested functions.
"email@google.org" = re.replace($e.network.email.from, "com", "org")
This example uses backslash-escaped digits in the replacementText argument to reference matches to the replaceRegex pattern.
"test1.com.google" = re.replace(
$e.principal.hostname, // holds "test1.test2.google.com"
"test2\.([a-z]*)\.([a-z]*)",
"\\2.\\1" // \\1 holds "google", \\2 holds "com"
)
Note the following cases when dealing with empty strings and re.replace()
:
Using empty string as replaceRegex
:
// In the function call below, if $e.principal.hostname contains "name",
// the result is: 1n1a1m1e1, because an empty string is found next to
// every character in `stringText`.
re.replace($e.principal.hostname, "", "1")
To replace an empty string, you can use "^$"
as replaceRegex
:
// In the function call below, if $e.principal.hostname contains the empty
// string, "", the result is: "none".
re.replace($e.principal.hostname, "^$", "none")
Date functions
Chronicle supports the following date-related functions:
timestamp.get_minute(unix_seconds [, time_zone])
timestamp.get_hour(unix_seconds [, time_zone])
timestamp.get_day_of_week(unix_seconds [, time_zone])
timestamp.get_week(unix_seconds [, time_zone])
timestamp.current_seconds()
Chronicle supports negative integers as the unix_seconds argument. Negative integers represent times before the Unix epoch. If you provide an invalid integer, for example a value that results in an overflow, the function will return -1. This is an uncommon scenario.
Because YARA-L 2 doesn't support negative integer literals, make sure to check for this condition using the less than or greater than operator. For example:
0 > timestamp.get_hour(123)
Time extraction
Returns an integer in the range [0, 59].
timestamp.get_minute(unix_seconds [, time_zone])
The following function returns an integer in the range [0, 23], representing the hour of day.
timestamp.get_hour(unix_seconds [, time_zone])
The following function returns an integer in the range [1, 7] representing the day of week starting with Sunday. For example, 1 = Sunday; 2 = Monday, etc.
timestamp.get_day_of_week(unix_seconds [, time_zone])
The following function returns an integer in the range [0, 53] representing the week of the year. Weeks begin with Sunday. Dates before the first Sunday of the year are in week 0.
timestamp.get_week(unix_seconds [, time_zone])
These time extraction functions have the same arguments.
- unix_seconds is an integer representing the number of seconds past Unix
epoch, such as
$e.metadata.event_timestamp.seconds
, or a placeholder containing that value. - time_zone is optional and is a string representing a time_zone. If
omitted, the default is "GMT". You can specify time zones using string
literals. The options are:
- The TZ database name, for example "America/Los_Angeles". For more information, see the "TZ Database Name" column from this page
- The time zone offset from UTC, in the format
(+|-)H[H][:M[M]]
, for example: "-08:00".
In this example, the time_zone argument is omitted, so it defaults to "GMT".
$ts = $e.metadata.collected_timestamp.seconds
timestamp.get_hour($ts) = 15
This example uses a string literal to define the time_zone.
$ts = $e.metadata.collected_timestamp.seconds
2 = timestamp.get_day_of_week($ts, "America/Los_Angeles")
Here are examples of other valid time_zone specifiers, which you can pass as the second argument to time extraction functions:
"America/Los_Angeles"
, or"-08:00"
. ("PST"
is not supported)"America/New_York"
, or"-05:00"
. ("EST"
is not supported)"Europe/London"
"UTC"
"GMT"
Current timestamp
Returns an integer representing the current time in Unix seconds. This is approximately equal to the detection timestamp and is based on when the rule is run.
timestamp.current_seconds()
The following example returns True if the certificate has been expired for more than 24h. It calculates the time difference by subtracting the current Unix seconds, and then comparing using a greater than operator.
86400 < timestamp.current_seconds() - $e.network.tls.certificate.not_after
Math functions
Absolute value
Returns the absolute value of an integer expression.
math.abs(intExpression)
This example returns True if the event was more than 5 minutes from the time specified (in seconds from the Unix epoch), regardless
of whether the event came before or after the time specified. A call to math.abs
cannot depend on multiple variables or placeholders. For example, you cannot replace the hardcoded time value of 1643687343
in the example below with $e2.metadata.event_timestamp.seconds
.
300 < math.abs($e1.metadata.event_timestamp.seconds - 1643687343)
Net functions
IP subnetwork search
Returns true when the given IP address is within the specified subnetwork.
net.ip_in_range_cidr(ipAddress, subnetworkRange)
You can use YARA-L to search for UDM events across all of the IP addresses within a subnetwork using the net.ip_in_range_cidr()
statement. Both IPv4 and IPv6 are supported.
To search across a range of IP addresses, specify an IP UDM field and a Classless Inter-Domain Routing (CIDR) range. YARA-L can handle both singular and repeating IP address fields.
IPv4 example:
net.ip_in_range_cidr($e.principal.ip, "192.0.2.0/24")
IPv6 example:
net.ip_in_range_cidr($e.network.dhcp.yiaddr, "2001:db8::/32")
For an example rule using the net.ip_in_range_cidr()
statement, see the example rule. Single Event within Range of IP Addresses
Array functions
Array Length
Returns the number of repeated field elements.
arrays.length($e.principal.ip) = 2
If multiple repeated fields are along the path, returns the total number of repeated field elements.
arrays.length($e.intermediary.ip) = 3
Function to placeholder assignment
You can assign the result of a function call to a placeholder in the events
section. For example:
$placeholder = strings.concat($e.principal.hostname, "my-string").
You can then use the placeholder variables in the match
, condition
, and outcome
sections.
However, there are two limitations with function to placeholder assignment:
Every placeholder in function to placeholder assignment must be assigned to an expression containing an event field. For example, the following examples are valid:
$ph1 = $e.principal.hostname $ph2 = $e.src.hostname // Both $ph1 and $ph2 have been assigned to an expression containing an event field. $ph1 = strings.concat($ph2, ".com")
$ph1 = $e.network.email.from $ph2 = strings.concat($e.principal.hostname, "@gmail.com") // Both $ph1 and $ph2 have been assigned to an expression containing an event field. $ph1 = strings.to_lower($ph2)
However, the example below is invalid:
$ph1 = strings.concat($e.principal.hostname, "foo") $ph2 = strings.concat($ph1, "bar") // $ph2 has NOT been assigned to an expression containing an event field.
Function call should depend on one and exactly one event. However, more than one field from the same event can be used in function call arguments. For example, the following is valid:
$ph = strings.concat($event.principal.hostname, "string2")
$ph = strings.concat($event.principal.hostname, $event.src.hostname)
However, the following is invalid:
$ph = strings.concat("string1", "string2")
$ph = strings.concat($event.principal.hostname, $anotherEvent.src.hostname)
Reference Lists syntax
See our page on Reference Lists for more information on reference list behavior and reference list syntax.
You can use reference lists in the events
or outcome
sections. Here is the
syntax for using various types of reference lists in a rule:
// STRING reference list
$e.principal.hostname in %string_reference_list
// REGEX reference list
$e.principal.hostname in regex %regex_reference_list
// CIDR reference list
$e.principal.ip in cidr %cidr_reference_list
You can also use the not
operator and the nocase
operator with reference lists as shown below.
The nocase
operator is compatible with STRING lists and REGEX lists.
// Exclude events whose hostnames match substrings in my_regex_list.
not $e.principal.hostname in regex %my_regex_list
// Event hostnames must match at least 1 string in my_string_list (case insensitive).
$e.principal.hostname in %my_string_list nocase
For performance reasons, the Detection Engine restricts reference list usage.
- Maximum
in
statements in a rule, with or without special operators: 7 - Maximum
in
statements with theregex
operator: 4 - Maximum
in
statements with thecidr
operator: 2
Meta section syntax
Meta section is composed of multiple lines, where each line defines a key-value pair. A key part must be an unquoted string, and a value part must be a quoted string:
<key> = "<value>"
The following is an example of a valid meta
section line:
meta:
author = "Chronicle"
severity = "HIGH"
Events section syntax
In the events
section, list the predicates to specify the following:
- What each match or placeholder variable represents
- Simple binary expressions as conditions
- Function expressions as conditions
- Reference list expressions as conditions
- Logical operators
Variable declarations
For variable declarations, use the following syntax:
<EVENT_FIELD> = <VAR>
<VAR> = <EVENT_FIELD>
Both are equivalent, as shown in the following examples:
$e.source.hostname = $hostname
$userid = $e.principal.user.userid
This declaration indicates that this variable represents the specified field for the event variable. When the event field is a repeated field, the match variable can represent any value in the array. It is also possible to assign multiple event fields to a single match or placeholder variable. This is a transitive join condition.
For example, the following:
$e1.source.ip = $ip
$e2.target.ip = $ip
Are equivalent to:
$e1.source.ip = $ip
$e1.source.ip = $e2.target.ip
When a variable is used, the variable must be declared through variable declaration. If a variable is used without any declaration, it is regarded as a compilation error.
Simple binary expressions as conditions
For a simple binary expression to use as condition, use the following syntax:
<EXPR> <OP> <EXPR>
Expression can be either event field, variable, literal, or function expression.
For example:
$e.source.hostname = "host1234"
$e.source.port < 1024
1024 < $e.source.port
$e1.source.hostname != $e2.target.hostname
$e1.metadata.collected_timestamp.seconds > $e2.metadata.collected_timestamp.seconds
$port >= 25
$host = $e2.target.hostname
"google-test" = strings.concat($e.principal.hostname, "-test")
"email@google.org" = re.replace($e.network.email.from, "com", "org")
If both sides are literals, it is regarded as a compilation error.
Function expressions as conditions
Some function expressions return boolean value, which can be used as an individual predicate in the events
section. Such functions are:
re.regex()
net.ip_in_range_cidr()
For example:
re.regex($e.principal.hostname, `.*\.google\.com`)
net.ip_in_range_cidr($e.principal.ip, "192.0.2.0/24")
Reference list expressions as conditions
You can use reference lists in the events section. See the section on Reference Lists for more details.
Logical operators
You can use the logical and
and logical or
operators in the events
section as shown in the following examples:
$e.metadata.event_type = "NETWORK_DNS" or $e.metadata.event_type = "NETWORK_DHCP"
($e.metadata.event_type = "NETWORK_DNS" and $e.principal.ip = "192.0.2.12") or ($e.metadata.event_type = "NETWORK_DHCP" and $e.principal.mac = "AB:CD:01:10:EF:22")
not $e.metadata.event_type = "NETWORK_DNS"
By default, the precedence order from highest to lowest is not
, and
, or
.
For example, "a or b and c"
is evaluated as "a or (b and c)"
. You can use parentheses to alter the precedence if needed.
In the events
section, all predicates are regarded as and
ed together by default.
Operators in events
You can use the operators with enumerated types. It can be applied to rules to simplify and optimize (use operator instead of reference lists) the performance.
In the following example, 'USER_UNCATEGORIZED' and 'USER_RESOURCE_DELETION' correspond to 15000 and 15014, so the rule will look for all the listed events:
$e.metadata.event_type >= "USER_CATEGORIZED" and $e.metadata.event_type <= "USER_RESOURCE_DELETION"
List of events:
- USER_RESOURCE_DELETION
- USER_RESOURCE_UPDATE_CONTENT
- USER_RESOURCE_UPDATE_PERMISSIONS
- USER_STATS
- USER_UNCATEGORIZED
Modifiers
nocase
When you have a comparison expression between string values or a regex expression, you can append nocase at the end of the expression to ignore capitalization.
$e.principal.hostname != "http-server" nocase
$e1.principal.hostname = $e2.target.hostname nocase
$e.principal.hostname = /dns-server-[0-9]+/ nocase
re.regex($e.target.hostname, `client-[0-9]+`) nocase
This cannot be used when a type of field is an enumerated value. Below examples are invalid and will generate compilation errors:
$e.metadata.event_type = "NETWORK_DNS" nocase
$e.network.ip_protocol = "TCP" nocase
Repeated fields
any, all
In UDM and Entity, some fields are labeled as repeated, which indicates they are lists of values or other types of messages.
In YARA-L, each element in the repeated field is treated individually. That means, if the repeated field is used in the rule, we evaluate the rule for each element in the field.
This can lead to an unexpected behavior. For example, if a rule has both $e.principal.ip = "1.2.3.4"
and $e.principal.ip = "5.6.7.8"
in the events
section, the rule never generates any matches, even if both "1.2.3.4"
and "5.6.7.8"
are in principal.ip
.
To evaluate the repeated field as a whole, you can use any
and all
operators. When any
is used, the predicate is evaluated as true if any value in the repeated field satisfies the condition.
When all
is used, the predicate is evaluated as true if all values in the repeated field satisfy the condition.
any $e.target.ip = "127.0.0.1"
all $e.target.ip != "127.0.0.1"
re.regex(any $e.about.hostname, `server-[0-9]+`)
net.ip_in_range_cidr(all $e.principal.ip, "10.0.0.0/8")
The any
and all
operators can only be used with repeated fields. In addition, they cannot be used when assigning a repeated field to a placeholder variable or joining with a field of another event.
For example, any $e.principal.ip = $ip
and any $e1.principal.ip = $e2.principal.ip
are not valid syntax. To match or join a repeated field, use $e.principal.ip = $ip
. There will be one match variable value or join for each element of the repeated field.
When writing a condition with any
or all
, be aware that negating the condition with not
might not have the same meaning as using the negated operator.
For example:
not all $e.principal.ip = "192.168.12.16"
checks if not all IP addresses match"192.168.12.16"
, meaning the rule is checking whether any IP address does not match"192.168.12.16"
.all $e.principal.ip != "192.168.12.16"
checks if all IP addresses do not match"192.168.12.16"
, meaning the rule is checking that no IP addresses match to"192.168.12.16"
.
Array indexing
You can perform array indexing on repeated fields. To access the n-th repeated field element, use the standard list syntax (elements are 0-indexed). An out-of-bounds element returns the default value.
$e.principal.ip[0] = "192.168.12.16"
$e.principal.ip[999] = ""
If there are fewer than 1000 elements, this returnstrue
.
An index must be a non-negative integer literal. Values that have an int type (e.g. placeholder set to an int) don't count. Array indexing cannot be combined with any/all. Array indexing cannot be combined with map syntax. If the field path contains multiple repeated fields, all repeated fields must use array indexing.
The following are all examples of invalid syntax:
$e.intermediary.ip[0]
is not valid because intermediary is a repeated field and we are attempting to use array indexing.$e.principal.ip[-1]
is not valid because-1
is not a positive integer.any $e.intermediary.ip[0]
is not valid because any/all cannot be combined with array indexing.$e.additional.fields[0]["key"]
is not valid because array indexing cannot be combined with map syntax.
Event variable join requirements
All event variables used in the rule must be joined with every other event variable in either of the following ways:
directly through an equality comparison between event fields of the two joined event variables, for example:
$e1.field = $e2.field
. The expression must not include arithmetic or function calls.indirectly through a transitive join involving only an event field (see variable declaration for a definition of "transitive join"). The expression must not include arithmetic or function calls.
For example, assuming $e1, $e2, and $e3 are used in the rule, the following events
sections are valid.
events:
$e1.principal.hostname = $e2.src.hostname // $e1 joins with $e2
$e2.principal.ip = $e3.src.ip // $e2 joins with $e3
events:
// all of $e1, $e2 and $e3 are transitively joined via the placeholder variable $ip
$e1.src.ip = $ip
$e2.target.ip = $ip
$e3.about.ip = $ip
events:
$e1.principal.hostname = $e2.src.hostname // $e1 joins with $e2
// Function to event comparison is not a valid join condition for $e1 and $e2,
// but the whole events section is valid because we have a valid join condition in the first line.
re.capture($e1.src.hostname, ".*") = $e2.target.hostname
However, here are examples of invalid events
sections.
events:
// Event to function comparison is an invalid join condition for $e1 and $e2.
$e1.principal.hostname = re.capture($e2.principal.application, ".*")
events:
// Event to arithmetic comparison is an invalid join condition for $e1 and $e2.
$e1.principal.port = $e2.src.port + 1
events:
$e1.src.ip = $ip
$e2.target.ip = $ip
$e3.about.ip = "192.1.2.0" //$e3 is not joined with $e1 or $e2.
events:
$e1.src.ip = $ip
// Function to placeholder comparison is an invalid transitive join condition.
re.capture($e2.target.ip, ".*") = $ip
events:
$e1.src.port = $port
// Arithmetic to placeholder comparison is an invalid transitive join condition.
$e2.principal.port + 800 = $port
Match section syntax
In the match
section, list the match variables for group events before checking for match conditions. Those fields are returned with each match.
- Specify what each match variable represents in the
events
section. - Specify the time range to use to correlate events after the
over
keyword. Events outside the time range are ignored. Use the following syntax to specify the time range:
<number><m/h/d>
Where
m/h/d
means minutes, hours, and days respectively.Minimum time you can specify is 1 minute.
Maximum time you can specify is 48 hours.
The following is an example of a valid match
:
$var1, $var2 over 5m
This statement returns $var1
and $var2
(defined in the events
section) when the rule finds a match. The time specified is 5 minutes. Events that are more than 5 minute apart are not correlated and therefore ignored by the rule.
Here is another example of a valid match
:
$user over 1h
This statement returns $user
when the rule finds a match. The time window specified is 1 hour. Events that are more than an hour apart are not correlated. The rule does not consider them to be a detection.
Here is another example of a valid match
:
$source_ip, $target_ip, $hostname over 2m
This statement returns $source_ip
, $target_ip
, and $hostname
when the rule finds a match. The time window specified is 2 minutes. Events that are more than 2 minutes apart are not correlated. The rule does not consider them to be a detection.
The following examples illustrate invalid match
sections:
var1, var2 over 5m // invalid variable name
$user 1h // missing keyword
Sliding window
By default, YARA-L 2.0 rules are evaluated using hop windows. A time range of
enterprise event data is divided into a set of overlapping hop windows, each
with the duration specified in the match
section. Events are then correlated
within each hop window. With hop windows, it is impossible to search for
events that happen in a specific order (for example, e1
happens up to 2
minutes after e2
). An occurrence of event e1
and an occurrence of event e2
are correlated as long as they are within the hop window duration of each other.
Rules can also be evaluated using sliding windows. With sliding windows, sliding
windows with the duration specified in the match
section are generated when
beginning or ending with a specified pivot event variable. Events are then
correlated within each sliding window. This makes it possible to search for
events that happen in a specific order (for example, e1
happens within 2
minutes of e2
). An occurrence of event e1
and an occurrence of event e2
are correlated if event e1
occurs within the sliding window duration after
event e2
.
Specify sliding windows in the match
section of a rule as follows:
<match-var-1>, <match-var-2>, ... over <duration> before|after <pivot-event-var>
The pivot event variable is the event variable that sliding windows are based
on. If you use the before
keyword, sliding windows are generated, ending with
each occurrence of the pivot event. If the after
keyword is used, sliding
windows are generated beginning with each occurrence of the pivot event.
The following are examples of valid sliding window usages:
$var1, $var2 over 5m after $e1
$user over 1h before $e2
Outcome section syntax
In the outcome
section, you can define up to 20 outcome variables, with
arbitrary names. These outcomes will be stored in the detections generated by
the rule. Each detection may have different values for the outcomes.
The outcome name, $risk_score
, is special. You can optionally define an
outcome with this name, and if you do, it must be an integer type. If populated,
the risk_score
will be shown in the
Enterprise Insights view for
alerts that come from rule detections.
If you do not include a $risk_score
variable in the outcome section of a rule,
one of the following default values is set:
- If the rule is configured to generate an alert, then
$risk_score
is set to 40. - If the rule is not configured to generate an alert, then
$risk_score
is set to 15.
The value of $risk_score
is stored in the security_result.risk_score
UDM field.
Outcome variable data types
Each outcome variable can have a different data type, which is determined by the expression used to compute it. We support the following outcome data types:
- integer
- string
- lists of integers
- lists of strings
If a match variable is on a repeated field and that repeated field contains
duplicate elements, then the duplicate element is considered multiple times
when computing outcomes. For example, this could result in unexpected values
for outcomes that use sum()
, but would not affect outcomes that use max()
.
Conditional logic
You can use conditional logic to compute the value of an outcome. Conditionals are specified using the following syntax pattern:
if(BOOL_CLAUSE, THEN_CLAUSE)
if(BOOL_CLAUSE, THEN_CLAUSE, ELSE_CLAUSE)
You can read a conditional expression as "if BOOL_CLAUSE is true, then return THEN_CLAUSE, else return ELSE_CLAUSE".
BOOL_CLAUSE must evaluate to a boolean value. A BOOL_CLAUSE expression takes a
similar form as expressions in the events
section. For example, it can
contain:
UDM field names with comparison operator, for example:
if($context.graph.entity.user.title = "Vendor", 100, 0)
placeholder variable that was defined in the
events
section, for example:if($severity = "HIGH", 100, 0)
functions that return a boolean, for example:
if(re.regex($e.network.email.from, .*altostrat\.com), 100, 0)
look up in a reference list, for example:
if($u.principal.hostname in %my_reference_list_name, 100, 0)
The THEN_CLAUSE and ELSE_CLAUSE must be the same data type. We support integers and strings.
You can omit the ELSE_CLAUSE if the data type is integer. If omitted, the ELSE_CLAUSE evaluates to 0. For example:
`if($e.field = "a", 5)` is equivalent to `if($e.field = "a", 5, 0)`
You must provide the ELSE_CLAUSE if the data type is string.
Mathematical operations
You can use mathematical operations to compute integer data type in the outcome
and events
sections of a rule. Chronicle supports addition,
subtraction, multiplication, and division as top level operators in a computation.
The following snippet is an example computation in the outcome
section:
outcome:
$risk_score = max(100 + if($severity = "HIGH", 10, 5) - if($severity = "LOW", 20, 0))
Placeholder variables in outcomes
When computing outcome variables, you can use placeholder variables which were
defined in the events section of your rule. In this example, assume that
$email_sent_bytes
was defined in the events section of the rule:
Single-event example:
// No match section, so this is a single-event rule.
outcome:
// Use placeholder directly as an outcome value.
$my_outcome = $email_sent_bytes
// Use placeholder in a conditional.
$other_outcome = if($file_size > 1024, "SEVERE", "MODERATE")
condition:
$e
Multi-event example:
match:
// This is a multi event rule with a match section.
$hostname over 5m
outcome:
// Use placeholder directly in an aggregation function.
$max_email_size = max($email_sent_bytes)
// Use placeholder in a mathematical computation.
$total_bytes_exfiltrated = sum(
1024
+ $email_sent_bytes
+ $file_event.principal.file.size
)
condition:
$email_event and $file_event
Aggregations
The outcome section can be used in multi-event rules (rules that contain a match section), and in single-event rules (rules that do not contain a match section). Requirements for aggregations are as follows:
Multi-event rules (with match section)
- Expression to compute outcomes is evaluated over all events that generated a particular detection.
- Expression must be wrapped in an aggregate function
- Example:
$max_email_size = max($e.network.sent_bytes)
- If the expression contains a repeated field, the aggregate operates over all elements in the repeated field, over all events that generated the detection
- Example:
Single-event rules (without match section)
- Expression to compute outcomes is evaluated over the single event that generated a particular detection.
- Must use aggregate function for expressions that involve at least one
repeated field
- Example:
$suspicious_ips = array($e.principal.ip)
- The aggregate operates over all elements in the repeated field
- Example:
- Can not use aggregate function for expressions that do not involve a
repeated field
- Example:
$threat_status = if($e.principal.file.size > 1024, "SEVERE", "MODERATE")
- Example:
You can use the following aggregation functions:
max()
: outputs the maximum over all possible values. Only works with integer.min()
: outputs the minimum over all possible values. Only works with integer.sum()
: outputs the sum over all possible values. Only works with integer.count_distinct()
: collects all possible values, then outputs the distinct count of possible values.count()
: behaves likecount_distinct()
, but returns a non-distinct count of possible values.array_distinct()
: collects all possible values, then outputs a list of these values. It will truncate the list of values to 25 random elements.array()
: behaves likearray_distinct()
, but returns a non-distinct list of values. It also truncates the list of values to 25 random elements.
The aggregate function is important when a rule includes a condition
section
that specifies multiple events must exist, because the aggregate function will
operate on all the events that generated the detection.
For example, if your outcome
and condition
sections contain:
outcome:
$asset_id_count = count($event.principal.asset_id)
$asset_id_distinct_count = count_distinct($event.principal.asset_id)
$asset_id_list = array($event.principal.asset_id)
$asset_id_distinct_list = array_distinct($event.principal.asset_id)
condition:
#event > 1
Since the condition section requires there to be more than one event
for each
detection, the aggregate functions will operate on multiple events. Suppose the
following events generated one detection:
event:
// UDM event 1
asset_id="asset-a"
event:
// UDM event 2
asset_id="asset-b"
event:
// UDM event 3
asset_id="asset-b"
Then the values of your outcomes will be:
- $asset_id_count =
3
- $asset_id_distinct_count =
2
- $asset_id_list =
["asset-a", "asset-b", "asset-b"]
` - $asset_id_distinct_list =
["asset-a", "asset-b"]
Things to know when using the outcome section:
Other notes and restrictions:
- The
outcome
section cannot reference a new placeholder variable which wasn't already defined in theevents
section. - The
outcome
section cannot use event variables that have not been defined in theevents
section. - The
outcome
section can use an event field that was not used in theevents
section, given that the event variable that the event field belongs to was already defined in theevents
section. - The
outcome
section can only correlate event variables that have already been correlated in theevents
section. Correlations happen when two event fields from different event variables are equated.
You can find an example using the outcome section in Overview of the YARA-L 2.0. See Create context-aware analytics for details on detection deduping with the outcome section.
Condition section syntax
In the condition
section, you can:
- specify a match condition over events and placeholders defined in the
events
section. See the following section, Event and placeholder conditionals, for more details. - (optional) use the
and
keyword to specify a match condition using outcome variables defined in theoutcome
section. See the following section, Outcome conditionals, for more details.
The following condition patterns are valid:
condition:
<event/placeholder conditionals>
condition:
<event/placeholder conditionals> and <outcome conditionals>
Event and placeholder conditionals
List condition predicates for events and placeholder variables here, joined
with the keyword and
or or
.
The following conditions are bounding conditions. They force the associated event variable to exist, meaning that at least one occurrence of the event must appear in any detection.
$var // equivalent to #var > 0
#var > n // where n >= 0
#var >= m // where m > 0
The following conditions are non-bounding conditions. They allow the associated event variable to not exist, meaning that it is possible that no occurrence of the event appears in a detection. This enables the making of non-existence rules, which search for the absence of a variable instead of the presence of a variable.
!$var // equivalent to #var = 0
#var >= 0
#var < n // where n > 0
#var <= m // where m >= 0
You can join an event with an entity, and then check for an absence of the event. The following pseudo-code example joins an event field, $u.field
and an entity field, $e.graph.field
in the events
section, and also checks for an absence of the event in the condition
section, !$u and $e
.
events:
$u.field = "value" // $u is an event
$e.graph.field = "value" // $e is an entity
// ...other sections of the rule...
condition:
!$u and $e
You cannot check for the absence of the entity. Considering the example above, the following statement to check for an absence of the entity is not valid: $u and !$e
.
In the following example, the special character #
on a variable (either the
event variable or the placeholder variable) represents the count of distinct
events or values of that variable.
$e and #port > 50 or #event1 > 2 or #event2 > 1 or #event3 > 0
The following non-existence example is also valid and evaluates to true if there
are more than two distinct events from $event1
, and zero distinct events from
$event2
.
#event1 > 2 and !$event2
There are restrictions around what type of event or placeholder variables can have non-bounding conditions. The variable must be one of the following:
- A UDM event variable in a rule with 2 or more UDM event variables.
- A placeholder variable associated with at least 1 UDM event variable without non-bounding conditions.
The following are examples of invalid predicates:
$e, #port > 50 // incorrect keyword usage
$e or #port < 50 // or keyword not supported with non-bounding conditions
not $e // not keyword is not allowed for event and placeholder conditions
Outcome conditionals
List condition predicates for outcome variables here, joined with the keyword and
or or
, or preceded by the keyword not
.
Specify outcome conditionals differently depending on the type of the outcome variable:
integer: compare against an integer literal with operators
=, >, >=, <, <=, !=
, for example:$risk_score > 10
string: compare against a string literal with either
=
or!=
, for example:$severity = "HIGH"
list of integers or arrays: specify condition using the
arrays.contains
function, for example:arrays.contains($event_ids, "id_1234")
Rule classification
Specifying an outcome conditional in a rule that has a match section means that the rule will be classified as a multi-event rule for rule quota. Please see single event rule and multiple event rule for more information about single and multiple event classifications.
Count (#) character
The #
character is a special character in the condition
section. If it is used before any event or placeholder variable name, it represents the number of distinct events or values that satisfy all the events
section conditions.
Value ($) character
The $
character is another special character in the condition
section. If it is used before any outcome variable name, it represents the value of that outcome.
If it is used before any event or placeholder variable name (for example, $event
), it is a shorthand for #event > 0
.
Options section syntax
In the options
section, you can specify the options for the rule. Syntax for the options
section is similar to that of the meta
section. But a key must be one of predefined option names, and the value is not restricted to string type.
Currently, the only available option is allow_zero_values
.
allow_zero_value
— If set to true, matches generated by the rule can have zero values as match variable values. Zero values are given to event fields when they are left unpopulated. This option is set to false by default.
Following is the valid options
section line:
allow_zero_values = true
Type checking
Chronicle performs type checking against your YARA-L syntax as you create rules within the interface. The type checking errors displayed help you to revise the rule in such a way as to ensure that it will work as expected.
The following are examples of invalid predicates:
// $e.target.port is of type integer which cannot be compared to a string.
$e.target.port = "80"
// "LOGIN" is not a valid event_type enum value.
$e.metadata.event_type = "LOGIN"