New to Chronicle: Regular expressions and reference lists

John Stoner

Principal Security Strategist, Google Cloud

December 1, 2022

Google Cloud Security Talks: Q4 Edition

Learn to trust nothing & detect everything

This is the ninth post from Google Cloud Principal Security Strategist John Stoner as part of his deep-dive "New to Chronicle" series, which helps propel forward security teams either new to SIEM or replacing their SIEM with Chronicle. You can view the entire series here.

The past few blog posts have taken us deeper into functions that provide us methods to work with portions of string data as we build rules. These include regular expression functions that will match, capture and replace. Along the way we have also discussed taking a string and decoding it if it was base64. In fact, at the end of our last blog we had used all four of these functions in concert to detect suspicious encoded PowerShell commands.

Today, we are going to continue to expand our repertoire by introducing reference lists as well as a pair of functions for case conversion. Let’s start by looking at our example rule so we have a starting point. Our rule looks for PowerShell processes in particular locations, using the re.regex function to isolate on specific events and then using the re.capture function to grab the base64 commands. From there, we decode it using the strings.base64_decode function and finally we use the re.replace function to remove the null bytes from the result set. We then used the re.regex function to find the suspicious strings which is where we will start today. This specific criteria is in bold below.

rule suspicious_encoded_powershell_command {
meta:
  author = "John Stoner"
  description = "Detects the string downloadstring in encoded powershell commands."  
  severity = "Low"
events:
  $event.metadata.event_type = "PROCESS_LAUNCH"
  $event.metadata.event_type = $event_type
  re.regex($event.target.process.file.full_path,
 `(system32|syswow64)\\WindowsPowerShell\\v1\.0\\powershell(|\_ise)\.exe`) nocase
  re.regex($event.target.process.command_line,
 `(?i)(?:-enc|-ec|-en)\s*\S*`)
  $encoded_value = re.capture($event.target.process.command_line,
 `(?i)(?:-enc|-ec|-en)\s*(\S*)`)
   $decoded_value =
  strings.base64_decode(re.capture($event.target.process.command_line,
 `(?i)(?:-enc|-ec|-en)\s*(\S*)`))
   $no_null_string
  = re.replace(strings.base64_decode(re.capture($event.target.process.command_line,
 `(?i)(?:-enc|-ec|-en)\s*(\S*)`)),`\0`, "")
  re.regex($no_null_string, `sharpnopsexec`) nocase or
  re.regex($no_null_string, `rubeus`) nocase
match:
  $event_type over 1m
outcome:
  $encoded_powershell = array_distinct($encoded_value)
  $decoded_powershell = array_distinct($decoded_value) 
  $null_decoded_powershell = array_distinct($no_null_string)
condition:
  $event
}

As noted at the end of the last blog, we could expand this bolded line of criteria by adding or statements for each item we wanted to alert on, but this can be tedious, particularly if there are other rules that also use similar criteria.

An alternative approach would be to use a reference list. Now, reference lists aren’t new to Chronicle, in fact, many of you may have used them already in rules. The syntax in YARA-L for lists is at its most basic in %. A very simple example of a reference list would be to constrain a rule to only run when the principal.hostname is on a list of named servers. If those hostnames were to be included in the list named key_servers, the criteria in the event section to focus a rule only on specific hosts would look like this:

$event.principal.hostname in %key_servers

If our principal.hostname field contained fully qualified domain names, but we didn’t want to replicate those names in the key_servers list and just wanted to place the hostnames in the list, we could use the re.replace function to strip out the domain names to leave just the hostname that could be compared against.

re.replace($event.principal.hostname, ".stackedpads.local", "") in %key_servers")

These two examples are referred to in the list manager as a string match. YARA-L provides the ability to use functions to finesse the values in the events to be compared to a list. However, there are times when we need greater flexibility which brings us back to our PowerShell example that we continue to evolve. Below are the detections that matched our PowerShell rule as written above.

Our criteria in the rule that is bolded looks for the strings of rubeus or sharpnopsexec with no case sensitivity and these detections all contain one of those strings. However, unlike the replacement function with the hostname example, we don’t really have a very predictable pattern to carve out specific strings we are looking for to use the string matching list.

Cues the sad panda emoji…

Wait wait! The good news is we have an answer for this! Reference lists now have the ability to match on regular expressions! With this recently added capability, we can build our list and then look for that list of values within a specific field for matches. Here we can see our list called suspicious_powershell_regex with the radio button under Syntax Type set to RegEx.

Much like the string matching list, the in operator is used, but now the word regex is added so the syntax is in regex %.

With the addition of our regular expression list, our line of event criteria becomes this:

strings.to_lower(re.replace(strings.base64_decode(re.capture
($event.target.process.command_line, `(?i)(?:-enc|-ec|-en)\s*(\S*)`)),`\0`, ""))
in regex %suspicious_powershell_regex

What is this strings.to_lower doing here?

Ah yes, you noticed that. strings.to_lower and its sibling, strings.to_upper, are used to convert, well, strings, to all upper or lower case. This is very handy because the decoded PowerShell contains both upper and lowercase characters, so to ensure we find our match, we can use this function to convert everything nested within our function before matching it against our list.

But what if I kind of care about case sensitivity?

Kind of care? I’m not entirely sure what that means but fine. If you look at line 4 in the regex list below, we can apply case insensitivity to a specific line using (?i) like we did with the re.regex function. By default, each line is evaluated as it appears in the list, but by adding (?i) to a specific line, it can be matched no matter if it is all upper, all lower or some sort of mixed case.

In this instance (I really wanted to say case, but I refrained), we would not add our strings.to_lower function to our event criteria.

However, our in regex % syntax would still be our comparison. Our results can be seen below. Instead of the seven detections we had previously, we get two, that is, the case insensitive versions of SharpNoPSExec, because the decoded string for rubeus was Rubeus-Rundll32 and without marking line 3 above with case insensitivity or using the string case functions, we won’t match on this value.

re.replace(strings.base64_decode(re.capture
($event.target.process.command_line, `(?i)(?:-enc|-ec|-en)\s*(\S*)`)),`\0`, "") in regex
%suspicious_powershell_regex

I feel that I would be remiss if I didn’t provide the full rule that we have built over the past few blogs as it has evolved, so here it is:

As previously mentioned, this rule will require some additional tuning to cover all the encoded command permutations but hopefully it serves as a nice starting point as well as some inspiration for other rules you might have in mind. Notice that in the bolded line below, we took our placeholder variable of $no_null_string and used that with our strings.to_lower function and list whereas above we used the full nesting of functions. Either way, it will work correctly.

rule suspicious_encoded_powershell_command {
meta:
        author = "John Stoner"
        description = "Detects strings in encoded powershell commands based
 on a watchlist of suspicious binaries."
      severity = "Low"
events:
     $event.metadata.event_type = "PROCESS_LAUNCH"
     $event.metadata.event_type = $event_type
     re.regex($event.target.process.file.full_path,
   `(system32|syswow64)\\WindowsPowerShell\\v1\.0\\powershell(|\_ise)\.exe`) nocase
     re.regex($event.target.process.command_line, `(?i)(?:-enc|-ec|-en)\s*\S*`)
     $no_null_string = 
re.replace(strings.base64_decode(re.capture($event.target.process.command_line,
 `(?i)(?:-enc|-ec|-en)\s*(\S*)`)),`\0`, "")
     strings.to_lower($no_null_string) in regex 
%suspicious_powershell_regex
match:
$event_type over 1m
outcome:
     $encoded_powershell = array_distinct($encoded_value)
     $null_decoded_powershell = array_distinct($no_null_string)
condition:
     $event
}

I hope this has provided you with a better understanding of how you can use regular expressions and their associated string functions and lists to craft YARA-L rules. Stay tuned for more cool capabilities with lists, functions and much more in Chronicle!

New to Chronicle Series

Let’s work together

Ready for Google-speed threat detection and response?