New to Chronicle: Capturing strings for additional analysis

John Stoner

Principal Security Strategist, Google Cloud

October 20, 2022

Ready for Google-speed threat detection and response?

Let's work together

This is the sixth post from Google Cloud Principal Security Strategist John Stoner as part of his deep-dive "New to Chronicle" series, which helps propel forward security teams either new to SIEM or replacing their SIEM with Chronicle. You can view the entire series here.

Previously, we introduced regular expressions to identify matches in events through the use of the re.regex function as well as using the forward slash notation around a string to compare it to a field. If you need a quick refresher, you can find that post here. Today, we are going to build on top of those concepts and introduce additional capabilities that can be used as we build our rules.

If you recall, the re.regex function can be used to find string patterns to identify values that can be used to trigger a rule. This is a great start. However, just matching on strings may not be enough. Perhaps we want to take a portion of a string and compare it to another event or a watchlist or we want to take that extracted portion and output it to an analyst via the outcome section? These are capabilities we will cover today!

The first function we will cover is re.capture. This function allows us to take a UDM field and capture a substring of the field, based on a regular expression pattern. Once we have captured that string, we can then use the captured value for additional evaluation or output. We ended our last blog with a condition to identify PowerShell processes. Let’s start with that to build an encoded PowerShell rule.

We are going to focus our rule by looking for PROCESS_LAUNCH events in the $event.metadata.event_type field. From there, we are going to use the re.regex function to match the paths that PowerShell is executed from.

events:
$event.metadata.event_type = "PROCESS_LAUNCH"
re.regex($event.principal.process.file.full_path,
`(system32|syswow64)\\WindowsPowerShell\\v1\.0\\powershell(|\_ise)\.exe`) nocase

Now that we have isolated our criteria on these PowerShell processes, we are going to look at the field target.process.command_line to determine if there are any strings that contain the PowerShell switches that would denote encoded commands. There are a number of permutations for this, so for our example, we have provided a subset of those permutations. This should be expanded before deploying in production, but it provides a good start.

In case you aren’t familiar with regular expressions, let’s briefly go through what is between the back quotes below. Essentially, we are turning on case insensitive mode with (?i) because those encoding switches could be mixed case, and then we are matching for one of the three combinations provided between the parenthesis. Parenthesis are frequently used to denote a capture, but the use of ?: tells the regular expression that we are not capturing, we are just looking for matches. After our possible encoded strings flags, we are going to look for some whitespace followed by the encoded string. This encoded string is what we want to capture so we will enclose this in parenthesis. Finally, if we are going to capture a value, we need to do something with it, like output it to a field, so we need to assign it to a placeholder variable. Now let’s test our rule.

$encodedPS = re.capture($event.target.process.command_line, 
`(?i)(?:-enc|-ec|-en)\s*(\S*)`)

Uh, John. We have a problem…

Yes, you see it too. We are seeing all PowerShell launches in our data set, not just ones with those encoded command switches. Depending on what you are trying to achieve, that might not be the end of the world, but if you are building toward a rule and testing or even performing a retrohunt or search, those additional PowerShell processes can create some unwanted noise as well as additional processing overhead. You might be thinking, “…but we specified our capture, why are we getting these additional values?”

The answer, dear reader, is that we didn’t narrow our results set with our capture function above, we just wrote it to a placeholder. This is the difference between the re.regex function and a function like re.capture. The re.regex is basically returning a boolean value; that is, does the field match the regular expression pattern, if it meets the event criteria, it is true and we keep processing; if it doesn’t, then it is false and does not meet our criteria. re.capture needs to be used with a placeholder variable, compared to a list or nested within another function to be used when assessing events. We won’t cover lists today, but stay tuned for that.

So, how can we refine our rule to remove excessive detections that we really don’t want when looking just for encoded PowerShell? We can add an additional line (bolded) in the event section of our rule that performs a comparison of the target.process.command_line for our matching string.

events:
   $event.metadata.event_type = "PROCESS_LAUNCH"
   re.regex($event.target.process.file.full_path, 
   `(system32|syswow64)\\WindowsPowerShell\\v1\.0\\powershell(|\_ise)\.exe`) nocase
   re.regex($event.target.process.command_line, 
   `(?i)(?:-enc|-ec|-en)\s*\S*`) nocase
  $encoded_value = re.capture($event.target.process.command_line, 
  `(?i)(?:-enc|-ec|-en)\s*(\S*)`)

Alright, that’s better.

Could we strip down the re.regex function further? Probably. The use of both nocase and (?i) is redundant and everything after the encoded command switches is probably not needed, but we will leave that to you to tweak further. From here, we could take our matched value that we extracted and output it so an analyst could view the encoded PowerShell when the rule fires. To do this, we can add a line under our outcome section, but for readers of our blog on outcomes, you already knew that.

outcome: $encoded_powershell = $encoded_value

Here we can see the detection that includes the encoded PowerShell in its own field.

An analyst or a SOAR could then take that value and decode it manually or run some sort of playbook on it.

Good so far? Wait, you mean you don’t want to manually decode PowerShell? OK, how about this?

Let’s introduce another function. This function is strings.base64_decode. Based on its name, you can tell that it works with string values. It's very powerful but simple to deploy. The basic syntax looks like this:

strings.base64_decode(encodedString)

That’s it. If you have a field in UDM that is encoded in base64, you can use the function to decode the value of the field. Of course, we may not have a whole field in UDM that is base64 encoded, but by coupling it with the re.capture function, we can isolate the base64 string and then decode it in YARA-L, without having to manually decode it or send it to a SOAR playbook. To do this in our rule, we would add the following line to our event section.

$decoded_value = strings.base64_decode(re.capture($event.target.process.command_line, 
`(?i)(?:-enc|-ec|-en)\s*(\S*)`))

Here we are nesting our re.capture function within the strings.base64_decode function with the output written to the placeholder variable, $decoded_value. From there, we can take that decoded value and write it to our console by adding this line to the outcome section of the rule.

$decoded_powershell = $decoded_value

And with that, our output would look something like this.

We can see the full command line, the encoded command and the decoded command being run in PowerShell!

I hope that this has shown how we can build on the re.regex function to perform captures of strings and then decode them, all within YARA-L. We will continue to build on this concept as we introduce another regular expression function, called re.replace!

Until next time…

New to Chronicle Series

Let’s work together

Ready for Google-speed threat detection and response?