New To Chronicle: Adding Prevalence to Your Analysis

John Stoner

Principal Security Strategist, Google Cloud

April 20, 2023

Try Chronicle

Detect, investigate and respond to cyber threats with Google's cloud-native Security Operations Suite.

"New to Chronicle" is a deep-dive series by Google Cloud Principal Security Strategist John Stoner which provides practical guidance for security teams that are either new to SIEM or replacing their SIEM with Chronicle. You can view the entire series here.

Today, we are going to take a look at prevalence in Chronicle. Prevalence is represented by a value that would indicate how often an entity is observed, based on logged events, in an environment. It was originally focused on domains but has been expanded to IP addresses and file hashes as well.

Chances are that entities with a high prevalence are being accessed by a breadth of users because they are part of normal activities and behavior (e.g. DNS lookups for google.com) and are less likely to be malicious whereas entities with lower prevalence are not as frequently accessed and many attacks involve interactions with lower prevalence entities. Below is a prevalence graph for the domain google.com.

Now, before you start checking to see if you have prevalence data sources to send to Chronicle, let me stop you. Chronicle generates prevalence for you. You don’t need to do a thing to get access to it! Prevalence is what we call a derived context. That is, we are calculating it daily and exposing it to you via the entity graph.

Let’s briefly talk about that calculation because it speaks to Chronicle’s ability to handle large amounts of data at scale to create this derived context. In the case of domains, think about all the logs that you generate from every site you visit as well as all the associated subdomains. That’s just your system. Now, combine that with every other system within your organization, that is sending logs to Chronicle. Below is the google.com domain from above overlaid with the play.google.com subdomain from the same organization.

Now, let’s perform this same calculation for IP addresses that your organization logs, both internal and external. Include file hashes that are part of many endpoint logging suites. That is a tremendous amount of entity data.

To top this off, we also take into account what we refer to as late arriving data, so even if data is delayed for whatever reason, those logs are included in prevalence calculations and those calculations are updated to reflect this late arriving data. We have seen data coming in multiple months later and it is still factored into prevalence. It’s a pretty impressive capability that is unique to Chronicle. With that foundation in place, let’s start building some rules!

Building Detections with Domain Prevalence

Since we have been talking about domains, we will use them as our example today but understand the same concepts apply for file hashes and IP addresses. In this rule, we are focusing on network DNS queries (lines 9-11), but this could be adapted for a wider set of domain use cases.

Using the prevalence data in our rule starts with lines 13-14 where we specify the metadata.entity_type of DOMAIN_NAME and metadata.source_type of DERIVED_CONTEXT. Because we have many kinds of entities and different types of entity data, we are using these fields and values to narrow the entities that will be joined to our events. Line 15 contains the placeholder variable that joins to the event placeholder variable in line 11. Because there are other derived context entities, specifying the domain.prevalence.day_count of 10 is a best practice to focus our rule on prevalence data because all prevalence.day_count fields, whether they are IP address, file hashes or domains have a value of 10 (Line 16). Finally we are left with determining our threshold for alerting on prevalence. In this example, we are looking for a rolling_max of 3 or less (Lines 17-18). This means that 3 or fewer people per day saw this domain over the past 10 days. We set a match condition for similar DNS events over a 60 minute period and we add our event variable for prevalence to our condition section and are done!

rule domain_prevalence {

meta:

author = "Chronicle Security"

description = "Detects DNS events querying domains that have a low rolling max prevalence."

severity = "Low"

events:

$event.metadata.event_type = "NETWORK_DNS"

$event.network.dns.questions.name != ""

$event.network.dns.questions.name = $domain

$prevalence.graph.metadata.entity_type = "DOMAIN_NAME"

$prevalence.graph.metadata.source_type = "DERIVED_CONTEXT"

$prevalence.graph.entity.hostname = $domain

$prevalence.graph.entity.domain.prevalence.day_count = 10

$prevalence.graph.entity.domain.prevalence.rolling_max <= 3

$prevalence.graph.entity.domain.prevalence.rolling_max > 0

match:

$domain over 60m

condition:

$event and $prevalence

}

When we test our rule, we see that we have five detections, including two for the same domain. Because those two requests occurred more than 60 minutes apart from one another, we see multiple alerts. When we expand the alert for cafe-nomade.com, we see a single DNS event associated with this domain during our time frame. When we review the entity data, it appears that this is the only system that has communicated with this domain today and looking back across the past 10 days, the most systems that have connected to this domain on any single day is also one.

Will all alerts with low prevalence have a single event associated with the alert? No, we could have hundreds or even thousands of events for a low prevalence entity. Keep in mind that the prevalence is calculated as the number of systems associated with the entity on a given day, not the volume of data that is transmitted or the amount of events generated. Does a single host that is generating a large amount of traffic with a domain that no one else in the entire organization is communicating with interest you? I believe that would be a condition that would merit additional investigation.

Subdomains are handled in the same manner as domains; the prevalence metric is based on the number of unique addresses associated with play.google.com, for example, whereas domain would encompass all subdomains of google.com, ignoring if it is play.google.com or drive.google.com. Once you have the hang of building a rule for domain prevalence, the same template can be applied to IP addresses and file hashes.

I hope this provides a greater understanding of the power you have with Chronicle and the use cases you can unlock with prevalence. While we just looked at correlating prevalence and our DNS events today, we could expand our use case to add in our threat intelligence, Safe Browsing data or any other threat intelligence to further focus our detections.

New to Chronicle Series

Let’s work together

Ready for Google-speed threat detection and response?