Security Operations Suite arrow_forward expand_more
Solutions arrow_forward expand_more
Why Chronicle arrow_forward expand_more
Resources arrow_forward expand_more
Security Operations Suite arrow_forward expand_more
Solutions arrow_forward expand_more
Why Chronicle arrow_forward expand_more
Resources arrow_forward expand_more
Security Analyst Diaries #6: Finding the proverbial needle in a haystack with Chronicle SIEM's domain prevalence

Welcome to another Security Analyst Diary entry. We embarked on a journey to drive context-aware detections, and enrich ingested data with actionable information for our customers. A key part of fulfilling on that journey has been Prevalence, an important capability since the very inception of Chronicle. Check out the video podcast of this diary entry.

Chronicle SIEM, part of our Chronicle Security Operations suite, enables analysts to drive impactful security operations, context-driven detections and investigations, and enable a faster threat response. In today’s Security Analyst Diary entry, we’re going to cover: 

  1. What is domain prevalence and UDM implementation?
  2. Detection engine and prevalence
  3. Prevalence isn’t just for domains
  4. SQL queries and how

Let’s begin!

What is domain prevalence and UDM implementation?

Prevalence has been a core feature of the Chronicle Security Operations suite since its inception. It provides the capability to build a unique baseline of network access commonality and is used by security analysts via the Asset View to quickly determine unusual or beaconing activity.  

“The prevalence of a resource within the customer's environment. This measures how common it is for assets to access the resource.”

In this diary, we’re going to cover how you can now use prevalence as part of Chronicle’s entity model using YARA-L Detections.

Example of Chronicle’s Prevalence in Asset View (the golf balls, cotton balls, fluffy clouds)

Prevalence is now a natively indexed value in Chronicle’s entity model, and you can now use its derived context value in the following UDM objects:

The prevalence type includes the following UDM fields:

Field Name Type Description
day_count int32 The number of days over which rolling_max is calculated.
day_max int32 The max prevalence score in a day interval window.
day_max_sub_domains int32 The max prevalence score in a day interval window across sub domains.
rolling_max int32 The maximum number of assets per day accessing the resource over the trailing day_count days.
rolling_max_sub_domains int32 The maximum number of assets per day accessing the domain along with sub-domains over the trailing day_count days. This fields is only valid for domains

https://cloud.google.com/chronicle/docs/reference/udm-field-list#prevalence

Key concepts to note:

Day_max

the maximum prevalence score for the artifact during the day identified by the entity.metadata.interval, where a day is defined as 12:00:00 AM - 11:59:59 PM UTC.

Rolling_max

the maximum per day prevalence score for the artifact over the previous 10 day window.

Day_count

used to calculate rolling_max and is always the value 10. From the time data is ingested, it takes approximately 36 hours for the statistics to be calculated and stored. The calculations use the previous 5 days of data. If events older than 14 days are ingested, they will not be included in prevalence calculations.

Detection engine and prevalence

The first example of using domain prevalence builds upon Chronicle’s existing CTI IOC matching capabilities to apply an additional filter to an alert on accessing an infrequently accessed domain within the enterprise.

An example YARA-L rule for matching IOC data against event data looks as follows:

rule ioc_domain_match_against_dns {
   meta:
        author = "Security Analyst Diaries"
       description = "Lookup Network DNS queries against Entity Graph IOC matches.
"severity = "LOW"

   events:
       $e.metadata.event_type = "NETWORK_DNS"
       $e.network.dns.questions.name = $hostname
       $e.principal.ip = $src

       //only match FQDNs, e.g., exclude chrome dns access tests and other internal hosts
$e.network.dns.questions.name = /(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]/

       // //ioc entity graph lookup
       $i.graph.metadata.vendor_name = "ACME CTI"
       $i.graph.metadata.entity_type = "DOMAIN_NAME"
       $i.graph.entity.hostname = $hostname

   match:
       $src, $hostname over 1d

    outcome:
       $risk_score = max(75)

   condition:
       $e and $i
    }

When run, the above rule generates 10,000 detections over the last week against ACME’s CTI feed - a lot of Detections.

Let’s add domain prevalence into the detection rule to demonstrate how that can filter out common activity to find potential unusual or never before accessed domains.

rule network_prevalence_uncommon_domain_ioc_match {
   meta:
       author = "Security Analyst Diaries"
       description = "Lookup Network DNS queries against Entity Graph for low prevalence domains with a matching IOC entry."
       severity = "MEDIUM"

   events:
       $e.metadata.event_type = "NETWORK_DNS"
      $e.network.dns.questions.name = $hostname
       $e.principal.ip = $src

        //only match FQDNs, e.g., exclude chrome dns access tests and other internal

hosts
       $e.network.dns.questions.name = /(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]/

       //prevalence entity graph lookup
       $p.graph.metadata.source_type = "DERIVED_CONTEXT"
       $p.graph.metadata.entity_type = "DOMAIN_NAME"
       $p.graph.entity.domain.name = $hostname

      // >>> START PREVALENCE TUNING

       // day_max: max prevalence SCORE in a day interval window.
       $p.graph.entity.domain.prevalence.day_max <= 5


       // day_max_sub_domains: max prevalence SCORE in a day interval window across subdomains.
       $p.graph.entity.domain.prevalence.day_max_sub_domains <= 5

      // rolling_max: maximum number of ASSETS per day accessing the resource over the trailing day_count days.
       $p.graph.entity.domain.prevalence.rolling_max <= 1

      // rolling_max_sub_domains: maximum number of ASSETS per day accessing the domain along with sub-domains over the trailing day_count days.
       // field is only valid for domains
       $p.graph.entity.domain.prevalence.rolling_max_sub_domains <= 1

       // >>> END PREVALENCE TUNING

      // //ioc entity graph lookup
       $i.graph.metadata.vendor_name = "ACME CTI"
       $i.graph.metadata.entity_type = "DOMAIN_NAME"
       $i.graph.entity.hostname = $hostname

   match:
       $src, $hostname over 1d

   outcome:
       $risk_score = max(75)
       $day_max = max($p.graph.entity.domain.prevalence.day_max)
       $day_max_sub_domains = max($p.graph.entity.domain.prevalence.day_max_sub_domains)
       $rolling_max = max($p.graph.entity.domain.prevalence.rolling_max)
       $rolling_max_sub_domains = max($p.graph.entity.domain.prevalence.rolling_max_sub_domains)

   condition:
       $e and $p and $i
   }

Applying an additional join against the Chronicle Entity Graph (the $p variable block), looking for domains with a prevalence score of 5 or lower, and only accessed by one asset over the interval (remember the interval duration is 10 days for prevalence score), the above rule now returns only 22 detections for the same time range, with two unique domains.

Example of the updated Detection Rule using prevalence to apply an additional level of filtering, and values blanked out to as the matches were against malicious domains.

You may ask, why did you not use the outcome variables Prevalence in the Condition? This is down to personal preference, and applying the same logic as used in the $p variable block in the Outcome & Conditional would be a perfectly valid approach. 

For reference, this is an example of the entity graph derived context record that represents domain prevalence:

metadata.product_entity_id = "bad.domain.tld"
metadata.collected_timestamp = "2022-09-14T00:00:00Z"
metadata.entity_type = "DOMAIN_NAME"
metadata.interval.start_time = "2022-09-14T00:00:00Z"
metadata.interval.end_time = "2022-09-14T23:59:59Z"
metadata.source_type = "DERIVED_CONTEXT"
entity.hostname = "bad.domain.tld"
entity.domain.name = "bad.domain.tld"
entity.domain.prevalence.day_count = 10
entity.domain.prevalence.day_max = 3
entity.domain.prevalence.day_max_sub_domains = 3

As part of your detection engineering activities you can consider alternatives on this above example, e.g., inverting the logic to find multiple hosts accessing an unusual domain with an IOC match, finding IOC matches against sub-domains.

The other consideration is the above rule is based on DNS activity, but you can update or create variant rules to evaluate NETWORK_HTTP against your Proxy, NDR, or EDR (and  against unclassified Domains categories as an additional filter).

Prevalence isn’t just for domains

The keen-eyed reader may recall that prevalence was available for not only domains and DNS, but also files, or more specifically hashes.

In the same way derived context is generated for domains, a baseline of activity specific for your organization is generated from file activity, specifically for hash (SHA256) activity.   

Here’s an example, against starting with the original YARA-L detection looking for IOC Hash matches against EDR Process activity:

rule ioc_file_hash_shas256_match_against_ioc {

 meta:
   author = "Security Analyst Diaries"
   description = "Lookup Process Launch activity against Entity Graph IOC matches."
   severity = "LOW"

 events:
   $e.metadata.event_type = "PROCESS_LAUNCH"
   $e.metadata.vendor_name = "ACME EDR"
   $e.metadata.product_name = "Process Events"
   $e.target.process.file.sha256 = $hash
   $e.principal.hostname = $host

   $p.graph.metadata.source_type = "DERIVED_CONTEXT"
   $p.graph.metadata.entity_type = "FILE"
   $p.graph.entity.file.sha256 = $hash

   $i.graph.metadata.vendor_name = "ACME CTI"
   $i.graph.metadata.entity_type = "FILE"
   $i.graph.entity.file.sha256 = $hash

 match:
   $host, $hash over 1h

 condition:
   $e and $p and $i
}

Let’s add in file prevalence to the rule to evaluate if we can reduce noisy IOCs, and detect never before seen Hash activity.

rule file_prevalence_uncommon_hash_ioc_match {
  meta:
    author = "Security Analyst Diaries"
    description = ""
    severity = "MEDIUM"
events: $e.metadata.event_type = "PROCESS_LAUNCH" $e.metadata.vendor_name = "ACME EDR" $e.metadata.product_name = "Process Events" $e.target.process.file.sha256 = $hash $e.principal.hostname = $host
$p.graph.metadata.source_type = "DERIVED_CONTEXT" $p.graph.metadata.entity_type = "FILE" $p.graph.entity.file.sha256 = $hash
// >>> START PREVALENCE TUNING $p.graph.entity.file.prevalence.day_max <= 10 $p.graph.entity.file.prevalence.day_max_sub_domains <= 10 $p.graph.entity.file.prevalence.rolling_max <= 1 $p.graph.entity.file.prevalence.rolling_max_sub_domains <= 1 // >>> END PREVALENCE TUNING
$i.graph.metadata.vendor_name = "ACME CTI" $i.graph.metadata.entity_type = "FILE" $i.graph.entity.file.sha256 = $hash
match: $host, $hash over 1h
outcome: $risk_score = max(75) $day_max = max($p.graph.entity.file.prevalence.day_max) $day_max_sub_domains = max($p.graph.entity.file.prevalence.day_max_sub_domains) $rolling_max = max($p.graph.entity.file.prevalence.rolling_max) $rolling_max_sub_domains = max($p.graph.entity.file.prevalence.rolling_max_sub_domains)
condition: $e and $p and $i
}

Similar to adding prevalence to our Domain IOC match, applying prevalence to file hash activity can help to reduce noise and find first seen hashes.  For reference, how a file (hash) prevalence entity graph record looks:

metadata.product_entity_id = "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9"
metadata.collected_timestamp = "2022-09-14T00:00:00Z"
metadata.entity_type = "FILE"
metadata.interval.start_time = "2022-09-14T00:00:00Z"
metadata.interval.end_time = "2022-09-14T23:59:59Z"
metadata.source_type = "DERIVED_CONTEXT"
entity.file.sha256 = "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9"
entity.file.prevalence.rolling_max = 4
entity.file.prevalence.day_count = 10
entity.file.prevalence.rolling_max_sub_domains = 4
entity.file.prevalence.day_max = 4
entity.file.prevalence.day_max_sub_domains = 4

The above rule examples can be expanded beyond just Process Launch, and using more powerful EDR logging for activities such as File Creation or Image Loading, e.g., detect when a file is created and not necessarily run, i.e., detect potentially malicious binaries and libraries before they can be executed.

SQL queries and how ?

Prevalence is not supported in UDM Search (as entity graph is not supported in UDM search), but can be viewed via the Detection Results view, i.e., viewing the results of a Detection Rule.  

To utilize prevalence, either use Detection Engine or Chronicle’s data lake, aka BigQuery. For example: to find Prevalence statistics for derived context sources, you could run a SQL query as follows:  

SELECT
l.enum_name as source_type,
e.entity.hostname,
e.entity.domain.prevalence.day_max,
e.entity.domain.prevalence.day_max_sub_domains,
e.entity.domain.prevalence.rolling_max,
e.entity.domain.prevalence.rolling_max_sub_domains,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(e.metadata.interval.start_time.seconds)), DAY) AS start_day,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(e.metadata.interval.end_time.seconds)), DAY) AS end_day
FROM `datalake.entity_graph` e
JOIN `datalake.entity_enum_value_to_name_mapping` as l
ON e.metadata.source_type = l.enum_value
WHERE DATE(_PARTITIONTIME) > DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)
AND REGEXP_CONTAINS(entity.hostname,'google\\.com$')
AND l.field_path = "backstory.EntityMetadata.SourceType"

Summary

We look forward to hearing from customers and driving towards goals of securing the enterprise at scale with these enrichments and use cases. Prevalence, no longer just in Asset View, is now a fully featured part of UDM. Using derived context, and applying a statistical baseline of activity specific for your organization can really help SOC teams to find the proverbial needle in the haystack. 

To learn more about these capabilities, contact your Google Cloud Platform sales or CSM team. You can learn more about all these new capabilities in Google Chronicle in our product documentation

Don't forget to look at the video podcast of this entry. Looking forward to sharing another story in another Security Analyst Diary. 

Let’s work together
Ready for Google-speed threat detection and response?
Contact us