New to Chronicle: Unified data model

John Stoner

Principal Security Strategist, Google Cloud

August 1, 2022

This is the first post from Google Cloud Principal Security Strategist John Stoner as part of his deep-dive "New to Chronicle" series, which helps propel forward security teams either new to SIEM or replacing their SIEM with Chronicle.

At Google, our goal is to provide insight and guidance to analysts who are using Chronicle on an everyday basis. However, there are lots of analysts out there. Some are new to security operations as a whole, while others have been in security operations for years but are new to Chronicle. This blog series' goal is to aid any analyst who is new to Chronicle because I, like many of you, am new to Chronicle too!

I’m John Stoner and I recently joined Google as a principal security strategist focusing on developing Chronicle content that helps the analyst be as effective as they can be in the face of a continually evolving threatscape. I have worked with other leading SIEMs over many years, so I feel like I bring an experienced perspective but Chronicle is new to me. So let’s learn some fundamentals together!

If you are an analyst new to Chronicle, one of the first things to understand is that searching for events can be handled with a raw search, as well as a structured search. That said, for anyone who has ever searched millions of event logs, you know that a structured search will provide results faster and with greater precision because of the specificity in the search. Yes, time is an important factor, and bounding your search with time ranges is essential. But that is beyond the scope of this post

Chronicle uses the unified data model (UDM) schema on the events it collects. You may have worked with schemas that are flat with 400+ fields, while others may have worked with schemas that break out subsets of data into summary tables. UDM has flexibility built in to accommodate a tremendous number of fields to describe an event. It maintains its nimbleness to effectively handle an event that is focused on an endpoint process as easily as it handles a network communication event without wasting space with fixed fields or multiple tables that contain redundant information. Let’s look at an example.

UDM events are made up of multiple sections. The first section, which will be found in every UDM event, is the metadata section. It provides a basic description of the event. This includes information like the timestamp when the event occurred and the timestamp when it was ingested into Chronicle. It includes product information, version, and description. The parser at ingestion also will establish an event type that is predefined and agnostic of the specific product logging. With just the fields in this metadata section of UDM, you can quickly start searching the data.

metadata.event_type = "USER_LOGIN" AND metadata.product_event_type = "4624"

In this example search, you can see with a field or two how you can generate a listing of Microsoft Windows 4624 successful login events, along with visibility into when the events were generated. Now, you might be wondering why I used a search with a vendor agnostic event type and a vendor specific event code. (I know my editors did!)

There is much more to unlock as you go into UDM, but I wanted to start with a search that just used the fields in the metadata section. As you go further into UDM, you will have more fields at your disposal to craft more vendor-agnostic searches that allow you to search across different solutions for all successful logins, like this:

metadata.event_type = "USER_LOGIN" and security_result.action = "ALLOW" AND target.user.userid != "SYSTEM" AND target.user.userid != /.*\$/

But before I get too far ahead of myself, let’s keep digging into UDM. In addition to the metadata section, a series of sections describe additional aspects of the event. If a section is not needed to describe the event, it isn’t included, thus saving space. For example, the principal section contains fields representing the entity that originates the activity in the event. Sections that reference the source(src) and destination(target) are also included. If there are systems that events pass through, like a proxy server or SMTP relay, their information would be in the intermediary section, and if you deploy a packet sniffer to passively watch data, these events could be seen in the observer section.

Let’s search for the userid of 'fkolzig' to determine when they successfully logged in. To do this you are going to use the target section of UDM. Within the target section (as well as the other sections mentioned above), series of subsections and fields describe the target. For example, the target in this case is a user and has a number of associated attributes, but in other cases the target could be a file and its attributes, a registry setting, an asset and its location, and more. Since you are focusing on the user subsection of target, the field you would search for 'fkolzig' in would be target.user.userid.

metadata.event_type = "USER_LOGIN" AND metadata.product_event_type = "4624" AND target.user.userid = "fkolzig"

Let’s look at another example, but this time, you will use network data. You can search UDM for RDP events with a target.port of 3389 and a principal.ip of 35.235.240.5. Notice you have add fields from the network section that contain fields that describe the network event, in this example, the direction of the data (network.direction) and the protocol (network.ip_protocol) being used.

metadata.product_event_type = "3" AND target.port = 3389 AND network.direction = "OUTBOUND" and principal.ip = "35.235.240.5"

Perhaps you want to understand processes that are created on our server. You could look for the use of the net.exe command and search for this specific file in its expected path. Applying the logic we covered earlier, the field you are searching would be target.process.file.full_path. When you run our search, you see the specific command issued in the target.process.command_line field. Additionally, you can identify the user issuing the command (principal.user.userid), the md5 hash (principal.process.file.md5) and command line (principal.process.command_line). You can also add a field in the about section which is the description of Microsoft Sysmon event code 1 (ProcessCreate).

Let’s wrap this up with one more search. In this case, you are going back to the metadata.event_type of USER_LOGIN that you started with, but this time, you are going to add to the search the field target.user.department with the value of marketing. This value is not in any of our user login events, but it is in the LDAP data you have ingested about your users and this is another area where UDM shines.

There is an entire entity data model within the UDM schema that provides context to UDM events automatically. In this example, the user login events contain the target.user.userid of shasek, but the fields target.user.username, target.user.email_addresses, target.user.office_address.name are all enriched from the entity data model. To complete the view, fields are included from the extensions section for authentication and security result section to describe these events in additional detail.

metadata.event_type = "USER_LOGIN" AND target.user.department = "Marketing"

I am going to stop here today, but I hope this has provided a better understanding of UDM and how to work with data parsed into the schema. The UDM field list is a handy reference to keep at the ready as is the usage guide. These searches were very simple and didn’t cover operators or regular expressions, but there is more available to further explore the data. The really cool thing about all of this is that the Chronicle rules engine uses these same fields, so learning the UDM fields and structure will not only make your searches easier and quicker, it will also set you up for crafting some awesome rules!

Threat Detection New to Chronicle Series

Let’s work together

Ready for Google-speed threat detection and response?