Identifying the Data You Need to Find Answers Fast
Investigations are all about iterating through evidence that helps you make decisions about what events transpired on your network. That sounds easy enough, but asking the right questions and identifying the data you need to answer them is tricky. This problem manifests in two ways. First, not having enough of the right data means you may be unable to answer the questions that will move the investigation forward. Conversely, having too much data may be overwhelming with a tremendous number of fields and complementary evidence sources to examine. In either case, asking good questions and moving towards a conclusion quickly and accurately depends on knowing what data is available to you in any given scenario. In this post, I’ll address these concerns and discuss how Sqrrl helps you better understand your data so that you know where the gaps are and what options you have available to you within the context of an active investigation.
Your favorite malware blogger has just published a post containing a few IP addresses associated with a recent campaign. By examining flow data, you’ve determined that a host on your network has communicated with the hostile IP. Logically, you pivot to PCAP data to see if you can determine the nature of the communication, but it’s encrypted and doesn’t provide much useful information. So, what now? What other data points do you have available to pivot to from the IP address you have?
You might be surprised at how difficult this question can be to answer when you consider every network, host, transactional, and open source data repository at your fingertips. Nearly everyone would cite firewall log data as an option, but most don’t immediately think of Windows process logs which can show outbound connections made from processes running on a system. As shown in Figure 1, there are a lot of directions to go.
Figure 1: IP Address Pivots
PIVOTING WITH EXPANSIONS
It behooves any analyst to know what data sources are available to them so that they can ask the right questions. A strength of the Sqrrl exploration interface is its ability to dynamically increase your data awareness by providing all the pivots that are available for a specific input. This is made available through the relationship-focused data model powering Sqrrl and is configured when you define your data sources. In our scenario, instead of not knowing where to go next or accidentally missing a data source, Sqrrl will provide every pivot available to you through its expansions in a context menu.
Figure 2: Sqrrl Expansion Pivoting
Figure 2 shows an example of the pivots available to the analyst investigating the IP address shown. Because Sqrrl populates data using a graph-model, each pivot has a sense of direction. This adds incredibly relevant context to the pivot. You aren’t simply pivoting from an IP Address to Windows Logs, you’re using two unique data sets to answer the question “What systems and processes have made connections to this IP address?” This moves analysis further away from the notion of blind pivoting to any data source available and more towards a context-rich approach for answering questions that will lead you towards meaningful conclusions.
Of the various data pivots you can carry out from the sidebar, here is a list of some of my favorites:
- IP Address > Involved In > Alert
- IP Address > Connected to > IP Address
- Domain Name > Hosted On > IP Address
- MD5 > Calculated from > File Name
- File > Hosted On > IP Address
- Account > Observer On > IP Address
- MAC Address > Resolved to > IP Address
- Process > Connect to > IP Address
- Process > Hosted on > Hostname
- User Account > Authentication [Success/Failure] > Hostname
Analyst nirvana exists somewhere in which the investigation tool being used can provide answers to questions before you even know to ask them. For example, I know that any time I am investigating an IP address like in the scenario I introduced earlier, I want to know if it appears on any lists provided by the intel services I subscribe to. I also want to know if anybody on my network has communicated with the IP address and if the IP appears in any existing alerts.
In a traditional SIEM, gathering all that information is probably going to require multiple queries, and I likely won’t be able to display all the returned data on the screen at the same time. This slows down the investigation and increases cognitive workload due to the extensive context switching.
Figure 3: Automated Evidence Context
Because Sqrrl is analyst-centered, this process is significantly easier. By clicking an IP address is the exploration window Sqrrl will automatically query all the data sources relevant to a piece of evidence. Data mappings are configured once and every time you select a piece of evidence you’ll automatically be given results relevant to the investigation. This data is displayed in the slide-out sidebar allowing you to view the results in the context of the investigation without unnecessary context-switching.
In a similar scenario, consider hunting through DNS data and finding a suspicious looking domain name that was resolved by a friendly host. The logical next question is, “Did any host on my network communicate with the IP address this domain name resolved to?” This is another example where a traditional SIEM will likely require multiple queries to find the answer you desire. In Sqrrl, a single click of the indicator provides the answer. This saves valuable time and will better enable you to ask great questions, and potentially even answer them right off the bat.
The mark of a good analyst is one who knows their data. However, even the most skilled analysts can become overwhelmed with the pivots available to them. While humans will always be better at making decisions based on evidence, machines can augment the analyst by helping them better understand the data available and automating some of the data retrieval. Sqrrl’s data model is designed to allow you to describe expansions and drill-downs so that the data you are presented is tailored to the context of the investigation at hand. This ensures more accuracy and reduces the cost to defend your network by lowering the amount of time required to determine the extent of malicious activity.