Threat Hunting Reference Guide
The Hunting Maturity Model
The Hunting Maturity Model, developed by Sqrrl’s security technologist and hunter David Bianco, describes five levels of organizational hunting capability, ranging from HM0 (the least capable) to HM4 (the most). Each level of maturity corresponds to a increasingly advanced framework for hunting that an organization or security operations center (SOC) can carry out, based on the quality and quantity of data they collect, their ability to repeat hunts, create new hunts, and automate their hunting process.
The HMM can be used by analysts and SOC managers to both to measure their current maturity and provide a roadmap for improvement.
The Hunting Loop
The hunting loop outlines a process for threat hunting. As a loop, it is specifically meant to be repeated continually. Hunters create hypotheses to drive their investigations, which are then carried out via tools and techniques. Over the course of an investigation, hunters look for specific patterns or Tactics, Techniques, and Procedures (TTPs) that might inform them of potential compromises. If a TTP is identified, a hunter will document it and export it to a Threat Intelligence Platform or other systems. The analyst will also update or create new analytics to ensure that the next time a similar attack occurs it will be discovered automatically and a hunt will not be necessary.
The Kill Chain
The kill chain is a framework developed to model the process that attackers will typically take to carry out an attack, from the initial phase of planning and reconnaissance to acting on their objectives. Hunters will often use the kill chain to orient themselves against a potential adversary. Many hunting planning strategies often includes beginning the process of hypothesis creation focusing at the end of the kill chain, where an organization is most vulnerable, and working one’s way backward up the kill chain to less immediately threatening but nevertheless increasingly malicious behavior.
Clustering is a statistical technique that consists of separating groups (or clusters) of similar data points based on certain characteristics out of a larger set of data. Hunters use clustering for many applications, including outlier detection, due to the fact that it can accurately find aggregate behaviors, such as an uncommon number of instances of a common occurrence.
This technique is most effective when dealing with a large group of data points that do not explicitly share behavioral characteristics.
Grouping consists of taking a set of multiple unique artifacts and identifying when multiple of them appear together based on certain criteria. The major difference between grouping and clustering is that in grouping your input is an explicit set of items that are each already of interest. Discovered groups within these items of interest may potentially represent a tool or a TTP that an attacker might be using. An important aspect of using this technique consists of determining the specific criteria used to group the items, such as events having occurred during a specific time window.
This technique works best when you are hunting for multiple, related instances of unique artifacts, such as the case of isolating specific reconnaissance commands that were executed within a specific timeframe.
The simplest method of hunting, searching is querying data for specific artifacts and can be performed in most tools. Unfortunately it may not always be the most effective method because it cannot produce outliers in the result set; you get exactly the results you searched for.
Searching also requires a finely defined search criteria to prevent result overload. A search that is too broad will often flood an analyst with too many results to realistically process.
There are a number of specific factors to keep in mind when carrying out a search:
- Searching too broadly for general artifacts may produce far too many results to be useful
- Searching too specifically for artifacts on specific hosts may produce fewer results that may be useful
Also known as stacking, this is one of the most common techniques carried out by hunters to investigate a hypothesis. Stacking involves counting the number of occurrences for values of a particular type, and analyzing the outliers or extremes of those results.
Speaking broadly, the effectiveness of this technique is diminished when dealing with large and/or diverse data sets, but it is most effective with a thoughtfully filtered input (endpoints of a similar function, organizational unit, etc.). Analysts should attempt to understand input well enough to predict the volume of the output. For example, if you are given a dataset containing 100k endpoints, stack counting the contents of the Windows\Temp\ folder on each endpoint across an enterprise produces an enormous result set.
Stacking is best used with data sets that produce a finite number of results. For example, stacking destination ports seen in connection metadata is often effective. There are a finite number of ports (65,536) and most connections use non-ephemeral ports (the specification of ephemeral ports varies, but generally any port under 1025 is considered non-ephemeral). This means that we can expect up to 65,536 data points in our result set and, due to the prominent use of non-ephemeral ports, may actually see far less than that.
Stacking across multiple unique hosts is most effective with a filtered input, in which case friendly intelligence can be used to define these filter(s). When using this technique, count the number of command artifact executions by hostname or account based on specific criteria. An analyst might want to keep track of factors relevant to their hunting hypotheses, such as C-suite laptops, HR workstations, non-Admin users, etc.
As an extremely broad category of hunting techniques, there are many types of visualizations can be utilized to analyze data and determine patterns. In general, visualizations seek to represent data in novel ways that bring information that an analyst is interested in to the fore. There are several more common methods of visualization, which include the kinds listed below.
Box plots are a simple way to visualize variety of activities across entities. With box plots, you can group entities based on type or function, including a simple grouping of a specific type of entities. After this is done the analyst Investigates top outliers in order to determine any anomalies. Box plots bring the concept of clustering to the visual realm– they visually describe distribution of data, with a box that represents median values and whiskers that represent high and low values (outliers).
Sparklines are simple line charts that are used to show the trend a specific value type and its changes, usually over time. These visualizations are useful as an assistance tool when performing textual analysis on a data set, especially when looking for spikes or upticks in specific activity.
Heat map are visual representations of a distribution of two or more different types of data, laid out in a matrix and assigned different colors based on value. These kinds of visualizations are useful for visualizing distinct groups in a data set. Grouped data (in the form of numbers) is assigned to zones along X and Y axes, forming a grid; colors are applied to each zone in the grid. Outliers are identified as different colors in the heat map (the colors generally range from a cool blue to a dark red). In hunting, this type of visualization is useful when the input data set is well-defined and outliers among both X and Y axes are of interest.
Machine Learning Techniques
Supervised machine learning uses labeled training data, data that is used to train the machine, to make predictions about unlabeled data. This new, unclassified data is what you want the machine to label correctly (based on the training data). One specific supervised machine learning classification algorithm is known as Random Forests.
In multiple machine learning algorithms, decision trees are created based on features or predictors identified in training data, which are then applied to new data to attempt to determine an accurate answer. In a random forest, random subsamples are taken from training data and from columns in training data — this creates random training data (an individual decision tree). This random subsampling is performed many times (which creates many trees, hence the “forest”). Classification is based on the results from all of the decision trees created by random subsampling. Each tree produces an answer and the majority answer is used. This is done to increase the efficiency of creating a model and mitigating biases that may be produced by simple analysis.
Process execution data – Contains information on processes run on specific hosts. Critical metadata associated with process execution includes command-line commands/arguments and process filenames and ID.
Registry access data – Contains detailed logs of access to files and registry objects.
File data – Information on stored file and artifacts kept on a local host. This can include when files were created or modified, as well as size, type, and storage location information.
Network session data – Contains information on network connections between hosts. Critical metadata associated with network connections include the source IP address, destination IP address, destination port, start time of the connection, and end time/duration of the connection. Includes Netflow, IPFIX, and similar data sources.
Bro logs- A network monitoring tool that collects connection-based flow data and application protocol metadata (HTTP, DNS, SMTP).
Proxy logs- HTTP data that contains information on outgoing web requests, including Internet resources that internal clients are accessing.
DNS logs- Network host information on domain resolutions that can be used to correlate domain resolution requests to internal clients.
Firewall logs- Connection data that contains information on network traffic at the border of a network, including blocked connections.