Threat Hunting for Suspicious File Types on the Host
In the first part of this series, I discussed how suspicious file types could lead to the discovery of malicious activity. I also discussed how to hunt for suspicious file types traversing your network using data sources like HTTP proxy events. In this article, I’ll continue our focus on hunting for suspicious files types by examining the presence and execution of files on the host. I’ll also discuss additional steps you can take to help investigate suspicious file types once you’ve discovered them on your network or systems.
Threat Hunting for Suspicious File Types on the Host
A file that has been downloaded on to a system will exist on disk, but you’ll primary rely on log-based data for hunting. The complication is that merely downloading a file doesn’t always create a log on the host. It’s typically only executed files that generate log data in commonly accessible places such as the native operating system logs or EDR logs. There are a couple of ways to approach this from the hunter’s perspective.
Direct Execution: The first strategy is to look for directly executed processes. This will reveal executions of suspicious file types that are directly executable by a system (e.g., Windows .exe files).Indirect Execution: Another technique for hunting suspicious file types is looking for those that indirectly execute. These are files that are not executable themselves but can execute code via another application. This includes things like documents, scripts, and archives.
The key point here is that, in all the examples, the same end result is achieved. Whether execution is direct or indirect the attacker can arrive at the same place. These are just different ways for attackers to achieve code execution. Fortunately, these are also different ways for you to hunt down the attacker’s code.
Let’s go over some host-based data sources you can use to detect direct and indirect execution.
The most common place to look for process execution is the logs generated by the operating system itself. For example, Microsoft Windows systems generate a log with the event ID 4688 whenever a process is executed on a system. You might start examining these logs by asking, “What processes were executed by my systems?”
A list of every process won’t be too helpful when examining lots of systems, but you can aggregate this data into a list of all processes and sort them by the least frequent occurrence. That’s what this query does:
SELECT COUNT(*),ProcessName FROM Sqrrl_WindowsEvents WHERE EventID = 4688 GROUP BY ProessName ORDER BY COUNT(*) LIMIT 100
If you’re interested in a single system you can also just output a list of all processes by executing a more specific query like this:
SELECT Computer,EventID,IpAddress,ProcessName,TimeCreated FROM Sqrrl_WindowsEvents WHERE EventID = 4688 AND IpAddress = ‘10.1.2.12’
If you’re using Microsoft Sysmon to extend your security logging (and you should be), you can focus on Event ID 1, which similarly covers process executions while providing more information, including a hash of the file. If you’ve incorporated a threat feed into Sqrrl you can configure it to automatically parse this hash against its list of malicious files. If not, you can search for the hash on popular open source intelligence repositories like VirusTotal.
If you utilize an Endpoint Detection and Response Platform (EDR) like Carbon Black, Tanium, or FireEye HX you can hunt through those logs in a similar fashion as well. This example is similar to the aggregation of Windows Events discussed above, but uses Carbon Black data:
SELECT COUNT(*),process_name FROM CarbonBlack GROUP BY process_name ORDER BY COUNT(*) ASC LIMIT 20
With each technique, you’re looking for things that stand out. This will vary based on whether you’re looking for direct or indirect execution.
Direct Execution: You’re looking for the execution of undesirable processes.
- Any process that isn’t in your approved list of software
- Processes that are named to look like legitimate process (svch0st.exe instead of svchost.exe)
- Processes whose names are random letters and numbers
Indirect Execution: You’re looking for the execution of legitimate processes, but with suspicious characteristics or followed by suspicious events.
- Common office apps (Word, Excel, PowerPoint) immediately followed by the execution of another process, including legitimate system applications
- Script execution of unknown scripts using scripting engines like Python, BASH, Powershell, etc.
Digging Into Suspicious File Types
Once you’ve found a suspicious file you’ll want to enumerate the relationships connected to that file to determine if it is actually malicious. This will almost always involve network, host, and open source intelligence data. For much of this, you can pivot through the Sqrrl Explore graph. The example below shows the indirect execution of a PowerShell script followed by net.exe being launched. This may be benign but requires more explanation.
You’ll want to consider:
What is the source of the file?
- Was it downloaded from an IP or domain historically associated with hosting malicious content?
- Was it downloaded from a legitimate organization?
What happened once the file was executed?
- Were any other suspicious processes created?
- Were any autorun processes created?
- Was any strange network activity observed after the process executed?
- Does the process have a clearly obvious legitimate business use?
- Does the user of this system have the ability to install software?
If you can answer these questions you should be able to reasonably determine the disposition of the file. If all else fails, consider executing the file in a malware sandbox to observe the file system changes it makes. You may be able to determine if it is malicious from this output, and you can also use it to guide additional hunting.
Hunting for suspicious files using host data alone can be quite tricky. In this post, I discussed how to think about this challenge through the lens of direct and indirect execution. Using these query techniques combined with Sqrrl’s explore graph should have you well equipped to find malware execution via suspicious file types and execution sequences.