The Best Data Sources and Basic Techniques For Threat Hunting
Jason Smith currently works for Cisco from his home in Nashville, TN and has worked for multiple US Department of Defense SOCs, as well as the lead security monitoring architect for the Commonwealth of Kentucky. He co-wrote Applied Network Security Monitoring and maintains the open source project FlowBAT, a graphical flow data analysis tool.
- Contextual data is important, but a lot of success can be gained by gathering relatively simple forms of data. For example, flow data can be analyzed with tools like FlowBAT, Bro logs and SiLK to create a comprehensive picture of your network that is very conducive for hunting.
- SOCs should take steps to avoid information siloing, especially when deployment groups within an organization are geographically separated.
- Hunting can be challenging, but is by no means impossible. A lot of good work can be done by getting set up with simple tools and expanding from there. In other words, “Find weirdness in all that data and you’ll learn a lot.”
Is your SOC unlocking its full potential? How do you ensure that you’re using network data to its fullest extent? Establishing a strategy and gathering the right datasets for threat hunting can prove challenging, but it is incredibly important for analysts who are seeking to increase the scope of their operations. To explore some of these points, we talked to Jason Smith (@Automayt), a threat hunter from Cisco about his ideas for building conductive environments for successful hunting.
This interview was originally posted in conjunction with the Threat Hunter Spotlight series which features conversations with top-level threat hunters to discuss a range of topics, from spotting adversary tactics, techniques, and procedures to leading hunt teams. Jason’s original “Threat Hunter Profile” can be found on the Sqrrl blog. The original interview is available here.
Question (Q): What is the process of having data laid out in such a way that is the most conducive to hunting?
Jason Smith (JS): So, people say that context is king and that PCAP has everything. In reality, my first big success was having the simplest of data: flow data. Flow data is just essentially the five aspects of traffic flow along with some little statistics about the actual session or about the records. And so, there’s really minimal data there compared to all sorts of metadata or everything that would be in an extract to capture. But, if you have the tools to actually parse through that minimal data, you can make a lot of statistical sense of it. This stuff is really easy to set up, as long as you have some sort of path or some sort of port near like that, that you can run a tool like SiLK. And then suddenly, as soon as you start getting data in, you can start making sense of it.
For a new hunter or even a brand new analyst, your environment is often times your house as well. And, so you’re setting this stuff up via Security Onion or things like that, which is out of the box, turning it on you have all this metadata and stuff. That’s a really, really big step you can take and it’s a pretty easy one as long as you have a spare computer lying around.
Q: What is it that’s prevented SOCs from having access to different data sources that they need and how do you overcome these blocks?
JS: Our mistake is that we were so siloed because we were an enclave of another organization. So our requests kind of fell on deaf ears. And, if the people did hear us at the central location then they also had different silos for like development and deploy groups, operations and things like that. Depending on which group owned the data, that was the group that decided who had access to what.
So, I’d say that the lesson you learn is to beware of siloing stuff too much to where you can’t actually get requests from one group to another to where they actually translate in a meaningful way. Because, even if you can, sometimes the development group or the deploy group won’t understand the request or why you need it or anything like that, so you just have to have a good conversation and try to prevent animosity and what not from building up where you can’t get that done. I’d say that was the biggest difference.
Q: In terms of deciding what kind data you want to actually have at your disposal and what you want to work with, how do you go about determining that?
JS: That depends on kind of what the deployment and tooling is that you’re moving forward towards. So, a lot of SIEMs get deployed in large stock. I’m a network security monitoring guy first so I normally go for things like SiLK and Bro and things like that. However, not to take away from any of the local syslog type of stuff or host-based log or things like that, but if you have a place to feed them via some sort of log aggregator or stem or something like that, getting that into those tools is pretty essential. You don’t want to be fumbling as you’re trying to integrate all this stuff because that’s where things kind of get messed up.
Q: Do you actually maintain some open source tools to help make sense of flow data?
JS: Yeah. I have FlowBAT, which is basically a front-end. Not to say that it’s just something that is a prettier form of SiLK but instead it does graphing, distills data and things like that. It does auto-complete of all the terms and so anyone that uses SiLK knows that there’s a lot of partitioning switches. You have to complete all those things. It allows you to save all your lists in a sensible way, and save your tuples to run things from a dashboard and to periodically run them. And again, it’s just a front-end but all the additional features basically add in a lot of new capabilities for parsing through all that data.
Then on the side, there’s also things like FlowPlotter, which is also a separate plotting tool for flow data that you can make different front ends and periodically give data to … Like, let’s say the issue is some sort of reduction of data. A lot of times our issue is that we have too much data and sometimes it’s hard to go through. But sometimes the actual problem is the data has dropped down for some reason, which could be a result of something malicious or most likely something non-malicious. But going to flow is really a quick way to do that. Graphing out that pattern is a great visual to get started.
I usually always say that flow, though, won’t give you the full picture but they’ll give you kind of the who, what and where for most things, and then you can pivot from that. A great use is even just timestamps, and use them to actually narrow in on certain things will make not only Bro data more useful but it will allow you to pull PCAP in a meaningful way, as well.
Q: What’s the best way to structure your network to make it understandable so that an analyst or hunter can actually know what their looking through?
JS: A lot of that depends on deployment and how consciously you’re deploying a lot of things. But there’s also a lot of after the fact friendly intelligence gathering that should always be done. I always go back to flows and Bro data but you can get a lot out of flows and Bro data just to determine what is actually going on right now. Like, if you come in with a completely fresh asset list where it’s essentially not complete at all, you could generate something pretty substantial just with that kind of data. So, from the cheap, there are some basic commands, some top talker types of thing that you can do and then run those down to try to determine different things that you can do just from flow and Bro.
Q: How you see automated detection systems and traditional SOC tools (a SIEM, IDS, etc.) fitting into hunting, the hunting cycle?
JS: I do have automation that spurs off some weird hunts. All hunts don’t just start necessarily as from just somebody sitting at their desk thinking, “I’m going to look up this thing weird today.” Some very, very basic, super low fidelity signatures can also kick off some pretty cool stuff. And the results of those things can enhance the other medium and high fidelity signatures that are used for normal alerting. So, that can provide both a starting point.
Q: If there’s an analyst who wants to sort of start doing this stuff, what’s the best thing for them to do?
JS: The best thing is to look at threat hunting, as not some sort of magical place that you want to get to but instead as something you can do now. You may not be as good as someone who is more experienced but you will be eventually. It just takes time and knowing weird versus normal stuff and bringing things to light. The biggest thing that keeps people from doing anything is the worry that they’re going to fail or they’re not smart enough to do this kind of work or something like that. That’s the things that holds people back a lot.
What it comes down to, if you’re not willing to step out there and fail quite a bit because, I’ll be honest, most of my hunts, they lead to nothing. They lead to a bunch of junk and then I feel like, “Oh, I’ve wasted a little bit of time.” But in no case do I waste time. Every time you do something like that and you fail, you still learn something. I’d say there was never a time when I don’t learn at least one thing from even every fail time and every successful hunt is really, really, really important and it’s worth partying over basically, especially if it’s something pretty cool. That’s the way you get better in security is by taking this raw data and learning how to mold that data and learning how to just parse through it in a way that you’re comfortable. Because, again, the conducive environment for hunting and security in general is the environment that you’re comfortable in. So, get comfortable with all this data. Get Security Onion at home and play around with that stuff. Use TCP to replay and replay a bunch of PCAPs in Security Onion to see what that stuff looks like and then find it yourself through the data manually and stuff like that. Find weirdness in all that data and you’ll learn a lot.