Thursday, July 5, 2012

Categories for hash databases



By now you have most likely started to generate some hash databases of your own using hashdog and it is time to start to put them to use. In this blog post I will describe how I usually go about to categorize my hash databases and how I use them. You might want do things differently based on the type of forensic investigations you are involved in or the type of environment you are supporting.

Common hash categories
You are today probably already using the Reference Data Set (RDS) from the National Software Reference Library (NSRL) as one of your hash databases. These databases have been created by NIST in a controlled environment and contain hashes from application or operating system that are mostly generated from files still on their original media. These files are known to be good and are usually put in a hash category called the ‘KnownGood’. This category contains hash databases of files that are known to be benign, files that you are not interested in investigating further.

You might even have hash databases that contain hashes of malicious files that you want to search for. Those databases are part of a hash category known as the ‘KnownBad’ category. As with the ‘KnownGood’ category you do not really want to spend your time analyzing any files you find matches for in this category. If you get a match for a hash in any of the databases you have in this category, chances are high that the malware already has been analyzed by multiple organizations before you came across it. A time better spend is trying to figure out how the malware got on the system you are investigating and if other data pieces like registry keys and tmp files are consistent with previous analysis that has been done. After all, the file could just have been placed on your system to throw you off and keep you from finding the real anomaly.

Extending the KnownGood and KnownBad categories
When I started to write hashdog and was generating hash databases of my own, it did not seem right to put some of my databases in the ‘KnownGood’ category. When I was creating databases from files downloaded directly from the vendor or extracted from a verified ISO image, I put the hash databases I generated in the ‘KnownGood” category. However, when I was generating hashes from standard OS build images and files listed in application shares, I had no really good way of guarantying that the files was absolutely free from malware. After thinking a lot and discussing it with Glenn, I came up with a solution that works for the kind of forensic investigations I am mostly involved in – looking for anomalies in a system that could indicate that the system has been compromised.

Instead of using just the two hash categories mentioned above I decided to use a third and a forth category, calling them the ‘KnownUsed’ and ‘KnownForbidden’ categories. The ‘KnownUsed’ category contains databases of hashes generated from files that are actively being used by the organization I am investigating. Any hits I get from hashes part of databases in this category are treated differently than any matches I get from one of my ‘KnownGood’ databases. For instance, if a file has a hash that is part of any of the ‘KnownGood’ databases, that file will be discarded immediately without any further analysis being made. If I get a hit for a hash part of one of the hash databases in the ‘KnownUsed’ category, the file will not be completely discarded but I will not pay so much attention to the file, at least not at my initial analysis. The argument for this is that if I am looking for an anomaly, it is highly unlikely that this anomaly is a file that I have previously generated a hash for and included in my ‘KnownUsed’ database.

The forth category that I call the ‘KnownForbidden’ contains databases of hashes created from files part of applications that is not allowed within the organization. Common hashes to put in this category are generated from files belonging to non-cooperate encryption software such as Truecrypt, penetration testing software like Metasploit and privacy and cleaning tools like CCleaner. These are applications that are not malicious in them self but could indicate a malicious use if they are found on a system I am investigating. As with the other categories mentioned above, I want to get alerted if any files are detected but I do not want to analyze any files. To sums things up these are the categories I use and they way I threat any matches I get for files in the hash databases.

  • KnownGood - Discard any files from further analysis.
  • KnownBad - Alert on any matches but do not analyze the file.
  • KnownUsed - Put these files aside for later analysis.
  • KnownForbidden - Alert on any matches but do not analyze the file.


By breaking up the hash categories this way it is easier for me to focus on the files that I have not seen before, the unknown file whose functionality is not known.

No comments:

Post a Comment