Friday, July 6, 2012

Post-processing hash databases from NIST

As mentioned in my previous blog post, the National Software Reference Library(NSRL), a part of the National Institute of Standards and Technology (NIST), generates and periodically releases hash databases called Reference Data Sets (RDS). Each entry in the RDS contains information such as the MD5, SHA-1 and CRC32 checksums of the file as well as its size and which product it belongs to. The RDS are published on downloadable .ISO which holds a .zip archive that contains five files;
  • NSRLFile.txt  - Main hash text file.
  • NSRLMfg.txt - Manufacturer listing.
  • NSRLOS.txt - Operating systems listing.
  • NSRLProd.txt - Product listing.
  • hashes.txt - SHA-1 hashes for the files above.


NIST also makes other files available to us, including a .zip archive called the "minimal" hashset. This hash database contains all the entries that the other RDS databases has but only lists one example of every file in the NSRL. It is this reduced hash database that I will be using in the examples below.

Products part of the database
Most forensic investigators that I know and ever talked to use the RDS from NIST as their ‘KnownGood’ hashset, actively discarding any file that generate a match in these databases. What NIST says regarding this matter is that the files used to generate the information in the RDS are actually files that are not known to be good nor known to be bad, just files that are known. Entries in the RDS are not only generated from files part of operating systems, but also from application that might be unwanted or even be considered as malicious by some organizations. It is recommended by NIST that the forensic examiner partition the RDS file so that any unwanted applications are excluded from the database.

To accomplish this we first need to process the ‘NSRLProd.txt’ file, a file that contains a list of all the products that NIST has generated hashes for in the hash database. To make it easier for us to understand exactly what is included in the RDS we need to create a uniquely sorted list of all the ‘ApplicationType’ fields, the last field of the comma separated file.

pmedina@forensic:~/NSRL/RDS_236m$ head NSRLProd.txt
"ProductCode","ProductName","ProductVersion","OpSystemCode","MfgCode","Language","ApplicationType"
1,"Norton Utilities","2.0 WinNT 4.0","WINNT","SYM","English","Utility"
2,"CRT","2.4","Gen","Unknown","English","Telnet"
7,"Harvard Graphics","3.0 Upgrade","DOS","SPC","English","Presentation"
8,"ScreenShow","N/A","DOS","SPC","English","Screen Saver"
9,"Norton Utilities","8","DOS","SYM","English","Utility"
9,"Norton Utilities","8","Gen","SYM","English","Utility"
9,"Norton Utilities","8","WIN","SYM","English","Utility"
14,"FastTrackSchedule","Windows","WIN","AEC","English","Calendar"
16,"Report Writer","N/A","DOS","CLA","English","Reports"
pmedina@forensic:~/NSRL/RDS_236m$ tail -n +2 NSRLProd.txt | awk -F,\” '{print $NF}' | tr -d '"' | sort -u
3d computer graphics and design
3d computer graphics and design,architectural
3D Landscaping amd Animation
3D Landscaping amd Animation,Design Suite,Tutorial
3D Landscaping amd Animation,Graphics Suite
3D Landscaping amd Animation,Modeling Software
Accessibility
Accessories
Accessories,Configuration & Management
Accounting
..
..
X Server
X Server,X Windows
X Windows
Year 2000
zip

Looking through the list, the application types that catches my attention right away are named ‘Cryptography’, ‘Disk Wiper’, ‘employee monitoring’, ‘Encryption’, ‘File Sharing’, ‘Hacker Tool’, ‘Keyboard Logger’, ‘p2p client’, ‘password recovery’, ‘privacy tool’ and ‘Steganography’. Even though you might find any files that are part of these application types, it does not automatically mean that the system you are investigating has been compromised. What is does mean is that you need to look into the matter. It might just be that there is a legitimate use for having products of the application type ‘Disk Wiper’ shown below installed on the system you are investigating.

pmedina@forensic:~/NSRL/RDS_236m$ grep -e 'Disk Wiper\"' NSRLProd.txt
18640,"Paragon Disk Wiper 8.5 Personal Edition","2007","190","731","English","Disk Wiper"
18640,"Paragon Disk Wiper 8.5 Personal Edition","2007","200","731","English","Disk Wiper"
18640,"Paragon Disk Wiper 8.5 Personal Edition","2007","204","731","English","Disk Wiper"
18640,"Paragon Disk Wiper 8.5 Personal Edition","2007","209","731","English","Disk Wiper"
18640,"Paragon Disk Wiper 8.5 Personal Edition","2007","226","731","English","Disk Wiper"
18640,"Paragon Disk Wiper 8.5 Personal Edition","2007","231","731","English","Disk Wiper"
18640,"Paragon Disk Wiper 8.5 Personal Edition","2007","237","731","English","Disk Wiper"
18641,"Absolute Disk Wiper","None","189","727","English","Disk Wiper"
18746,"Active@KillDisk Professional Suite 5.0","1999-2008","189","1211","English","Disk Wiper"
18748,"Active@ Eraser Professional 4.1","4.1","189","1211","English","Disk Wiper"
18750,"East-Tec Eraser 2008","c.1998-2008","189","1209","English","Disk Wiper"
22369,"Wipedrive Six","c. 2010","231","1543","English","Disk Wiper"
22369,"Wipedrive Six","c. 2010","237","1543","English","Disk Wiper"
22369,"Wipedrive Six","c. 2010","359","1543","English","Disk Wiper"
23309,"DISKExtinguisher","c. 2009","190","1544","English","Disk Wiper"
23309,"DISKExtinguisher","c. 2009","194","1544","English","Disk Wiper"
23309,"DISKExtinguisher","c. 2009","231","1544","English","Disk Wiper"
23309,"DISKExtinguisher","c. 2009","237","1544","English","Disk Wiper"
23310,"File Extinguisher","c. 2011","190","1544","English","Disk Wiper"
23310,"File Extinguisher","c. 2011","231","1544","English","Disk Wiper"
23310,"File Extinguisher","c. 2011","237","1544","English","Disk Wiper"
                       
Partitioning the database
Now that you have found the application types that you want to separate from your existing hash database we need to divide the file ‘NSRLFile.txt’, creating two new files, something NIST calls partitioning the hash database. There are many ways we can go about to do this but I will be using a program called ‘nsrlext.pl’, part of the ByteInvestigator toolkit by Tony Rodrigues. This program could initially only search for entries belonging to the application type ‘Hacker Tool’ and separate these entries from the rest of the database. Since I needed to do some more extensive searching and partitioning, I patched Tony’s program so it now has the possibility to search for more application types than just the ‘Hacker Tool’. Follow the instructions below how to download ‘nsrlext.pl’ and apply my patch.

pmedina@forensic:~/NSRL/RDS_236m$ wget --no-verbose http://downloads.sourceforge.net/project/byteinvestigato/byteinvestigatr/0.1.6/ByteInvestigator0.1.6.zip
2012-07-05 15:08:12 URL:http://iweb.dl.sourceforge.net/project/byteinvestigato/byteinvestigatr/0.1.6/ByteInvestigator0.1.6.zip [103132/103132] -> "ByteInvestigator0.1.6.zip" [1]
pmedina@forensic:~/NSRL/RDS_236m$ md5sum ByteInvestigator0.1.6.zip
4a9d5e3d004f95caabd4fe5ab1a70d2a  ByteInvestigator0.1.6.zip
pmedina@forensic:~/NSRL/RDS_236m$ 7z x ByteInvestigator0.1.6.zip nsrlext.pl

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Processing archive: ByteInvestigator0.1.6.zip

Extracting  nsrlext.pl

Everything is Ok

Size:       2585
Compressed: 103132
pmedina@forensic:~/NSRL/RDS_236m$ cat ../nsrlext.pl-v01.patch
12a13
> use strict;
14c15
< my $ver="0.1";
---
> my $ver="0.1m";
17,18c18,19
< %args = ( );
< getopts("hn:p:g:b:", \%args);
---
> my %args = ( );
> getopts("hn:p:g:b:s:", \%args);
21c22,26
< if ($args{h}) {
---
> my $error_msg="\n";
> unless ($args{n}){$error_msg.="Enter the NSRL hashset file list (comma delimited)\n";$args{h}="";}
> unless ($args{p}){$error_msg.="Enter the NSRL product file list (comma delimited)\n";$args{h}="";}
> 
> if (defined $args{h}) {
29a35
>  -s :string to search for. Default "Hacker Tool". Ex: -s "Web Builder|Presentation"
32a39
>    print "$error_msg\n";
36,41c43,45
< die "Enter the NSRL hashset file list (comma delimited)\n" unless ($args{n});
< die "Enter the NSRL product file list (comma delimited)\n" unless ($args{p});
< 
< die "Enter known good and/or known bad output filenames\n" unless (($args{g}) || ($args{b}));
< 
< my %hack;
---
> my (%hack,$string);
> if ($args{s}){$string=$args{s};}
> else {$string="Hacker Tool";}
47,48c51,52
< 
< foreach $item (@prod) {
---
> print "searching for products labeled: $string\n";
> foreach my $item (@prod) {
56c60,63
<       $hack{$line[0]} = $item if ($line[6] =~ /Hacker Tool/);
---
>       $line[6]=~s/\W+$//;
>       $line[6]=~s/^"//;
>       $line[6]=~s/"$//;
>       $hack{$line[0]} = $_ if ($line[6] =~m/^($string)$/i);
60a68,71
> print "excluding the following products:\n";
> foreach my $product (keys %hack){print "$hack{$product}\n";}
> 
> unless (($args{g}) || ($args{b})){print "\nPlease use '-g' and '-b' to specify the files to store the result of the hash database partition\n";exit;}
71c82,83
< foreach $item (@hset) {
---
> foreach my $item (@hset) {
> print "processing hash file: $item\n";
75a88,94
>       if ($i==0){
>               print BAD $_ if ($args{b});
>               print GOOD $_ if ($args{g});
>                       $i++;
>               next;
>       }
> 
77c96
<       print ">" if (($i % 10000) == 0);
---
>       if ($i == 10000){print STDERR ".";$i=1;}
108a128
> Modified by Par Osterberg Medina
111a132
> return ();
pmedina@forensic:~/NSRL/RDS_236m$ patch nsrlext.pl -i ../nsrlext.pl-v01.patch -o nsrlext-v01m.pl
patching file nsrlext.pl
pmedina@forensic:~/NSRL/RDS_236m$ chmod +x nsrlext-v01m.pl
pmedina@forensic:~/NSRL/RDS_236m$ ./nsrlext-v01m.pl

nsrlext.pl v0.1m
Extracts known good and known bad hashsets from NSRL
Tony Rodrigues
dartagnham at gmail dot com
Modified by Par Osterberg Medina
--------------------------------------------------------------------------

 uso: nsrlext.pl -n nsrl_files_comma_separated -p nsrl_prod_files_comma_separated [-g known_good_txt] [-b known_bad_txt] [-h]

 -n :nsrl files comma separated. Ex: -n c:\nsrl\RDA_225_A\NSRLFile.txt,c:\nsrl\RDA_225_B\NSRLFile.txt
 -p :nsrl prod files comma separated. Ex: -p c:\nsrl\RDA_225_A\NSRLProd.txt,c:\nsrl\RDA_225_B\NSRLProd.txt
 -g :known good txt filename. Ex: -g good.txt
 -b :known bad txt filename. Ex: -b bad.txt
 -s :string to search for. Default "Hacker Tool". Ex: -s "Web Builder|Presentation"
 -h :help


Enter the NSRL hashset file list (comma delimited)
Enter the NSRL product file list (comma delimited)

pmedina@forensic:~/NSRL/RDS_236m$

For us to partition the hash database and actually split the ‘NSRLFile.txt’ file in two parts we need to specify a couple of switches to the ‘nsrlext.pl’ program. First we need to use the switch ‘-m’ and give the program the path to our hash database, the ‘NSRLFile.txt’. The next step is to specify the path to ‘NSRLProd.txt’, the file that contains all the product codes for the entries in the hash database. We also need to specify which ‘ApplicationType’ strings to search for so we can build the list of products we are going to separate from the hash database. This is accomplished by using the switch ‘-s’ and specifying the names you want to search for. In the example below I am searching for the same application types that previously caught my attention above.

pmedina@forensic:~/NSRL/RDS_236m$ ./nsrlext-v01m.pl -n NSRLFile.txt -p NSRLProd.txt -s "Cryptography|Disk Wiper|employee monitoring|Encryption|File Sharing|Hacker Tool|Keyboard Logger|p2p client|password recovery|privacy tool|Steganography"

nsrlext.pl v0.1m
Extracts known good and known bad hashsets from NSRL
Tony Rodrigues
dartagnham at gmail dot com
Modified by Par Osterberg Medina
--------------------------------------------------------------------------

searching for products labeled: Cryptography|Disk Wiper|employee monitoring|Encryption|File Sharing|Hacker Tool|Keyboard Logger|p2p client|password recovery|privacy tool|Steganography
excluding the following products:
19296,"Spector CNE Investigator","1998-2008","254","903","English","employee monitoring"
20722,"uTorrent","0.9.2","395","1429","English","File Sharing"
9755,"ProDiscover Basic and ZeroView","2002-2006","WIN","TPathways","English","Encryption"
21763,"iMesh","10","189","1451","English","p2p client"
6490,"Hack Attacks Denied Second Edition","NA","WIN","Wiley","English","Hacker Tool"
3297,"Guide to Hacking Software Security 2002","1.0","WINXP","Silv","English","Hacker Tool"
..
..
6510,"Invisible KeyLogger 2000","NA","UNK","Amecisco","English","Keyboard Logger"
21805,"Limewire","4.14.8","189","1448","English","p2p client"
21777,"Limewire","5.5.14","189","1448","English","p2p client"
2099,"RPing","NA","WIN","Unknown","English","Hacker Tool"
4525,"Spector CNE","4.1","WINXP","Spectorsoft","Unknown","Keyboard Logger"
2102,"PortScan","NA","WIN","Unknown","English","Hacker Tool"

Please use '-g' and '-b' to specify the files to store the result of the hash database partition
pmedina@forensic:~/NSRL/RDS_236m$

If everything looks correct and the products you want to separate are listed in the command output, the only thing left to do is to specify the path to the files to store the result in. This is done by using the switches ‘-g’ and ‘-b’. The entries that matches the products we searched for will be put in the file specified with the ‘-b’ switch and all other entries will be put in the file specified with the ‘-g’ switch. Even though the ‘-g’ stands for good and the ‘-b’ stands for bad, I will not put the entries that matches my search criteria in the ‘KnownBad’ hash category. Instead I will put the database with the matches in the ‘KnownForbidden’ category explained in this blog post.

pmedina@forensic:~/NSRL/RDS_236m$ sudo ./nsrlext-v01m.pl -n NSRLFile.txt -p NSRLProd.txt -s "Cryptography|Disk Wiper|employee monitoring|Encryption|File Sharing|Hacker Tool|Keyboard Logger|p2p client|password recovery|privacy tool|Steganography" -g /opt/forensic/database/knowngood/RDS_236m-good.txt -b /opt/forensic/database/knownforbidden/RDS_236m-forbidden.txt

nsrlext.pl v0.1m
Extracts known good and known bad hashsets from NSRL
Tony Rodrigues
dartagnham at gmail dot com
Modified by Par Osterberg Medina
--------------------------------------------------------------------------

searching for products labeled: Cryptography|Disk Wiper|employee monitoring|Encryption|File Sharing|Hacker Tool|Keyboard Logger|p2p client|password recovery|privacy tool|Steganography
excluding the following products:
19296,"Spector CNE Investigator","1998-2008","254","903","English","employee monitoring"
20722,"uTorrent","0.9.2","395","1429","English","File Sharing"
9755,"ProDiscover Basic and ZeroView","2002-2006","WIN","TPathways","English","Encryption"
..
..
processing hash file: NSRLFile.txt
...................... OUTPUT REMOVED
Done !
pmedina@forensic:~/NSRL/RDS_236m$ wc -l NSRLFile.txt
25892925 NSRLFile.txt
pmedina@forensic:~/NSRL/RDS_236m$ wc -l /opt/forensic/database/knowngood/RDS_236m-good.txt /opt/forensic/database/knownforbidden/RDS_236m-forbidden.txt
  25804471 /opt/forensic/database/knowngood/RDS_236m-good.txt
     88455 /opt/forensic/database/knownforbidden/RDS_236m-forbidden.txt
  25892926 total
pmedina@forensic:~/NSRL/RDS_236m$

The NSRL hash database has now been separated into two hash databases. The first database is placed in our ‘KnownGood’ category and any files that match entries in that database will be automatically discarded. The second database that we created is a database that holds entries of files that are not allowed in our organization - generating alerts if any of these files are detected on a system we are investigating.

No comments:

Post a Comment