Mason Archival Repository Service

A Digital Media Similarity Measure for Triage of Digital Forensic Evidence

Show simple item record

dc.contributor.advisor Jones, James H Jr
dc.contributor.author Lim, Myeong Lyel
dc.creator Lim, Myeong Lyel
dc.date 2020-11-23
dc.date.accessioned 2021-01-26T21:21:05Z
dc.date.available 2021-01-26T21:21:05Z
dc.identifier.uri http://hdl.handle.net/1920/11926
dc.description.abstract As the volume of potential digital evidence increases, digital forensics investigators are challenged to find the best allocation of their limited resources. While automation will continue to partially mitigate this problem, the preliminary question of which media should be examined by human or machine remains largely unsolved. Prior work has established various methods to assess digital media similarity which may aid in prioritization decisions. Similarity measures may also be used to establish links between media, and by extension, links between the individuals or organizations associated with that media. Existing similarity measures, however, have high computational costs which can delay identification of digital media warranting immediate attention or render link establishment across large collections of data impractical. In this work, I propose, develop, and validate a methodology for assessing digital media similarity to assist with digital media triage decisions. The application of my work is predicated on the idea that unexamined media is likely to be relevant and interesting to an investigator if this unexamined media is similar to other media previously determined to be interesting and relevant. My methodology builds on prior work using sector hashing and the Jaccard index similarity measure. I combine these methods in a novel way and demonstrate the accuracy of my method against a test set of hard disk images with known ground truth. My method is called Jaccard Index with Normalized Frequency (JINF) and calculates the similarity measure between two disk images by normalizing the frequency of the distinct sectors. I also developed and tested two extensions to improve performance. The first extension randomly samples sectors from digital media under examination and applies a modified JINF method. I demonstrate that the JINF disk similarity measure remains useful with sampling rates as low as 5%. The second extension takes advantage of parallel processing. The method distributes the computation across multiple processors after partitioning the digital media, then it combines the results into an overall similarity measure which preserves the accuracy of the original method on a single processor. Experimental results provided as much as a 51% reduction in processing time. My work goes beyond interesting file and file fragment matching; rather, I assess the overall similarity of digital media to identify systems which might share applications and user content, and hence be related, even if some common files of interest are encrypted, deleted, or otherwise not available. In addition to triage decisions, digital media similarity may be used to infer links and associations between the disparate entities owning or using the respective digital devices. en_US
dc.language.iso en en_US
dc.subject Jaccard index en_US
dc.subject link discovery en_US
dc.subject sampling en_US
dc.subject drive similarity en_US
dc.subject sector hash en_US
dc.subject parallel computation en_US
dc.title A Digital Media Similarity Measure for Triage of Digital Forensic Evidence en_US
dc.type Dissertation en_US
thesis.degree.name Doctor of Philosophy in Information Technology en_US
thesis.degree.level Doctoral en_US
thesis.degree.discipline Information Technology en_US
thesis.degree.grantor George Mason University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


Browse

My Account

Statistics