The CFReDS Project

NIST is developing Computer Forensic Reference Data Sets (CFReDS) for digital evidence. These reference data sets (CFReDS) provide to an investigator documented sets of simulated digital evidence for examination. Since CFReDS would have documented contents, such as target search strings seeded in known locations of CFReDS, investigators could compare the results of searches for the target strings with the known placement of the strings. Investigators could use CFReDS in several ways including validating the software tools used in their investigations, equipment check out, training investigators, and proficiency testing of investigators as part of laboratory accreditation. The CFReDS site is a repository of images. Some images are produced by NIST, often from the CFTT (tool testing) project, and some are contributed by other organizations. National Institute of Justice funded this work in part through an interagency agreement with the NIST Office of Law Enforcement Standards.

In addition to test images, the CFReDS site contains resources to aid in creating your own test images. These creation aids will be in the form of interesting data files, useful software tools and procedures for specific tasks.

IMPORTANT NOTE: This web site is under development and may change or be reorganized at any time.

Data Set Types

There are several uses envisioned for the data sets, but we also expect that there will be unforeseen applications. The four most obvious applications are testing forensic tools, establishing that lab equipment is functioning properly, testing proficiency in specific skills and training laboratory staff. Each type of data set has slightly different requirements. Most data sets can be used for more than one function. For example, the Russian Tea Room can be used to evaluate the behavior of a tool to search UNICODE text or display UNICODE text. This set can also be used as a skill test for an examiner to demonstrate proficiency in working with UNICODE text or as a training exercise.

Data sets for tool testing

Data sets for tool testing need to be completely documented. The user of the data set needs to know exactly what is in the data set and where it is located. These data sets should also provide specification for a set of explicit tests. However, the user should have sufficient documentation to develop and execute other test cases if necessary or desirable. These data sets could be part of a realistic investigation scenario, but it is easier to control expected results if each data set is focused on a particular type of tool function. Examples of focused function areas are string searching, deleted file recovery and email extraction.

There will tend to be many small test images, each focused on a particular feature for the tool function being tested.

Data sets for equipment check out

These data sets need to focus on issues in acquisition, access and restoration of data. These data sets might need to have a strong procedural component.

Data sets for staff training

These data sets would be primarily investigation scenario based tests to give a real flavor to the data set. These would be similar to the data sets for proficiency testing, but generally available.

Proficiency Testing and Skill Testing

These data sets would be primarily investigation scenario based tests to give a real flavor to the data set. These would be similar to or the same as the data sets for staff training. There would be some small images to focus on specific skills. For example, a data set that would require the examiner to demonstrate some system skill such as loading a new font onto an analysis computer.

Data Set Documentation

The degree of documentation required for a data set varies depending on the use of the data set. For example, a data set for testing string searching requires absolute disk addresses for strings located in unallocated space, but an investigation scenario data set may only need to say that the file at C:\mystuff\social-security-numbers.txt contains the information to be found.

Data Set Distribution

Several data set distribution schemes were considered. Using actual hard disk drives was ruled out as too costly and impractical. We will need to balance several factors, including realism, cost, and practicality.

Current Data Sets

(NOTE: THESE DATA SETS ARE NOT FOR FEDERATED TESTING)

These are prototype data sets for public comment ([email protected]). Some test sets are multi-skill holistic cases, e.g., the hacking case while other test sets are focused on specific skills, e.g., non-English text searching in the Russian Tea Room case.

Data Set	Description
Hacking Case	Any names in the image are fictional and do no refer to real people.
Data Leakage Case	Large, complex image involving intellectual property theft
Registry Forensics	Data Set for testing MS Windows Registry Extraction Tools
Drone Images	Images from 60 drones and associated controllers, connected mobile devices and computers
Russian Tea Room	Unicode string search in Russian or English (Bigendian)
asb image, dd, E01	Unicode string search in Russian (UTF-8)
Create a reference drive	Create a drive with known hash values. The creation process also verifies that the computer hardware and the drive are working as expected.
Basic Mac image	Mac File Systems (HP OS Extended Journaling, HP OS Extended, HP OS Standard & Unix)
Rhino Hunt	Look for images (of a rhinoceros) in an image file and network traces.
Memory Images	Live memory capture images
DCFL	DCFL Control image
Mobile Device Images	Chip-off /JTAG binary images
Container Files	String searching on container and nested container files
Deleted File Recovery	Metadata based deleted file recovery images
File Carving	Basic file carving images
File Carving CFTT Images	Images used for CFTT file carving test reports