The Art of eDiscovery Data Filtering and Culling

Data Filteringe in Discovery

The Art of eDiscovery Data Filtering and Culling

The Art of eDiscovery Data Filtering and Culling 1200 801 Michael Beumer

Every lawyer wants a smoking gun email, text, or Facebook post that will decidedly win a case or force a settlement. Given the mind-boggling volume of electronic data flooding our world, finding relevant, discoverable information is a more difficult and expensive endeavor. But, if you are strategic about eDiscovery data filtering techniques prior to document review, you can drastically reduce eDiscovery expenses and simplify your search for evidence.


eDiscovery Data Filtering In Three Parts

More than 90 percent of all cases settle prior to trial, which means discovery is when cases are won and lost, not in the courtroom. Today, we begin the first of a three-part series investigating how legal teams can take a massive collection of electronic evidence and whittle it down to a manageable size. First, we cover the technical issues surrounding strategic data reduction. Subsequent posts will cover the role of human judgement in the process and the new technologies transforming the art of eDiscovery data filtering and culling. 


eDiscovery-Defensible Deletion

Data Filtering Is Not Optional

Data processing is not an end, but a means by which potentially responsive information is identified and organized. The next stage of discovery – document review – is where responsive evidence will be positively identified. More filtering and culling will happen in that review phase, but the processing stage is where most of the work of filtering by defensible criteria will happen.

Failing to properly filter a dataset can be fatal to almost any case. After negotiating with opposing counsel in the Rule 26(f) conference (more on that in Part 2), parties can engage in defensible deletion, in which they identify and eliminate inessential data so that it doesn’t overwhelm their efforts to identify relevant information.



Data Search And Destroy

The first major step is to extract text and metadata from data and build a searchable index. The metadata is entered in a database and the collection is made text-searchable through the creation of an index. Once you have documents in a database, then you can then effectively screen out the irrelevant by using a search and filter strategy.

The goal is simply to eliminate immaterial items and things that fall outside of certain mandated criteria: date ranges, file types, Internet domains, file size, custodian and other document characteristics.

This will help exclude items which have little or no value as discoverable evidence, and may significantly reduce your data set very quickly. Remember that the majority of your database will NOT be relevant. You want to make reduce the size of the haystack to make finding the needles an easier task.


The most common initial filtering strategies include:

• Filtering by file type:

Determine what types of files will not be needed for the purposes of a matter. Some files, like audio files or most graphics can be put aside for further analysis later. Other examples of immaterial items include container files – like ZIP or mailbox files like Outlook PST and MBOX — that tend to have no relevance apart from their contents.

• Filtering by date:

Identify the date range relevant to a matter so you can cut data that could not possibly be in scope.

• Keyword filtering:

Consider eliminating mass company email blasts sent to particular distribution lists or generic notifications that contain text such as “Do not reply.”

• Domain filtering:

Like a spam filter, searching by known domains can eliminate junk mail, newsletters, and other items that cannot possibly be relevant to your case.


Analytical Tools Of The Trade

In addition to search technology, it is possible to extract low-value materials from higher value information using a variety of analytical tools. Some of the most important tools include:


DeNIST is a list of common system files compiled by the by National Institute for Standards and Technology. De-NISTing is the process of removing all so-called system files that are deemed to have no evidentiary value, like executables, OS Files, DLLs, etc.

• De-duplication:

As the name suggests, it is possible to remove exact copies of files, and sometimes near matches, from a data set. Typically, deduplication will remove files that contain a given percentage of duplication with other unnecessary files.

• Email threading:

Email threading groups a string of related emails together in a chain. Legal teams can eliminate the whole chain at once if immaterial.

Not all of these efforts are just aimed at removing files. Some of your filtering can be used to identify places where evidence might be hidden. For example, text can be hidden in image files or scanned document TIFFs and may be relevant for review.



Data Reduction = eDiscovery Cost Reduction

Get smart about reducing electronic data with strategic culling and filtering, and you can dramatically reduce expenses of eDiscovery review. Of course, filtering is only as good as the terms and strategies you employ.

Future blog posts will provide more information about how those strategies are formulated, and will highlight technical filtering, substantive filtering, advanced filtering, and defensibility. In the meantime, if you have questions regarding data reduction methods, reach out to the experts at Nextpoint. We are here to help.

Part 2 in our data filtering series is now published: The Human Role in eDiscovery Data Filtering and Culling.