Predictive Coding Wars: Man or Machine?

Predictive Coding Wars: Man or Machine?

Predictive Coding Wars: Man or Machine?150150Jason Krause

Lawyers need search technology to find documents for litigation. That’s just a fact of life in the digital age. However, it is increasingly clear that the current search methods are not keeping up with the explosion in digital content.

predictive coding bots

Recognizing this challenge, the Text Retrieval Conference (TREC) Legal Track was launched earlier this decade to test different types of computer searches and create industry best practices for searching electronic records in litigation.

The TREC Legal Track found that Boolean keyword searching, which is the current computer technology lawyers use to find documents for litigation, missed 78 percent of relevant documents. That figure was nearly identical to the results of the only other major study done on the subject, the Blair and Maron study from 1985.

People Are Just as Bad

But when considering whether or not to use technology-assisted review, lawyers should be careful not to make the mistake of thinking that human reviewers are more reliable than computers. Studies show that manual reviewers – whether reviewing documents for legal or general information retrieval purposes – miss a remarkably large number of relevant documents; often 30 percent or more.

A study by Maura Grossman and Gordon Cormack, based on data from the TREC 2009 Legal Track, showed that manual reviewers missed from 21 percent to 75 percent of the responsive documents, depending on the topic. A subsequent study conducted by Grossman and Cormack showed that about 90 percent of the missed documents were clearly due to human error and should easily have been identified.

However, the TREC project reliance on simple keywords and Boolean strings sometimes results in a large quantity of relevant evidence going missing. A wide variance in success achieved, but the results are very promising when compared with historically lower rates of recall and precision across many information retrieval studies. But the success demonstrated to date is highly encouraging.

TREC Retired Too Soon?

Sadly, TREC was still in the early days of performing research, but the project seems to have dried up and will not likely be rebooted anytime soon. In the meantime, Jason Baron, who lead the TREC project said, “I would be concerned about courts simply taking on faith the results of past research as applicable in all legal settings.”

Unfortunately, that’s what the courts are doing. In a case that got a lot of attention last year, Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (S.D.N.Y. Feb. 8, 2012), Judge Andrew Peck approved the use of predictive coding to review documents. Despite this approval from the bench, the effectiveness of technology is not yet certain.

Maybe Humans Are Useful After All

Though there is no magic bullet or simple technological answer to this problem, there is hope. During the last half-decade of research that has gone into TREC’s Legal Track, something important has emerged from the results, namely: we have a much better understanding of what a good “search process” looks like. That includes a “human in the loop” (known in the Legal Track as a “topic authority”) evaluating on an ongoing, iterative basis what automated search software kicks out by way of results.

The debate over this past decade has been whether man or machine is better suited for document review in litigation. It turns our man + machine = the answer.

[hs_action id=”6358″]