The Tactical Ediscovery Data Processing Workflow That Streamlines Document Review
The Tactical Ediscovery Data Processing Workflow That Streamlines Document Reviewhttps://www.nextpoint.com/wp-content/uploads/2022/11/Tactical_Ediscovery_Data_Processing_Workflow.png47982801Elizabeth GuthrieElizabeth Guthriehttps://secure.gravatar.com/avatar/ac8ae9c5e7533bbb2b8cb6440229d792?s=96&d=mm&r=g
As a legal practitioner, you understandably want to start looking at files and documents as soon as possible so you can start developing your legal strategy. You want to see who was talking to whom, what they were saying, and generally gain a better, high-level understanding about the people, places, and events involved in the matter. This Early Case Assessment (ECA) also involves helping your clients understand the legal risks as well as the costs of the matter.
Step 1: Normalizing Your Data (There’s No Such Thing as Normal Data!)
The first step in an ediscovery processing workflow is to “normalize” all the collected data so that the review is consistent and straightforward.For example, when you’re looking at a glob of email, Word documents, PDF files, pictures, sound recordings, spreadsheets, and more, you need to ensure that you can read and see and hear all those files in an approachable manner. It seems like it shouldn’t be complicated, but it’s important to accurately identify every file type so they can all be properly formatted for your viewing pleasure.
For email, we also have to ensure all attachments are linked to their proper messages (what we call the parent/child relationship). Even more important, we need to “normalize” the time zones associated with all the messages. If we processed all the emails as per the time zone where you practice law, there’s a possibility that you would find emails sent after they were received, which is obviously confusing (even more confusion when we factor in Daylight Savings Time or international time zones). For this reason, we usually process everything according to Universal Coordinated Time (UTC), and you’ll need to be comfortable with that format.
Additionally, we have to make sure any zipped/compressed files are uncompressed and properly listed. And we have to extract any embedded objects that might have been inserted into Microsoft Word documents or Excel spreadsheets. Another important step is to assign each file a “DocumentID” or control number so we can provide analytics and audit trails in the platform. Note this is NOT a Bates number, since those are typically assigned when you generate a production set.
Step 2: Metadata Extraction and File Culling (De-Mystifying the Content)
While lawyers are understandably focused on reading the content of emails and documents, it’s critical that all of the metadata from those files is properly extracted so that it can all be populated into a database. The processing stage extracts all the information from the From, To, CC, BCC fields, along with the Sent & Received dates/times, the Subject line, and several more properties such as whether the message was opened or replied to, and what conversation thread it belongs in. Having all the metadata extracted into a spreadsheet-like database view means you can easily sort and filter data to focus on just the communications you need to investigate.
Your ediscovery processing workflow should also include deduplicating files,and this is where you need to provide some input to your vendor. Let’s say you’ve collected emails from 10 different individuals/custodians, and you realize each of those 10 individuals may have received the same email – do you want to read that same email message 10 times? Or would you rather the duplicates be removed with an indicator to each individual who received that message? These are important decisions you need to discuss with your vendor, who can help you understand your options so you get what works best for your review needs.
Lastly, this is the step where any non-searchable files are OCR’d so they are readable and searchable in the platform. There may be some scanned paper documents or pictures that contain text that humans can read, while the computer has to attempt to recognize that text for searchability. A computer can try to OCR handwriting, but just know that it won’t be perfect, which means your searches may be incomplete.
Step 3: Indexing and Searching (You Can’t Search What You Don’t Index)
When attorneys think about “searching” documents, they envision typing in a word and having the computer check for that word in every single file. You can’t be blamed for visualizing the task that way, but the reality is that it would take so long for a computer to search every document that it would be a time-wasting disaster.
Instead, when you type in a word and hit the search button, the computer is actually scanning an “index” or dictionary of words that has been generated based on all the words found in the files during the data processing stage. That way, it’s only searching for words found in the files you collected, and it only has to inquire with that index rather than laboriously explore every document every single time. This is much more efficient and gets you the results you’re looking for in fractions of a second. The index knows every file where a word is found, and so it can highlight your search terms in the files during your review.
But there’s a flip side to this – in order to be most efficient and avoid human impatience, many search indexes will ignore the most common words such as and, to, is, etc. These “noise” words or “stop” words show up at an astronomically higher rate than all other words. Since we’re usually not searching for those conjunctions,determiners, and prepositions,the indexes will just completely ignore them.
This is standard procedure, but you should be aware of these limitations if you ever come across a situation where you might need to search for those specific words. Craig Ball has an excellent example that in most ediscovery document review platforms, you won’t be able to find the phrase “to be or not to be” even if you put it in quotations, because those are noise words that would not be indexed in the data processing phase.
At this step, you should consider proactively giving your provider (like Nextpoint) a list of keywords or search terms that you’re interested in so you can receive a “hit report” after processing. This report can be helpful to show you how many occurrences of certain words are found in the data and allows you to filter chosen keywords before diving directly into a manual review.
Step 4: Data Mining and Analytics (Examine What The Data is Telling You)
Lastly, an ediscovery processing workflow should enable you to take advantage of deep-dive analysis of your data. Computational tools, like Nextpoint’s Data Mining, can be used to highlight interesting or significant patterns in your data to provide you with better angles to approach review. There are several advanced tools utilizing artificial intelligence (AI), machine learning (ML), natural language processing (NLP), and a host of other mind-blowing technologies. Just ask your vendor what basic analytical tools they have that can help you.
For example, immediately after data processing, Nextpoint provides you with a set of statistics on how many files and documents you’re faced with, how many email messages, how many attachments, and how many email threads or conversations in total. You can also view a visual, interactive timeline of files and email messages so you can focus on a specific date range. There are data widgets that break down the different file types found in your data, as well as the authors and email domains.
In addition to these features, Nextpoint offers Data Mining, the new groundbreaking technology for Early Case Assessment and comprehensive data analysis. The app generates snapshots of key themes in your data and offers advanced search features that can be used to create custom visual reports. As volumes of electronic data explode in the legal field, advanced tools like this are becoming key parts of handling potential evidence in litigation.
An Ediscovery Processing Workflow To Simplify Your Data Load
As you can see, the “processing” stage of ediscovery has a lot more happening under the hood, and while some of these tasks are standard and run-of-the-mill, it’s also important that you become comfortable with all the options and processes so you can make the best decisions for your clients and their data. With these strategies, there’s no need to be overwhelmed by discovery data – you can simplify and understand it before diving into document review.