Best for large-scale document reviews and/or messy data imports.
Near dedupe differs from our standard “dedupe” as it goes beyond hash values (electronic thumbprints) in identifying duplicates, and/or similar documents
Process can be done on native or imaged/produced data
Steps of analysis:
Extract the text of every document
Text files are then analyzed through a “scoring” process
This creates one “Master” doc and scores all other nearly duplicative docs against it
Each score equates to a percentage of similarity
I.e. Document B is 97% similar to the Master-Document A
The client has full reign as to how stringent they want to be on what’s considered a Duplicate
I.e. Anything above 85% similar, could be considered a duplicate in some cases, whereas another client may want to be more strict and set the threshold to 90% and above instead
Once this scoring is complete, the results are viewable in Nextpoint
The Masters are isolated into one folder, and the Duplicates into another
The “Similarity Score” is brought into a coding field for visibility and sorting purposes
The Related Document window, will further show you a “cluster” of nearly duplicate documents
The Master will show up on the top level, with all other duplicates showing up as “related” documents underneath
Privacy & Cookies Policy
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.