The Only Feature That Matters in eDiscovery

The Only Feature That Matters in eDiscovery https://www.nextpoint.com/wp-content/themes/fildisi/images/empty/thumbnail.jpg 150 150 Nextpoint Nextpoint https://secure.gravatar.com/avatar/1b86f5c8e4e91db0cfc4afcc9936c3cbf3f7d8539ff4141fa8024c31b9c07e10?s=96&d=mm&r=g May 1, 2012 June 17, 2021

Fourteen years ago, if you visited Google.com, you would have found pretty much the same basic search functionality you see today. Amazingly, since that time Google has been able to offer the same mind-blowing search speeds despite the explosion of content. In 1998, Google indexed 28 million pages. By 2000 that number was 1 billion. And by 2008, just one decade after they started, Google had reached 1 trillion pages.

Today, Google dominates the Internet, not because they have added lots of new features. They have, in fact, added very few. Google dominates because they focused on scalability. Scale is the first priority in the new world of information management.

Features v. Scale

Last month, I wrote about the failure of many eDiscovery technologies to handle large data volumes because they don’t adequately scale. What I didn’t mention, is that in some cases, a glut of added features are the root cause of this failure to scale.

High feature density equals low scalability.

Feature density can be loosely defined as the number of features within a given product/application. Usually products have a fairly specific functional purpose, and the best products have managed to keep that purpose uncorrupted by extraneous “feature noise.” Time tested products like the hammer have managed to keep their feature list pretty low. A modern claw hammer has two features: a blunt end for driving nails and a claw end for prying out nails. A hammer can’t do a lot of things, but it can do one or two things very well.
The right tool for the job.

On the other end of the spectrum, there are multi-tools which have a very high feature density. When you need a little bit of everything and you only have the resources for a single tool, a multi-tool (such as the Leatherman) might be the best choice. But as most of us know, multi-tools don’t do anything quite as well as the standalone tools they are meant to replace. So if you have a big project and you need a good screwdriver, a hammer, and a pair of pliers… the multitool isn’t the best answer. It’s feature density, while very nifty and portable, doesn’t scale to the big, difficult, repetitive tasks.

Scalable Social Media

An even more relevant example, would be social media technologies. Twitter and Facebook are incredibly simplistic from a feature standpoint. They have grown over the years, but still haven’t introduced a single new feature that hasn’t been previously available on the web for over a decade. What they do well is scale and that’s where the substance of their technical innovation really lies. You don’t go to Facebook because they have the best photo viewer. You go because they have the most photos of your friends. And they’ve been able to accomplish that by building the most scalable social platform. The point being, they focus on scale instead of feature density. Scale is the only feature that matters.

Protecting the Core

So why haven’t Google, Facebook, and Twitter, with their endless stores of cash, trained their focus on providing BOTH high scalability and high feature density? As a software engineer, I’ve learned this lesson the hard way. Every little feature you add takes away from a products’ ability to scale. For simple interface changes, it might just be a matter of adding more user behaviors to test. But for systems that need high scalability, the problems are much greater.

At a minimum, computational resource demands grow linearly in proportion to the amount of data. However, in a lot of circumstances the computational requirements grow exponentially with the amount of data. This is particularly true in heavy processing, indexing, and analytics-based tasks. eDiscovery folks can think about things like native file processing, near-dupe identification, and search functionality as good examples of this behavior.

While it’s great to have a cloud computing architecture, any environment carries theoretical limits. The key is to stay focused on the core requirements and ensure that resources are spent largely on those core needs. I could give numerous specific examples of common eDiscovery features that require exponentially growing computational resources. But then again, I’m sure most people reading this have experienced the results of this problem: A feature that works great in a demo (on 5,000 or 10,000 documents) but doesn’t work at all on it’s first real-world implementation.

No one tool is the right tool for every situation. High feature density products have their place in any product category. But in the eDiscovery environment that most of us face, scale is becoming the number one concern. Look for technologies that prioritize scale above feature density and you’ll be way ahead of the game.

The Only Feature That Matters in eDiscovery