We all know that big data is not only threatening to break corporate IT budgets, but threatens to overwhelm the legal system. As organizations try to solve one problem, like controlling ownership of data, they inevitably create new headaches for the legal department. These self-inflicted wounds are probably inevitable to some degree in the data-intensive world we live in, but they are not unsolvable.
Mo’ Data, Mo’ Problems
This morning, IT World published the article, “IT Under Data Barrage,” citing research from the Aberdeen Group, which found companies average “28 sources of incoming data, 14 from internal operations, nine from partners and five from outside their business.” The median company surveyed had 150 terabytes of data, and one-fifth of organizations reported growth rates of more than 75 percent. That’s a lot of data, but the real problem the survey found is not the volume of data- it’s that the data being created is not available to be analyzed or reviewed. According to the piece, only 23 percent of data in an average organization is available for analysis.
Where is this data coming from and why is it so unknowable? One growing source of big data complications eDiscovery analyst Greg Buckles notes, is the surge in popularity for Sharepoint content management systems. While these enterprise-wide systems can simplify corporate data management, ownership of documents is often impossible to track. Even worse, a Sharepoint study from Azeleos found that if organizations had to produce data for a Sharepoint eDiscovery request or a regulatory audit, nearly half (43 percent) said they would find doing so difficult or very difficult. The same study found that IT staffing to manage Sharepoint has fallen by 10 percent in just one year. As Buckles says, “Be aware of this trend to migrate file shares to SharePoint and get ahead of your IT bulldozer, not underneath it.”
In addition, Buckles notes that organizations migrating to third party hosted applications, including Google Docs and Gmail, fail to impose the same controls and limits on the content as is required under most document retention policies. It is tempting to keep more data when the cost of ownership for these cloud applications is low, but it doesn’t mean organizations should suddenly ignore sensible policies. Especially since a failure to enforce data retention policies is what caused so many eDiscovery headaches for many corporations not too long ago. (Andersen Consulting would probably still exist if it had enforced it’s policies before the Enron debacle.)
Stop Shooting Yourself in the Foot
As organizations are increasingly looking to preserve data from Sharepoint or cloud data sources like Google Docs, Salesforce, and other hosted applications, they are going to need help to collect, archive, and preserve that data. Unfortunately, collecting data from third-party sources or document management systems is not easily achieved without special planning. For example, Nextpoint’s SmartCrawl Collections will provide a custom SmartCrawl Evaluation of an organization’s content to capture these sources of data.
It’s easy for an organization to get excited about the ability for technology to streamline operations and make departments more efficient. New environments work because they break the rules and are not limited by the same factors as older generations of software. But if it’s done without an eye towards making data accessible for litigation or regulatory control, you’re just creating dangerous new problems.