Why You Must Archive Website Content

Why You Must Archive Website Content

Why You Must Archive Website Content 150 150 Jason Krause

Digital historians everywhere can breathe a sigh of relief. The first web page in the history of the Internet has been found. You can see the page in all its hypertext glory or see a small screen-capture below. Just look at all those font sizes!

The story of the first web page is that Tim Berners-Lee created a computer language to help researchers in his lab share information. In 1990, he put together the first page in HTML, and later showed it publicly at a conference called Hypertext 91 in San Antonio. Until recently, web historians (there is apparently such a thing) had failed to archive the first web page. Despite recent efforts to archive website content, like the Internet Archive, the page had not been seen in decades.

The Disappearing Internet – A Common Fate

It’s not unusual for web pages to disappear forever. According to Brewster Kahle, the founder of the Internet Archive in San Francisco,”The Web was not designed to be preserved. The average life of a Web page is about 100 days.”

One hundred days is not very long. That means anything you put online is irrevocably changed, edited, or even entirely destroyed in a very short time. Social media is even more ephemeral- your Facebook timeline or Twitter feed has probably completely changed in the time it takes you to read this sentence.

Now imagine you, your company, or a client is involved in a lawsuit. Say, an infringement, intellectual property, or misappropriation case of some sort. Or you are suing someone else for infringement of some kind. You are probably going to have to produce web content to defend yourself. Or, if you are the plaintiff, you will need to capture the other party’s web content for litigation to prove your case.

However, the existing models for discovery do not provide all of the tools necessary to successfully archive website content and preserve continually changing and disappearing social media information. The things that make social media popular–the instant and shared nature of the communications–create new challenges for organizations concerned with preservation.

The only reasonable and defensible strategy for preserving, reviewing, and producing social media information is to capture it quickly and in such a way that you can identify changes, edits, or deleted content.


With Websites, S.O.P. is S.O.L.

The standard operating procedure for many litigators is still to preserve a copy of a web page as a PDF or other image file. We’ve talked about authenticating social media before. Unfortunately, one common problem is that many lawyers still present digital evidence in court printed out on paper. For example, in Griffin v. State, the court overturned a murder conviction at least in part over a failure to authenticate evidence obtained from a MySpace profile, which had been presented as a printout.

Accurately preserving social media is achieved through the application programming interfaces (APIs) for each social media type. Legacy approaches to discovery such as capturing screen-shots or creating PDF images from browser-generated content is insufficient for accurately capturing data from social media platforms. Any attempt to collect content from social media sources must make use of APIs to pull actual data and content from the social media source, and not just collect an image or screen capture.

Capturing web and social media content means archiving a complete and forensically accurate copy of the data, or authenticating the data later becomes impossible. Nextpoint has been archiving social media for litigation since 2010. Talk to us about how to make social media discovery work in any matter.

For more information about Nextpoint’s social media and website archiving software for eDiscovery, contact us, or download our two free eBooks on Social Media Discovery: