The Wayback Machine is cool, not just because you can go back and see what Ars looked like in May 1999, but because its 1+ petabyte archive contains a treasure trove of data for researchers. Last year, the Internet Archive, which runs the Wayback Machine, was sued by Healthcare Advocates after the attorneys for another company used the Wayback Machine to access information that might be helpful in an ongoing legal action.
Healthcare Advocates and the Internet Archive have finally resolved their differences, reaching an undisclosed out-of-court settlement. In some ways, that's disappointing news for onlookers who were hoping to see how a court would sift through the complex issues facing Internet archives, caching systems, and more. More on that later.
Here's the backstory. Healthcare Advocates found itself embroiled in a trademark dispute with Philadelphia-based Health Advocate. The latter company was represented by Harding Earley Follmer & Frailey, which used the Wayback Machine to access Healthcare Advocates web pages dating back to 1999 in an attempt to find information that would bolster their client's case. Healthcare Advocates then sued both Harding Earley and the Internet Archive, alleging among other things, violations of the DMCA.
Operated by the Internet Archive, the Wayback Machine dates back to 1996 and archives web sites using Alexa's crawler. Like many other crawlers, Alexa respects the Robot Exclusion Standards (RES), a voluntary protocol designed to prevent robot crawlers from accessing part of a website. In this case, the lawsuit between the two Advocate companies was filed on June 26, 2003. On July 8, 2003, Healthcare Advocates added a robots.txt file to its site to invoke the RES so that crawlers would stop spidering it.