Archive.is blog

Can your spider be stopped in a similar way as IA Archiver: "User-agent: ia_archiver Disallow: /" Thank you.

Anonymous

There is no spider (as a machine which takes decisions what to archive).
All the urls are entered manually by users (or taken from https://en.wikipedia.org/wiki/Special:RecentChanges, where they also appear as a result of user edits).
If the archive would check and obey robots.txt, then, if archiving is disallowed, the error should be seen to the user, right?
Then, on seeing the error, the user will archive the page indirectly, first feeding the url to an url shortener (bit.ly, …) or to an anonimizer (hidemyass.com, ..) or to another on-demand archive (peeep.us, …), and then archive the same content from another url, thus bypassing the robots.txt restrictions.
So, this check will not work the same way as it works with IA Archiver (which is actually a machine which takes decisions).
- June 16, 2013 (9:29 pm)