Unethical idea of the day: 'The Forbidden Web,' a search engine that only indexes files disallowed by robots.txt files. For example, CNN's robots.txt file asks search engines to avoid their transcripts, jobs, website statistics, and development directories. The Forbidden Web would index only those forbidden (and often intriguing) directories. Evil, isn't it?
A glance at the robots.txt files on some popular sites: New York Times, Google, Hotwired, eBay, Slashdot, Verisuck, Kuro5hin, Filepile, ZDNet, Epinions, IMDB, BBC, IBM, USA Today, Jakob Neilsen.
You can search Google for more robots.txt files.

Waxy.org is the sandbox of 
10:48 AM
wow. that IS a great idea.
3:36 PM
I wondered about the same thing back when I had to modify my robots.txt file to keep the Wayback Machine from downloading my entire site *6 times a day* (post-9/11). I bet there are quite a few bots out there that specifically look for items forbidden by the robots.txt file. To what end, I don't know.
12:40 AM
I actually noticed that google was robots.txt less for a while a few weeks ago