Unethical idea of the day: ‘The Forbidden Web,’ a search engine that only indexes files disallowed by robots.txt files. For example, CNN’s robots.txt file asks search engines to avoid their transcripts, jobs, website statistics, and development directories. The Forbidden Web would index only those forbidden (and often intriguing) directories. Evil, isn’t it?
You can search Google for more robots.txt files.