Waxy.org
Waxy.org is the sandbox of Andy Baio, an independent journalist and programmer living in Portland, Oregon. I created Upcoming.org and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM

The Forbidden Web

Posted May 9, 2002

Unethical idea of the day: 'The Forbidden Web,' a search engine that only indexes files disallowed by robots.txt files. For example, CNN's robots.txt file asks search engines to avoid their transcripts, jobs, website statistics, and development directories. The Forbidden Web would index only those forbidden (and often intriguing) directories. Evil, isn't it?

A glance at the robots.txt files on some popular sites: New York Times, Google, Hotwired, eBay, Slashdot, Verisuck, Kuro5hin, Filepile, ZDNet, Epinions, IMDB, BBC, IBM, USA Today, Jakob Neilsen.

You can search Google for more robots.txt files.

3 Comments (Add Yours)

May 9, 2002
10:48 AM  
mat wrote:

wow. that IS a great idea.


May 9, 2002
3:36 PM  
jkottke wrote:

I wondered about the same thing back when I had to modify my robots.txt file to keep the Wayback Machine from downloading my entire site *6 times a day* (post-9/11). I bet there are quite a few bots out there that specifically look for items forbidden by the robots.txt file. To what end, I don't know.


Sep 9, 2003
12:40 AM  
Reform Blog.com wrote:

I actually noticed that google was robots.txt less for a while a few weeks ago


 

Leave a comment





Waxy Links
Ads via The Deck
January 6, 2009
The Perils of Zero-Gravity Videography — Matt Harding discovers hard drive-based camcorder don't work in zero-gravity (via)
Screenshot: 4chan hacks MacRumorsLive during Apple keynote — the 4chan thread shows how they found the admin interface, password hashes, and finally cracked a user's password
January 5, 2009
xkcd's Guide to Converting to Metric — even Liberia and Myanmar are mostly metric, compared to the U.S.
Crowdsourcing an Ethical Dilemma — Dolores Labs uses Mechanical Turk to answer the Trolley Problem
January 3, 2009
Stamen's Mike Migurski on extreme programming vs. interaction design — the linked interview is great
January 2, 2009
Jason Scott on the closure of AOL's online communities — like physical evictions, there need to be laws protecting community data in the event of closure
JPG Magazine to stop publishing, turn off website — with only three days notice; here's the response from Derek and the JPG community
December 31, 2008
Wikipedia over DNS — loony hack serves summaries of Wikipedia articles; also available as JSON and JS
Leap year bug caused every 30GB Zune to crash at 2am this morning — as strange as the Android bug that ran every keystroke as root
Metafilter's exhaustive tour of the early origins of Adult Swim — the Cartoon Network breathed new life into old cartoons, while constantly trying to find the next big thing
December 30, 2008
Infochimps' massive scrape of Twitter's friend network — Twitter gave their blessing on sharing the 56-million records, which includes 10M tweets and 220k hashtags
The Lonely Island's We Like Sportz — the sequel to Just 2 Guyz
Niall Kennedy documents the undocumented Google Reader API — whoops, this was three years ago; here's an updated version
Sakurako Shimizu's Waveform Jewelry — the "I Do" wedding band and Atari chip ring are cute, too
Fimoculous' 30 Most Notable Blogs of 2008 — an incredibly well-researched list, with related recommendations for every entry
December 29, 2008
DJ Earworm's United State of Pop 2008 — mashing up the top 25 singles of the year into a single song and video
Twit 4 Dead, four Twitter bots fight zombies in real-time — watch their collected activity here
Facebook sentiment mining predicts presidential polls — like StateStats, Facebook Lexicon is tons of fun
Giganews reports Usenet upload growth since 2001 — note this doesn't reflect Usenet popularity, but most likely the rise of huge Blu-Ray and HD rips
December 28, 2008
List of Starbucks employee jargon — culled from the Starbucks Gossip blog
December 27, 2008
Rocketboom covers the history of the Lip Dub — the Know Your Meme series is consistently well-researched and fun to watch
Jennifer 8. Lee on the history of General Tso's Chicken — different cultures each localized their own versions of Chinese food around the world (via)
Top 20 freeware games released by Cactus, this year — is Jonatan the most prolific game developer alive?
December 26, 2008
Paul and Storm finish their 25 Days of Randy Newman — hosted on Bandcamp, and now with the solo piano track used in each song
AutoPager, infinite scrolling for Firefox — love the idea, but too clunky for everyday use (via)
December 24, 2008
Net Cafe archives, dot-com nostalgia TV show from 1996-2002 — Sergei Brin in 2000 at the newly-opened Metreon, Mondo 2000 and Boing Boing, awkward Webby broadcasts, and hundreds of dead dot-coms (via)
The Offworld's best indie and overlooked games of 2008 — also: Gamasutra's top 5 indie games
Left 4k Dead — lo-fi zombie shooter in 4k of Java (via)
NORAD's Santa Tracker on Twitter — they just passed through Kazakhstan; also tracking on Google Maps and in 3D on Google Earth
December 22, 2008
ScummVM adds support for 7th Guest — I didn't realize they expanded into non-Scumm engines last year, including the Sierra AGI games

Andy Baio lives here. Some rights reserved, for your pleasure.