Wikipedia History Contest Winners

Two weeks ago, I summoned the Lazyweb for a way to automatically generate a slideshow of Wikipedia revision history. I wanted it so badly, I offered $50. Other people felt the same and kicked in an additional $200 (among other nice prizes)!

Four outstanding entries were entered: Dan Phiffer’s Wikipedia Animate, Corey’s WikiDiff, John Resig’s AniWiki and Colin Hill’s BetterHistory.

The winner? Dan Phiffer’s Wikipedia Animate. (If you haven’t used it, watch Jon Udell’s brief screencast to see it in action.)

Although John Resig’s AniWiki entry had several innovations, Dan wins because of the elegant Wikipedia integration and the ease of use. Dan’s entry was the first to use a slider for navigation, allowing you to scrub across revisions with changes reflected in real-time, and I like the ability to switch between selected arbitrary ranges using the existing Wikipedia buttons or the entire revision history. It looks like a seamless part of Wikipedia. He’ll receive $200, one Flickr Pro account, a $20 Threadless gift certificate, and the Socialtext Starter package.

Second place goes to John Resig’s innovative AniWiki. Although I didn’t like the slideshow navigation as much, I was blown away by his graphical chart of activity over time and the visual diffs written entirely in Javascript. (Dan Phiffer later incorporated John’s Javascript diff algorithm into his own code.) For his excellent work, John will receive $50 and a Flickr Pro account.

These scripts raise an interesting question about the ethics and etiquette of user scripts, since they all generate multiple page requests to Wikipedia. There was some debate about this on the Greasemonkey discussion list.

I think Dan’s entry was an excellent compromise, as the only one that doesn’t automatically load any extra pages without explicit user action (i.e. clicking a button). Not to pick on Corey’s otherwise excellent entry, but the Greasemonkey script loaded (at least) 30 revisions in the background when viewing every Wikipedia entry, whether you wanted the history or not. No matter what the solution, anyone animating the history of a wiki entry with hundreds (or thousands) of revisions could seriously impact the server’s performance. What’s great for users isn’t always great for the website creator.

Anyway, thanks to everyone for participating. Go, Lazyweb!

Tom Cruise Kills Oprah

On Tuesday, I posted a link to my local copy of the Tom Cruise Kills Oprah Quicktime video that I found on an unnamed file-sharing site. Since then, the clip’s exploded in popularity. It’s been linked from MSNBC, USA Today, b3ta, and every Livejournal and message board on the planet. Right now, I’m serving about 200 gigabytes of the 4.1MB video every day. Yesterday, it was downloaded over 52,000 times and 60,000 times the day before. (Watch my bandwidth implode in real-time.)

Does anyone know the origins of the video? (Update: Found! Read the update below for details.) I know that it was later used to create the YTMND page, but nobody seems to know the source of the original video.

Between his crazy Scientology ranting, his war on psychiatry and anti-depressants, creeping out Hollywood starlets, frenetic couch jumping, and the conspiracy theories surrounding the brainwashing of Katie Holmes, it seems like Tom Cruise is the new Michael Jackson. Sounds good to me.

Update: Leonard’s hosting the video for a while until traffic dies down a bit. Thanks, Leonard!

June 27, 2005: An anonymous commenter admitted to creating the movie using audio from a downloaded version of Star Wars Episode III. He adds, “The video was created using Adobe Premiere for the shot and sound editing and Adobe After Effects 6.5 for the lightning using the ‘advanced lightning’ feature.”

I can confirm that my source originally found it on Shacknews, so this makes sense.

Yellow Antelope Comment Spam

I consider myself fairly knowledgable in the world of comment spam, but this one leaves me completely baffled… Two comments were posted right after each other to two different entries, with two different e-mail addresses but identical text. Here it is:

IP Address: 85.65.41.131

Name: yellow antelope

Email Address: [email protected]

Comments:

Think of every yellow antelope you know – they do not match! Enchanted experience of betting and gambling with yellow antelope http://spaces.msn.com/members/rear-animels/ yellow antelope is what I was looking for.

The MSN Spaces blog linked in the comment has only two entries, and they’re complete nonsense. The text files they link to on 50webs.com make even less sense, since they have no hidden links and no apparent purpose.

Theoretically, they could be driving up the pagerank of these seemingly benign pages, and then replace them en masse with advertising pages… But why inflate the search engine ranking of the pages for terms like “purple clown” and “yellow antelope”?

June 22, 2005: More bizarre animal spam today, apparently from the same people as the antelope spam. This one uses Blogspot instead of MSN Spaces:

IP Address: 85.64.46.113

Name: protected animals

Email Address: [email protected]

Comments:

The best protected animals in the world. protected animals tournaments are now available.

Automating Wikipedia History

This recent Jon Udell entry about Wikipedia wars mentioned a great idea, but I don’t have the time to code it.

I’d love to see a tool for animating Wikipedia history for a given entry or block of text (see Udell’s screencast for an example). Bonus points for highlighting what changed in each version, and extra special bonus points for a way to scrub backwards and forwards through time. I don’t care if it’s a Greasemonkey extension, Flash or Ajax, as long as it does the job.

Lazyweb, hear my plea! $50 $250 (and a free Flickr Pro account) to the best implementation, ruthlessly decided by me in about a week. If anyone else wants to kick in money for the bounty, feel free to post a comment. (If your application meets Jason Scott’s criteria in the comments below, you’ll earn an additional $50.)

Update: Two amazing entries were submitted so far, both using the Greasemonkey extension for Firefox. Dan Phiffer’s Wikipedia Animate and Corey’s WikiDiff. Others are still in development, and a winner will be announced on Tuesday.

June 21, 2005: Two more entries! John Resig’s AniWiki and Colin Hill’s BetterHistory. Also, note that the first two submissions have had big changes… Give them all a try, and stay tuned for the winner later today.

June 27, 2005: The winners!

E3 Underdogs 2005

E3 isn’t the best place to find innovative new games, largely because it’s a massive marketing event driven by “sure-thing” multimillion dollar blockbusters and movie/TV franchises. But every year, there are a few underdogs that somehow make it onto the show floor, along with a couple inspiring big-name titles.

Last year, Namco gave an obscure Japanese title called Katamari Damacy a tiny space in the back of their booth. I fell in love, and dubbed it the Best of Show in my 2004 Oddball Roundup. Katamari ended up being the biggest cult hit of the year.

With that in mind, a little late, here’s a roundup of my favorite underdogs from this year’s E3. (G4TV has their own video underdog roundup. For a more mainstream roundup, try 1UP’s detailed list of the best games at E3.)

Continue reading “E3 Underdogs 2005”

Star Wars EP3 Workprint Leaked Online

Though the original rumors of screener copies of Star Wars Episode III spreading via BitTorrent turned out to be fakes, a new workprint of the film was leaked online yesterday. Watermarked with timecodes on every frame, the workprint was made available as a DVD-R by a pirate group named VISA by Wednesday afternoon.

To get an idea of the quality of this workprint, the video below is a 19 second sample excerpt, posted along with the complete film to Usenet newsgroups. (Thanks for mirroring, Leonard.)

Download: starwars_ep3_leaked_workprint_sample.avi (7.3 MB, XViD)

Mirror #1: starwars_ep3_leaked_workprint_sample.avi

(Note: I don’t have the movie and have no plans to download it, so please don’t ask me how to download it. Also, please don’t post torrent links in the comments.)

Update: This entry is linked in a new Reuters article and many media outlets that use the Reuters feeds, like CNN Money, News.com, and ZDNet. The story is also on the front page of Yahoo right now. In the blog world, it was linked by Boing Boing, Fark (indirectly), and Kottke, among others.

The results? My bandwidth is going through the roof. Downloads of the sample video are pushing 82 Mb/s. That’s pretty insane, and it’s a testament to EV1 Servers that the site is still online and speedy.

Syndic8's Search Engine Spam

Hot on the heels of the recent WordPress fiasco, it looks like Syndic8.com is doing something very similar. Charles Coxhead pointed out that the popular RSS feed index is hosting hundreds of thousands of junk articles, designed to lure search engine traffic to context-sensitive text ads.

Unlike WordPress, the links aren’t hidden; they’re in plain sight at the bottom of nearly every page, added in late November 2004 according to Archive.org. Also, it’s unclear how much the hosted subdomains are benefiting from Syndic8.com’s pagerank of 7.

Here’s a list of the sites I found, with the number of articles indexed by Google and Google pagerank. At current count, over 194,000 articles are indexed.

credit.syndic8.com: 27,700 (PR7)

debt.syndic8.com: 8,780 (PR3)

glasses.syndic8.com: 6,310 (PR6)

insurance.syndic8.com: 38,400 (PR7)

jewelry.syndic8.com: 4,010 (PR3)

loans.syndic8.com: 37,400 (PR3)

marketing.syndic8.com: 14,500 (PR6)

mortgage.syndic8.com: 10,500 (PR6)

personals.syndic8.com: 21,700 (PR4)

training.syndic8.com: 25,500 (PR6)

Do people really think this is a legitimate form of advertising revenue? Jeff Barr and Bill Kearney, the two Syndic8 leads, are both smart guys and they seem to support the practice. But why? Gaming search engines makes the web less useful for everyone. As Leonard put it, “It’s a simple question, right? Is what you’re doing making the world a better place or not?”

Maybe I’m in the minority here. I’d love to hear what everyone thinks.

Update: Interesting discussion in the comments, with feedback from Jeff Barr and GoogleGuy. Jeff comments, “I fully realize that there are lots of ways to fund a ‘public resource’ site like this, and I simply chose one that worked and was available to me.”

Google representative GoogleGuy adds, “This is absolutely webspam… syndic8.com’s choice to ‘rent out’ subdomains to spammers and link to the spam from their home page will directly impact their reputation in search engines.”

May 6, 2005: Philipp Lenssen reports that Syndic8.com was removed from Google’s index entirely. By e-mail, a Google engineer also confirmed that the Google AdSense account for Syndic8’s ad affiliate was terminated.

I agree with Aaron Wall in the comments, who states that it’s Google’s responsibility to make sure people aren’t cheating AdSense. If their quality control for the AdSense program was more rigorous, this wouldn’t be an issue. They’re passively supporting this practice by allowing people to profit off it.

May 12, 2005: Jeff Barr posted a public response on his blog. My response is in his comments.

May 25, 2005: TDavid posted an hour-long audio interview with Jeff Barr, which covers the history of Syndic8.com and a detailed discussion of the advertising issue. This is a great listen with many insights into Jeff’s frame of mind.

He mentions that Bill Kearney warned him about the subdomain advertising deal: “Bill was actually a fairly cautious person. He said, ‘You know what, we’ve got to be a little bit careful here, Jeff.’ And me being a bit naive or maybe a bit too enthralled by getting checks, I said, ‘You know, I think this’ll be okay.’ So I went ahead and accepted that advertiser.” Another new bit of info was that the subdomains were actually provided by multiple advertisers.

As I said originally, I still maintain that Jeff is a smart guy who made a poor business decision, and I think that comes across in the interview. For him, it sounds like a cautionary tale he wants other people to learn from. “Basically, spreading out this story and telling people what happened and these are the things where you need to be careful… I think there’s always room for people to learn.”

WordPress Followup

I talked to Matt Mullenweg last night by AIM, from a wi-fi connection in Italy. He was just starting to catch up on the story, swamped under a ton of e-mail and interview requests. He wrote something brief on his weblog, and is in the process of composing an official response for the WordPress homepage. Update: Here it is. Please read it! It addresses many questions.

Update: Matt just posted a comment in this thread, addressing some of the conspiracy theories and saying that his response is coming very soon:

Andy did let me know he was working on the article, I was open and answered all his questions however I’d prefer if our AIM conversation wasn’t quoted more just because I don’t have access to it myself and I’m not terribly articulate on IM. It didn’t occur to me that the article would be finished while I was gone and there would be so much feedback. I’m back online and going to be posting a response shortly.

Otherwise, things are settling down from a very busy day yesterday. eWeek, MSNBC, The Register, Slashdot, Ars Technica, and Metafilter all posted articles about it. After deleting the offending articles, Google added WordPress.org back into their search results and reinstated the 8/10 pagerank.

Chad Jones, the creator of Hot Nacho, contacted me and asked me to post this statement. Several parts of his story were contradicted by Matt himself, and I don’t believe it myself, but I’m happy to reproduce it in full below.

Continue reading “WordPress Followup”

WordPress Website's Search Engine Spam

Disclaimer. I’m hesitant to even write about this, knowing the web’s fondness for angry mob justice, but I feel like it’s an important issue that needs to be addressed. My one request: please be calm and rational. WordPress is a great project, and Matt is a good guy. Think before piling on the hatemail and flames. (Important Update: Followup to this entry, with an official response from WordPress and Hot Nacho.)

The Problem. WordPress is a very popular open-source blogging software package, with a great official website maintained by Matt Mullenweg, its founding developer. I discovered last week that since early February, he’s been quietly hosting at least 120,000 168,000 articles on their website. These articles are designed specifically to game the Google Adwords program, written by a third-party about high-cost advertising keywords like asbestos, mesothelioma, insurance, debt consolidation, diabetes, and mortgages. (Update: Google is actively removing every article from their results, but here’s a saved copy of the first page of results. You can still view about 25,000 results on Yahoo. Here’s an example of some results in MSN.)

Continue reading “WordPress Website's Search Engine Spam”