Waxy.org
Waxy.org is the sandbox of Andy Baio, a journalist/programmer living in Portland, Oregon. I'm the CTO of Kickstarter, created Upcoming.org, and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM

Star Wars Kid: The Data Dump

Posted May 21, 2008 (Updated Jun 14, 2008)

This Friday, I'll be speaking at the Webvisions conference in Portland about Internet memes, how they spread, and how their distribution's changed over time.

As part of that research, I've been digging into my original server logs from the Star Wars Kid debacle, five years after I played a major role in what some say is the biggest viral video of all-time.

Be warned, this is more detail than you'll ever want about the origins of the Star Wars Kid meme and how it spread. You don't care about this level of detail, but I'm writing this all down so that I never have to think about it again.

In addition, I've decided to release the first six months of server logs from the meme's spread into the public domain — with dates, times, IP addresses, user agents, and referer information. (Download it below.)

Early Origins

Like I mentioned in my original entry, the video was first released by Ghyslain's schoolmates to Kazaa on April 19, 2003 with the original filename "ghyslain_razaa.wmv." Within three days, it was being passed around in the offices of Raven Software in Madison, Wisconsin, where a game developer named Bryan Dube posted it on his personal website on April 22. Two days later, he created the first Star Wars Kid remix, adding lightsabers and sound effects in a new video titled "TheLastHope.avi."

On April 27, a mostly-NSFW online community called Sensible Erection linked to the video on Bryan's website. Later that evening, an SE user cross-posted it to a private file-sharing community I belong to with the new filename "star_wars_guy.wmv." It quickly became the most popular file on the site, which is where I found it the following day, April 28 at 7:52pm.

On April 29, I renamed it Star_Wars_Kid.wmv and posted it to my site at 4:49pm &mdash inadvertently giving the meme its permanent name. (Yes, I coined the term "Star Wars Kid." It's strange to think it would've been "Star Wars Guy" if I was any lazier.) An hour later, Scott Gowell becomes the first person to link to the video.

From there, for the first week, it spread quickly through news site, blogs and message boards, mostly oriented around technology, gaming, and movies. Throughout the life of the meme, most of the referers are blank, suggesting people were primarily sending the links by email or instant message.

The chart below shows the distinct top-level domains that appeared in the referral logs grouped by day.

It's worth noting that the majority of sites sent less than 10 referers in that first month, and 21% of domains referred only one user. (Note: The chart below is on a logarithmic scale for both axes.)

Mainstream News Coverage

Here's some of the highlights from the mainstream media coverage. The New York Times was the first major paper to report on it, almost a week after I tracked Ghyslain down, Jish and I interviewed him for the first time, and we started the fundraiser.

May 19, New York Times
May 19, Wired News
May 20, Public Radio International's "The World" (radio program)
May 20, Globe and Mail
May 20, National Post
May 23, The Mirror UK

Jun 6, LA Times

Jul 4, The Independent UK
Jul 12, The Age
Jul 23, Wired News
Jul 25, BBC News
Jul 31, NPR w/Tavis Smiley (radio interview)

Aug 21, USA Today, syndicated Associated Press article
Aug 25, NBC's Today Show (TV program)
Aug 26, MSNBC's Countdown (TV program)
Aug 28, USA Today
Aug 30, Seattle P-I

Sep 8, SF Chronicle
Sep 15, Variety
Sep 16, Globe and Mail

Nov 18, CBS Evening News

Statistics

Here's what the Star Wars Kid meme did to my overall traffic. At its peak, I received almost a million pageviews in a single day.

That includes all pageviews on my weblog entries. Isolating only the video downloads from my site, or later redirected to one of the mirrors, gives the following chart.

Download the Data

This file is a subset of the Apache server logs from April 10 to November 26, 2003. It contains every request for my homepage, the original video, the remix video, the mirror redirector script, the donations spreadsheet, and the seven blog entries I made related to Star Wars Kid. I included a couple weeks of activity before I posted the videos so you can determine the baseline traffic I normally received to my homepage.

The file is 158 megabytes &mdash 1.6GB uncompressed — so I'm distributing it with BitTorrent. The data is public domain. If you use it for anything, please drop me a note!

Download: star_wars_kid_logs.zip.torrent

16 Comments (Add Yours)

May 21, 2008
11:48 PM  
Greg Knauss wrote:

Yes, but what about the _details_?


May 22, 2008
5:41 AM  
n0wak wrote:

"private file-sharing community"

What is this secret cabal that you speak of?


May 22, 2008
6:47 AM  
M. wrote:

It's incredible how quickly this spread...

I feel really bad for the guy though. The guy actually goes to my school, and seems to have been really traumatized by the whole thing - all the ipods and court-settlements in the world aren't worth that kind of notoriety.


May 22, 2008
7:22 AM  
John wrote:

Maybe its time to leave the poor guy alone.


May 22, 2008
9:19 AM  
Andy Baio wrote:

I totally agree. When it became clear he wasn't interested in the attention, I took the videos down and refused all press interviews. I encourage others to do the same.

This entry isn't about Ghyslain or the content of the video, though. It's about how information spreads on the Internet, and the diffusion of one of the most popular viral videos of all-time.


May 22, 2008
10:52 AM  
mu wrote:

Remember Arrested Development! Are we sure the kid's not George Michael?


May 22, 2008
1:34 PM  
Dylan wrote:

It's very interesting to note how that spread... thanks for consolidating this data.


May 22, 2008
2:20 PM  
sixfoot6 wrote:

Great timeline and dataporn. Funny, I remember eating curry with you and the gang, discussing both the memetic and ethical aspects of the Star Wars Kid craze... I can't believe that was 5 years ago.


May 23, 2008
10:48 AM  
Jish wrote:

To this day, I still get about 1-2 "media requests" for information on Ghyslain ... usually asking for his contact info or if I know of his current whereabouts. Much like Andy, I deny them all.


May 27, 2008
2:51 PM  
Brian wrote:

great stuff. thanks!


Jun 13, 2008
4:14 AM  
Peter Ferne wrote:

Interesting graphs. Any chance of a log-log plot of the Number of Referrals per Domain? As it is it's hard to resolve anything useful.


Jun 14, 2008
12:34 AM  
Andy Baio wrote:

Good call, I changed the chart.


Jun 14, 2008
12:07 PM  
Peter Ferne wrote:

That's a lot better, thanks, but if the x-axis were on a log scale too it would make it even easier to distinguish the top half a dozen referers and, assuming it's a power law, the curve would be more or less a straight line. Zipf's Law would suggest a gradient of approximately -1 but I'd be curious to see how close it came.


Jun 14, 2008
4:09 PM  
Andy Baio wrote:

Done. Thanks for the advice.


Jun 22, 2008
8:51 AM  
Ian Woollard wrote:

You should remove the poor guys real-life name from this webpage. It adds little, and if anything perpetuates the abuse the guy got.


Jun 22, 2008
11:07 AM  
Andy Baio wrote:

Funny you should say that, since I've never mentioned his last name on my site, until today. I've referred to him in seven entries, always by his first name only. I even renamed the file when I first posted it in 2003 so his full name wouldn't be revealed.

80 mainstream articles have used his full name, including the New York Times, USA Today, Globe and Mail, Seattle Post-Intelligencer, Associated Press, NPR, International Herald Tribune, Newsweek, Forbes, BBC News, Variety, and the San Francisco Chronicle. Plus, his name was used in the lawsuit, making it a matter of public record.

Five years later, I don't think stating the original filename of the video (which misspells his last name) makes much difference, and it's relevant to anyone researching the early history of the meme's distribution.


 

Leave a comment





Waxy Links
Ads via The Deck
November 20, 2009
Regretsy gets a book deal — the anonymous author turned out to be April Winchell, collector of audio oddities
Google Chrome OS Demo — a world without a local filesystem and apps; also, the Chrome UI concept video (via)
Patrick Moberg's Internet Vices — funny, Tumblr feels more like beer than wine to me
Charlotte Gainsbourg and Beck's "Heaven Can Wait" — Keith Schofield's surreal video and insane treatment were inspired by FFFFOUND and Reddit, but maybe too explicitly (via)
November 19, 2009
YouTube adds machine-translated automatic captions — starting with some partner channels, but auto-timing is available to everyone today
Microsoft tries to patent Edward Tufte's sparklines — they were recently added to Excel
Leonard Lin's Retweet Avatars for Greasemonkey — a subtle change, but a big improvement
Web-ops god John Allspaw leaves Flickr to join Etsy — he's the last of the original Ludicorp team to go (via)
November 18, 2009
Laptop Steering Wheel Desk — don't miss the product photos
Interview with Ralph Eggleston, Pixar's production designer on WALL-E — from last February, but new to me; I didn't know the Axiom had three passenger classes
NSFW: Animated pixel-art video for Flair's "Trucker's Delight" — warning: very offensive and sexist, but the attention to 16-bit detail by director Jérémie Perin is incredible
NY Observer on Anil Dash's new government 2.0 incubator project — Expert Labs debuted at Web 2.0 today, funded with a $500k grant from the MacArthur Foundation
November 17, 2009
Google's Dan Morrill explains how the Droid autofocus breaks every 24.5 days — this gets second-place for quirkiest Android bug (via)
Conan O'Brien and Andy Richter on Zach Galifianakis' Between Two Ferns — his style of comedy usually makes me uncomfortable, but this made me laugh
The Pirate Bay shuts down their tracker for good — they're switching to DHT instead
November 16, 2009
How Darren at Link Machine Go found Belle de Jour's identity five years ago — Brooke was part of the early UK blog scene
ICU64, real-time visualization of Commodore 64 memory — the developer also posted videos of Paradroid and Boulder Dash (via)
Russell Davies on pretending and "barely games" — his SAP prototype looks like great ambient fun (via)
NYT Magazine on the indie gaming movement — nothing new here, but good overview with a wonderful closing anecdote from Cactus
Tim O'Reilly on the pending War for the Web — "more than that, it's a war against the web as an interoperable platform"
November 14, 2009
Jason Scott rounds up Geocities' top 10 most popular MIDI files — along with a torrent with 51,000 MIDIs rescued by Archive Team
Matt Haughey on the discovery of his brain tumor, treatment, and the Internet's response — there were about 1,000 #mathowielove tweets in 24 hours
Belle de Jour reveals herself after six year of anonymity — only six people in the world knew, she only told her parents yesterday (via)
Paul F. Tompkins debates comedy ethics with Improv Everywhere's Charlie Todd — great discussion, and it's hard not to see where both are coming from (via)
November 13, 2009
Rogue Amoeba stops iPhone app development after App Store idiocy — I'm with Marco, the only fix is allowing external apps, but it's unlikely (via)
Numb3rs on IRC — "Luckily, I speak l33t."
Prank War 8: The Skydiving Prank — hard to say if life-threatening situations are funnier than public humiliation
301 Works, Internet Archive works to preserve URL shortener data — the shorteners will provide regular backups and hand over data on closure, though TinyURL's conspicuously missing
November 12, 2009
Quizipedia — simple game with trivia scraped from Wikipedia entries
Kill Screen, funding a new art magazine about videogames — sounds like the English analogue of Amusement I was hoping for

Andy Baio lives here. Some rights reserved, for your pleasure.