Pirating the Oscars 2012: Ten Years of Data

Every year, the MPAA tries desperately to stop Oscar screeners — the review copies sent to Academy voters — from leaking online. And every year, teenage boys battling for street cred always seem to defeat whatever obstacles Hollywood throws at them.

For the last 10 years, I’ve tracked the online distribution of Oscar-nominated films, going back to 2003. Using a number of sources (see below for methodology), I’ve compiled a massive spreadsheet, now updated to include 310 films.

This year, for the first time, I’m calling it: after three years of declines, the MPAA seems to be winning the battle to stop screener leaks. But why?

A record 37 films were nominated this year, and the studios sent out screeners for all but four of them. But, so far, only eight of those 33 screeners have leaked online, a record low that continues the downward trend from last year.

(Disclaimer: Any of this could change before the Oscar ceremony, and I’ll keep the data updated until then.)

They may be winning the battle, but they’ve lost the war.

While screeners declined in popularity, 34 of the nominated films (92 percent) were leaked online by nomination day, with 25 of them available as high-quality DVD or Blu-ray rips. Only three films — Extremely Loud & Incredibly Close, My Week with Marilyn and W.E. — haven’t leaked online in any form (yet!).

If the goal of blocking leaks is to keep the films off the internet, then the MPAA still has a long way to go.

There are a number of theories about what’s causing the decline.

It could be attributed to tighter controls — personalized watermarks, the aggressive prosecution of leakers, and greater awareness of the risks for Academy voters.

But the MPAA may have little to do with the decline. Oscar-nominated films could be coming out earlier in the year, making screeners less important.

Or maybe the interests between the mainstream downloader and industry favorites is diverging? If the Oscars are mostly arthouse fare and critical darlings, but with low gross receipts, they’ll be less desirable to leak online. It would be very interesting to track the historical box office performance of nominees to see how it affects downloading. (Maybe next year!)

The continuously shrinking window between theatrical and retail releases may be to blame. After all, once the retail Blu-ray or DVD is released, there’s no reason for pirate groups to release a lower-quality watermarked screener.

The chart below tracks the window between U.S. release and its first DVD/Blu-Ray leak online, which shows how the window between theatrical and retail release dates is slowly closing since 2003.

Whatever the reason, online movie releasing groups are taking longer to pirate movies than ever. When I first started tracking releases in the early- to mid-2000s, the median time between theatrical release to its first leak online was 1 to 2 days. Now, that number’s crept up to over three weeks.

The rise in leak time correlates with a dip in popularity for lower-quality sources, like camcorder-sourced footage. This year, only eight of the 37 nominees (21 percent) were sourced from camcorder footage. (This is likely because there are fewer blockbuster nominees than in the mid-2000s.)

As the industry slowly transitions from physical media to streaming video, it’ll be interesting to see if the downward trend continues, or if the ease of capturing streaming video spawns a new renaissance for screeners. Last year, Fox Searchlight distributed screeners with iTunes, and all were quickly and easily pirated.

The Data Dump

Skeptical of my results? Want to dig into it yourself? Good! Here’s the complete dataset, available on Google Spreadsheets or downloadable as an Excel spreadsheet or comma-separated text file.

Methodology

I include the full-length feature films in every category except documentary and foreign films (even music, makeup, and costume design).

I use Yahoo! Movies for the release dates, always using the first available U.S. date, even if it was a limited release, falling back to the first available U.S. date in IMDB.

All the cam, telesync, and screener leak dates are taken from VCD Quality, supplemented by dates in ORLYDB. I always use the first leak date, excluding unviewable or incomplete nuked releases.

The official screener release dates are from Academy member Ken Rudolph, who kindly lists the dates he receives each screener on his personal homepage. Thanks again, Ken!

For previous years, see 2004, 2005, 2007, 2008 (part 1 and part 2), 2009, 2010, and 2011.

Why SOPA and PIPA Must Die

Today, you’re going to hear a million solid reasons why SOPA and PIPA — the two proposed bills sponsored by the entertainment industry to censor the web — have to die. Wikipedia, Google, Reddit, craigslist, Metafilter, and many, many more have made their cases. Here’s mine.

Virtually every project I’ve ever worked on is threatened by this legislation:

Upcoming.org faced copyright complaints for event posters and listings that users added to the site.

Kickstarter gets DMCA takedowns from artists who find their work used in pitch videos, and from project founders quarreling with each other.

Supercut.org indexes hundreds of video remixes that reuse copyrighted content.

Kind of Bloop faced a lawsuit over the cover art.

And here on Waxy.org, I’ve had a number of battles over copyright. Among them, I received a cease-and-desist from EMI for being the first person to host DJ Danger Mouse’s Grey Album on the web, from Disney for hosting the Kleptones’ Night at the Hip-Hopera, and from Bill Cosby for hosting House of Cosbys, which was clearly fair use as a parody.

Every cease-and-desist and DMCA request I’ve received wasn’t fun to get in my inbox, but it allowed me to deal with the issues directly with the copyright holder or using the due process of the court system.

Imagine, instead, a world where a bill like SOPA or PIPA passes. A copyright holder could bypass due process entirely, demanding that search engines stop linking to my sites, ad providers drop me, and force DNS providers not to resolve my domain name. All in the name of stopping piracy.

The chilling effect would be huge.

Every online community that allows for community-contributed content — discussion forums, imageboards, Usenet newsgroups, photo sharing communities, video sites, and many more — would be forced to pre-emptively self-censor, shut down, or risk getting blown off the net entirely.

That fucking sucks.

Everything I love about the web requires the unfettered freedom to build new ways to let people express themselves, and with that, comes the risk of copyright infringement.

Breaking the web isn’t a solution.

Please take 10 minutes today to call your representatives — or show up in person! –and let them know you won’t stand for this. SOPA and PIPA must die.

Spotify vs. Rdio, Part 2: The Billboard Charts

Streaming music services like Spotify and Rdio are transforming the way we listen to music, but spotting differences in their catalogs is nearly impossible for the casual listener. The licensing landscape is constantly shifting, with songs appearing and disappearing as labels try to make up their minds.

To help you decide which service is right for you, I’m using the developer APIs provided by each service to go crate-digging into each catalog to see which service comes out on top.

Last time, we looked at 5,000 critically loved albums on both services, with Rdio barely edging ahead of Spotify. That’s great for music geeks who can’t live without “Marquee Moon,” “Bitches Brew” and “In the Aeroplane Over the Sea.” But it leaves more mainstream, single-oriented music fans out in the cold.

If you love pop music, this is your week. We’re digging into 56 years of Billboard charts, searching Spotify and Rdio for every year’s top 100 from 1955 to 2011 — from Elvis Presley and Dean Martin to Rick Ross and Waka Flocka Flame.

How It Works

The Billboard chart data comes from the Whitburn Project, a group of obsessive music collectors who have been quietly compiling historical chart data on Usenet since 1998. Originally intended to help complete their MP3 collections, they used multiple sources to create a spreadsheet of over over 38,500 songs dating back to 1890, with 112 columns of raw data, including each song’s duration, beats per minute, songwriters, label, and week-by-week chart position.

Here’s a sample of the most recent Whitburn spreadsheet from November 11, 2011, so you can see the fields they entered.

With the spreadsheet, I selected the top 100 songs that stayed at the top of the charts the longest each year starting in 1955, and pulled it into a database for easy manipulation.

With these 5,700 songs, I then wrote a script to search the Rdio and Spotify APIs for each track. To standardize artist and song names, I used the Echo Nest’s Song.search API. As before, I’m only checking U.S. availability, since Rdio is limited to the United States and Canada only.

Disclaimer: Variations in artist and song names can lead to some missed results, and false positives can crop up due to karaoke versions and tribute bands. I’ve tried to weed out most of the bad results, but didn’t check all 5,700 results by hand. That said, it doesn’t seem like any error favors Spotify or Rdio, so the results should be fair, if imperfect.

Results

Of the 5,700 songs in the top 100, 5,026 (88 percent) were available on both Spotify and Rdio. An additional 81 (1.4 percent) were only on Spotify, and 100 (1.7 percent) only available on Rdio. If we limit it to only the 570 top-10 singles, 518 songs (over 90 percent) were available on both Spotify and Rdio.

The chart below shows the percentage of the top 100 available per year on Spotify and Rdio. At a glance, you can see how deep both of their catalogs are. It’s very rare for either service to have less than 80 percent of the top 100 in a given year. (Note that the Beatles singlehandedly lower their coverage in the mid- to late-1960s.)

Here’s the average percentage by decade:

Let’s start by looking at the holdouts, the top-charting artists that aren’t available for streaming on either service. As in the album analysis, The Beatles top the list with 35 missing hits, but the rest of the list is very different. All 11 of the Eagles’ top hits are unavailable, Bob Seger fans will be bummed to hear his 10 (!) charting singles are missing, and most of the Red Hot Chili Peppers’ post-1991 hits are unavailable for streaming. The Dave Clark Five’s eight hits from the mid-1960s are all missing, and Aaliyah’s estate is apparently protective of her work, blocking access to her eight big singles.

Other surprising holdouts: Hootie and the Blowfish, Joan Jett, and Roberta Flack. A handful of one-hit wonders are missing entirely, depriving the world of songs like Another Bad Creation’s 1990 debut “Iesha” and Rick Dees’ “Disco Duck” from 1976.

The Exclusives

Both services stream virtually every song every to appear on the Billboard charts, but they don’t overlap perfectly. Each have secured different licenses with record labels, giving each exclusive access to some songs and artists.

If you want to hear the 14 singles released by Paul McCartney, solo and with Wings, you can only hear them on Rdio. Same for LeAnn Rimes, Monica, and Fergie. Spotify, on the other hand, didn’t have exclusive access for any artist with more than two charting singles in the yearly top 100 charts.

Below, I’ve listed the top 20 tracks exclusive to each service, ordered by their overall yearly ranking.

Only on RdioOnly on Spotify
Paul McCartney — My Love (#3, 1973)
Paul McCartney — Say Say Say (#4, 1983)
Monica — The First Night (#4, 1998)
Christina Aguilera — Lady Marmalade (#6, 2001)
Kyu Sakamoto — Sukiyaki (#7, 1963)
Monica — Angel Of Mine (#7, 1999)
Fergie — London Bridge (#7, 2006)
*NSYNC — It’s Gonna Be Me (#11, 2000)
Paul McCartney — Coming Up (Live At Glasgow) (#12, 1980)
LeAnn Rimes — How Do I Live (#12, 1997)
Fergie — Big Girls Don’t Cry (#12, 2007)
Wings — With A Little Luck (#13, 1978)
Divine — Lately (#13, 1998)
Red Hot Chili Peppers — Under The Bridge (#20, 1992)
LL Cool J — Loungin’ (#20, 1996)
Monica — For You I Will (#20, 1997)
Enrique Iglesias — Hero (#21, 2001)
Paul McCartney — Band On The Run (#22, 1974)
Merril Bainbridge — Mouth (#23, 1996)
Wings — Listen To What The Man Said (#24, 1975)
Mariah Carey — Don’t Forget About Us (#7, 2005)
Steve Miller Band, The — Abracadabra (#9, 1982)
Patti Austin — Baby, Come To Me (#10, 1983)
Dr. Dre — Nuthin’ But A G Thang (#14, 1993)
Shocking Blue, The — Venus (#20, 1970)
Mike & The Mechanics — The Living Years (#24, 1989)
Salt ‘N Pepa — Shoop (#29, 1993)
Ashlee Simpson — Pieces Of Me (#33, 2004)
String-A-Longs, The — Wheels (#36, 1961)
Irene Cara — Fame (#38, 1980)
Climax Blues Band — Couldn’t Get It Right (#42, 1977)
Yael Naim — New Soul (#43, 2008)
Madonna — Don’t Cry For Me Argentina (#48, 1997)
Dr. Dre — Dre Day (#49, 1993)
Technotronic — Move This (#50, 1992)
Erykah Badu — Love Of My Life (An Ode To Hip Hop) (#54, 2003)
Paperboy — Ditty (#55, 1993)
Johnny Thunder — Loop De Loop (#57, 1963)
Tee Set, The — Ma Belle Amie (#59, 1970)
Gerry and the Pacemakers — Ferry Across the Mersey (#61, 1965)

Conclusion

Both services do an extraordinary job at including music history’s most popular songs. Virtually every song was available on Spotify and Rdio, a huge change from the previous album-oriented analysis. Again, much to my surprise, Rdio comes out slightly on top. Spotify’s international catalog fills most of these gaps, so expect things to heat up rapidly over the next year as they secure more of those licenses for the United States.

Have any questions about this analysis, or anything missing you’d like to see? Leave a comment and let me know.

(Note: This was originally published for my column at Wired.)

Spotify vs. Rdio: Who Has the Exclusives?

The new generation of streaming music services like Spotify, Rdio, and MOG have more music than you could consume in a lifetime. But how much of it would you really want to listen to?

There’s no shortage of great roundups and reviews showing the pros and cons of each service, but they rarely talk specifically about the different music you can find on each. They’ve all built impressive catalogs, but it’s nearly impossible to tell from casual browsing which artists and albums are exclusives for each.

Fortunately, both Rdio and Spotify offer powerful developer APIs, making it simple to compare the two. (Sadly, MOG doesn’t offer an API, so isn’t included.)

For this test, I needed a large set of popular, well-loved albums to test. I used the top 5,000 albums from Rate Your Music, the quirky 11-year-old online community dedicated to rating and reviewing music. These albums span all genres, from klezmer to chiptune, with a total of 2,282 different artists across 70 years of recorded music.

I used the Spotify and Rdio search APIs to look up each album, and checked their streaming availability in the United States. (Rdio uses the IP address to determine country of origin, making it impossible to query other countries. Spotify, on the other hand, returns a list of every region the album’s available.)

Note: The results aren’t perfect. Spotify and Rdio often have slight differences between artist and album names, which can deliver false positives. Let me know if you spot anything amiss and I’ll correct it.

Results

Of the top 5,000, about 44% were available on both Spotify and Rdio. 4.8% of the albums were only available on Spotify, while a further 6.8% were only available on Rdio. Overall, 56% of the albums were streamable on at least one of the services.

Labels are still withholding most or all of the albums from many popular artists. The Beatles, King Crimson, AC/DC, The Eagles, Tool, De La Soul, Peter Gabriel, Led Zeppelin, and Metallica are nowhere to be found, as well as most of the best albums by The Kinks. Music geeks will be sad to discover that Frank Zappa, Coil, Spacemen 3, and Joanna Newsom are all missing, as well. This landscape will constantly shift as labels change their minds; Arcade Fire was added to Spotify yesterday, and more than 200 indie labels left the streaming services last month.

But what about albums that are exclusive only to one service? The results surprised me. Spotify has a reputation for having a deeper catalog, but at least for historic critically-regarded albums, Rdio has a better selection of both popular and obscure artists. More albums in the top 5,000 were available on Rdio, and they offer exclusive access in the U.S. to huge acts like Bob Dylan, Pink Floyd, the White Stripes, and Queen.

Top Exclusive Artists

Here’s a list of the top 20 artists exclusive to each service, with the number of exclusive albums in parentheses.

Only on RdioOnly on Spotify
Bob Dylan (12)
Pink Floyd (8)
Bruce Springsteen (7)
Miles Davis (6)
The Gathering (5)
Blind Guardian (4)
Can (4)
William Basinski (4)
Iced Earth (4)
Stars of the Lid (3)
The White Stripes (3)
John Williams (3)
Queen (3)
Nevermore (3)
Thelonious Monk (3)
Charles Mingus (3)
Bill Hicks (3)
John Coltrane (2)
Camel (2)
Keith Jarrett (2)
My Dying Bride (4)
Miles Davis (4)
Candlemass (3)
Funkadelic (3)
The Pretty Things (3)
Current 93 (3)
Darkthrone (3)
Underworld (3)
Katatonia (3)
CunninLynguists (3)
Charles Mingus (2)
Mahavishnu Orchestra (2)
The Jesus Lizard (2)
The Misfits (2)
Klaus Schulze (2)
John Coltrane (2)
Galaxie 500 (2)
Silvio Rodríguez (2)
Secos & Molhados (2)
maudlin of the Well (2)

Note that artists like Miles Davis and John Coltrane appear on both lists because of how prolific they were. Both are well-represented in Spotify and Rdio, but some critically-adored out-of-print albums are unavailable on both.

Top Exclusive Albums

Digging into the albums, Rdio wins again. Nine of the top 100 albums are only found on Rdio, while only one is exclusive to Spotify. In fact, there are only 32 albums in the top 1,000 available on Spotify alone. Below is the top 30 for each service, along with their Rate Your Music ranking.

Only on RdioOnly on Spotify
Pink Floyd – The Dark Side of the Moon
Pink Floyd – Wish You Were Here
Bob Dylan – Highway 61 Revisited
Bob Dylan – Blonde on Blonde
The Clash – London Calling
Bob Dylan – Bringing It All Back Home
Bob Dylan – Blood on the Tracks
Pink Floyd – Animals
Bob Dylan – The Freewheelin’ Bob Dylan
Bob Dylan – Another Side of Bob Dylan
Bob Dylan – The Times They Are A-Changin’
Dr. Dre – The Chronic
Stars of the Lid – The Tired Sounds Of
Camel – Moonmadness
The White Stripes – Elephant
Bon Iver – For Emma, Forever Ago
John Williams – Raiders of the Lost Ark
Popol Vuh – Hosianna Mantra
Jethro Tull – Nothing Is Easy: Live at the Is…
Albert King – Born Under a Bad Sign
Keith Jarrett – Vienna Concert
Dead Kennedys – Plastic Surgery Disasters
Thin Lizzy – Black Rose: A Rock Legend
Magic Sam – West Side Soul
Bob Dylan & The Band – The Basement Tapes
Eric Dolphy – Out There
Blind Guardian – Live
Devin Townsend – Terria
Strapping Young Lad – City
Pretenders – Pretenders
The Zombies – Odessey and Oracle
Candlemass – Nightfall
Funkadelic – Standing on the Verge of Getting…
The Jesus Lizard – Goat
The Pretty Things – Parachute
The Jazz Composer’s Orchestra – The Jazz Comp…
Klaus Schulze – X
Sodom – Agent Orange
Danny Elfman – Edward Scissorhands
Galaxie 500 – Today
Current 93 – All the Pretty Little Horses
Secos & Molhados – Secos & Molhados
maudlin of the Well – Bath
Sun Kil Moon – Ghosts of the Great Highway
Anathema – Alternative 4
Darkthrone – A Blaze in the Northern Sky
The Byrds – Fifth DimensionMost Popular
The Gun Club – Miami
Autopsy – Severed Survival
My Dying Bride – Turn Loose the Swans
The Jesus Lizard – Liar
Vektor – Black Future
maudlin of the Well – Leaving Your Body Map
Jean Michel Jarre – Oxygene
16 Horsepower – Secret South
Riverside – Out of Myself
Darkthrone – Transilvanian Hunger
Nino Rota – Amarcord
Suede – Suede
Darkthrone – Under a Funeral Moon

Unless you’re a huge fan of Norwegian death metal, it’s hard to see this as anything but a win for Rdio. The fact is that both services have done a tremendous job of building the celestial jukebox — with a couple of high-profile exceptions, nearly everything you’d ever want to listen to is available at your fingertips.

Now, one huge drawback of using the Rate Your Music list is that it skews towards older album-oriented music geeks. That’s great if you like Ornette Coleman and Galaxie 500, but not so great if you like Drake and Katy Perry.

Next week, we’ll set the controls for the heart of mainstream music: the Billboard charts, analyzing every charted single in the top 100 from 1955 to the present. This will give us a completely different view of their catalogs, focused on pop singles, past and present, instead of classic albums.

Want more? Ed Summers did his own fascinating deep-dive into Spotify and Rdio uses top album lists from Alf Eaton’s Album of the Year list collection, and published the results on Google Fusion Tables. Also, try Matt Montag’s Music Smasher, a tool that searches Rdio, Spotify, and Grooveshark.

(Note: This was originally published for my Wired column.)

No Copyright Intended

On October 26, a YouTube user named crimewriter95 posted a full-length version of Pulp Fiction, rearranged in chronological order.

A couple things struck me about this video.

First, I’m surprised that a full-length, 2.5-hour very slight remix of a popular film can survive on YouTube for over six weeks without getting removed. Now that it’s on Kottke and Buzzfeed, I’m guessing it won’t be around for much longer.

But I was just as amused by the video description:

“The legendary movie itself placed into chronological order. If you’d like me to put the full movie itself up, let me know and I’ll be glad to oblige. Please no copyright infringement. I only put this up as a project.”

These “no copyright infringement intended” messages are everywhere on YouTube, and about as effective as a drug dealer asking if you’re a cop. It’s like a little voodoo charm that people post on their videos to ward off evil spirits.

How pervasive is it? There are about 489,000 YouTube videos that say “no copyright intended” or some variation, and about 664,000 videos have a “copyright disclaimer” citing the fair use provision in Section 107 of the Copyright Act.

Judging by his username, I’m guessing crimewriter95 is 16 years old. I wouldn’t be surprised if most of those million videos were uploaded by people under 21.

He’s hardly alone. On YouTube’s support forums, there’s rampant confusion over what copyright is. People genuinely confused that their videos were blocked even with a disclosure, confused that audio was removed even though there was no “intentional copyright infringement.” Some ask for the best wording of a disclaimer, not knowing that virtually all video is blocked without human intervention using ContentID.

YouTube’s tried to combat these misconceptions with its Copyright School, but it seems futile. For most people, sharing and remixing with attribution and no commercial intent is instinctually a-okay.

Under current copyright law, nearly every cover song on YouTube is technically illegal. Every fan-made music video, every mashup album, every supercut, every fanfic story? Quite probably illegal, though largely untested in court.

No amount of lawsuits or legal threats will change the fact that this behavior is considered normal — I’d wager the vast majority of people under 25 see nothing wrong with non-commercial sharing and remixing, or think it’s legal already.

Here’s a thought experiment: Everyone over age 12 when YouTube launched in 2005 is now able to vote.

What happens when — and this is inevitable — a generation completely comfortable with remix culture becomes a majority of the electorate, instead of the fringe youth? What happens when they start getting elected to office? (Maybe “I downloaded but didn’t share” will be the new “I smoked, but didn’t inhale.”)

Remix culture is the new Prohibition, with massive media companies as the lone voices calling for temperance. You can criminalize commonplace activities from law-abiding people, but eventually, something has to give.

Update, February 11: Everybody’s singing the YouTube Disclaimer Blues.

Tracking the U.S. Government’s Response to #Occupy on Twitter

It’s no exaggeration to say that Occupy Wall Street first started on Twitter. As the New York Times reported Monday, the #occupywallstreet hashtag was conceived in July, a full two months before the first tent was pitched at Zuccotti Park.

As it grew from a single camp into a movement, Twitter was essential for getting real-time updates out as events unfolded, for both supporters and local government.

Particularly in the last month, some city officials have used Twitter as a tool to keep people informed. Even as they were dismantling camps, the mayors of New York City and Portland, Oregon were posting real-time updates and responding to citizens directly.

While city officials have actively communicated their positions, the response from the federal government has been muted, at best. The Occupy movement’s concerns are much larger than city politics, with most proposed demands requiring cooperation from Washington.

So far, official statements are isolated and infrequent — an early endorsement from the president, a couple of statements from the White House press secretary, and a range of opinions from individual members of Congress.

But maybe the situation’s different online? Twitter is much more casual and conversational, and social media-savvy federal agencies often respond directly to queries and complaints from their followers. It’s possible that federal employees are addressing questions and concerns about Occupy on Twitter instead.

I decided to find out.

Data Wrangling

I originally gathered this data to build the Federal Social Media Index, a weekly report that compares federal agencies using Twitter, which I’m happy to release today as part of my work at Expert Labs.

Starting with an index of over 450 U.S. government departments and agencies, I asked the anonymous workforce at Amazon Mechanical Turk to find official Twitter accounts for each one.

Three workers researched each agency, and I approved the ones they agreed on and hand-checked the rest.

When I was done, I had a list of 126 official Twitter accounts representing a wide swath of U.S. government, from the Secret Service to the Postal Service. (Browse them all on the Federal Social Media Index or in the spreadsheet below.)

To collect all the tweets, I used ThinkUp, a free, open-source tool for archiving and analyzing social-media activity on Twitter, Facebook, and Google+ that I work on at Expert Labs.

With this dataset, I could easily tell which federal agency is the most popular (NASA), the most prolific (the NEA), and the most likely to reply to you personally (the US Census Bureau).

It also makes it very easy to see who’s talking about Occupy, and who isn’t.

Occupy Silence

Since the Occupy protests started in mid-September, nearly 15,000 messages were posted by the 126 federal Twitter accounts.

Of those accounts, only three have mentioned the Occupy protests in any way — Voice of America, the Smithsonian, and the White House.

For those unfamiliar with it, VOA is a radio and television news network broadcasting in 100 countries in 59 languages, but banned from airing in the United States because of propaganda laws. As part of their daily news coverage, they’ve tweeted about Occupy nine times since the protests began. (Here’s the most recent.)

Second, the Smithsonian responded to a tweet by Complex Magazine, refuting rumors of an OWS-themed museum exhibit.

The only other mention of the Occupy protests: one tweet from the White House nearly two months ago.

Opening Up

The obvious reason for the silence is that the federal government doesn’t yet have a position on Occupy. If they haven’t issued a formal statement, blog post, or press conference, then why Tweet?

For starters, it’s a humane and natural way to open a dialogue with a generally forgiving audience. Some of these agencies have tens or hundreds of thousands of people who care about what they have to say, or they wouldn’t be following them.

Proactively talking about potentially challenging issues like Occupy is an opportunity to bring some humanity to government, and maybe even help shape policy.

(Note: This was originally published in my column in WIRED.)

Viewing the UC Davis Pepper Spraying from Multiple Angles

I was stunned and appalled by the UC Davis Police spraying protestors, but struck by how many brave, curious people recorded the events. I took the four clearest videos and synchronized them. Citizen journalism FTW. Sources below.

Best viewed in HD fullscreen.

Top

briocloud, http://www.youtube.com/watch?v=K8Uj1cV97XQ

jamiehall1615, http://www.youtube.com/watch?v=wuWEx6Cfn-I

Bottom

OperationLeakS, http://www.youtube.com/watch?v=BjnR7xET7Uo

asucd, http://www.youtube.com/watch?v=6AdDLhPwpp4

Google Analytics A Potential Threat to Anonymous Bloggers

Last month, an anonymous blogger popped up on WordPress and Twitter, aiming a giant flamethrower at Mac-friendly writers like John Gruber, Marco Arment and MG Siegler. As he unleashed wave after wave of spittle-flecked rage at “Apple puppets” and “Cupertino douchebags,” I was reminded again of John Gabriel’s theory about the effects of online anonymity.

Out of curiosity, I tried to see who the mystery blogger was.

He was using all the ordinary precautions for hiding his identity — hiding personal info in the domain record, using a different IP address from his other sites, and scrubbing any shared resources from his WordPress install.

Nonetheless, I found his other blog in under a minute — a thoughtful site about technology and local politics, detailing his full name, employer, photo, and family information. He worked for the local government, and if exposed, his anonymous blog could have cost him his job.

I didn’t identify him publicly, but let him quietly know that he wasn’t as anonymous as he thought he was. He stopped blogging that evening, and deleted the blog a week later.

So, how did I do it? The unlucky blogger slipped up and was ratted out by an unlikely source: Google Analytics.

Reverse Lookups

Typically, Google will only reveal a user’s identity with a federal court order, as they did with a Blogger user who harassed a Vogue model in 2009.

But anonymous bloggers are at serious risk of outing themselves, simply by sharing their Google’s Analytics ID across the sites they own.

If you’re watching your pageviews, odds are you’re using Google to do it. Launched in 2005, Analytics is the most popular web statistics service online, in use by half of Alexa’s top million domains.

For the last few years, online SEO tools have published Analytics and AdSense IDs for the domains they crawl publicly, typically for competitive intelligence, such as ferreting out your competitor’s other websites.

But in the last year, several free services such as eWhois and Statsie have started offering reverse lookup of Analytics IDs. (Most also allow searching on the Google AdSense ID, though I wasn’t able to find an anonymous blogger sharing an AdSense ID across two sites.)

Finding anonymous bloggers from Analytics is less likely than other methods. It’s still more likely that someone would slip up and leave their personal info in their domain or share a server IP than to share a Google Analytics account. But it’s also more accurate. Hundreds or thousands of people can share an IP address on a single server and domain information can be faked, but a shared Google Analytics is solid evidence that both sites are run by the same person.

And unlike any other method, it can unmask people using hosted blogging services. Tumblr, Typepad and Blogger all have built-in support for Google Analytics, though reverse lookup services haven’t comprehensively indexed them. (Note that WordPress.com doesn’t support Analytics or custom Javascript, so their users aren’t affected.)

Just to be clear, this technique isn’t new. The first Google Analytics reverse lookup services started in 2009, so the technique’s been possible for at least two years. My concern is that it isn’t nearly well-known enough. It’s not mentioned in any guide to anonymous blogging I could find and several established bloggers, engineers, and entrepreneurs I spoke to were unaware of it.

Unmasking an anti-Mac blogger may not be life-changing, but if you’re an anonymous blogger writing about Chinese censorship or Mexican drug cartels, the consequences could be dire.

I decided to see how pervasive this problem is. Using a sample of 50 anonymous blogs pulled from discussion forums and Google news, only 14 were using Google Analytics, much less than the average. Half of those, about 15% of the total, were sharing an analytics ID with one or more other domains.

In about 30 minutes of searching, using only Google and eWhois, I was able to discover the identities of seven of the anonymous or pseudonymous bloggers, and in two cases, their employers. One blog about Anonymous’ hacking operations could easily be tracked to the founder’s consulting firm, while another tracking Mexican cartels was tied to a second domain with the name and address of a San Diego man.

I’ve contacted each to let them know their potential exposure.

Protecting Yourself

Some of the most important and vital voices online are anonymous, and it’s important to understand how you’re exposed. Forgetting any of these can lead to lawsuits, firings, or even death.

If you’re aware of the problem, it’s very easy to avoid getting discovered this way. Here are my recommendations for making sure you stay anonymous.

  1. Don’t use Google Analytics or any other third-party embed system. If you have to, create a new account with an anonymous email. At the very least, create a separate Analytics account to track the new domain. (From the “My Analytics Accounts” dropdown, select “Create New Account.”)
  2. Turn on domain privacy with your registrar. Better, use a hosted service to avoid domain payments entirely.
  3. If you’re hosting your own blog, don’t share IP addresses with any of your existing websites. Ideally, use a completely different host; it’s easy to discover sites on neighboring IPs.
  4. Watch your history. Sites like Whois Source track your history of domain and nameserver changes permanently, and Archive.org may archive old versions of your site. Being the first person to follow your anonymous Twitter account or promote the link could also be a giveaway.
  5. Is your anonymity a life-or-death situation? Be aware that any service you use, including your own ISP, could be forced to reveal your IP address and account details under a court order. Use shared computers and an anonymous proxy or Tor when blogging to mask your IP address. Here’s a good guide.

Stay safe.

Arcade Improv: Humans Pretending to Be Videogames

At the PAX East conference last year, a young man approached the microphone during the Q&A with Mike Krahulik and Jerry Holkins, creators of the popular Penny Arcade webcomic.

Instead of asking a question, he bellowed, “Welcome to ACTION CASTLE! You are in a small cottage. There is a fishing pole here. Exits are out.”

An awkward pause, followed by some giggling from the audience. “Is it our turn to say something?” said Mike.

“I don’t understand ‘is it our turn to say something,'” said the young man.

Instantly, Mike and Jerry understood, along with everyone in the audience born before 1978.

“Go out!” said Jerry.

“You go out. You’re on the garden path. There is a rosebush here. There is a cottage here. Exits are north, south, and in.”

The game was afoot.

They were playing Action Castle, the first of a series of live-action games based on classic text adventures from the late ’70s and early ’80s. Game designer Jared Sorensen calls the series Parsely, named after the text parsers that convert player input into something a computer can understand.

In Parsely games, the computer is replaced entirely by a human armed with a simple map and loose outline of the adventure. No hardware and no code; just people talking to people.

It’s a clever solution to complex problems that have plagued game designers for decades. How do we understand the player’s intent? Can we make AI characters act human, instead of like idiot robots? Is it possible to handle every edge case the player thinks of without working on this game for the next 10 years?

Making computers think and react like us is hard. So instead of making software more human, some game developers are trying to make humans more like software.

It’s a similar approach used by Amazon for Mechanical Turk — their motto is “artificial artificial intelligence.” By layering an API over an anonymous human workforce, developers can solve problems that are best tackled by humans, but without the messiness of actual human communication.

Projects like Soylent add another layer of abstraction, invisibly embedding Mechanical Turk in Microsoft Word to crowdsource tedious tasks like proofreading and summarizing paragraphs of text. The effect feels weirdly magical, like technology that beamed in from the future.

In the gaming world, this substitution usually feels less like magic and more like robotic performance art. These performers are software-inspired actors — people pretending they’re videogames.

Nobody knows more about acting like a videogame than webcomic artist Andrew Hussie. Since 2006, he’s been running MS Paint Adventures, a series of increasingly insane reader-driven comics in the style of text-based graphical adventure games.

His first adventure, Jailbreak, started with a series of simple drawings posted on a discussion forum. With every new post, commenters would suggest new commands to further the gameplay, which he’d rapidly draw.

Hussie didn’t invent the genre — that honor likely goes to Ruby Quest and other denizens of 4chan’s gaming forums — but he certainly popularized it.

In the process, he became the world’s most prolific web cartoonist, sometimes updating up to 10 times a day.

To get a sense of the scale, Problem Sleuth, his second adventure, spanned over 1,600 pages in one year. Homestuck, his latest adventure, contains a staggering 4,100 pages so far, making it the longest webcomic of all time in a mere 2.5 years. And he still has a ways to go, with act five (out of seven) wrapping up just last week. (By comparison, the Guinness Book of World Records cites Mr. Boffo creator Joe Martin as the world’s most prolific cartoonist, with a mere 1,300 comics yearly.)

Over time, Hussie’s experimented with the amount of reader input. With Jailbreak, he drew the first command posted after every image, but as the adventures grew in popularity — it currently averages 600,000 unique visitors daily — this grew wildly impractical.

“When a story begins to get thousands of suggestions, paradoxically, it becomes much harder to call it truly ‘reader-driven,'” wrote Hussie on his website. “This is simply because there is so much available, the author can cherry-pick from what’s there to suit whatever he might have in mind, whether he’s deliberately planning ahead or not.”

With his newest adventure, Hussie leans on reader input less frequently and less directly, but involves the community in other ways. (For example, they just published their eighth soundtrack album of songs entirely created by fans. Don’t get me started on the cosplay.)

MS Paint Adventures goes where no videogame can possibly go, with insane storylines, shifting rules, and a ridiculous number of objects to interact with.

In any game, every object or action added to the game multiplies the number of possible interactions. Add a gun, and the programmer needs to deal with players shooting every single other object in the game. Add a lighter, and you’d better prepare for players burning everything in sight. Math geeks call this combinatorial explosion.

Homestuck’s bizarre alchemy system supports 280 trillion combinations. But Hussie doesn’t need to draw them all, only the ones readers actually try.

Reader-driven games give the illusion of limitless options, at the cost of scale. Even at 1,600 pages per year, player demand far outstrips the efforts of a single cartoonist.

Frustrated with emotional expression in computer games, game design veteran Chris Crawford set out to build Storytron, a storytelling engine intended to model the drama and emotional complexity with computer-generated actors. Eighteen years later, Crawford is still working on it and emotional AI seems just as far out of reach.

Jason Rohrer, creator of the critically acclaimed art-game Passage, tackled the problem of emotional depth in a different way — he replaced the computer AI with a human.

Last year, he released Sleep Is Death, a quirky storytelling environment that connects a single player to a single “controller” over the network. The player has 30 seconds to make any move they can think of, and the controller scrambles to manipulate the scene to respond using a set of drawing tools.

The world is completely open-ended. The only limitation is the imagination of the player and controller.

As you’d expect, the results vary wildly, often depending on the relationship between the participants, but it’s always surprising in a way that many traditional videogames aren’t. Try browsing through SIDTube, the community-contributed gallery of Sleep Is Death playthroughs, and you’ll find everything from a child’s eye view of Hiroshima to meditations on growing old with friends.

Every playthrough is completely unique, a singular experience improvised by two people. Is that a game or performance art?

Earlier this year, a German theater group named Machina eX began staging live performances based on “point-and-click” adventure games like Secret of Monkey Island and Machinarium.

On the surface, Machina eX resembles other immersive performances like Tamara or Punchdrunk’s Sleep No More, with audience members following oblivious actors around elaborately-designed rooms.

In Machina eX’s performances, actors periodically get stuck in a loop, like a game paused. The audience must step in to solve the puzzle by manipulating objects in the room before the story can continue.

Each of these projects pull together elements of improvisational theater, performance art, and role-playing games.

But it’s the lens of videogames that separates them from Dungeons & Dragons, TheatreSports, and countless other collaborative games.

Each game borrows the conventions of a familiar game genre, preparing anyone who plays it with a set of expectations — the fundamental rules, terminology, constraints, and affordances are all well-known. Even better, storytellers can subvert any of those expectations at any time.

And unlike a game engine, human storytellers can go off-script. In the case of MS Paint Adventures, they can even switch game genres entirely, as Andrew Hussie’s done with Homestuck’s evolution from adventure game to Sims-style simulation to traditional RPG to whatever the hell this is.

Using live, real-time human ingenuity as the engine for videogames creates completely new, unexpected experiences unlike anything you can code.

In The Diamond Age, Neal Stephenson imagines a world where AI is extremely powerful, but still not convincing enough to convincingly simulate human behavior. Instead, AI characters are replaced by “ractors” — paid human actors who perform in virtual worlds for entertainment and education.

Even the all-powerful Wizard 0.2, the most powerful Turing machine in the land, is actually only used for data collection and processing — the real decisions are made by the man behind the curtain.

Chris Crawford and Peter Molyneux spent years trying to find Milo, but I think we’ll be waiting for a while yet.

In the meantime, I’m going to go pretend a game or two.

Supercut: Anatomy of a Meme

I spent last weekend revisiting the “supercut” meme, with a talk at WFMU’s Radiovision conference in New York and my new Wired column, which you can read below.

To cap it off, I spent a night revamping Supercut.org into a comprehensive, browsable database of supercut videos, with the help of Twitter’s Bootstrap CSS toolkit.

I’m very happy with how the site came out, so let me know if you have any suggestions and please submit any videos I missed. I also just added RSS and you can now follow @supercutorg for updates. Thanks!

For the last few years, I’ve tracked a particular flavor of remix culture that I called “supercuts” — fast-paced video montages that assemble dozens or hundreds of short clips on a common theme.

Many supercuts isolate a word or phrase from a film or TV series — think every “dude” in The Big Lebowski or every profanity from The Sopranos — while others point out tired cliches, like those ridiculous zoom-and-enhance scenes from crime shows.

Since 2008, I’ve added every supercut I could find to a sprawling blog post. With nearly 150 of these videos, and more being added weekly, it’s turned from a blog post into a minor obsession.

Earlier this year, I collaborated with NYC-based artist Michael Bell-Smith on Supercut.org, a 24-hour hack to make a supercut composed entirely out of other supercuts, along with a randomized supercut browser.

Today, I’m happy to announce that I’ve relaunched the site to let you browse the entire collection in different ways, subscribe to updates, or submit your own to the growing list. I’m also releasing the entire dataset publicly, which you can download at the end of this post.

To understand the rise of this new genre, let’s take a look back at how it began and how it’s evolved in the last three years.

The Proto-Cuts

While the web popularized the genre, the art world was experimenting with similar film cut-ups for years before YouTube was a gleam in Chad & Steve’s eyes.

Brooklyn-based critic Tom McCormack wrote the definitive history of the supercut, tracing its origins back to found-footage cinema, like Bruce Conner’s A MOVIE from 1958.

But it wasn’t until the 1990s that clear descendants of the genre emerged. Matthias Müller’s Home Stories (1990) reused scenes from 1950s- and 1960s-era Hollywood melodramas, filmed directly from the TV set, to show actresses in near-identical states of distress.

Christian Marclay’s Telephones (1995) showed famous actors answering ringing telephones in a string of surreal, disjointed conversations throughout Hollywood history. Edited together, the cadence and rhythm of nonstop clips feels very reminiscent of modern supercuts.

Apple tried to license Marclay’s film for the launch of the iPhone in 2007, but he refused. Instead, they made their own, borrowing the idea wholesale. (Marclay decided not to sue.)

As far as I can tell, the earliest supercut native to the web was Chuck Jones’ Buffies from 2002, which isolated every mention of “Buffy” from the first season of Buffy the Vampire Slayer.

While there were rare exceptions, supercuts really didn’t start proliferating online until around 2006. Why then? The likely cause: YouTube.

Before YouTube, it was incredibly difficult to both find and share video. After YouTube’s launch in 2005, searching through big chunks of film and TV’s recorded history became simple. Perhaps more importantly, sharing the video with others didn’t require server space, a huge amount of bandwidth, and a deep knowledge of video codecs. It just worked.

The result was that clips were easy to find and even easier to distribute. Combined with the rise of BitTorrent and the availability of affordable, easy-to-use video editing software like iMovie, it was the perfect environment for video remixing. The only missing ingredient is the time and passion to make it happen.

Supercut as Criticism

When I first started tracking the trend in 2008, almost every example was created by a superfan. Creating videos with hundreds of edits takes a staggering amount of time, and the only people willing to do it were those who were in love with the source material.

In the last three years, the form seems to have evolved from fan culture to criticism.

Rich Juzwiak may have started the trend by calling out reality TV contestants for their overused “I’m not here to make friends” trope. That directly led to supercuts criticizing lazy screenwriting, from “We’ve got company” to “It’s gonna blow!”

But recently, it’s being used for more serious criticism: calling out politicians and the news media. The Daily Show pioneered the reuse of archival news footage and quick edits to point out the absurdity of the news media and political figures, but online video remixers are taking it much further.

Video remixing group Wreck & Salvage took Sarah Palin’s speech about the Arizona shootings and removed everything but the sound of her breathing. The result, Sarah’s Breath, was a creepy example of supercut as political speech.

In March, artist Diran Lyons released one of the most epic supercuts ever — chronicling every time President Obama says “spending” in the complete video archive posted to the White House website. The result is six minutes long with over 600 edits.

The results are effective. Just as it was used to point out film cliches, a supercut sends a message about a public figure’s speech in a very short period of time. For that reason, I wouldn’t be surprised to see supercuts make their way into 2012 campaign ads.

Breaking It Down

I wanted to learn more about the structure of these videos, so I enlisted the help of the anonymous workforce at Amazon’s Mechanical Turk to analyze the videos for me.

Using the database of 146 videos, I asked them to count the number of clips in each video, along with some qualitative questions about their contents. Their results were interesting.

When looking at the source of the videos, nearly half come from film with a little over one-third sourced from TV shows. The rest are a mix of real-life events, videogames, or a combination of multiple types, as you can see below.

According to the turker estimates, the average supercut is composed of about 82 cuts, with more than 100 clips in about 25% of the videos. Some supercuts, about 5%, contain over 300 edits!

I asked the turkers whether each supercut was comprehensive, collecting every possible example, or if they were just a representative sample. For example, collecting every one of Kramer’s entrances from Seinfeld vs. a selection of explosions from action films. The results were split, with about 60% comprehensive. This could be attributed to film cliche supercuts, which don’t attempt to be thorough.

Finally, I was wondering whether each video’s creator was a fan or critic of the source material. The workers surveyed said that most supercuts were created by fans, about 73% of the time. This style of video remixing may be useful for criticism, but for now, it seems to mostly be a labor of love.

The Data Dump

Want to do your own analysis, or do some video remixing of your own?

You can view the full supercut database below or on Google Docs, or download the data as a comma-separated text file or Excel spreadsheet.

And, of course, let me know if you find any that I missed!