Making Supercut.org

This weekend, I was invited to participate in Rhizome’s Seven on Seven in NYC — an event that pairs seven artists with seven technologists, challenging them to create something in one day and present it to an audience the next day.

The other teams were a humbling roster of creative geeks, including Ricardo “mrdoob” Cabello, Ben Cerveny, Jeri Ellsworth, Zach Lieberman, Kellan Elliott-McRea, Chris “moot” Poole, Bre Pettis, and Erica Sadun.

I was paired with Michael Bell-Smith, whose digital art I’d admired and linked to in the past. It was a perfect match, and we’re very happy to announce the result of our collaboration: Supercut.org (warning: NSFW audio).

Supercut.org is an automatic supercut composed entirely out of other supercuts, combined with a way to randomly shuffle through all of the supercut sources.

The Idea

When we first started work on Friday morning, Michael and I started brainstorming what we wanted to accomplish: something visual, high-concept (i.e. explainable in a tweet), and hopefully with a sense of humor.

We quickly realized that our interest in supercuts was fertile ground. Michael’s work often touches on structural re-edits and remixes, such as Oonce-Oonce, Battleship Potemkin: Dance Edit, Chapters 1-12 of R. Kelly’s Trapped in the Closet Synced and Played Simultaneously, and his mashup album mixing pop vocals over their ringtone versions.

Both of us were fascinated by this form of Internet folk art. Every supercut is a labor of love. Making one is incredibly time-consuming, taking days or weeks to compile and edit a single video. Most are created by pop culture fans, but they’ve also been used for film criticism and political commentary. It’s a natural byproduct of remix culture: people using sampling to convey a single message, made possible by the ready availability of online video and cheap editing software.

So, supercuts. But what? Making a single supercut seemed cheap. I first suggested making a visual index of supercuts, or a visualization of every clip.

But Michael had a better idea — going meta. We were going to build a SUPERSUPERCUT, a supercut composed entirely out of other supercuts. And, if we had time, we’d make a dedicated supercut index.

Making the SuperSupercut

There were three big parts: index every supercut in a database, download all the supercuts locally, break each video into its original shots, and stitch the clips back together randomly.

Using my blog post as a guide, Michael added every supercut to a new table in MySQL, along with a title, description, and a category. While Michael did that, I wrote a simple script that used youtube-dl to pull the video files from YouTube and store them locally.

To split the clips, we needed to find a way to do scene detection — identifying cuts in a video by looking at movement between two consecutive frames. But we needed a way to do it from my Linux server, which ruled out everything with a GUI.

After some research, I found a very promising lead in an obscure, poorly-maintained French Linux utility called shotdetect. It did exactly what we needed — analyze a video and return an XML file of scene start times and durations.

The most challenging part of the entire day, by far, was simply getting shotdetect to compile. The package only had binaries for Debian and the source hadn’t been updated since September 2009. Since then, the video libraries it needed have changed dramatically, and shotdetect wouldn’t compile without locating them.

Three frustrating hours later, we called in the help of one of the beardiest geeks I know, Daniel Ceregatti, an old friend and coworker. After 20 minutes of hacking the C++ source, we were up and running.

With the timecodes and durations from shotdetect, we used ffmpeg to split each supercuts into hundreds of smaller MPEG-2 videos, all normalized to the same 640×480 dimensions with AAC audio. The results weren’t perfect — many scenes were broken during dialogue because of camera changes — but it was good enough.

As ffmpeg worked, I stored info about each newly-generated clip in MySQL. From there, it was simple to generate a random ten-minute playlist of clips between a half-second to three seconds in length.

With that list, we used the Unix `cat` utility to concatenate all the videos together into a finished supersupercut. We tweaked the results after some early tests, which you can see on YouTube.

While the videos processed overnight, we registered the domain, built the rest of the website, and designed our slides for Saturday’s event — taking time out for wonderful Korean food with the fellow team of Ben Cerveny and Liz Magic-Laser. I finally got to sleep at 5:30am, but I’m thrilled with the results.

The Future

There were several things we talked about, but simply didn’t have time to do.

I’m planning on using the launch of Supercut.org to finally retire my old supercut list by adding a way to browse and sort the entire index of supercuts by date, source, and genre. Most importantly, I’m going to add a way for anyone to submit their own supercuts to the index.

And of course, when any supercut is added, it will automatically become part of the randomized supersupercut on the homepage: an evolving tribute to this unique art form.

ThinkBack, Playing with ThinkUp's New API

The newest beta of ThinkUp adds an API to the app for the first time, allowing developers to easily build apps on top of data coming from ThinkUp.

The JSON API was created by Sam Rose, a 20-year-old student from Wales and an active contributor in the ThinkUp community. His 7,000 line contribution — composed of 40% new tests and 40% documentation — earned him first place in the ThinkUp bounty contest and a brand new iPad 2. Congrats, Sam!

I thought it’d be fun to try building a hack with his new API, so I made a simple visualization of your entire Twitter archives in ThinkUp — ThinkBack, a ThinkUp-powered time capsule. Take a look at my history, or on the @whitehouse account to get the gist.

ThinkBack analyzes your entire Twitter history, extracts entities from the text, and colors them based on category. Grouped by month, it also gives you a quick glimpse at your posting activity over time.

For the entity extraction, I used an excellent free web service called AlchemyAPI to extract people, places, events, product names and other keywords from everything I’ve ever posted. They provide a category for each, which I assigned a color.

I also tested two other free web services that offer entity extraction, text-processing.com and Zemanta. Finding and categorizing keywords from short status updates is no small feat, but AlchemyAPI does a remarkable job. (If you’d like to play around with all three, support for both Zemanta and text-processing.com is commented out in the source code, but easily swappable with AlchemyAPI.)

ThinkBack also uses four typefaces from Google Web Fonts, my first time using them and dead simple to implement. For free fonts, the quality’s surprisingly great, with several faces commissioned by Google itself. For a quick, free hack, it’s a great alternative to Typekit.

I also used a very simple PHP templating language called RainTPL, which I chose as a lightweight alternative to Smarty. In practice, I found it too simple. Its handling of complex data structures and loops required me to jump through hoops that shouldn’t be necessary. (I’ll stick with Smarty next time.)

Anyway, you can download the code here, it only requires PHP and access to a recent version of ThinkUp. Feel free to fork it and submit a pull request for anything you add!

Waxy Goes to SXSW Interactive 2011

Every year, I think it’ll be my last, and every year, I keep going. Why? Because what makes SXSW Interactive special isn’t the panels, parties, BBQ, or endless free alcohol (though those all help). It’s the unique group of creative individuals that shows up in Austin every year — a wonderful mess of people, all stuck in the same city at the same time.

Those unique set of circumstances create a serendipity machine, and the people I meet each year keep me coming back, even as it busts at the seams. To that end, please track me down or say hi! That’s why I’m there. And here’s where you’ll find me:

Worst Website Ever II: Too Stupid to Fail

Monday, March 14 at 11am

Hilton, Salon D

After a three year absence, I’m bringing Worst Website Ever back to SXSW with an all-star lineup of designers, developers, and entrepreneurs. Join us as these very talented people pitch their worst website/app/startup ideas to a live audience in short, five-minute rounds.

This year, the lineup’s pretty amazing:

Gina Trapani (Lifehacker, Expert Labs)

Jeffery Bennett, (BetamaXmas, 2008’s runner-up with Image Search for the Blind)

Josh Millard, (Metafilter, viral musician)

Jonah Peretti, (Buzzfeed, Huffington Post)

Mike Lacher, (Wonder-Tonic, Wolfenstein 1-D, Geocities-izer)

Ze Frank (The Show, Star.me)

And like last time, it’s judged by a real VC, Rob Hayes from First Round Capital. Winner gets funded!

ThinkUp: Austin Meetup

Saturday, March 12 at 5pm

The Ginger Man, outdoor patio

Curious about ThinkUp, or want to meet the people behind it? I’ll be joining Gina Trapani, Amy Unruh, Jed Sundwall, and other developers/users of ThinkUp at our second SXSW meetup. Come on out! We’ll be in the back patio of The Ginger Man, if weather permits.

Stalk Me

This year, I’ll be tracking interesting sessions on Lanyrd, and evening events on Plancast. And, of course, I’ll be mentioning any unusually great activity in real-time over on Twitter.

See you there!

How I Indexed The Daily

For the last three weeks, I’ve indexed The Daily. Now that my free trial’s up, I’ve had an intimate look at what they have to offer and, sad to say, I don’t plan on subscribing. As a result, I’m ending The Daily: Indexed, my unofficial table of contents for every article they published publicly.

I’m surprised and grateful that The Daily executive and legal team never tried to shut it down. On the contrary, when asked directly about it, publisher Greg Clayman said, “If people like our content enough to put it together in a blog and share it with folks, that’s great! It drives people back to us.” They seem like a nice bunch of folks, and I hope they succeed with their big publishing experiment.

But now that I’m ending it, I can finally address the most common question — how did I do it?

The Daily: Indexed is just a list of article headlines, bylines, and links to each article on The Daily’s official website. Anyone can grab the links from the Daily iPad app by clicking each article’s “Share by Email” button, but that would’ve taken me far too long. So, how to automate the process?

When you first start The Daily application, it connects to their central server to check for a new edition, and then downloads a 1.5MB JSON file with the complete metadata for that issue. It includes everything — the complete text of the issue, layout metadata, and the public URLs.

But how can you get access to that file? My first attempt was to proxy all of the iPad’s traffic through my laptop and use Wireshark to inspect it. As it turns out, The Daily encrypts all traffic between your iPad and their servers. I was able to see connections being made to various servers, but couldn’t see what was being sent.

Enter Charles, a brilliantly-designed web debugging proxy for Mac, Windows, and Linux. By default, Charles will listen to all of your HTTP network traffic and show you simple, but powerful, views of all your web requests. But it can also act as an SSL proxy, sitting in the middle of previously-secure transactions between your browser and an SSL server.

After grabbing the JSON, I was able to write a simple Python script to extract the metadata I needed and spit out the HTML for use on the Tumblr page. Here’s how to do it.

Configuring Charles

1. Download and install Charles on your desktop machine. On your iPad, navigate to http://charlesproxy.com/charles.crt to trust Charles’ SSL certificate.

2. For Mac users, start Network Utility to get your desktop’s local IP address. Start your iPad, make sure it’s on the same wireless network as your desktop, and go into Settings>Network>Wi-Fi. Select the wireless network, and click the right arrow next to it to configure advanced settings. Under “HTTP Proxy,” select “Manual.” Enter the IP address of your desktop for “Server” and enter in “8888” for the port.

3. Now, start Charles on your desktop and, on the iPad, try loading any website. You should see assets from that website appear in Charles. If so, you’re ready to sniff The Daily’s iPad app.

Indexing the Daily

1. Start the Daily app on the iPad. Wait for it to download today’s issue. In Charles, drill down to https://app.thedaily.com/ipad/v1/issue/current, and select “JSON Text.”

2. Copy and paste the raw JSON into a text file.

3. This Python script takes the JSON file as input, and spits out a snippet of HTML suitable for blogging. I simply pasted the output from that script into Tumblr, made a thumbnail of the cover, and published.

The End

So, that’s it! Hope that was helpful. If any fan of The Daily out there wants to take over publishing duties, I’ll happily pass the Tumblr blog on to you.

The Daily: Indexed

Anybody else think it’s weird that The Daily, News Corp’s new iPad-only magazine, posts almost every article to their official website… but with no index of the articles to be found? They spent $30M on it, but apparently forgot a homepage. (That’s a joke, people.)

So I went ahead and made one for them! Introducing, The Daily: Indexed

Why did I do this? The Daily’s publishing free, web-based versions to every article, but without an index, it’s (deliberately) hard to find or link to the individual articles from the web. And since the iPad app only carries today’s edition, it makes finding any historical articles you’ve paid for nearly impossible.

I love that this kind of experimentation is happening in journalism. I love journalism dearly and want to see new models emerge, and charging for content is a great way to align a media organization’s interests with those of its readership. That said, if you do charge for access, you can’t publish free versions to the web and hope that people don’t find them.

I’m also very curious about their reaction. This isn’t illegal or a copyright violation — all I’m doing is linking to the versions they’re publishing on their site. The ability to link to any webpage without permission is part of what makes the web great, and it should never be discouraged. It’s also worth noting that Google’s slowly indexing all the articles too, and search engines aren’t blocked in their robots.txt file.

But I’m still recovering from a legal nightmare last year (more on that soon), so if asked to stop publishing and delete the Tumblr, I will. (Lawyers: My email address is at the top of this page.)

In the meantime, enjoy!

(Special thanks to Rex Sorgatz for the inspiration.)

Update: At The Daily’s press conference, editor-in-chief Jesse Angelo addressed the question of public sharing on the web.

“For the pages in the application that we can do it, we create mirror HTML pages. Those pages are out there on the web — they can be shared, they can be searched, you can find them out there… We know there are billions of other people sharing content on the web, and we want to be part of that.”

Thanks to Ryan Tate for the video.

February 4: Some people seem to have been a bit confused by my original post, so I edited it a bit, explaining a bit more clearly why I made this. I never thought that The Daily actually forgot to make a homepage/index; that was tongue-in-cheek. I also added a comment answering some of the frequently asked questions about the project.

Metagames: Games About Games

Over the last few years, I’ve been collecting examples of metagames — not the strategy of metagaming, but playable games about videogames. Most of these, like Desert Bus or Quest for the Crown, are one-joke games for a quick laugh. Others, like Cow Clicker and Upgrade Complete, are playable critiques of game mechanics. Some are even (gasp!) fun.

Since I couldn’t find an exhaustive list (this TV Tropes guide to “Deconstruction Games” is the closest), I thought I’d try to pull one together along with some gameplay videos.

This is just a starting point, please post your additions in the comments or email me and I’ll add them in. Note: I’ve tried to stay away from specific game parodies (like Pong Kombat or Pyst), and stick to games that comment on game design, mechanics, or culture.

Continue reading “Metagames: Games About Games”

Colorblind Leading the Blind

Today, Netflix posted some interesting research, tracking the performance of their streaming service on the top ISPs in the U.S.

Sadly, the charts were completely useless to me — a pile of mostly-indistinguishable lines. Along with one out of every 14 American males (about 7%), I’m red-green colorblind.

This is hard for non-colorblind people to understand, so I pulled together a couple examples. Here’s a split comparison of the original chart, showing what people with normal vision see compared to me and my crappy eyes.

(Click to view large.)

Two simple solutions:

1. Label your lines. When you have more than three data points in a line chart, legends fall apart quickly whether you’re colorblind or not. A label next to each line makes any chart much more readable. Here’s a quick remake I whipped up. (Thanks to Greg for helping me get the colors right.)

(Click to view large.)

2. Pick colorblind-safe colors. If you have to use a legend, be kind and pick something people like us can see. Photoshop’s supported drop-dead simple colorblind simulation since CS4, or you can check your images or webpages for free using the Vischeck colorblind simulator.

When doing the right thing is this easy, it’s really disturbing when it’s dismissed as a waste of time.

A couple years ago, I contacted the husband-and-wife team behind Snopes, the essential resource on urban legends, to let them know about a similar issue. The red/green icons they use to indicate true/false urban legends looked absolutely identical to me. I let them know about the problem and prepared alternate GIFs for them, with a darker red and lighter green. (Incidentally, that’s why colorblind people don’t have trouble with stoplights.)

They not only refused the new images, but actually added a new entry to their FAQ, defending their position:

We chose our red-yellow-green coding system because its “traffic light” pattern can be understood by most of our readers with little or no explanation. While we understand that about 8% of our readership experiences some form of color blindness and therefore cannot distinguish the different colors of bullets, other alternatives we have tried have proved confusing to many of our non-color blind readers. Therefore, we have chosen to stick with a system that works very well for 92% of our readers.

Instead, they recommended hovering over every icon to see the tooltip text. I absolutely adore the work they do on Snopes, but that interaction’s left a sour taste in my mouth ever since. It just doesn’t seem defensible — is slightly darkening a shade of red and brightening a green too much to ask?

I wouldn’t expect anyone to be able to perfectly anticipate every person’s needs; accessibility is extremely hard to get right 100% of the time. But if your ultimate goal is conveying information, open ears and a little empathy can go a long way.

Update: Alex Bischoff took the three images I made for Snopes, and wrote a user script that replaces their images with mine. Install it here for your browser of choice.

Pirating the 2011 Oscars

The Oscar nominations were announced yesterday, which means it’s time again to see who’s winning in the eternal fight between the movie studios, the Motion Picture Academy, and the loosely-organized group of spunky kids known as The Scene.

Yesterday morning, along with an anonymous group of spectators, I updated the ever-growing spreadsheet now spanning the last nine years of Oscar-nominated film. I added this year’s 29 nominees to the list, a collection of 274 films in total. (You can read about more sources and methodology at the end of the entry.)

Don’t miss the Statistics sheet, which covers all the aggregate year-by-year stats. Download or view it below, or read on for my findings. As always, if you have any additions or corrections, let me know.

View full-size on Google Spreadsheets.

Download: Excel (with formulas) or CSV

Findings

Note: These numbers will change as we get closer to the ceremony, and I’ll do my best to keep them updated until Oscar day.

Continuing the trend from the last couple years, fewer screeners are leaking online by nomination day than ever. Last year at this time, only 41% of screeners leaked online; this year, that number drops again slightly to 38%.

But if you include retail DVD releases along with screeners, 66% of this year’s nominated films have already leaked online in high quality. This makes sense; if a retail DVD release is already available, there’s no point in leaking the screener. But I think it’s safe to say that industry efforts to watermark screeners and prosecute leaks by members have almost certainly contributed to the decline.

The gap between theatrical and DVD release dates seems to have stabilized, hovering around 105 days for the last few years. This year, the gap between US release to first leak seems to have dipped slightly, from a median 23 days last year to 17 days.

The chart below shows how camcorder and telesync leaks for Oscar-nominated films continue to decline in popularity, while nearly every nominated film is eventually leaked on DVD. (The only exception seems to be 2008’s Il Divo, which never appeared to get a US retail release.)

One prediction: The end of the DVD screener is near. This year, Fox Searchlight distributed three screeners with iTunes — 127 Hours, Black Swan, and Conviction — to all 93,000 voting members of the Screen Actors’ Guild, marking the first time a major studio’s used Apple’s service for screener distribution.

Voters get the additional convenience of being able to watch films on their computers, Apple TVs, iPads and iPhones, while studios save the time and expense of distributing physical media. If this experiment’s successful, it seems likely other studios will follow.

Miscellanea

Some random notes:

  • This year, three films were leaked online within a day of their theatrical release — Iron Man 2, Alice in Wonderland, and Harry Potter.
  • The Rabbit Hole screener was leaked online eight days before its theatrical release, while Winter’s Bone was the slowest to leak online (so far) at 125 days after its theatrical release.
  • Oscar-nominated films tend to get released late in the year, but how late? More nominated films have been released on December 25 than any other day, but the median date is October 20.
  • For the first year, the first high-quality leak of a film — Harry Potter and the Deathly Hallows — was a PPV rip, most likely from a hotel’s new movie releases on pay-per-view.
  • Retail Blu-Ray rips are now frequently being leaked online now before retail DVDs, so I’ve modified the “Retail DVD” column to include them.

Methodology

As usual, I included the feature films in every category except documentary and foreign films (even makeup and costume design). I used Yahoo! Movies for US release dates, always using the first available date, even if it was a limited release. Cam, telesync, R5, and screener leak dates were taken from VCD Quality, with occasional backup from ORLYDB. I always used the first leak date, with the exception of unviewable or incomplete nuked releases.

Finally, the official screener dates came from Academy member Ken Rudolph, who lists the date he receives every screener on his personal homepage. Thanks again, Ken!

For previous years, see 2004, 2005, 2007, 2008 (part 1 and part 2), 2009, and 2010.

Wikileaks Cablegate Reactions Roundup

I’ve been dealing with a family illness, but couldn’t let the Wikileaks Cablegate incident pass without comment. In between hospital visits, I’ve been jotting down links related to the historic leak.

It’s a stunning experiment of forced transparency, prying open government against its will without much care or concern about the ramifications. Wikileaks is the Pirate Bay of journalism — an unstoppable force disrupting whole industries because they can.

To help make sense of my own opinions about it, I rounded up some of the more interesting responses and visualizations. Enjoy.

Continue reading “Wikileaks Cablegate Reactions Roundup”

Joining Expert Labs

Big news! I’m very happy to announce that I’ve joined Expert Labs as a Project Director, working alongside the wonderful and talented Anil Dash and Gina Trapani. (Read the official announcement.)

Our goal’s to help government make better decisions about policy by listening to citizens in the places they already are: social networks like Twitter and Facebook.

Our first project is ThinkUp, an open-source tool for archiving and visualizing conversations on social networks. It started with Gina scratching a personal itch, a way to parse and filter @replies. But it’s grown to be something more: a tool for policy makers to harness the collective intelligence of experts.

There’s tons to do, but I’m particularly excited to tackle ThinkUp’s ability to separate signal from noise, making it easier to derive meaning from hundreds or thousands of responses, using visualization, clustering, sentiment analysis, and robotic hamsters. I’m planning on building some fun hacks on top of ThinkUp, as well as keeping an eye open for other vectors to tackle our core mission.

Officially, I started on Monday and it’s already been an incredible week. I flew to Washington DC, attended the FCC’s first Open Developer Day, and a day of meetings with various groups at the White House.

What I found was inspiring: a group of extremely clever and passionate geeks, working from within to make things better. Some agencies are definitely more clueful than others, but it was clear that they want our — and your — help. I was skeptical at first, but they’re sincere: they want meaningful public participation and they need smart people to make it happen.

Want to join in? The easiest thing to do would be to install ThinkUp on your server. Give it a try, see what you think and, if you can, contribute — code, design, and documentation are all welcome.

If you’ve read Waxy for a while, you’ll know I very rarely touch on political issues here. It’s not that I’m apolitical — like anyone, I have opinions, but I don’t often feel engaged enough to write about it.

So, why would I go to a Gov 2.0 non-profit? For three main reasons:

  1. It’s important. To tackle our most serious national issues, we need better communication between government and citizens. I want my son to grow up in a world where he doesn’t feel disconnected and disillusioned by government, and I want government to meet the needs of the people, rather than favoring those with the most money or the loudest voices.
  2. It’s exciting. Technology is quite possibly our best hope of breaking down that divide, using social tools to disrupt the way that governments are run and policy is made. I love designing and building tools that use social connections to tackle difficult problems, and it feels like government is an area ripe for disruption.
  3. I love the team. I’ve known Anil and Gina for years and have long admired their work. They’re both extraordinarily talented and creative people, and I feel lucky to call them both friends. The opportunity to work with them was too hard to pass up.

How can I pass that up?

And what about Kickstarter? I recently stepped back into my original advisory role, and will continue to help out the team however I can — dispensing unsolicited advice, recruiting new projects, writing the occasional article, and evangelizing for them around the world, like I did at Free Culture Forum in Barcelona two weeks ago. Kickstarter’s leading an indie-culture revolution, thanks to amazing leadership and a brilliantly creative team, and it was a pleasure working with them.

This isn’t a change in direction for me, but a change in focus. Both Kickstarter and Expert Labs are bringing smart people together — people who might never connect otherwise — to create things, to change things, to make the world a better place. I can’t wait.

Proof!