Making Supercut.org

Posted May 16, 2011 by Andy Baio

This weekend, I was invited to participate in Rhizome’s Seven on Seven in NYC — an event that pairs seven artists with seven technologists, challenging them to create something in one day and present it to an audience the next day.

The other teams were a humbling roster of creative geeks, including Ricardo “mrdoob” Cabello, Ben Cerveny, Jeri Ellsworth, Zach Lieberman, Kellan Elliott-McRea, Chris “moot” Poole, Bre Pettis, and Erica Sadun.

I was paired with Michael Bell-Smith, whose digital art I’d admired and linked to in the past. It was a perfect match, and we’re very happy to announce the result of our collaboration: Supercut.org (warning: NSFW audio).

Supercut.org is an automatic supercut composed entirely out of other supercuts, combined with a way to randomly shuffle through all of the supercut sources.

The Idea

When we first started work on Friday morning, Michael and I started brainstorming what we wanted to accomplish: something visual, high-concept (i.e. explainable in a tweet), and hopefully with a sense of humor.

We quickly realized that our interest in supercuts was fertile ground. Michael’s work often touches on structural re-edits and remixes, such as Oonce-Oonce, Battleship Potemkin: Dance Edit, Chapters 1-12 of R. Kelly’s Trapped in the Closet Synced and Played Simultaneously, and his mashup album mixing pop vocals over their ringtone versions.

Both of us were fascinated by this form of Internet folk art. Every supercut is a labor of love. Making one is incredibly time-consuming, taking days or weeks to compile and edit a single video. Most are created by pop culture fans, but they’ve also been used for film criticism and political commentary. It’s a natural byproduct of remix culture: people using sampling to convey a single message, made possible by the ready availability of online video and cheap editing software.

So, supercuts. But what? Making a single supercut seemed cheap. I first suggested making a visual index of supercuts, or a visualization of every clip.

But Michael had a better idea — going meta. We were going to build a SUPERSUPERCUT, a supercut composed entirely out of other supercuts. And, if we had time, we’d make a dedicated supercut index.

Making the SuperSupercut

There were three big parts: index every supercut in a database, download all the supercuts locally, break each video into its original shots, and stitch the clips back together randomly.

Using my blog post as a guide, Michael added every supercut to a new table in MySQL, along with a title, description, and a category. While Michael did that, I wrote a simple script that used youtube-dl to pull the video files from YouTube and store them locally.

To split the clips, we needed to find a way to do scene detection — identifying cuts in a video by looking at movement between two consecutive frames. But we needed a way to do it from my Linux server, which ruled out everything with a GUI.

After some research, I found a very promising lead in an obscure, poorly-maintained French Linux utility called shotdetect. It did exactly what we needed — analyze a video and return an XML file of scene start times and durations.

The most challenging part of the entire day, by far, was simply getting shotdetect to compile. The package only had binaries for Debian and the source hadn’t been updated since September 2009. Since then, the video libraries it needed have changed dramatically, and shotdetect wouldn’t compile without locating them.

Three frustrating hours later, we called in the help of one of the beardiest geeks I know, Daniel Ceregatti, an old friend and coworker. After 20 minutes of hacking the C++ source, we were up and running.

With the timecodes and durations from shotdetect, we used ffmpeg to split each supercuts into hundreds of smaller MPEG-2 videos, all normalized to the same 640×480 dimensions with AAC audio. The results weren’t perfect — many scenes were broken during dialogue because of camera changes — but it was good enough.

As ffmpeg worked, I stored info about each newly-generated clip in MySQL. From there, it was simple to generate a random ten-minute playlist of clips between a half-second to three seconds in length.

With that list, we used the Unix `cat` utility to concatenate all the videos together into a finished supersupercut. We tweaked the results after some early tests, which you can see on YouTube.

While the videos processed overnight, we registered the domain, built the rest of the website, and designed our slides for Saturday’s event — taking time out for wonderful Korean food with the fellow team of Ben Cerveny and Liz Magic-Laser. I finally got to sleep at 5:30am, but I’m thrilled with the results.

The Future

There were several things we talked about, but simply didn’t have time to do.

I’m planning on using the launch of Supercut.org to finally retire my old supercut list by adding a way to browse and sort the entire index of supercuts by date, source, and genre. Most importantly, I’m going to add a way for anyone to submit their own supercuts to the index.

And of course, when any supercut is added, it will automatically become part of the randomized supersupercut on the homepage: an evolving tribute to this unique art form.

ThinkBack, Playing with ThinkUp's New API

Posted May 3, 2011 by Andy Baio

The newest beta of ThinkUp adds an API to the app for the first time, allowing developers to easily build apps on top of data coming from ThinkUp.

The JSON API was created by Sam Rose, a 20-year-old student from Wales and an active contributor in the ThinkUp community. His 7,000 line contribution — composed of 40% new tests and 40% documentation — earned him first place in the ThinkUp bounty contest and a brand new iPad 2. Congrats, Sam!

I thought it’d be fun to try building a hack with his new API, so I made a simple visualization of your entire Twitter archives in ThinkUp — ThinkBack, a ThinkUp-powered time capsule. Take a look at my history, or on the @whitehouse account to get the gist.

ThinkBack analyzes your entire Twitter history, extracts entities from the text, and colors them based on category. Grouped by month, it also gives you a quick glimpse at your posting activity over time.

For the entity extraction, I used an excellent free web service called AlchemyAPI to extract people, places, events, product names and other keywords from everything I’ve ever posted. They provide a category for each, which I assigned a color.

I also tested two other free web services that offer entity extraction, text-processing.com and Zemanta. Finding and categorizing keywords from short status updates is no small feat, but AlchemyAPI does a remarkable job. (If you’d like to play around with all three, support for both Zemanta and text-processing.com is commented out in the source code, but easily swappable with AlchemyAPI.)

ThinkBack also uses four typefaces from Google Web Fonts, my first time using them and dead simple to implement. For free fonts, the quality’s surprisingly great, with several faces commissioned by Google itself. For a quick, free hack, it’s a great alternative to Typekit.

I also used a very simple PHP templating language called RainTPL, which I chose as a lightweight alternative to Smarty. In practice, I found it too simple. Its handling of complex data structures and loops required me to jump through hoops that shouldn’t be necessary. (I’ll stick with Smarty next time.)

Anyway, you can download the code here, it only requires PHP and access to a recent version of ThinkUp. Feel free to fork it and submit a pull request for anything you add!

Waxy Goes to SXSW Interactive 2011

Posted March 7, 2011 by Andy Baio

Every year, I think it’ll be my last, and every year, I keep going. Why? Because what makes SXSW Interactive special isn’t the panels, parties, BBQ, or endless free alcohol (though those all help). It’s the unique group of creative individuals that shows up in Austin every year — a wonderful mess of people, all stuck in the same city at the same time.

Those unique set of circumstances create a serendipity machine, and the people I meet each year keep me coming back, even as it busts at the seams. To that end, please track me down or say hi! That’s why I’m there. And here’s where you’ll find me:

Worst Website Ever II: Too Stupid to Fail

Monday, March 14 at 11am

Hilton, Salon D

After a three year absence, I’m bringing Worst Website Ever back to SXSW with an all-star lineup of designers, developers, and entrepreneurs. Join us as these very talented people pitch their worst website/app/startup ideas to a live audience in short, five-minute rounds.

This year, the lineup’s pretty amazing:

Gina Trapani (Lifehacker, Expert Labs)

Jeffery Bennett, (BetamaXmas, 2008’s runner-up with Image Search for the Blind)

Josh Millard, (Metafilter, viral musician)

Jonah Peretti, (Buzzfeed, Huffington Post)

Mike Lacher, (Wonder-Tonic, Wolfenstein 1-D, Geocities-izer)

Ze Frank (The Show, Star.me)

And like last time, it’s judged by a real VC, Rob Hayes from First Round Capital. Winner gets funded!

ThinkUp: Austin Meetup

Saturday, March 12 at 5pm

The Ginger Man, outdoor patio

Curious about ThinkUp, or want to meet the people behind it? I’ll be joining Gina Trapani, Amy Unruh, Jed Sundwall, and other developers/users of ThinkUp at our second SXSW meetup. Come on out! We’ll be in the back patio of The Ginger Man, if weather permits.

Stalk Me

This year, I’ll be tracking interesting sessions on Lanyrd, and evening events on Plancast. And, of course, I’ll be mentioning any unusually great activity in real-time over on Twitter.

See you there!

How I Indexed The Daily

Posted February 23, 2011 by Andy Baio

For the last three weeks, I’ve indexed The Daily. Now that my free trial’s up, I’ve had an intimate look at what they have to offer and, sad to say, I don’t plan on subscribing. As a result, I’m ending The Daily: Indexed, my unofficial table of contents for every article they published publicly.

I’m surprised and grateful that The Daily executive and legal team never tried to shut it down. On the contrary, when asked directly about it, publisher Greg Clayman said, “If people like our content enough to put it together in a blog and share it with folks, that’s great! It drives people back to us.” They seem like a nice bunch of folks, and I hope they succeed with their big publishing experiment.

But now that I’m ending it, I can finally address the most common question — how did I do it?

The Daily: Indexed is just a list of article headlines, bylines, and links to each article on The Daily’s official website. Anyone can grab the links from the Daily iPad app by clicking each article’s “Share by Email” button, but that would’ve taken me far too long. So, how to automate the process?

When you first start The Daily application, it connects to their central server to check for a new edition, and then downloads a 1.5MB JSON file with the complete metadata for that issue. It includes everything — the complete text of the issue, layout metadata, and the public URLs.

But how can you get access to that file? My first attempt was to proxy all of the iPad’s traffic through my laptop and use Wireshark to inspect it. As it turns out, The Daily encrypts all traffic between your iPad and their servers. I was able to see connections being made to various servers, but couldn’t see what was being sent.

Enter Charles, a brilliantly-designed web debugging proxy for Mac, Windows, and Linux. By default, Charles will listen to all of your HTTP network traffic and show you simple, but powerful, views of all your web requests. But it can also act as an SSL proxy, sitting in the middle of previously-secure transactions between your browser and an SSL server.

After grabbing the JSON, I was able to write a simple Python script to extract the metadata I needed and spit out the HTML for use on the Tumblr page. Here’s how to do it.

Configuring Charles

1. Download and install Charles on your desktop machine. On your iPad, navigate to http://charlesproxy.com/charles.crt to trust Charles’ SSL certificate.

2. For Mac users, start Network Utility to get your desktop’s local IP address. Start your iPad, make sure it’s on the same wireless network as your desktop, and go into Settings>Network>Wi-Fi. Select the wireless network, and click the right arrow next to it to configure advanced settings. Under “HTTP Proxy,” select “Manual.” Enter the IP address of your desktop for “Server” and enter in “8888” for the port.

3. Now, start Charles on your desktop and, on the iPad, try loading any website. You should see assets from that website appear in Charles. If so, you’re ready to sniff The Daily’s iPad app.

Indexing the Daily

1. Start the Daily app on the iPad. Wait for it to download today’s issue. In Charles, drill down to https://app.thedaily.com/ipad/v1/issue/current, and select “JSON Text.”

2. Copy and paste the raw JSON into a text file.

3. This Python script takes the JSON file as input, and spits out a snippet of HTML suitable for blogging. I simply pasted the output from that script into Tumblr, made a thumbnail of the cover, and published.

The End

So, that’s it! Hope that was helpful. If any fan of The Daily out there wants to take over publishing duties, I’ll happily pass the Tumblr blog on to you.

The Daily: Indexed

Posted February 3, 2011 by Andy Baio

Anybody else think it’s weird that The Daily, News Corp’s new iPad-only magazine, posts almost every article to their official website… but with no index of the articles to be found? They spent $30M on it, but apparently forgot a homepage. (That’s a joke, people.)

So I went ahead and made one for them! Introducing, The Daily: Indexed…

Why did I do this? The Daily’s publishing free, web-based versions to every article, but without an index, it’s (deliberately) hard to find or link to the individual articles from the web. And since the iPad app only carries today’s edition, it makes finding any historical articles you’ve paid for nearly impossible.

I love that this kind of experimentation is happening in journalism. I love journalism dearly and want to see new models emerge, and charging for content is a great way to align a media organization’s interests with those of its readership. That said, if you do charge for access, you can’t publish free versions to the web and hope that people don’t find them.

I’m also very curious about their reaction. This isn’t illegal or a copyright violation — all I’m doing is linking to the versions they’re publishing on their site. The ability to link to any webpage without permission is part of what makes the web great, and it should never be discouraged. It’s also worth noting that Google’s slowly indexing all the articles too, and search engines aren’t blocked in their robots.txt file.

But I’m still recovering from a legal nightmare last year (more on that soon), so if asked to stop publishing and delete the Tumblr, I will. (Lawyers: My email address is at the top of this page.)

In the meantime, enjoy!

(Special thanks to Rex Sorgatz for the inspiration.)

Update: At The Daily’s press conference, editor-in-chief Jesse Angelo addressed the question of public sharing on the web.

“For the pages in the application that we can do it, we create mirror HTML pages. Those pages are out there on the web — they can be shared, they can be searched, you can find them out there… We know there are billions of other people sharing content on the web, and we want to be part of that.”

Thanks to Ryan Tate for the video.

February 4: Some people seem to have been a bit confused by my original post, so I edited it a bit, explaining a bit more clearly why I made this. I never thought that The Daily actually forgot to make a homepage/index; that was tongue-in-cheek. I also added a comment answering some of the frequently asked questions about the project.