Forgotify — listen to the 4 million songs on Spotify that haven’t been played once
Flagsmith — creative use of OpenType for flag creation, from the creators of Chartwell

Dirty, Fast, and Free Audio Transcription with YouTube

Five years ago, I wrote about how I transcribe audio with Amazon’s Mechanical Turk, splitting interviews into small segments and distributing the work among dozens of anonymous people. It ended up as one of my most popular posts ever, continuing to draw traffic and comments every day.

Lately, I’ve been toying with a free, fast way to generate machine transcriptions: repurposing YouTube’s automatic captions feature.

How It Works

Every time you upload a video, YouTube tries to generate a caption file. If there’s audible text, you can grab a subtitle file within a few minutes of uploading the video.

But how’s the quality? Pretty mediocre! It’s about as good as you’d expect from a free machine-generated transcript. The caption files have no punctuation between sentences, speakers aren’t broken out separately, and errors are very common.

But if you’re transcribing interviews, it’s often easier to edit a flawed transcript than starting from scratch. And YouTube provides a solid interface for editing your transcript audio and getting the results in plaintext.

I used TunesToTube, a free service for uploading MP3s to YouTube, to upload the first 15 minutes of our New Disruptors interview, with permission from Glenn Fleishman.

It took about 30 seconds for TunesToTube to generate the 15-minute-long video, three seconds to upload it, and about a minute for the video to be viewable on my account.

It takes a bit more time for YouTube to generate the audio transcriptions. Testing in the middle of a weekday, it took about six minutes to transcribe a two-minute video, and around 30 minutes for the 15-minute video. Fortunately, there’s nothing you need to do while it processes. Just upload and wait.

I ran a number of familiar film monologues through the YouTube’s transcription engine, and the results vary from solid to laughably bad. I’ve posted the videos below with the automatic transcription and their actual text.

As you’d expect, it works best with clear enunciation and spoken word. Soft words over background music, like in the Breakfast Club clip, falls apart pretty quick. But some, like Independence Day, aren’t terrible.

Continue reading “Dirty, Fast, and Free Audio Transcription with YouTube”

Downworthy — a browser plugin to take hyperbolic headlines down a notch or two
Squarespace Logo — play with simple logo ideas using Noun Project icons and free typefaces
Losing Aaron — Aaron Swartz’s father on his prosecution and tragic death, one year later (via)
Classroom Aquatic — finally, a free dolphin school test cheating simulator for the Oculus Rift

Ellen DeGeneres' "Walter Mitty" Screener Leaks Online

It’s Oscar piracy season, that time of the year where screeners of newly-released critical darlings are leaked online as DVD and Blu-Ray screeners are sent out around the world to Academy voters and secretly loaned to their friends and relatives.

Yesterday was a busy day with screener copies of Frozen, Her, and The Wolf of Wall Street all appearing online.

Today, I got a tip that there was a very unusual watermark in the screener of The Secret Life of Walter Mitty that leaked online today. I dug it up, and sure enough, a very familiar name pops up in the first scene of the screener. I made a GIF of it:

Oh, Ellen! If the watermark’s accurate, this screener belonged to Ellen DeGeneres. But was it actually an Oscar screener? Probably not.

The watermark shows that the screener was created on November 26, 2013. According to Ken Rudolph’s Academy screener list, he received the Walter Mitty DVD screener via UPS on December 19.

That’s a pretty huge gap, indicating that Ellen’s screener wasn’t for Oscar consideration, but instead given to her for review in advance of Ben Stiller’s December 4 appearance on her show.

Of course, there’s a chance, albeit small, that this watermark was added by someone besides 20th Century Fox — by someone trying to hide the identity of the actual source, maybe.

More likely, the watermark is accurate and Ellen’s screener simply ended up in the wrong hands. A postal worker, one of her employees, friend, family member, or countless others in the production and distribution chain could be responsible for ripping the DVD and putting it online.

It’s very common for screeners to leak, but rare for a celebrity’s name to be identified as the source. In 2011, a screener copy of Super 8 leaked online with Howard Stern’s name clearly watermarked on it. Stern vehemently denied leaking the film on air.

Curious to see if Ellen responds the same way.

As usual, I’ll update my spreadsheet of Oscar screener piracy statistics as soon as the nominees are announced on the morning of January 16.

OpenEmu — emulator framework for OS X
Movie Code — on-screen source code in movies, and what it actually does