Five years ago, I wrote about how I transcribe audio with Amazon's Mechanical Turk, splitting interviews into small segments and distributing the work among dozens of anonymous people. It ended up as one of my most popular posts ever, continuing to draw traffic and comments every day.
Lately, I've been toying with a free, fast way to generate machine transcriptions: repurposing YouTube's automatic captions feature.
How It Works
Every time you upload a video, YouTube tries to generate a caption file. If there's audible text, you can grab a subtitle file within a few minutes of uploading the video.
But how's the quality? Pretty mediocre! It's about as good as you'd expect from a free machine-generated transcript. The caption files have no punctuation between sentences, speakers aren't broken out separately, and errors are very common.
But if you're transcribing interviews, it's often easier to edit a flawed transcript than starting from scratch. And YouTube provides a solid interface for editing your transcript audio and getting the results in plaintext.
It took about 30 seconds for TunesToTube to generate the 15-minute-long video, three seconds to upload it, and about a minute for the video to be viewable on my account.
It takes a bit more time for YouTube to generate the audio transcriptions. Testing in the middle of a weekday, it took about six minutes to transcribe a two-minute video, and around 30 minutes for the 15-minute video. Fortunately, there's nothing you need to do while it processes. Just upload and wait.
I ran a number of familiar film monologues through the YouTube's transcription engine, and the results vary from solid to laughably bad. I've posted the videos below with the automatic transcription and their actual text.
As you'd expect, it works best with clear enunciation and spoken word. Soft words over background music, like in the Breakfast Club clip, falls apart pretty quick. But some, like Independence Day, aren't terrible.