January 31, 2014
Forgotify — listen to the 4 million songs on Spotify that haven’t been played once
Adrian Holovaty on Chicago and bootstrapping — this could easily apply to Portland, or countless other nascent tech scenes
hey are you cool — one player’s encounters in DayZ
Alexis Madrigal tracks down the teen creators of @HistoryInPics — the only thing saving them from lawsuits, for now, is that it’s non-commercial and poorly-indexed
PBS Idea Channel on the experience of being trolled — Mike Rugnetta’s epiphany makes for a very interesting episode (via)
Five years ago, I wrote about how I transcribe audio with Amazon’s Mechanical Turk, splitting interviews into small segments and distributing the work among dozens of anonymous people. It ended up as one of my most popular posts ever, continuing to draw traffic and comments every day.
Lately, I’ve been toying with a free, fast way to generate machine transcriptions: repurposing YouTube’s automatic captions feature.
How It Works
Every time you upload a video, YouTube tries to generate a caption file. If there’s audible text, you can grab a subtitle file within a few minutes of uploading the video.
But how’s the quality? Pretty mediocre! It’s about as good as you’d expect from a free machine-generated transcript. The caption files have no punctuation between sentences, speakers aren’t broken out separately, and errors are very common.
But if you’re transcribing interviews, it’s often easier to edit a flawed transcript than starting from scratch. And YouTube provides a solid interface for editing your transcript audio and getting the results in plaintext.
It took about 30 seconds for TunesToTube to generate the 15-minute-long video, three seconds to upload it, and about a minute for the video to be viewable on my account.
It takes a bit more time for YouTube to generate the audio transcriptions. Testing in the middle of a weekday, it took about six minutes to transcribe a two-minute video, and around 30 minutes for the 15-minute video. Fortunately, there’s nothing you need to do while it processes. Just upload and wait.
I ran a number of familiar film monologues through the YouTube’s transcription engine, and the results vary from solid to laughably bad. I’ve posted the videos below with the automatic transcription and their actual text.
As you’d expect, it works best with clear enunciation and spoken word. Soft words over background music, like in the Breakfast Club clip, falls apart pretty quick. But some, like Independence Day, aren’t terrible.
Polygon on the rise of "Early Access" in gaming — a trend that’s leaking outside of games
ConferenceCall.biz — can someone recap what’s been covered so far, thanks
Downworthy — a browser plugin to take hyperbolic headlines down a notch or two
Tracking 20 years of computer history using Law & Order — a preview on Jeffrey Thompson’s blog, including every URL ever mentioned on the show
Squarespace Logo — play with simple logo ideas using Noun Project icons and free typefaces
Real-time face substitution in the browser — be Bieber, Kim, or Cage
Ask Metafilter decrypts a 20-year family mystery — pop in if you can help decode the rest
Mathematician hacks OKCupid to find the perfect mate — scraping survey data with multiple accounts to cluster results himself
Behind the GIFs — comic backstories for animated GIFs
The Verge on angry smartphone fanboys — “But it isn’t necessarily about loving the phone… It’s about what the phone represents.”
Frere-Jones sues Hoefler over ownership — the respected type foundry breaks up
New Republic on the evolving usage of the period — ending sentences with a period can feel abrupt and harsh in text messages
Everpix releases internal metrics, financials, VC feedback — after its recent closure, the ultimate postmortem; fun to see the rejections
EMOJI IRL.LOL. — photo recreations of emoji
Classroom Aquatic — finally, a free dolphin school test cheating simulator for the Oculus Rift
The Museum of Simulation Technology — clever game mechanic that plays with forced perspective
It’s Oscar piracy season, that time of the year where screeners of newly-released critical darlings are leaked online as DVD and Blu-Ray screeners are sent out around the world to Academy voters and secretly loaned to their friends and relatives.
Today, I got a tip that there was a very unusual watermark in the screener of The Secret Life of Walter Mitty that leaked online today. I dug it up, and sure enough, a very familiar name pops up in the first scene of the screener. I made a GIF of it:
Oh, Ellen! If the watermark’s accurate, this screener belonged to Ellen DeGeneres. But was it actually an Oscar screener? Probably not.
The watermark shows that the screener was created on November 26, 2013. According to Ken Rudolph’s Academy screener list, he received the Walter Mitty DVD screener via UPS on December 19.
That’s a pretty huge gap, indicating that Ellen’s screener wasn’t for Oscar consideration, but instead given to her for review in advance of Ben Stiller’s December 4 appearance on her show.
Of course, there’s a chance, albeit small, that this watermark was added by someone besides 20th Century Fox — by someone trying to hide the identity of the actual source, maybe.
More likely, the watermark is accurate and Ellen’s screener simply ended up in the wrong hands. A postal worker, one of her employees, friend, family member, or countless others in the production and distribution chain could be responsible for ripping the DVD and putting it online.
It’s very common for screeners to leak, but rare for a celebrity’s name to be identified as the source. In 2011, a screener copy of Super 8 leaked online with Howard Stern’s name clearly watermarked on it. Stern vehemently denied leaking the film on air.
Curious to see if Ellen responds the same way.
As usual, I’ll update my spreadsheet of Oscar screener piracy statistics as soon as the nominees are announced on the morning of January 16.
The Year in Kickstarter 2013 — Oscars, helicopters, space, VR, and nearly 20,000 more funded projects in the world
Steven Levy's How the NSA Almost Killed the Internet — the story of the Snowden leaks from inside tech giants
OpenEmu — emulator framework for OS X
Content-Aware Typography — using Photoshop for algorithmic art
Movie Code — on-screen source code in movies, and what it actually does
The NYT editorial board argues clemency for Edward Snowden — “He may have committed a crime to do so, but he has done his country a great service.”
The Atlantic reverse-engineers Netflix's subgenre categories — awesome data journalism by Alexis Madrigal with a genre generator by Ian Bogost