Waxy.org
Waxy.org is the sandbox of Andy Baio, a journalist/programmer living in Portland, Oregon. I'm the CTO of Kickstarter, created Upcoming.org, and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM
« October 2008 | November 2008 Archives | January 2009 »

The Faces of Mechanical Turk

Posted Nov 20, 2008

When you experiment with Amazon's Mechanical Turk, it feels like magic. You toss 500 questions into the ether, and the answers instantly start rolling in from anonymous workers around the world. It was great for getting work done, but who are these people? I've seen the demographics, but that was too abstract for me.

Last week, I started a new Turk experiment to answer two questions: what do these people look like, and how much does it cost for someone to reveal their face?

Answer #1. This is what Mechanical Turk looks like (click for full-size):

Answer #2. About $0.50.

Results

Here's my original request:

Upload a photo of yourself holding a handwritten sign that says "I Turk for ...", filling out why you turk. For example, "I Turk for Cash," "I Turk for My Kids," "I Turk to Kill Time," or whatever else you like. Be honest, be funny, be whatever you like.

As a good faith gesture, here's my photo.

If you have a webcam, you can simply go to Cameroid to snap a photo from your web browser, download the JPG, and upload it below. (Don't worry if the text is backwards, I can fix that myself.) DON'T provide any identifiable information, like your name or email, since that's a violation of MTurk policy.

The result will be used in a collage that can be found on my personal weblog, http://waxy.org. By uploading your image and accepting payment for the image, you give permission to me, Andy Baio, to use your image in all forms and media for any lawful purposes. (That's just cover-my-ass language. I'm almost certainly only going to restrict it to this one project.) The collage will show up there shortly after the HIT is complete. Thanks, everybody!

I started the task at $.05, but only two people responded in the first 24 hours. (And one of those was Joshua Schachter, who I'd told about the project.) Clearly, that was too low, so I increased it to $.25, receiving only eight submissions in 48 hours. (For reference, all 500 of my Girl Talk tasks were done in about an hour.) Increasing it to $.50 got me 20 more submissions in about 48 hours, after which it started to drop off quickly. I wasn't about to give dollar bills to random people for their photos, so I ended the experiment there. People aren't willing to give up their anonymity for cheap.

The final results: 30 people total — 10 women, 20 men. Almost all were white, mostly in their 20s and 30s. 21 said they turked for money, 9 for fun or boredom.

Thanks for pulling back the curtain, Turkers.

34 comments

Musicians Get Meta in Guitar Hero and Rock Band

Posted Nov 19, 2008 (Updated Dec 4, 2008)

There's something satisfyingly self-referential about watching talented musicians try to play their own music in Rock Band and Guitar Hero. Especially when they're worse than you.

Here's a list of every video I could find. Let me know if I missed any.

Anthrax's Scott Ian, "Madhouse" at Best Buy

"You suck. You're going to have to write easier songs... 20 years ago."

Continue reading (260 more words)...
8 comments

Deconstructing Google Mobile's Voice Search on the iPhone

Posted Nov 18, 2008 (Updated Nov 19, 2008)

I've experimented with audio transcription lately, but always with big, clumsy humans. I'd happily use cyborgs speech recognition software, but even today, automatic conversion of voice-to-text is still flawed. Naturally, I was intrigued when Google announced they were adding voice searching to their Google Mobile iPhone app.

Google's flirted with voice-to-text conversion in the past, with GOOG-411 and their Audio Indexing of political videos on YouTube. But this is the first time they're offering a web-accessible interface for speech conversion, albeit completely undocumented, so I decided to poke around a bit to see what I could find.

Over the last few hours, I've been analyzing the traffic proxied through my network, trying to reverse-engineer it to get to something usable, but I've hit my limits. I'm posting this with the hopes that someone out there can run with it and find out more.

Behind the Scenes

Here's what we know so far: When you first start speaking into the microphone, the app opens a connection to Google's server and starts sending over chunks of audio, almost certainly encoded with the open-source Speex codec.

The waveform image is generated on the phone and displayed along with a "Working" indicator and the adorable "beep-boop" sounds. In the background, a tiny file is being sent as a POST request to http://www.google.com/m/appreq/gmiphone. Here's what the headers look like:

POST /m/appreq/gmiphone HTTP/1.1
User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1
Content-Type: application/binary
Content-Length: 271
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Pragma: no-cache
Connection: keep-alive
Connection: keep-alive
Host: www.google.com

The response from Google is an even smaller attachment. These two files are the same for every query, so don't contain any meaningful information.

HTTP/1.1 200 OK
Content-Type: application/binary
Content-Disposition: attachment
Date: Tue, 18 Nov 2008 13:06:53 GMT
X-Content-Type-Options: nosniff
Expires: Tue, 18 Nov 2008 13:06:53 GMT
Cache-Control: private, max-age=0
Content-Length: 114
Server: GFE/1.3

After the audio's sent to Google, they return an HTML page with the results and a second request is triggered, this time a GET request to clients1.google.com with the converted voice-to-text string.

GET /complete/search?client=iphoneapp&hjson=t&types=t
    &spell=t&nav=2&hl=en&q=chicken%20soup HTTP/1.1
User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Pragma: no-cache
Connection: keep-alive
Connection: keep-alive
Host: clients1.google.com

The response is an array of search terms in JSON format, for use in search autocompletion.

["chicken soup",[["http://www.chickensoup.com/","Chicken Soup for the Soul",5,""],["http://www.chickensoupforthepetloverssoul.com/","Chicken Soup for the Pet Lover's Soul",5,""],["chicken soup recipe","489,000 results",0,"2"],["chicken soup for the soul","1,470,000 results",0,"3"],["chicken soup dog food","462,000 results",0,"4"],["chicken soup with rice","467,000 results",0,"5"],["chicken soup diet","453,000 results",0,"6"],["chicken soup from scratch","364,000 results",0,"7"],["chicken soup for the soul quotes","398,000 results",0,"8"],["chicken soup crock pot","604,000 results",0,"9"]]]

Help!

Unfortunately, until we can isolate and decode the audio stream, playing with the voice recognition features is out of reach.

Any ideas on cracking this mystery would be hugely appreciated. Anonymity for Google insiders is guaranteed!

Updates

As several commenters figured out, and confirmed to me by Google, the audio is being sent to Google's servers for voice recognition. The two binaries I posted above aren't the actual transmission, and are actually identical for every query, so can be disregarded. Sorry about the red herring.

Gummi Hafsteinsson, product manager for Google's Voice Search, says, "I can confirm that we split the audio down to a smaller byte stream, which is then sent to Google for recognition, but we can't really provide any details beyond that." Responding to my request for a public API, he added, "I appreciate the suggestion to provide voice recognition as a service. Right now we have nothing to announce, but we'll take this feedback as we look at future product ideas."

Also, Chris Messina discovered some secret settings in the application's preferences file, including alternate color schemes and sound sets for "Monkey" and "Chicken." Beep-boop!

Next step: As Paul discovered in the comments, the Legal Notices page says clearly that the app uses the open-source Speex codec for voice encoding. Can anyone capture and decode the audio being sent to Google?

November 19: I rewrote most of this entry to reflect the new information, since it was confusing new readers.

29 comments

Yes We Did

Posted Nov 4, 2008 (Updated Nov 6, 2008)

(Credit: Michael Buchino, also available as a shirt)

9 comments
« October 2008 | November 2008 Archives | January 2009 »
Waxy Links
Ads via The Deck
November 20, 2009
Regretsy gets a book deal — the anonymous author turned out to be April Winchell, collector of audio oddities
Google Chrome OS Demo — a world without a local filesystem and apps; also, the Chrome UI concept video (via)
Patrick Moberg's Internet Vices — funny, Tumblr feels more like beer than wine to me
Charlotte Gainsbourg and Beck's "Heaven Can Wait" — Keith Schofield's surreal video and insane treatment were inspired by FFFFOUND and Reddit, but maybe too explicitly (via)
November 19, 2009
YouTube adds machine-translated automatic captions — starting with some partner channels, but auto-timing is available to everyone today
Microsoft tries to patent Edward Tufte's sparklines — they were recently added to Excel
Leonard Lin's Retweet Avatars for Greasemonkey — a subtle change, but a big improvement
Web-ops god John Allspaw leaves Flickr to join Etsy — he's the last of the original Ludicorp team to go (via)
November 18, 2009
Laptop Steering Wheel Desk — don't miss the product photos
Interview with Ralph Eggleston, Pixar's production designer on WALL-E — from last February, but new to me; I didn't know the Axiom had three passenger classes
NSFW: Animated pixel-art video for Flair's "Trucker's Delight" — warning: very offensive and sexist, but the attention to 16-bit detail by director Jérémie Perin is incredible
NY Observer on Anil Dash's new government 2.0 incubator project — Expert Labs debuted at Web 2.0 today, funded with a $500k grant from the MacArthur Foundation
November 17, 2009
Google's Dan Morrill explains how the Droid autofocus breaks every 24.5 days — this gets second-place for quirkiest Android bug (via)
Conan O'Brien and Andy Richter on Zach Galifianakis' Between Two Ferns — his style of comedy usually makes me uncomfortable, but this made me laugh
The Pirate Bay shuts down their tracker for good — they're switching to DHT instead
November 16, 2009
How Darren at Link Machine Go found Belle de Jour's identity five years ago — Brooke was part of the early UK blog scene
ICU64, real-time visualization of Commodore 64 memory — the developer also posted videos of Paradroid and Boulder Dash (via)
Russell Davies on pretending and "barely games" — his SAP prototype looks like great ambient fun (via)
NYT Magazine on the indie gaming movement — nothing new here, but good overview with a wonderful closing anecdote from Cactus
Tim O'Reilly on the pending War for the Web — "more than that, it's a war against the web as an interoperable platform"
November 14, 2009
Jason Scott rounds up Geocities' top 10 most popular MIDI files — along with a torrent with 51,000 MIDIs rescued by Archive Team
Matt Haughey on the discovery of his brain tumor, treatment, and the Internet's response — there were about 1,000 #mathowielove tweets in 24 hours
Belle de Jour reveals herself after six year of anonymity — only six people in the world knew, she only told her parents yesterday (via)
Paul F. Tompkins debates comedy ethics with Improv Everywhere's Charlie Todd — great discussion, and it's hard not to see where both are coming from (via)
November 13, 2009
Rogue Amoeba stops iPhone app development after App Store idiocy — I'm with Marco, the only fix is allowing external apps, but it's unlikely (via)
Numb3rs on IRC — "Luckily, I speak l33t."
Prank War 8: The Skydiving Prank — hard to say if life-threatening situations are funnier than public humiliation
301 Works, Internet Archive works to preserve URL shortener data — the shorteners will provide regular backups and hand over data on closure, though TinyURL's conspicuously missing
November 12, 2009
Quizipedia — simple game with trivia scraped from Wikipedia entries
Kill Screen, funding a new art magazine about videogames — sounds like the English analogue of Amusement I was hoping for

Andy Baio lives here. Some rights reserved, for your pleasure.