Deconstructing Google Mobile's Voice Search on the iPhone

Posted November 18, 2008 by Andy Baio

I’ve experimented with audio transcription lately, but always with big, clumsy humans. I’d happily use ~~cyborgs~~ speech recognition software, but even today, automatic conversion of voice-to-text is still flawed. Naturally, I was intrigued when Google announced they were adding voice searching to their Google Mobile iPhone app.

Google’s flirted with voice-to-text conversion in the past, with GOOG-411 and their Audio Indexing of political videos on YouTube. But this is the first time they’re offering a web-accessible interface for speech conversion, albeit completely undocumented, so I decided to poke around a bit to see what I could find.

Over the last few hours, I’ve been analyzing the traffic proxied through my network, trying to reverse-engineer it to get to something usable, but I’ve hit my limits. I’m posting this with the hopes that someone out there can run with it and find out more.

Behind the Scenes

Here’s what we know so far: When you first start speaking into the microphone, the app opens a connection to Google’s server and starts sending over chunks of audio, almost certainly encoded with the open-source Speex codec.

The waveform image is generated on the phone and displayed along with a “Working” indicator and the adorable “beep-boop” sounds. In the background, a tiny file is being sent as a POST request to http://www.google.com/m/appreq/gmiphone. Here’s what the headers look like:

POST /m/appreq/gmiphone HTTP/1.1

User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1

Content-Type: application/binary

Content-Length: 271

Accept: */*

Accept-Language: en-us

Accept-Encoding: gzip, deflate

Pragma: no-cache

Connection: keep-alive

Connection: keep-alive

Host: www.google.com

The response from Google is an even smaller attachment. These two files are the same for every query, so don’t contain any meaningful information.

HTTP/1.1 200 OK

Content-Type: application/binary

Content-Disposition: attachment

Date: Tue, 18 Nov 2008 13:06:53 GMT

X-Content-Type-Options: nosniff

Expires: Tue, 18 Nov 2008 13:06:53 GMT

Cache-Control: private, max-age=0

Content-Length: 114

Server: GFE/1.3

After the audio’s sent to Google, they return an HTML page with the results and a second request is triggered, this time a GET request to clients1.google.com with the converted voice-to-text string.

GET /complete/search?client=iphoneapp&hjson=t&types=t

&spell=t&nav=2&hl=en&q=chicken%20soup HTTP/1.1

User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1

Accept: */*

Accept-Language: en-us

Accept-Encoding: gzip, deflate

Pragma: no-cache

Connection: keep-alive

Connection: keep-alive

Host: clients1.google.com

The response is an array of search terms in JSON format, for use in search autocompletion.

["chicken soup",[["http://www.chickensoup.com/","Chicken Soup for the Soul",5,""],["http://www.chickensoupforthepetloverssoul.com/","Chicken Soup for the Pet Lover's Soul",5,""],["chicken soup recipe","489,000 results",0,"2"],["chicken soup for the soul","1,470,000 results",0,"3"],["chicken soup dog food","462,000 results",0,"4"],["chicken soup with rice","467,000 results",0,"5"],["chicken soup diet","453,000 results",0,"6"],["chicken soup from scratch","364,000 results",0,"7"],["chicken soup for the soul quotes","398,000 results",0,"8"],["chicken soup crock pot","604,000 results",0,"9"]]]

Help!

Unfortunately, until we can isolate and decode the audio stream, playing with the voice recognition features is out of reach.

Any ideas on cracking this mystery would be hugely appreciated. Anonymity for Google insiders is guaranteed!

Updates

As several commenters figured out, and confirmed to me by Google, the audio is being sent to Google’s servers for voice recognition. The two binaries I posted above aren’t the actual transmission, and are actually identical for every query, so can be disregarded. Sorry about the red herring.

Gummi Hafsteinsson, product manager for Google’s Voice Search, says, “I can confirm that we split the audio down to a smaller byte stream, which is then sent to Google for recognition, but we can’t really provide any details beyond that.” Responding to my request for a public API, he added, “I appreciate the suggestion to provide voice recognition as a service. Right now we have nothing to announce, but we’ll take this feedback as we look at future product ideas.”

Also, Chris Messina discovered some secret settings in the application’s preferences file, including alternate color schemes and sound sets for “Monkey” and “Chicken.” Beep-boop!

Next step: As Paul discovered in the comments, the Legal Notices page says clearly that the app uses the open-source Speex codec for voice encoding. Can anyone capture and decode the audio being sent to Google?

November 19: I rewrote most of this entry to reflect the new information, since it was confusing new readers.

Yes We Did

Posted November 4, 2008 by Andy Baio

(Credit: Michael Buchino, also available as a shirt)

Girl Talk's Feed the Animals: The Official Sample List

Posted October 29, 2008November 16, 2020 by Andy Baio

Last month, I dissected Girl Talk’s Feed the Animals using the list of samples lovingly collected by hundreds of Wikipedia users. But that was totally unofficial, a crowdsourced attempt to find musical needles in a giant mashup haystack.

Well, the official CDs were shipped out last week to everyone who donated more than $10. Inside, as promised, was the official sample list — a one-page insert with every single sample on the album. Steve Heil was the first to scan it and contact me.

Unfortunately, a huge block of printed small-caps text isn’t very useful for my kind of fun, so I tried throwing into several OCR engines on WeOCR to turn the image into text. Tesseract gave the best results, but it was still a mess that needed quite a bit of cleanup.

Anyway, here it is. The complete list of all 322 samples in Girl Talk’s Feed the Animals, available as a CSV, Excel, or Google Spreadsheets document.

Continue reading “Girl Talk's Feed the Animals: The Official Sample List” →

Memeorandum Colors: Visualizing Political Bias with Greasemonkey

Posted October 10, 2008 by Andy Baio

Like the rest of the world, I’ve been completely obsessed with the presidential election and nonstop news coverage. My drug of choice? Gabe Rivera’s Memeorandum, the political sister site of Techmeme, which constantly surfaces the most controversial stories being discussed by political bloggers.

While most political blogs are extremely partisan, their biases aren’t immediately obvious to outsiders like me. I wanted to see, at a glance, how conservative or liberal the blogs were without clicking through to every article.

With the help of del.icio.us founder Joshua Schachter, we used a recommendation algorithm to score every blog on Memeorandum based on their linking activity in the last three months. Then I wrote a Greasemonkey script to pull that information out of Google Spreadsheets, and colorize Memeorandum on-the-fly. Left-leaning blogs are blue and right-leaning blogs are red, with darker colors representing strong biases. Check out the screenshot below, and install the Greasemonkey script or standalone Firefox extension to try it yourself.

Note: The colors don’t necessarily represent each blogger’s personal views or biases. It’s a reflection of their linking activity. The algorithm looks at the stories that bloggers linked to before, relative to all other bloggers, and groups them accordingly. People that link to things that only conservatives find interesting will be classified as bright red, even if they are personally moderate or liberal, and vice-versa. The algorithm can’t read minds, so don’t be offended if you feel misrepresented. It’s only looking at the data.

For example, while Nate Silver of FiveThirtyEight may be a Democrat, he has a tendency to link to stories conservative bloggers are discussing slightly more often than liberal bloggers, so he’s shaded very slightly red. (Geeks can read on for more details about how this works.)

Continue reading “Memeorandum Colors: Visualizing Political Bias with Greasemonkey” →

Found Footage: Sarah Palin's 1984 Miss Alaska Pageant Video, Swimsuit Competition

Posted September 26, 2008 by Andy Baio

Somehow, a 22-year-old University of Alaska student named Richard Millay got his hands on a videotape that’s eluded the media since John McCain asked Sarah Palin to be his running-mate — original footage of her 1984 Miss Alaska Pageant.

Of course, this is all very frivolous and has nothing to do with the current campaign. But like Barack Obama’s high school basketball footage, it’s a little glimpse into the early life of a highly-visible national figure.

In the first part added to YouTube, he posted the portion from the swimsuit competition, prefaced by a brief introduction mentioning the demand for the “88 minutes of Alaska Gold.”

Update: The original video was removed, but I managed to save a copy of the relevant footage without Richard’s original intro. YouTube’s removing every copy of this video, so I’m streaming the clip below from my own server. It won’t be removed.

Continue reading “Found Footage: Sarah Palin's 1984 Miss Alaska Pageant Video, Swimsuit Competition” →