Attribution and Affiliation on All Things Digital

Posted April 8, 2009 by Andy Baio

Getting linked from a high-profile website is almost always a huge compliment, well-received by any blogger. But Monday morning, I saw two friends taken by surprise when they were featured on the front page of AllThingsD, the Dow Jones-owned news site edited by Kara Swisher and Walt Mossberg from the Wall Street Journal. I talked to Kara, as well as several other writers and bloggers, to understand why.

Background

After Del.icio.us founder Joshua Schachter’s article about URL shorteners was posted on AllThingsD, he asked on Twitter, “What the hell is this?” Danny Sullivan replied, “It’s a compliment. AllThingsD liked your shortener article enough to feature you on their home page.” Joshua responded, “It’s just very unclear to me where that came from, who wrote it, why they are showing ads on it, etc.”

Continue reading “Attribution and Affiliation on All Things Digital” →

Waxy.org at SXSW Interactive 2009

Posted March 10, 2009 by Andy Baio

I’m making the pilgrimage to Austin for SXSW Interactive again this year, but no crazy Worst Website Ever antics this time. But I will be speaking at a couple events, if you want to get together:

Sunday, 3:30pm

What Do I Do With Myself, Now that the Economy Has Collapsed?

Lane Becker moderates a lineup of web geeks who started projects during the last bust, with some advice and lessons learned from our past success and failures. I’m very lucky to be on the lineup, along with the wonderful Ben Brown, Michael Sippey, and Jane Mount.

Monday, 7:30pm

The Heather Gold Show

Palmer Events Center, 900 Barton Springs Road

Every year, writer/comedian Heather Gold brings her live, interactive talk show to Austin to interview artists, musicians, coders, and writers around a theme. This year’s subject is “Something From Nothing,” a loose conversation about inspiration and the creative drive, with CD Baby founder Derek Sivers, Huffduffer creator Jeremy Keith, Adaptive Path founder/Emmett Labs CEO Janice Fraser, singer/songwriter Amber Rubarth, and me! The Heather Gold Show is a small part of the huge Plutopia EFF-Austin party, a three-stage art and music extravaganza featuring Bruce Sterling and Ian McLagan from The Faces, so should be fun. Free admission for SXSW badge holders, $10 for everyone else.

Naturally, I’ll be on Twitter and my picks for the show are on Upcoming and Sched.org. If you see me, say hi!

Translating "The Economist" Behind China's Great Firewall

Posted February 26, 2009 by Andy Baio

While researching Oscar screeners last month, I stumbled on a remarkable example of online collaboration in China that’s completely undiscovered here. In short, a group of dedicated fans of The Economist newsmagazine are translating each weekly issue cover-to-cover, splitting up the work among a team of volunteers, and redistributing the finished translations as complete PDFs for a Chinese audience.

It reminds me of the scanlation movement, in which groups of fans scan, translate, and redistribute manga into another language. But I’ve never seen it applied to a newspaper or magazine, especially one as high-minded as The Economist.

It’s an impressive example of online collaboration with simple tools, a completely non-commercial effort by volunteers interested in spreading knowledge while improving their English skills. In the process, they’re taking a political risk in translating controversial articles about their homeland behind the Great Firewall.

Continue reading “Translating "The Economist" Behind China's Great Firewall” →

John Hodgman on "meh"

Posted February 24, 2009 by Andy Baio

I enjoyed this exchange with John Hodgman on Twitter yesterday, reminiscent of my own rant on “FAIL.“

hodgman: Did I ever tell you people how much I hate the word “meh”? Nothing announces “I have missed the point” more than that word.

hodgman: It is the essence of blinkered Internet malcontentism. And a rejection of joy. Also: 12 hive mehs in the replies SO FAR

hodgman: By definition, it may mean disinterest (although simple silence would be a more damning and sincere response, in that case)

hodgman: But in use, it almost universally seems to signal: I am just interested enough to make one last joyless, nitpicky swipe and then disappear

wordwill: @hodgman Isn’t rejecting joy how one traditionally demonstrates one’s superior cool? Though, at the same time, to hell with that.

hodgman: @wordwill yes. It’s part of the toxic Internet art of constant callous one upsmanship. And it is a sort of art, but not for me.

Robin Hood's "Oo De Lally," Translated Into 16 Languages

Posted February 5, 2009 by Andy Baio

There’s something enchanting about these localized versions of Roger Miller’s “Oo De Lally” from Disney’s Robin Hood from 1973. While all of these videos were found on YouTube, each was created by a different person around the world. (Bonus points if you can find the ~~Japanese~~, ~~Chinese~~, and ~~Norwegian~~ versions. Got ’em all! Thanks, everyone.) April 7: Added Hebrew, but YouTube removed the Arabic version… Anyone have it? November 12: Added Finnish and Danish, but still missing Arabic.

August 1, 2012: These keep getting knocked offline, but this video captured 16 languages into a single video.

Original in English

Portuguese, “O-La-Ri-Lo-Le”

Italian, “Urca Tirulero”

Continue reading “Robin Hood's "Oo De Lally," Translated Into 16 Languages” →

Pirating the 2009 Oscars

Posted January 22, 2009February 8, 2022 by Andy Baio

The Oscar nominees were announced this morning, which means it’s time to get out your scorecards to see who’s winning in the eternal struggle between the MPAA vs. the Internet. (Hint: It’s not the MPAA.)

I’ve been tracking the distribution of Oscar-nominated films every year, culminating with the release of six years of piracy data last year. I’ve updated those spreadsheets with this year’s 26 nominees, for a total of 211 films from the last seven years.

You can view or download all the data below, including a second sheet with some interesting aggregate stats. As always, I’ll keep it updated until the Oscar broadcast.

View full-size on Google Spreadsheets.

Download: Excel (with formulas) or CSV

Findings

So, how did they do? Out of 26 nominated films, an incredible 23 films are already available in DVD quality on nomination day, ripped either from the screeners or the retail DVDs. (All 26 were available by February 7.) This is the highest percentage since I started tracking.

Only three films are unavailable — Rachel Getting Married wasn’t leaked online in any form, while Changeling is only available as a low-quality telecine transfer and Australia as a terrible quality camcorder recording. (Update: A DVD screener of Australia was leaked on January 23, a retail DVD rip of Changeling on January 31, and finally, the retail DVD of Rachel Getting Married on February 7.)

Other findings:

Academy members received screeners for at least 20 of the 26 films.
25 out of 26 films leaked in some form online, if you include camcorder recordings.
The average time from the time screeners are received by Academy members to its leak online is 6 days.

Surprisingly, it seems like this year’s Oscar movies took longer to leak online than in previous years. If I had to guess, it’s because far fewer camcorder copies were released for this year’s nominees. This could be because of the theaters cracking down on camcorder recordings, but I suspect it’s because fewer nominees were desirable targets this year for cams. (Aside from the obvious blockbusters, like Dark Knight, Kung Fu Panda, and Tropic Thunder.) The chart below shows the median number of days from a movie’s US release date to its first leak online.

Last year, one of the interesting findings was how the release of Region 5 DVDs were reducing the prestige of official screener leaks. This year, only four of the nominated films were released as R5s, compared to eight from last year. The numbers are still too small to tell if this is a trend, but it seems like the popularity of the R5 may have peaked in 2007. (Are the studios releasing fewer R5s in general?)

What other trends in the data am I missing? Feel free to chime in with your conclusions or visualizations in the comments.

Methodology

As usual, I included the feature films in every category except documentary and foreign films. I used Yahoo! Movies for US release dates, always using the first available date, even if it was a limited release. Cam, telesync, R5, and screener leak dates were almost universally taken from VCD Quality. I used the first leak date, with the exception of unviewable or incomplete nuked releases. Finally, the official screener dates came from Academy member Ken Rudolph, who lists the date he receives every screener on his personal homepage. Thanks again, Ken!

For previous years, see 2004, 2005, 2007, 2008 (part 1 and part 2).

Update: The screener for Australia was released today, so I added that date to the spreadsheet, along with some missing retail DVD dates from last year’s Oscars.

February 3, 2009: Some related links of interest… I was interviewed for Future Tense on American Public Media, talking about this entry. Bruce Lidl looked at leaks in the Foreign and Documentary categories, as well as how quickly HD-quality leaks are happening. Finally, Flowing Data is sponsoring a contest to generate information visualizations from this data.

The Faces of Mechanical Turk

Posted November 20, 2008 by Andy Baio

When you experiment with Amazon’s Mechanical Turk, it feels like magic. You toss 500 questions into the ether, and the answers instantly start rolling in from anonymous workers around the world. It was great for getting work done, but who are these people? I’ve seen the demographics, but that was too abstract for me.

Last week, I started a new Turk experiment to answer two questions: what do these people look like, and how much does it cost for someone to reveal their face?

Answer #1. This is what Mechanical Turk looks like (click for full-size):

Answer #2. About $0.50.

Results

Here’s my original request:

Upload a photo of yourself holding a handwritten sign that says “I Turk for …”, filling out why you turk. For example, “I Turk for Cash,” “I Turk for My Kids,” “I Turk to Kill Time,” or whatever else you like. Be honest, be funny, be whatever you like.

As a good faith gesture, here’s my photo.

If you have a webcam, you can simply go to Cameroid to snap a photo from your web browser, download the JPG, and upload it below. (Don’t worry if the text is backwards, I can fix that myself.) DON’T provide any identifiable information, like your name or email, since that’s a violation of MTurk policy.

The result will be used in a collage that can be found on my personal weblog, http://waxy.org. By uploading your image and accepting payment for the image, you give permission to me, Andy Baio, to use your image in all forms and media for any lawful purposes. (That’s just cover-my-ass language. I’m almost certainly only going to restrict it to this one project.) The collage will show up there shortly after the HIT is complete. Thanks, everybody!

I started the task at $.05, but only two people responded in the first 24 hours. (And one of those was Joshua Schachter, who I’d told about the project.) Clearly, that was too low, so I increased it to $.25, receiving only eight submissions in 48 hours. (For reference, all 500 of my Girl Talk tasks were done in about an hour.) Increasing it to $.50 got me 20 more submissions in about 48 hours, after which it started to drop off quickly. I wasn’t about to give dollar bills to random people for their photos, so I ended the experiment there. People aren’t willing to give up their anonymity for cheap.

The final results: 30 people total — 10 women, 20 men. Almost all were white, mostly in their 20s and 30s. 21 said they turked for money, 9 for fun or boredom.

Thanks for pulling back the curtain, Turkers.

Musicians Get Meta in Guitar Hero and Rock Band

Posted November 19, 2008 by Andy Baio

There’s something satisfyingly self-referential about watching talented musicians try to play their own music in Rock Band and Guitar Hero. Especially when they’re worse than you.

Here’s a list of every video I could find. Let me know if I missed any.

Anthrax’s Scott Ian, “Madhouse” at Best Buy

“You suck. You’re going to have to write easier songs… 20 years ago.”

Continue reading “Musicians Get Meta in Guitar Hero and Rock Band” →

Deconstructing Google Mobile's Voice Search on the iPhone

Posted November 18, 2008 by Andy Baio

I’ve experimented with audio transcription lately, but always with big, clumsy humans. I’d happily use ~~cyborgs~~ speech recognition software, but even today, automatic conversion of voice-to-text is still flawed. Naturally, I was intrigued when Google announced they were adding voice searching to their Google Mobile iPhone app.

Google’s flirted with voice-to-text conversion in the past, with GOOG-411 and their Audio Indexing of political videos on YouTube. But this is the first time they’re offering a web-accessible interface for speech conversion, albeit completely undocumented, so I decided to poke around a bit to see what I could find.

Over the last few hours, I’ve been analyzing the traffic proxied through my network, trying to reverse-engineer it to get to something usable, but I’ve hit my limits. I’m posting this with the hopes that someone out there can run with it and find out more.

Behind the Scenes

Here’s what we know so far: When you first start speaking into the microphone, the app opens a connection to Google’s server and starts sending over chunks of audio, almost certainly encoded with the open-source Speex codec.

The waveform image is generated on the phone and displayed along with a “Working” indicator and the adorable “beep-boop” sounds. In the background, a tiny file is being sent as a POST request to http://www.google.com/m/appreq/gmiphone. Here’s what the headers look like:

POST /m/appreq/gmiphone HTTP/1.1

User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1

Content-Type: application/binary

Content-Length: 271

Accept: */*

Accept-Language: en-us

Accept-Encoding: gzip, deflate

Pragma: no-cache

Connection: keep-alive

Connection: keep-alive

Host: www.google.com

The response from Google is an even smaller attachment. These two files are the same for every query, so don’t contain any meaningful information.

HTTP/1.1 200 OK

Content-Type: application/binary

Content-Disposition: attachment

Date: Tue, 18 Nov 2008 13:06:53 GMT

X-Content-Type-Options: nosniff

Expires: Tue, 18 Nov 2008 13:06:53 GMT

Cache-Control: private, max-age=0

Content-Length: 114

Server: GFE/1.3

After the audio’s sent to Google, they return an HTML page with the results and a second request is triggered, this time a GET request to clients1.google.com with the converted voice-to-text string.

GET /complete/search?client=iphoneapp&hjson=t&types=t

&spell=t&nav=2&hl=en&q=chicken%20soup HTTP/1.1

User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1

Accept: */*

Accept-Language: en-us

Accept-Encoding: gzip, deflate

Pragma: no-cache

Connection: keep-alive

Connection: keep-alive

Host: clients1.google.com

The response is an array of search terms in JSON format, for use in search autocompletion.

["chicken soup",[["http://www.chickensoup.com/","Chicken Soup for the Soul",5,""],["http://www.chickensoupforthepetloverssoul.com/","Chicken Soup for the Pet Lover's Soul",5,""],["chicken soup recipe","489,000 results",0,"2"],["chicken soup for the soul","1,470,000 results",0,"3"],["chicken soup dog food","462,000 results",0,"4"],["chicken soup with rice","467,000 results",0,"5"],["chicken soup diet","453,000 results",0,"6"],["chicken soup from scratch","364,000 results",0,"7"],["chicken soup for the soul quotes","398,000 results",0,"8"],["chicken soup crock pot","604,000 results",0,"9"]]]

Help!

Unfortunately, until we can isolate and decode the audio stream, playing with the voice recognition features is out of reach.

Any ideas on cracking this mystery would be hugely appreciated. Anonymity for Google insiders is guaranteed!

Updates

As several commenters figured out, and confirmed to me by Google, the audio is being sent to Google’s servers for voice recognition. The two binaries I posted above aren’t the actual transmission, and are actually identical for every query, so can be disregarded. Sorry about the red herring.

Gummi Hafsteinsson, product manager for Google’s Voice Search, says, “I can confirm that we split the audio down to a smaller byte stream, which is then sent to Google for recognition, but we can’t really provide any details beyond that.” Responding to my request for a public API, he added, “I appreciate the suggestion to provide voice recognition as a service. Right now we have nothing to announce, but we’ll take this feedback as we look at future product ideas.”

Also, Chris Messina discovered some secret settings in the application’s preferences file, including alternate color schemes and sound sets for “Monkey” and “Chicken.” Beep-boop!

Next step: As Paul discovered in the comments, the Legal Notices page says clearly that the app uses the open-source Speex codec for voice encoding. Can anyone capture and decode the audio being sent to Google?

November 19: I rewrote most of this entry to reflect the new information, since it was confusing new readers.

Yes We Did

Posted November 4, 2008 by Andy Baio

(Credit: Michael Buchino, also available as a shirt)