Waxy.org
Waxy.org is the sandbox of Andy Baio. I run XOXO, built Playfic and Supercut, helped build Kickstarter, founded Upcoming, made an album, and some other stuff too.

Contact Me: Email, AOL IM, or follow me on Twitter.

How I Indexed The Daily

Posted Feb 23, 2011

For the last three weeks, I've indexed The Daily. Now that my free trial's up, I've had an intimate look at what they have to offer and, sad to say, I don't plan on subscribing. As a result, I'm ending The Daily: Indexed, my unofficial table of contents for every article they published publicly.

I'm surprised and grateful that The Daily executive and legal team never tried to shut it down. On the contrary, when asked directly about it, publisher Greg Clayman said, "If people like our content enough to put it together in a blog and share it with folks, that's great! It drives people back to us." They seem like a nice bunch of folks, and I hope they succeed with their big publishing experiment.

But now that I'm ending it, I can finally address the most common question — how did I do it?


The Daily: Indexed is just a list of article headlines, bylines, and links to each article on The Daily's official website. Anyone can grab the links from the Daily iPad app by clicking each article's "Share by Email" button, but that would've taken me far too long. So, how to automate the process?

When you first start The Daily application, it connects to their central server to check for a new edition, and then downloads a 1.5MB JSON file with the complete metadata for that issue. It includes everything — the complete text of the issue, layout metadata, and the public URLs.

But how can you get access to that file? My first attempt was to proxy all of the iPad's traffic through my laptop and use Wireshark to inspect it. As it turns out, The Daily encrypts all traffic between your iPad and their servers. I was able to see connections being made to various servers, but couldn't see what was being sent.

Enter Charles, a brilliantly-designed web debugging proxy for Mac, Windows, and Linux. By default, Charles will listen to all of your HTTP network traffic and show you simple, but powerful, views of all your web requests. But it can also act as an SSL proxy, sitting in the middle of previously-secure transactions between your browser and an SSL server.

After grabbing the JSON, I was able to write a simple Python script to extract the metadata I needed and spit out the HTML for use on the Tumblr page. Here's how to do it.


Configuring Charles

1. Download and install Charles on your desktop machine. On your iPad, navigate to http://charlesproxy.com/charles.crt to trust Charles' SSL certificate.

2. For Mac users, start Network Utility to get your desktop's local IP address. Start your iPad, make sure it's on the same wireless network as your desktop, and go into Settings>Network>Wi-Fi. Select the wireless network, and click the right arrow next to it to configure advanced settings. Under "HTTP Proxy," select "Manual." Enter the IP address of your desktop for "Server" and enter in "8888" for the port.

3. Now, start Charles on your desktop and, on the iPad, try loading any website. You should see assets from that website appear in Charles. If so, you're ready to sniff The Daily's iPad app.


Indexing the Daily

1. Start the Daily app on the iPad. Wait for it to download today's issue. In Charles, drill down to https://app.thedaily.com/ipad/v1/issue/current, and select "JSON Text."

2. Copy and paste the raw JSON into a text file.

3. This Python script takes the JSON file as input, and spits out a snippet of HTML suitable for blogging. I simply pasted the output from that script into Tumblr, made a thumbnail of the cover, and published.


The End

So, that's it! Hope that was helpful. If any fan of The Daily out there wants to take over publishing duties, I'll happily pass the Tumblr blog on to you.

15 Comments (Add Yours)

Feb 23, 2011
10:54 AM  
Julien wrote:

Wouldn't it be much much simpler if the daily had feeds?


Feb 23, 2011
11:10 AM  
Andy Baio wrote:

They want everyone reading the articles on their iPads, so I wouldn't hold my breath for RSS.


Feb 23, 2011
1:10 PM  
Aaron wrote:

Wow, a 1.5 MB JSON file! Is that compressed, or uncompressed? Wouldn't it make more sense to just grab a ToC file, and then start streaming in articles in the background (with top/front page stuff coming first)?


Feb 23, 2011
2:56 PM  
Kosso wrote:

Great stuff! I was looking into doing something like this. Thanks for bringing Charles to my attention! Very useful indeed. I've been looking for a tool like this for quite some time! :)


Feb 23, 2011
4:30 PM  
KFR wrote:

thanks for temp service and doc


Feb 24, 2011
2:25 AM  
Rounak Jain wrote:

Got to know of two really nice tools. Thanks :)


Feb 24, 2011
5:06 AM  
Marcelo wrote:

In my comment to your original post about the Daily indexed, I said "Nice trick". Now, knowing the details of how you did it, I say again: Nice trick!
I also want to say thanks for calling my attention to Charles application.


Feb 24, 2011
7:11 AM  
Dan Christ wrote:

Great simple solution. Really liked to read how you solved this riddle, and then shared the results. I'll be sorry to see the index go.


Feb 24, 2011
11:15 AM  
Noah wrote:

May I send you money to continue your subscription (and this website)? I checked it every day and now I'm kind of bummed...


Feb 24, 2011
3:02 PM  
Andy Baio wrote:

Noah: Sorry, I'm out of the indexing game! Hopefully, someone reading this will step up to the plate.


Feb 25, 2011
1:59 PM  
Adam wrote:

Curious to see that the first few days of The Daily's web content was actually text-based (you could copy and paste it), but a few days into it they switched to just posting screenshots from the app instead.


Feb 25, 2011
2:36 PM  
Andy Baio wrote:

Yeah, that's part of why I wasn't interested in continuing the index. Originally, the index helped search engines index the content, but there's nothing much to index with screenshots alone.


Feb 25, 2011
4:49 PM  
pascal wrote:

And you've pointed out a common security issue: the Daily app should have been checking the identity of the server's certificate, but instead it just made sure there was any kind of encrypted connection. So if any of the certificates in the iPad's store is breached, the app is vulnerable (I don't have it, but I'm guessing unlike in a browser, it doesn't allow the user to check the server's identity?)


Mar 3, 2011
7:50 AM  
moose wrote:

Can someone make a Yahoo pipes feed for The Daily? Wouldn't that make more sense?


Mar 31, 2011
12:51 AM  
rash wrote:

Yes, feeds can be more easily indexed than pages. It's a good idea, moose.


 

Leave a comment





Waxy Links
Ads via The Deck
June 20, 2013
Feedly Cloud launches, freeing it from Google Reader — fast and solid feedreader alternative to NewsBlur
June 19, 2013
John McAfee's NSFW video on how to uninstall McAfee Antivirus — I love that he snorts his bath salts through a krazy straw
June 18, 2013
The Deletionist — bookmarklet turns any webpage into an erasure poem; examples: Daring Fireball and me
Gunpoint recoups development costs in 64 seconds — linkbait headline for the delightful news that Tom Francis will be working on games fulltime
Maciej Ceglowski on the NSA and modern surveillance — related: using metadata to find Paul Revere
June 17, 2013
Battle for the planet of the APIs — "If those services don't trust me enough to give me an RSS feed, why should I trust them with my data?"
Edward Snowden live Q&A — Cosmo asks the tough questions
June 15, 2013
Instant Server — intantly spin up an Ubuntu server with a built-in terminal for 35 free minutes (via)
June 14, 2013
Google's Project Loon — high-altitude balloons with Internet access for rural and remote areas (via)
Matt Haughey on Gmail's Organized Inbox — just enabled it, and it's instantly useful
The Internet of Actual Things — "Your light bulbs will narrate their agonizing deaths."
Sci-Fi Corridor Archive — so many octagons (via)
We See In Every Direction — Jonas Lund built a massively-multiplayer web browser (via)
Filmmaker sues to prove Happy Birthday To You is public domain — and, best of all, they want Warner to pay back millions in undeserved licensing fees
NYT on how Yahoo tried to fight PRISM in court — related: the story of one CEO that defied NSA wiretap orders
June 13, 2013
Profile of NYC teen who speaks 20 languages — part of THNKR's prodigy series (via)
Apple's short film on the personal impact of four iOS apps — helps to explain why this app can cost $220 and still have four stars
Wemoji — reenact emoji icons with your webcam; more unlock as photos are added
Foursquare Time Machine — don't miss the infographic it generates in the "Share My Stats" section
Stamen's Map Stack — powerful photo filters for map design
Venus Patrol's Horizon press conference — stunning lineup of upcoming artful indie games, an antidote to E3 ego and bluster
Jony Ive Redesigns Things — apparently, I started a meme
George Lucas and Steven Spielberg on the future of film — "out of that chaos will come some really amazing things... because all the gatekeepers have been killed!"
John Martz ends Drawn — don't know how I missed this, but I'm sad to see it go
June 12, 2013
Kyle McDonald's Caricature — automatically generating caricatures based on motion
Geek vs. Nerd — analyzing 68k tweets to see how the terms are used
June 11, 2013
Author Hugh Howey on the future of self-publishing — widely applicable across all indie art and tech
ScummVM ported to Javascript — ported with Emscripten, audio's Firefox-only for now
Frank Chimero on the iOS 7 redesign and perspective — also, Leo Drapeau's quick icon redesign
In Defense of Art Games — fantastic Ignite talk by Owen Goss; links to the cited games

Andy Baio lives here. Some rights reserved, for your pleasure.