Like the rest of the world, I've been completely obsessed with the presidential election and nonstop news coverage. My drug of choice? Gabe Rivera's Memeorandum, the political sister site of Techmeme, which constantly surfaces the most controversial stories being discussed by political bloggers.
While most political blogs are extremely partisan, their biases aren't immediately obvious to outsiders like me. I wanted to see, at a glance, how conservative or liberal the blogs were without clicking through to every article.
With the help of del.icio.us founder Joshua Schachter, we used a recommendation algorithm to score every blog on Memeorandum based on their linking activity in the last three months. Then I wrote a Greasemonkey script to pull that information out of Google Spreadsheets, and colorize Memeorandum on-the-fly. Left-leaning blogs are blue and right-leaning blogs are red, with darker colors representing strong biases. Check out the screenshot below, and install the Greasemonkey script or standalone Firefox extension to try it yourself.
Note: The colors don't necessarily represent each blogger's personal views or biases. It's a reflection of their linking activity. The algorithm looks at the stories that bloggers linked to before, relative to all other bloggers, and groups them accordingly. People that link to things that only conservatives find interesting will be classified as bright red, even if they are personally moderate or liberal, and vice-versa. The algorithm can't read minds, so don't be offended if you feel misrepresented. It's only looking at the data.
For example, while Nate Silver of FiveThirtyEight may be a Democrat, he has a tendency to link to stories conservative bloggers are discussing slightly more often than liberal bloggers, so he's shaded very slightly red. (Geeks can read on for more details about how this works.)
After it's installed, go to any page on Memeorandum and wait a second for the coloring to appear. I hope you like it!
How It Works (Nerds Only)
The first challenge was getting the data. I emailed Gabe Rivera, and he graciously gave offered a full dump of every blog listed on Memeorandum. This didn't include relationship data, showing which blogs linked to which stories, so Joshua and I crawled the site instead. Using the historical archives, we took a snapshot of the site's homepage for every six hours for the last three months — about 360 total. With a Python script, Joshua scraped the links from the saved HTML to get the link data.
Armed with the spreadsheet of over 50,000 blogger-to-article relationships, we needed to somehow find correlations in the data. We used a method called Singular Value Decomposition (SVD), a method to break down complex data in matrices to its component parts. It's extremely flexible, used in applications as diverse as weather prediction, movie recommendations, genome modeling, clustering search results, and image compression.
Inspired by GovTrack's use of SVD to visualize the political spectrum for members of Congress, we attempted to do the same thing for political blogs.
Here's how Joshua describes the methodology:
I created an adjacency matrix, with discussion sites as the rows and the discussed articles as the columns. When a site discusses an article on Memeorandum, we fill in a 1 in that cell; everything else is left as zero.
Every site becomes a very high dimensionality vector into link-space. This is very difficult to visualize. (Unless your monitor displays many dimensions. Mine only has two.) Since a bunch of sites tend to link to the same groups in the same way, we don't need all those dimensions. So, very roughly, what SVD lets us do is reproject the points in space into a new coordinate system, so that the points that are similar are near each other and we know which dimensions are most important. We can take just the most significant ones.
We could use two or three for a nifty visualization, but we wanted to show the bias as a spectrum, which is just a single dimension. In this case, the second most significant dimension (v2) ends up corresponding to linking similarity. The first dimension (v1) corresponds to how much linking they do in general.
Curiously, when running the exact same analysis on Techmeme, the second most significant dimension ends up being Business vs. Technology. (The conservatives/liberals of the geek world?)
Did you get all that? If you'd like to try to figure out what the other dimensions represent, take a look at columns v3-v5 on the full spreadsheet below and let us know if you come up with anything. (We didn't have much luck.)
Once we'd realized that the second dimension (v2) highly correlated with political leaning, we uploaded the spreadsheet into Google Spreadsheets and created a new column with a normalized score, scaled between a range of -1 and 1. The spreadsheet, with all of the sources and their respective scores, is below. (Download the Excel document or CSV if you want to sort or filter the data.)
After deriving the scores, writing the Greasemonkey was straightforward. Google offers XML feeds for Spreadsheets, so I queried this public feed of our data using XMLHttpRequest, parsed it, and colored it based on the score.
If you have any improvements to the code, please pass them on by emailing me or IMing me using my contact information at the top of the page.
I'd love to know what dedicated Memeorandum fans think of this. For me, it makes the site much easier to skim. At a glance, I can see what left-wing and right-wing bloggers each find interesting and, more importantly, when there's an article that's of genuine interest to both parties. It's also interesting to quickly see which bloggers cross party lines, willing to link to stories that don't favor their own candidates.
I hope you like it, and please contribute your changes to make it better!
Puffinware's SVD tutorial is one of the most concise, coherent explanations of SVD I could find for the layman. Ilya Grigorik applied SVD to build a recommendation system in Ruby, with great explanations and source code. Simon Funk explains how he used SVD to tie for third in the Netflix Prize leaderboard (for a short time).
For those interested in network analysis of the political blogosphere, this 2005 academic paper by Lada Adamic and Natalie Glance looked at bloggers during the 2004 election. It's two very separate worlds, as shown in this chart. Also, Microsoft Research's Blews and Political Streams projects visualize bias and emotion for stories surfacing on political blogs.
This was my first Greasemonkey script, and I found Mark Pilgrim's Dive into Greasemonkey invaluable. I highly recommend writing a couple scripts yourself; it's incredibly empowering to modify other people's websites.
A special thanks to Gabe Rivera for building Memeorandum and Techmeme and for supporting this little project.
October 10: J. Chris Anderson built a bookmarklet for use with non-Firefox browsers, or by anyone who just wants to test it out without installing an extension. This also has the benefit of working on sites beyond Memeorandum, like Google News. (Though, of course, it will only color sites that appear in our spreadsheet.)
October 11: Brendan O'Connor compared our unsupervised machine-derived rankings to human judgments of political bias on Skewz, and found there's a significant correlation. He released the code and full dataset on his entry.