April 17, 2009 CMU's ClueWeb09, 1 billion website crawl available for researchers — massive 25 terabyte dataset shipped on four 1.5 terabyte drives; get this up on AWS! (via) #