ProPublica's guide to scraping data

using free tools to get structured data out of messy HTML, PDFs and Flash