This morning, I woke to the news that Archive Team is working to save Upcoming. This is the Internet equivalent of hearing that Marsellus Wallace is sending The Wolf.
For those unfamiliar, Archive Team is a band of rogue archivists and programmers working to rescue dead and dying websites from destruction. To put it mildly, they are very good at what they do.
Led by computer historian/documentary filmmaker Jason Scott, they’ve saved massive sites like GeoCities, Friendster, MobileMe, Fortune City and many others from deletion, and collaborate with the Internet Archive to inject their backups into the Wayback Machine for permanent preservation.
The importance of their work can’t be overstated. While companies like Yahoo work to destroy as much Internet history as possible, Archive Team is the only group actively trying to save it.
To assist their efforts, they’ve developed ArchiveTeam Warrior, a virtual appliance that makes it easy for anyone to help archive dying websites and upload the backups to their server.
Want to help? Install Warrior right now.
It’s dead simple to get up and running, and works on Windows, Mac, and Linux. And because it all runs in a virtual machine, it can’t possibly hurt your system. It will only use your bandwidth and disk space.
After it’s installed, you can choose the “Upcoming” project to start backing up Upcoming.org specifically, or pick “ArchiveTeam’s Choice” to let the team decide. Posterous and Formspring are also dying soon, and that will allow the team to prioritize your work.
I made a little video showing how easy it is to start saving Internet history.
You can track the status of the Upcoming archiving effort in real-time, currently at around 6% of the complete site.
And again, thanks to all the dedicated volunteers of Archive Team for their effort.
Update (April 23): Three days later, the Upcoming archive is complete. Every event, venue, group, and user page is currently being compressed and uploaded in batches to the Internet Archive. Truly amazing.
My next step: to parse the HTML and extract structured data, distributed that database, and build something off it to make the community-contributed material accessible after Yahoo shuts it down.
Just an FYI for those that have spare boxes/instances lying around, you can just run the script separately.
Here’s how to get it running on Ubuntu (runs on :8001, I haven’t tested w/ virtualenv, don’t run this willy nilly on production boxes unless you know what you’re doing obvs)
but just yesterday you told us that upcoming was mostly spam since 2009….what needs saving?
Regarding the spam, it seems to me that it would contain interesting information. For example, what was the spam trying to sell (or otherwise) and how did that change over time? You can learn a lot about a culture by going through its garbage dumps.
Moreover, not *everything* on Upcoming is spam (especially in its early days). May as well get the whole shebang now, before it’s gone forever, then filter it later.
Here’s an example of some of the internal Yahoo dysfunctional bullshit.
This guy is so entitled, not realizing the love that people put into Upcoming.
http://pastebin.com/37e4eeg7
Similar to LHL’s notes, if you also want to run this on extra cloud instances of Heroku, you can do so for free. I hacked together some quick instructions and a script to help here:
https://github.com/mroth/upcoming-cloud-warrior/
This is particularly handy if you use a laptop and it’s not always online, or just want to maximize your contribution.
I’m currently running it on 3x dynos and it’s chugging along pretty fast.
@anonymous April 20, 8:05pm
Wow. I keep wanting to give Yahoo a sixth or seventh chance because of the good old days. But they keep making that hard to do, with crap like this.
Dang, VMWare won’t let me run the archive warrior machine after importing it. Something about no master IDE drive (I have SSD/Fusion in this iMac) and it just dies.
Thanks, I’m running the software now. I’m actually in the middle of moving a friend’s blog from Posterous and didn’t know the effort was under way. It’s a big relief to know that the old site may stay up, even if only to point people to the new one.
I buy my own email, recently signed up for Pinboard, and am kind of actively looking for other services I can pay for. We can buy & build our way out of a nightmare future of Facebook, Google, etc. pwning us all.
I used an @mailinator address the other day while posting a comment (and am doing so again) to retain my anonymity – is there a reason the comment wasn’t posted? Did it get caught up in the filter?
I usually remove comments that add little to the conversation, and especially substance-free comments that are entirely anonymous. Go post it on Hacker News or use your real name.
Andy, I think you’re a great guy and understand your frustrations – I even downloaded Warrior and am doing my best to preserve Upcoming, because I think on a personal level that it’s important to keep as many artifacts from our culture as possible. For professional reasons, I can’t include my name; however, I thought by including the quote from the Y! email thread, the conversation could be furthered because they’re legitimate points outside of the crappy delivery by the original author.
In addition to that, Yahoo is facing a huge lawsuit currently for releasing user data in 2006 to some researchers without properly scrubbing it. Of course they’re wary of providing any internal info regarding users and using company resources for a product that didn’t work is rightfully within the realm of legitimate management concern.
I’ve repasted a truncated version of the comment below because I think it’s an important topic for founders/users/readers to consider. Data retention isn’t even something most people think about when selling and if it is a part of their ethos, need to have written in.
“I guess I didn’t realize that the stockholders are under some obligation to pay for the servers and electricity to keep upcoming, geocities, etc. online FOREVER.
He could have included provisions about what happened if Y! shut down upcoming in his contract, but he apparently didn’t. Too bad!
I agree with one thing he says. If he couldn’t live with all of the possible consequences of selling to Yahoo, he shouldn’t have done it.”
Woohoo! Archive Team successfully finished archiving all the events that they generated IDs for: http://tracker.archiveteam.org/upcoming/