
For the last ten years, obsessive record collectors in Usenet have been working on the Whitburn Project — a huge undertaking to preserve and share high-quality recordings of every popular song since the 1890s. To assist their efforts, they've created a spreadsheet of 37,000 songs and 112 columns of raw data, including each song's duration, beats-per-minute, songwriters, label, and week-by-week chart position. It's 25 megs of OCD, and it's awesome.
As far as I know, this is the first time the project and its data have ever been discussed outside of Usenet. Despite its illegality, they've created a wonderful resource and you can do some fun things with the data. For the next three days, I'm going to publish some analysis and insights gleaned from their work. Update: I published an entry about one-hit wonders and pop longevity.
History of the Whitburn Project
Named after Joel Whitburn and his authoritative Billboard books, the Whitburn Project began in 1998, when a group of 15 collectors pooled their resources to create an MP3 collection of every single in the top 40. They experimented with trading the files on P2P networks, but eventually landed in Usenet instead.
The Excel spreadsheets were created to help them verify their collections were complete, with new versions updated and re-uploaded to the newsgroups weekly. Later, other collectors found the spreadsheet and built tools on top of it, including a utility to rename files properly and locate missing songs.
Originally, most of the Whitburn Project was simple data entry and fact-checking, but as the project grew, it forked away from the Whitburn books. "This spreadsheet does not reflect the Whitburn information found in his books," wrote Bullfrog, one of the spreadsheet's maintainers. "Whitburn has changed the way he numbers the annual songs at least twice since this [spreadsheet] was created. We feel that he went off the deep end a little, so will not be following his new numbering scheme."
They've also added new fields culled from their own research. "Obviously with the addition of BPM, genre, and the like," wrote Bullfrog, "it has become its own entity and will continue to be from now on."
Over the last few months, I've tried multiple times to contact the maintainers of the spreadsheet and the excellent Whitburn newsgroup FAQ, but they haven't responded.
The Data
There are several Whitburn spreadsheets uploaded to multiple Usenet newsgroups sporadically, but the most useful is the "Billboard Pop ME (1890-2008)," which is posted in alt.binaries.sounds.whitburn.pop.
Note: This data is almost certainly a violation of Billboard's copyright, and probably infringes on Record Research's books too. The analysis I'm publishing here should fall under fair use, but redistributing the spreadsheet would not. If you're brave (or dumb) enough to locate and mirror a copy of the file, leave a comment. Update: An anonymous commenter posted the spreadsheet to Rapidshare/Megaupload.
Above is a sample of the top 10 songs from 2007, so you can see the format and fields of the collected data, along with the key explaining each column. (Scroll to the right to see all the fields.)
Song Lengths Over Time
I'll be focusing more on analysis tomorrow, but here's one of the first questions I asked when stumbling on this spreadsheet. Are pop songs are longer or shorter now than in previous decades? A quick query reveals this chart of average playtimes per year.

Pop songs became shorter in the early 1960s, around the 2:30 mark, before rising yearly until peaking in 1992 at 4:16. Since then, pop songs have hovered around 4 minutes long.
The longest charting song of all time is Harry Chapin's live version of "A Better Place to Be," at an epic 9 minutes and 30 seconds. Runners-up include Guns n' Roses' "November Rain" (8:56), Don McLean's "American Pie" (8:36), and a new entrant, Death Cab for Cutie's "I Will Possess Your Heart" (8:35).
And the shortest? The Womenfolk's cover of Malvina Reynolds' "Little Boxes" from 1964 is only 1 minute and 3 seconds long. The shortest modern song to chart is Zac Efron's "What I've Been Looking For," the third-shortest charting song of all-time at a brief 1:19.
How about the length of the perfect pop song? For this, we can look at the mode to find the most common song lengths by decade. For example, in the 1940s, there were 42 songs that were exactly 3:01, making it the perfect song length for that decade.
1950s, 2:30 (95 songs)
1960s, 2:30 (250 songs)
1970s, 3:30 (153 songs)
1980s, 3:59 (142 songs)
1990s, 4:00 (132 songs)
2000s, 3:50 (58 songs)
I was surprised at how exact these numbers are. The capacity for 45 RPM records was about three minutes, setting the standard for pop singles well into the 1960s. By the late 1960s, those constraints were removed, and we start to see longer singles. But without artificial constraints, why did exactly four minutes become the de facto standard in the 1980s and 1990s? (Maybe Madonna knows.)
I'm tired. More analysis tomorrow, including a look at one-hit wonders and how quickly singles fall off the charts over time. Update: Here it is.

Waxy.org is the sandbox of Andy Baio, an
independent journalist and programmer living in
Portland, Oregon. I created 
5:10 AM
This is a labour of love, and a very impressive piece of work.
I think there is a useful comparison to be made here between freely available information of this sort and non-DRM-protected (or cracked) books, games, and audio files.
In decades and centuries to come, databases such as this one will become an invaluable resource for historians, whereas their proprietary equivalents will languish in Billboard's (or any other Billboard's) archives, if such companies even continue to exist.
Similarly, protected media is certainly advantageous from a short-term profit-making point of view, but what cultural relevance will it have for future generations once licence servers have been shut down and its contents have been rendered inaccessible by encryption? How can anything so ephemeral pass into the canon of human achievement?
Another point that crossed my mind yesterday was that as storage space increases and the file-size of audio remains relatively constant, we may well get to the stage where perfect copies of record companies' entire catalogues (of verifiable quality and authenticity thanks to lists such as these) are traded as easily as text files. (It is true that private torrent sites are getting us at least some of the way there). To me, that is both an exciting and daunting thought.
5:52 AM
Very interesting..
It'd be interesting to see the maximum length of time a track has been in the charts per decade.. I.e. - do we tire of our pop songs more quickly now than we did in the past?
Also, what was the youtube link to? It's not available for those of us in the UK.
7:09 AM
It's a Madonna song called "4 Minutes," which, curiously enough, has very little of Madonna singing and much more from two random guys.
7:17 AM
I wonder why songs became as short as they did in the late 1950s, when they had been 30 seconds longer ten years prior. Because it was fashionable, and what other recording artists were doing at the time?
It's also interesting to see songs started becoming longer from the start of the 1960's when the likes of the Beatles and the Rolling Stones were coming onto the scene. Coincidence?
7:53 AM
Great find, Andy, thank you for posting about it. These guys are librarians, what they're doing is cataloging and archiving media. It's a valuable service. I'm sure you're correct about the copyright concerns of the project, but it's a shame that the discussion has to be framed that way.
7:54 AM
the underlying data may not be protected by copyright, so let's not pull the "this is illegal" trigger so quickly.
here's an interesting discussion of databases and copyright:
http://www.bitlaw.com/copyright/database.html
8:10 AM
I can't help but think that this effort would be better spent at Musicbrainz.org - Excel spreadsheets don't have APIs
10:41 AM
hey what happens when you click on my name
10:59 AM
Clare: Yeah, I've read that thoroughly. The spreadsheet is a compilation, which remains copyrighted. Also, it's more than just a database of facts. Billboard is tracking the retail/online sales and radio play themselves (i.e. Neilsen Soundscan) and using their own method to determine rankings, so it meets the "industrious collection" requirement, as well.
a birdy: I can't condone it, but I can verify that the link you posted is the same version of the spreadsheet I used for my analysis.
11:30 AM
a birdy : thank you - you are like a little birdy visiting my window and singing a sweet song.
andy : fantastic - have never heard of this list - the song length data is fascinating.
...and to the compilers of this list, I tip my hat to you all.
1:48 PM
this is amazing. thank you, and thanks to the newsgroup participants.
7:14 PM
That is so awesome!
I wonder how large the mp3 collection of the spreadsheet gets, tho, come to think of it, they prolly trade it in FLAC.
7:21 PM
John Lampard: I wonder why songs became as short as they did in the late 1950s, when they had been 30 seconds longer ten years prior. Because it was fashionable, and what other recording artists were doing at the time?
My uneducated guess: maximizing consumers' enjoyment of their newfangled rocking and rolling by speeding up tempos. Faster = more fun to dance to, at least to a point.
7:45 AM
I did receive an email from Andy in April. I have been so busy that to be honest, I totally forgot about it. You will be getting a response shortly. As for the copyright, you are probably right. However, if you subscribe to Billboard (I have for over 30 years now), you will find that any old data is no longer available. Questions posted to Fred Bronson from Chart Beat shows that old data is no longer available. Even some of the charts you can get online at Billboard do not match what was in the actual magazine (due to updated or corrected data I think). Some of this data is not available anywhere else, like which charted singles were mono and which were stereo in the late 60s, early 70s (the main project I am now on).
As for whitburn, don't even get me started there. To me he has never been anything more than another collector, that happened to put out a book. The spreadsheet only uses his numbers because they are widely known. His numbers were used at the very beginning of the project and has grown from there. Believe me, if I could I would change them. No one else in the industry ranks records the same as he does. Also, the times he uses in his annuals are not always correct. He puts the time that is shown on the label. We are slowly verifying every 45 and putting actual play time. Pat Downey does this, but his data is incomplete and only covers top 40.
As Andy has shown, it is easier to extract data (the time differences is interesting) from one location, then trying to gather it all up from many sources.
For those interested, the spreadsheet is updated weekly and posted on Usenet every 2 to 3 weeks
3:22 PM
Which newsgroup?
---
For those interested, the spreadsheet is updated weekly and posted on Usenet every 2 to 3 weeks
5:15 PM
I found it in alt.binaries.sounds.whitburn.pop, but it's posted in the rest of the alt.binaries.sounds.whitburn.* groups. If your Usenet provider doesn't have those, it's also in alt.binaries.sounds.1950s.mp3, alt.binaries.sounds.1960s.mp3, alt.binaries.sounds.1970s.mp3, alt.binaries.sounds.1980s.mp3, and alt.binaries.sounds.1990s.mp3.
8:54 AM
Andy,
Is there any master lookup for song titles ? A separate
website not related to the Whitburn behemoth that's
easy for a Mac user to use ?
Thanks,
Harvey
8:58 AM
I'm on a Mac, and the Excel spreadsheet works fine for me. I'd love to provide a lookup service, but that'd likely be a violation of Billboard's copyright.
11:15 AM
Thanks, Andy...
9:43 AM
Andy, this is great stuff, thanks for bringing this to my attention and for your interesting and thoughtful analysis.
2:22 PM
Thanks Andy and anyone else involved with this.
10:04 PM
Wonder if anybody in this group may have some info on Bobby Lewis' " Tossin' and Turnin' ".
On one version, the line... " Baby, Baby, You did something to me " opens the song.
On the other ( Australian release ? ) that line is absent.
Both sound like the same 1961 version.
Would appreciate any insight in to the history.
Thanks.
12:06 AM
"How about the length of the perfect pop song?...
But without artificial constraints, why did exactly four minutes become the de facto standard in the 1980s and 1990s?"
There is no such thing as the perfect pop song length per se, you can keep repeating a song you like endlessly without being bored for long time... everyone does it or has done it at least once for sure, if there's a hidden magic number, at least is not musically related.
Here is the big reason for the song length patterns:
Another artificial constraint not spoken of but used I bet, since the beginning:
Advertising space
Radio and TV stations don't survive from thin air so they need to include advertising, probably something along this lines happened:
"How much time people can tolerate ads before changing to another station? " they went and worked out some statistics I guess, and came up with 25% of air time to advertising, somebody else can know better than me about how they came with this number.
Now, "that's 15 minutes per hour, what do we do with the other 45 minutes?"
Distract people from the fact that this is merely business by playing some songs
"how many songs do we need to play?, it does not matter as long as they are entertained"...
... wait a minute, that's not true, if you play one or two 8 minute songs one after another, are people going to wait till they are bored or will change station too?
"we need to play only short songs, we have a deal!"
Or not?
"Hey, somehow we can't structure correctly our programs to one hour blocks, why?"
Easy one, how many songs have the same length in minutes? barely none, is not the nature of any form of art to be constrained in that way, period.
"Pfffffff... hey you record company, fix this!".
And "Radio edit" term was born, they fixed things by shortening songs, taking out some bars from the song... that solo nobody understands, the boring intro, the over extended chorus, the cinematic grand finale, etc. whatever the song does "not need".
And this reached the musicians: "either you fix the song or they will not play it, that's it."
I got this speech from a producer not too many years ago, it's true.
End result: musicians have to think of a song in terms of radio playability too, is not just a matter of art anymore.
At this point somebody probably is asking why the length changes then if all was already worked out in favor of radio stations?
Have a look at the table again:
1950s, 2:30 (95 songs)
1960s, 2:30 (250 songs)
Up until now the constrain was the size of the 45" vinil records, it's true.
But it's also true that modern music as we know it was just in dippers and starting to evolve, new music never heard was being created everywhere in the world and new genres too so there was no need for lengthy songs, wherever you looked at, there was a couple of good songs to add to the radio station rotation.
1970s, 3:30 (153 songs)
1980s, 3:59 (142 songs)
1990s, 4:00 (132 songs)
2000s, 3:50 (58 songs)
And then song lengths increase in a steady way, but still you can see the time length controlled pattern, no matter the decade.
The answer to this behavior is simple as well:
As the genres grew old, there was a lot less new and good song choices to pick up so air time has to be filled with lengthy songs more and more each decade.
The decrease in talent or creativity is another big subject to discuss about, and radio has a lot of guilt on this too.
I know is kind of hard to take this facts in consideration but certainly they give another perspective to the project, and can modify some of the results.
Nevertheless, great project! ;)
10:40 AM
Great stuff! The dream (admittedly a silly one that will not really bring happiness, but that's OCD for you) would be to have instant access to lossless copies of every one of these songs (plus all variations and remixes of them) at any location at any time. Then add instant access to any music videos or live performances, plus the same thing for every album that ever charted ... and every song on EVERY other chart ... maybe of every country. I wonder how much hard drive space THAT would take up?
11:26 AM
It's really not that crazy. If you ballpark an average 40MB per song, getting lossless versions of all 37,000 songs would be 1.4 terabytes of data. That sounds like a lot now, but you can buy a terabyte hard drive for under $200 now. That'll only get cheaper.
And if the project moved from Usenet to BitTorrent, a single torrent (or set of torrents) could manage all the files and let people fill/replace their missing pieces.
11:37 AM
Oh yeah, and lyrics too, of course. And karaoke versions and everything else you could think of, all instantly available from anywhere.
11:47 AM
Sorry, Andy, my comment at 11:37am was made without having seen your comment at 11:26am. However, your comment does not acknowledge the extras in my 10:40am comment: INSTANT access to all of these songs PLUS: every song on every Top 100 album ever, every music video, every song on all the other Billboard charts (R&B, country, many more), every song on every other nation's major charts, every other version/variation of each song (there are often many!), etc. [Plus karaoke versions, lyrics, parodies, what have you.] Petabytes!
12:00 PM
Give it time.
1:10 PM
How many years do you think? 10? 20? Until all this media should be available instantly by portable device (cellphone or whatever technology comes next). And, of course, you'd want every episode of every TV show ever, plus every movie ever, etc. It would be cool to live to enjoy that, but of course it won't mean humans are one iota happier.
Anyway, thanks a bundle for the info! I'd first encountered this spreadsheet in late '05 but sort of forgot about it until now. I think I'll normalize the data into a set of database tables and play around with it (create queries and the like). I know it's not OK to go public with it, but I hope it's OK for personal use.
12:24 PM
Hi Andy,
I've just namechecked your tireless work at my website musicradar.com - on the back of a recent UK survey that finds 44% of 18-24 year olds skip a song after 30 seconds. That's iPod culture gone mad. In 5 years time, even the best artists may be just writing jingles just to get heard...
Great project, it's fascinating stuff.
10:34 AM
Incredible data. I have whitburns books for the 60's 70's and 80's, and do not regret having purchased them.
Putting the data in a spread sheet makes it more researchable and easier to use. Needless to say, pulling out the books when I need a quick check of a fact, or what songs were charting when tends to be a pain.
Thanks for the information!
12:50 PM
I just found this discussion. What a great piece of work, reliable chart information from before 1940 is really hard to find.
My site combines music charts from all over the world to do a similar thing for the world's music. I think that you guys would find it interesting.
11:38 AM
Is there anywhere i can find correct times of all 45rpm records from billboards hot 100 history?
6:58 AM
I was wondering if anyone is working on doing this complete billboard collection of all orginal 45's rather than a mix of 45's and digital cd? Thanks
8:13 AM
Can someone please tell what the A-S-X tab on the spreadsheet represents? I can't seem to find an explanation anywhere. thanks!
1:44 PM
Jim wrote:
Can someone please tell what the A-S-X tab on the spreadsheet represents? I can't seem to find an explanation anywhere. thanks!
A = Air Play
S = Sales
X = Christmas
1:57 PM
O.T. wrote:
Is there anywhere i can find correct times of all 45rpm records from billboards hot 100 history?
as far as i know there is no complete offical list. Since you can't believe the labels themselves or Whitburns books, I guess the only way is to listen and time them yourself.
O.T. wrote:
I was wondering if anyone is working on doing this complete billboard collection of all orginal 45's rather than a mix of 45's and digital cd? Thanks
Easy answer is yes, it is being tried. Hard part is trying to find all the original 45s, 78s, single cassettes, and single CDs. If you want the collection to be authentic then there must be a way of checking each entry. anyone can say this is an original 45 rip, try proving it.