Waxy.org
Waxy.org is the sandbox of Andy Baio, an independent journalist and programmer living in Portland, Oregon. I created Upcoming.org and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM

The Whitburn Project: 120 Years of Music Chart History

Posted May 15, 2008 (Updated May 17, 2008)

For the last ten years, obsessive record collectors in Usenet have been working on the Whitburn Project — a huge undertaking to preserve and share high-quality recordings of every popular song since the 1890s. To assist their efforts, they've created a spreadsheet of 37,000 songs and 112 columns of raw data, including each song's duration, beats-per-minute, songwriters, label, and week-by-week chart position. It's 25 megs of OCD, and it's awesome.

As far as I know, this is the first time the project and its data have ever been discussed outside of Usenet. Despite its illegality, they've created a wonderful resource and you can do some fun things with the data. For the next three days, I'm going to publish some analysis and insights gleaned from their work. Update: I published an entry about one-hit wonders and pop longevity.

History of the Whitburn Project

Named after Joel Whitburn and his authoritative Billboard books, the Whitburn Project began in 1998, when a group of 15 collectors pooled their resources to create an MP3 collection of every single in the top 40. They experimented with trading the files on P2P networks, but eventually landed in Usenet instead.

The Excel spreadsheets were created to help them verify their collections were complete, with new versions updated and re-uploaded to the newsgroups weekly. Later, other collectors found the spreadsheet and built tools on top of it, including a utility to rename files properly and locate missing songs.

Originally, most of the Whitburn Project was simple data entry and fact-checking, but as the project grew, it forked away from the Whitburn books. "This spreadsheet does not reflect the Whitburn information found in his books," wrote Bullfrog, one of the spreadsheet's maintainers. "Whitburn has changed the way he numbers the annual songs at least twice since this [spreadsheet] was created. We feel that he went off the deep end a little, so will not be following his new numbering scheme."

They've also added new fields culled from their own research. "Obviously with the addition of BPM, genre, and the like," wrote Bullfrog, "it has become its own entity and will continue to be from now on."

Over the last few months, I've tried multiple times to contact the maintainers of the spreadsheet and the excellent Whitburn newsgroup FAQ, but they haven't responded.

The Data

There are several Whitburn spreadsheets uploaded to multiple Usenet newsgroups sporadically, but the most useful is the "Billboard Pop ME (1890-2008)," which is posted in alt.binaries.sounds.whitburn.pop.

Note: This data is almost certainly a violation of Billboard's copyright, and probably infringes on Record Research's books too. The analysis I'm publishing here should fall under fair use, but redistributing the spreadsheet would not. If you're brave (or dumb) enough to locate and mirror a copy of the file, leave a comment. Update: An anonymous commenter posted the spreadsheet to Rapidshare/Megaupload.

Above is a sample of the top 10 songs from 2007, so you can see the format and fields of the collected data, along with the key explaining each column. (Scroll to the right to see all the fields.)

Song Lengths Over Time

I'll be focusing more on analysis tomorrow, but here's one of the first questions I asked when stumbling on this spreadsheet. Are pop songs are longer or shorter now than in previous decades? A quick query reveals this chart of average playtimes per year.

Pop songs became shorter in the early 1960s, around the 2:30 mark, before rising yearly until peaking in 1992 at 4:16. Since then, pop songs have hovered around 4 minutes long.

The longest charting song of all time is Harry Chapin's live version of "A Better Place to Be," at an epic 9 minutes and 30 seconds. Runners-up include Guns n' Roses' "November Rain" (8:56), Don McLean's "American Pie" (8:36), and a new entrant, Death Cab for Cutie's "I Will Possess Your Heart" (8:35).

And the shortest? The Womenfolk's cover of Malvina Reynolds' "Little Boxes" from 1964 is only 1 minute and 3 seconds long. The shortest modern song to chart is Zac Efron's "What I've Been Looking For," the third-shortest charting song of all-time at a brief 1:19.

How about the length of the perfect pop song? For this, we can look at the mode to find the most common song lengths by decade. For example, in the 1940s, there were 42 songs that were exactly 3:01, making it the perfect song length for that decade.

1950s, 2:30 (95 songs)
1960s, 2:30 (250 songs)
1970s, 3:30 (153 songs)
1980s, 3:59 (142 songs)
1990s, 4:00 (132 songs)
2000s, 3:50 (58 songs)

I was surprised at how exact these numbers are. The capacity for 45 RPM records was about three minutes, setting the standard for pop singles well into the 1960s. By the late 1960s, those constraints were removed, and we start to see longer singles. But without artificial constraints, why did exactly four minutes become the de facto standard in the 1980s and 1990s? (Maybe Madonna knows.)

I'm tired. More analysis tomorrow, including a look at one-hit wonders and how quickly singles fall off the charts over time. Update: Here it is.

37 Comments (Add Yours)

May 15, 2008
5:10 AM  
G wrote:

This is a labour of love, and a very impressive piece of work.

I think there is a useful comparison to be made here between freely available information of this sort and non-DRM-protected (or cracked) books, games, and audio files.

In decades and centuries to come, databases such as this one will become an invaluable resource for historians, whereas their proprietary equivalents will languish in Billboard's (or any other Billboard's) archives, if such companies even continue to exist.

Similarly, protected media is certainly advantageous from a short-term profit-making point of view, but what cultural relevance will it have for future generations once licence servers have been shut down and its contents have been rendered inaccessible by encryption? How can anything so ephemeral pass into the canon of human achievement?

Another point that crossed my mind yesterday was that as storage space increases and the file-size of audio remains relatively constant, we may well get to the stage where perfect copies of record companies' entire catalogues (of verifiable quality and authenticity thanks to lists such as these) are traded as easily as text files. (It is true that private torrent sites are getting us at least some of the way there). To me, that is both an exciting and daunting thought.


May 15, 2008
5:52 AM  
Thomi wrote:

Very interesting..

It'd be interesting to see the maximum length of time a track has been in the charts per decade.. I.e. - do we tire of our pop songs more quickly now than we did in the past?

Also, what was the youtube link to? It's not available for those of us in the UK.


May 15, 2008
7:09 AM  
Fake Rake wrote:
Also, what was the youtube link to? It's not available for those of us in the UK.

It's a Madonna song called "4 Minutes," which, curiously enough, has very little of Madonna singing and much more from two random guys.


May 15, 2008
7:17 AM  
John Lampard wrote:

I wonder why songs became as short as they did in the late 1950s, when they had been 30 seconds longer ten years prior. Because it was fashionable, and what other recording artists were doing at the time?

It's also interesting to see songs started becoming longer from the start of the 1960's when the likes of the Beatles and the Rolling Stones were coming onto the scene. Coincidence?


May 15, 2008
7:53 AM  
Nelson Minar wrote:

Great find, Andy, thank you for posting about it. These guys are librarians, what they're doing is cataloging and archiving media. It's a valuable service. I'm sure you're correct about the copyright concerns of the project, but it's a shame that the discussion has to be framed that way.


May 15, 2008
7:54 AM  
Clare wrote:

the underlying data may not be protected by copyright, so let's not pull the "this is illegal" trigger so quickly.

here's an interesting discussion of databases and copyright:
http://www.bitlaw.com/copyright/database.html


May 15, 2008
8:10 AM  
Corey wrote:

I can't help but think that this effort would be better spent at Musicbrainz.org - Excel spreadsheets don't have APIs


May 15, 2008
10:41 AM  
a birdy wrote:

hey what happens when you click on my name


May 15, 2008
10:59 AM  
Andy Baio wrote:

Clare: Yeah, I've read that thoroughly. The spreadsheet is a compilation, which remains copyrighted. Also, it's more than just a database of facts. Billboard is tracking the retail/online sales and radio play themselves (i.e. Neilsen Soundscan) and using their own method to determine rankings, so it meets the "industrious collection" requirement, as well.

a birdy: I can't condone it, but I can verify that the link you posted is the same version of the spreadsheet I used for my analysis.


May 15, 2008
11:30 AM  
otis wrote:

a birdy : thank you - you are like a little birdy visiting my window and singing a sweet song.

andy : fantastic - have never heard of this list - the song length data is fascinating.

...and to the compilers of this list, I tip my hat to you all.


May 15, 2008
1:48 PM  
pfig wrote:

this is amazing. thank you, and thanks to the newsgroup participants.


May 15, 2008
7:14 PM  
Phill wrote:

That is so awesome!

I wonder how large the mp3 collection of the spreadsheet gets, tho, come to think of it, they prolly trade it in FLAC.


May 16, 2008
7:21 PM  
Robert Hutchinson wrote:

John Lampard: I wonder why songs became as short as they did in the late 1950s, when they had been 30 seconds longer ten years prior. Because it was fashionable, and what other recording artists were doing at the time?

My uneducated guess: maximizing consumers' enjoyment of their newfangled rocking and rolling by speeding up tempos. Faster = more fun to dance to, at least to a point.


May 17, 2008
7:45 AM  
Bullfrog wrote:

I did receive an email from Andy in April. I have been so busy that to be honest, I totally forgot about it. You will be getting a response shortly. As for the copyright, you are probably right. However, if you subscribe to Billboard (I have for over 30 years now), you will find that any old data is no longer available. Questions posted to Fred Bronson from Chart Beat shows that old data is no longer available. Even some of the charts you can get online at Billboard do not match what was in the actual magazine (due to updated or corrected data I think). Some of this data is not available anywhere else, like which charted singles were mono and which were stereo in the late 60s, early 70s (the main project I am now on).

As for whitburn, don't even get me started there. To me he has never been anything more than another collector, that happened to put out a book. The spreadsheet only uses his numbers because they are widely known. His numbers were used at the very beginning of the project and has grown from there. Believe me, if I could I would change them. No one else in the industry ranks records the same as he does. Also, the times he uses in his annuals are not always correct. He puts the time that is shown on the label. We are slowly verifying every 45 and putting actual play time. Pat Downey does this, but his data is incomplete and only covers top 40.

As Andy has shown, it is easier to extract data (the time differences is interesting) from one location, then trying to gather it all up from many sources.

For those interested, the spreadsheet is updated weekly and posted on Usenet every 2 to 3 weeks


May 17, 2008
3:22 PM  
Jorge Bus wrote:

Which newsgroup?

---

For those interested, the spreadsheet is updated weekly and posted on Usenet every 2 to 3 weeks


May 17, 2008
5:15 PM  
Andy Baio wrote:

I found it in alt.binaries.sounds.whitburn.pop, but it's posted in the rest of the alt.binaries.sounds.whitburn.* groups. If your Usenet provider doesn't have those, it's also in alt.binaries.sounds.1950s.mp3, alt.binaries.sounds.1960s.mp3, alt.binaries.sounds.1970s.mp3, alt.binaries.sounds.1980s.mp3, and alt.binaries.sounds.1990s.mp3.


May 19, 2008
8:54 AM  
Harvey Wharfield wrote:

Andy,

Is there any master lookup for song titles ? A separate
website not related to the Whitburn behemoth that's
easy for a Mac user to use ?

Thanks,

Harvey


May 19, 2008
8:58 AM  
Andy Baio wrote:

I'm on a Mac, and the Excel spreadsheet works fine for me. I'd love to provide a lookup service, but that'd likely be a violation of Billboard's copyright.


May 19, 2008
11:15 AM  
Harvey Wharfield wrote:

Thanks, Andy...


May 20, 2008
9:43 AM  
punxking wrote:

Andy, this is great stuff, thanks for bringing this to my attention and for your interesting and thoughtful analysis.


May 27, 2008
2:22 PM  
Primus Luta wrote:

Thanks Andy and anyone else involved with this.


May 29, 2008
10:04 PM  
Alan Duffy wrote:

Wonder if anybody in this group may have some info on Bobby Lewis' " Tossin' and Turnin' ".
On one version, the line... " Baby, Baby, You did something to me " opens the song.
On the other ( Australian release ? ) that line is absent.
Both sound like the same 1961 version.
Would appreciate any insight in to the history.
Thanks.


Jun 1, 2008
12:06 AM  
Freddy wrote:

"How about the length of the perfect pop song?...

But without artificial constraints, why did exactly four minutes become the de facto standard in the 1980s and 1990s?"

There is no such thing as the perfect pop song length per se, you can keep repeating a song you like endlessly without being bored for long time... everyone does it or has done it at least once for sure, if there's a hidden magic number, at least is not musically related.

Here is the big reason for the song length patterns:

Another artificial constraint not spoken of but used I bet, since the beginning:

Advertising space

Radio and TV stations don't survive from thin air so they need to include advertising, probably something along this lines happened:

"How much time people can tolerate ads before changing to another station? " they went and worked out some statistics I guess, and came up with 25% of air time to advertising, somebody else can know better than me about how they came with this number.

Now, "that's 15 minutes per hour, what do we do with the other 45 minutes?"

Distract people from the fact that this is merely business by playing some songs

"how many songs do we need to play?, it does not matter as long as they are entertained"...

... wait a minute, that's not true, if you play one or two 8 minute songs one after another, are people going to wait till they are bored or will change station too?

"we need to play only short songs, we have a deal!"

Or not?

"Hey, somehow we can't structure correctly our programs to one hour blocks, why?"

Easy one, how many songs have the same length in minutes? barely none, is not the nature of any form of art to be constrained in that way, period.

"Pfffffff... hey you record company, fix this!".

And "Radio edit" term was born, they fixed things by shortening songs, taking out some bars from the song... that solo nobody understands, the boring intro, the over extended chorus, the cinematic grand finale, etc. whatever the song does "not need".

And this reached the musicians: "either you fix the song or they will not play it, that's it."

I got this speech from a producer not too many years ago, it's true.

End result: musicians have to think of a song in terms of radio playability too, is not just a matter of art anymore.

At this point somebody probably is asking why the length changes then if all was already worked out in favor of radio stations?

Have a look at the table again:

1950s, 2:30 (95 songs)
1960s, 2:30 (250 songs)
Up until now the constrain was the size of the 45" vinil records, it's true.

But it's also true that modern music as we know it was just in dippers and starting to evolve, new music never heard was being created everywhere in the world and new genres too so there was no need for lengthy songs, wherever you looked at, there was a couple of good songs to add to the radio station rotation.

1970s, 3:30 (153 songs)
1980s, 3:59 (142 songs)
1990s, 4:00 (132 songs)
2000s, 3:50 (58 songs)

And then song lengths increase in a steady way, but still you can see the time length controlled pattern, no matter the decade.

The answer to this behavior is simple as well:

As the genres grew old, there was a lot less new and good song choices to pick up so air time has to be filled with lengthy songs more and more each decade.

The decrease in talent or creativity is another big subject to discuss about, and radio has a lot of guilt on this too.

I know is kind of hard to take this facts in consideration but certainly they give another perspective to the project, and can modify some of the results.

Nevertheless, great project! ;)


Jun 2, 2008
10:40 AM  
Ardee wrote:

Great stuff! The dream (admittedly a silly one that will not really bring happiness, but that's OCD for you) would be to have instant access to lossless copies of every one of these songs (plus all variations and remixes of them) at any location at any time. Then add instant access to any music videos or live performances, plus the same thing for every album that ever charted ... and every song on EVERY other chart ... maybe of every country. I wonder how much hard drive space THAT would take up?


Jun 2, 2008
11:26 AM  
Andy Baio wrote:

It's really not that crazy. If you ballpark an average 40MB per song, getting lossless versions of all 37,000 songs would be 1.4 terabytes of data. That sounds like a lot now, but you can buy a terabyte hard drive for under $200 now. That'll only get cheaper.

And if the project moved from Usenet to BitTorrent, a single torrent (or set of torrents) could manage all the files and let people fill/replace their missing pieces.


Jun 2, 2008
11:37 AM  
Ardee wrote:

Oh yeah, and lyrics too, of course. And karaoke versions and everything else you could think of, all instantly available from anywhere.


Jun 2, 2008
11:47 AM  
Ardee wrote:

Sorry, Andy, my comment at 11:37am was made without having seen your comment at 11:26am. However, your comment does not acknowledge the extras in my 10:40am comment: INSTANT access to all of these songs PLUS: every song on every Top 100 album ever, every music video, every song on all the other Billboard charts (R&B, country, many more), every song on every other nation's major charts, every other version/variation of each song (there are often many!), etc. [Plus karaoke versions, lyrics, parodies, what have you.] Petabytes!


Jun 2, 2008
12:00 PM  
Andy Baio wrote:

Give it time.


Jun 2, 2008
1:10 PM  
Ardee wrote:

How many years do you think? 10? 20? Until all this media should be available instantly by portable device (cellphone or whatever technology comes next). And, of course, you'd want every episode of every TV show ever, plus every movie ever, etc. It would be cool to live to enjoy that, but of course it won't mean humans are one iota happier.

Anyway, thanks a bundle for the info! I'd first encountered this spreadsheet in late '05 but sort of forgot about it until now. I think I'll normalize the data into a set of database tables and play around with it (create queries and the like). I know it's not OK to go public with it, but I hope it's OK for personal use.


Jun 11, 2008
12:24 PM  
Michael Leonard wrote:

Hi Andy,

I've just namechecked your tireless work at my website musicradar.com - on the back of a recent UK survey that finds 44% of 18-24 year olds skip a song after 30 seconds. That's iPod culture gone mad. In 5 years time, even the best artists may be just writing jingles just to get heard...

Great project, it's fascinating stuff.


Jun 14, 2008
10:34 AM  
Paul wrote:

Incredible data. I have whitburns books for the 60's 70's and 80's, and do not regret having purchased them.

Putting the data in a spread sheet makes it more researchable and easier to use. Needless to say, pulling out the books when I need a quick check of a fact, or what songs were charting when tends to be a pain.

Thanks for the information!


Jun 22, 2008
12:50 PM  
Steve wrote:

I just found this discussion. What a great piece of work, reliable chart information from before 1940 is really hard to find.

My site combines music charts from all over the world to do a similar thing for the world's music. I think that you guys would find it interesting.


Jul 6, 2008
11:38 AM  
O.T. wrote:

Is there anywhere i can find correct times of all 45rpm records from billboards hot 100 history?


Jul 7, 2008
6:58 AM  
O.T. wrote:

I was wondering if anyone is working on doing this complete billboard collection of all orginal 45's rather than a mix of 45's and digital cd? Thanks


Jul 10, 2008
8:13 AM  
Jim wrote:

Can someone please tell what the A-S-X tab on the spreadsheet represents? I can't seem to find an explanation anywhere. thanks!


Jul 14, 2008
1:44 PM  
sky wrote:

Jim wrote:

Can someone please tell what the A-S-X tab on the spreadsheet represents? I can't seem to find an explanation anywhere. thanks!

A = Air Play
S = Sales
X = Christmas


Jul 14, 2008
1:57 PM  
Anonymous wrote:


O.T. wrote:

Is there anywhere i can find correct times of all 45rpm records from billboards hot 100 history?

as far as i know there is no complete offical list. Since you can't believe the labels themselves or Whitburns books, I guess the only way is to listen and time them yourself.


O.T. wrote:

I was wondering if anyone is working on doing this complete billboard collection of all orginal 45's rather than a mix of 45's and digital cd? Thanks

Easy answer is yes, it is being tried. Hard part is trying to find all the original 45s, 78s, single cassettes, and single CDs. If you want the collection to be authentic then there must be a way of checking each entry. anyone can say this is an original 45 rip, try proving it.


 

Leave a comment





Waxy Links
Ads via The Deck
July 24, 2008
Amir's Super Mega Burn — anonymity can turn anyone, even superfans, into superjerks in no time flat (via)
The Balcony Is Closed — Roger Ebert on Gene Siskel and the end of "At the Movies"
The Onion's Random Roles with Teri Garr — like Random Rules, this format teases out insights and anecdotes from interview subjects
July 23, 2008
Bush: 'Our Long National Nightmare Of Peace And Prosperity Is Finally Over' — remarkably prescient article from January 2001
Jeffrey McManus runs the numbers on Dr. Horrible's Sing-Along Blog — Joss Whedon himself confirmed the estimates were close (via)
QA Deathmatch — bug reporting as a multiplayer game (via)
Fox News affiliate tries product placement with fake McDonald's iced coffees — even if it's a morning show, this is a huge credibility hit and creates new conflicts of interest (via)
GameBridge, Jabber/XMPP bot for Z-Machine, MUSH, and other text games — nice list of text adventures on the Jabber bot
I-Fluid, PC game pilots a water droplet through a kitchen — with a nice be-bop soundtrack
July 22, 2008
43 Folders on iPhone security — is the time saved for convenience worth the potential hassles of identity theft?
Baby's First Internet — "It doesn't matter what you say, just publish it twelve times per day." (via)
July 21, 2008
Something Awful tries the 5-minute microwave chocolate cake recipe — don't miss the handy microwaved huevos rancheros recipe (via)
July 20, 2008
Multiplayer Minesweeper — brilliant collaborative game, but only takes one jackass to ruin everyone's fun (via)
Cliche Watch: Pushing over the Leaning Tower of Pisa — many more in Pisa Pushers on Flickr
July 18, 2008
The Quirkbook — Rands polls Twitter for everyone's odd quirks and mildly OCD mannerisms
Jane McGonigal on Werewolf at Foo Camp 2008 — ideal strategies, a sneaky all-villager variation, and the impact of the werewolf metaphor
Google interviews the creators of WarGames — great trivia about the making of the film and its impact on tech culture
July 17, 2008
Logan Aube's Hockey Night theme — Something Awful goons tweak an online contest with funny results (via)
July 16, 2008
Sean Tevis is running for Kansas State Representative, XKCD-style — help a computer geek defeat the incumbent, a hard-right, anti-privacy Creationist; he's trying to get 3,000 to donate $9 each
How to Fake Being a Wine Snob — there might be supertasters out there, but most people are just faking it
The Economist responds to Freakonomics co-author's pasty/pastry mixup — tasty response to this original post (via)
Mike Arrington interviews Evan Williams at Foo Camp — great interview; thoughtful questions and brimming with information, without the sensationalism
Rick Trooper — The Empire rolls you.
Mocha VNC Lite, free VNC client for the iPhone — link opens in iTunes; like others, I'm hoping an SSH client is next
Annalee Newitz on Dr. Horrible's Sing Along Blog — exceeds the hype; the site's been down all day, so I just bought the season in iTunes for $3.99
July 15, 2008
The Sound of Young America Live interviews Ze Frank — strange interview, but talks about the end of The Show and current projects; see also: Jay Smooth from Ill Doctrine (via)
Defender of the favicon — staggering hack puts a playable Defender clone in your browser's 16x16 favicon; Firefox and Opera only
Twitter officially acquires Summize — search.twitter.com is now live
July 14, 2008
Deep Note, the Guitar Hero bot — it got 820k points and 98% playing Through the Fire and Flames; amazingly, some humans can still beat it, for now (via)
Unofficial RSS feed of newly-added App Store applications — until Apple adds their own, I've been keeping tabs using this

Andy Baio lives here. Some rights reserved, for your pleasure.