Waxy.org
Waxy.org is the sandbox of Andy Baio, a journalist/programmer living in Portland, Oregon. I'm the CTO of Kickstarter, created Upcoming.org, and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM

Tracking Twitter's Message Growth

Posted Mar 15, 2007

By now, everyone knows that Twitter exploded at SXSW and everybody's seen the Alexa charts. But this is mostly a mobile app, so pageview traffic is only part of the story. How fast is Twitter really growing?

I decided to find out by using Twitter's founder Evan Williams himself, albeit indirectly. Since Ev's Twitter history goes from message #28 in March 2006 to #8,281,991 about three hours ago, it's a convenient snapshot of Twitter's growth since it began. Update: The data from November 2006 to present is faulty, since they apparently switched to non-sequential IDs. More information below.

I threw it all in Excel and charted the sequential IDs and dates for each of Ev's 1,226 messages. The moment SXSW started, Twitter's growth curve changed radically and hasn't slowed down. (The huge orange bar for March is only half the month.) But more interesting to me are two other dates: November 23, when Twitter's growth rate sped up drastically and then on February 5, when the rate seriously slowed. Are there problems with my data? If so, I can't find it. If you have any sense of what triggered those changes, please comment and let me know.

To help with your search, and any other visualization, I've posted the full Excel spreadsheet with inline charts and tables. Enjoy!

Note! The last three are cumulative charts, not month-to-month growth numbers. That means there were not 8 million messages sent this month, but 8 million total since Twitter started.

Twitter Growth

Update: Jason and I just discovered that the IDs since November 2006 have not been sequential, rendering these charts useless. The jumps in activity were largely artificial. Jason has more information.

27 Comments (Add Yours)

Mar 15, 2007
4:49 PM  
/pd wrote:

nice.. the long tail in motion :)-

thanks for sharing !!


Mar 15, 2007
5:14 PM  
Myles wrote:

The slow down is probably due to Twitter hitting the performance wall. I know I had a lot of trouble posting to and receiving messages from Twitter during that period -- so I lowered my usage of it until performance improved.


Mar 15, 2007
5:17 PM  
Andy Baio wrote:

Hmm, it looks like November 23 was Thanksgiving. (Though I don't understand why that would lead to a change in the growth rate.)


Mar 15, 2007
5:54 PM  
Michael Specht wrote:

Maybe late November was when Twitter first starter to appear outside of the really early adopters, for example Scoble signed up 20 Nov?


Mar 15, 2007
6:15 PM  
Jeb wrote:

I second the performance wall hypothesis. That's when I started implementing my "connect twitter to arbitrary desktop apps" project (mostly just to practice Proto/VBA stuff, I don't know why anyone would want every iTunes track to be a twitter message) Twitter was reaaaaaallly slow. I'd played with it a long time before then and it was snappy. I know in that couple weeks when I was playing with it, I personally would have sent several times more messages if it wasn't lagging so bad.


Mar 15, 2007
6:26 PM  
springnet wrote:

I'm up to 100 followers after just 3 or 4 days. Started using it on day 1 of sxsw.


Mar 15, 2007
6:28 PM  
mat wrote:

Very nice Andy. I love it when you run stats like these


Mar 15, 2007
7:06 PM  
/michael. wrote:

A Many Eyes treatment of the same data. Not as pretty. I sourced the data back to here and hope you don't mind.

http://services.alphaworks.ibm.com/manyeyes/view/SJjqGFsOtha61PEcjt5KF2-


Mar 15, 2007
7:46 PM  
Jeffrey W. Baker wrote:

Isn't this better thought of as a growth rate, i.e. first derivative of what's posted here?

http://tastic.brillig.org/~jwb/twitter.png


Mar 15, 2007
8:55 PM  
Alan wrote:

Not posting to be the internet jackass, but I think your data collection methodology is flawed, and this might explain the abnormalities you're seeing.

It's a mistake to assume that each sequential ID represents a single post in the system (unless you're privy to back-end details, in which case I'll shut up).

It's probably safe to assume that the number is an auto-generated sequential ID of some kind, but we don't know for sure by what amount this value is bumped up every time there's a post made to the system. An id of 8,281,991 doesn't mean 8,281,991 posts are in the system

So, here's one plausible scenario that's probably not "the truth", but it illustrates my point. Twitter starts as a small application with a single MySQL database server. The table that stores the messages has an auto increment primary key that gets bumped up by one for each row inserted.

At some point, one database server isn't enough. Twitter is an application that's going to be sensitive to slave lag, so the team decides to go with MySQL 5's multiple master feature which lets you have more than one master server.

One of the problems with multiple master servers is what happens if both servers receive an insert at exactly the same time that generates the same auto_increment primary key. The master/master synching voodoo pukes and corruption happens.

To get around this, the primary keys use a modulus. In a two master setup, one server generates odd keys, the other generates even keys. In a three master setup, each id is incremented by 3 with server 1 starting at 1, server 2 starting at 2 and server 3 starting at 3. Since the servers will always generate different primary keys, corruption is avoided.

With this setup, unless the traffic is evenly distributed among the database servers, the id is no longer an accurate count of how many posts are in the system at any one time.

Now, let's throw in another wrinkle. If you're smart, even if you only have a two or three master setup you use a higher modulus to make replicating new masters into the system a breeze. So, an application might only have two database servers, but their primary keys are being incremented by 5 each time, which will let the application scale up to five master servers without having to worry about re-jiggering the modulus each time you add a new server.

So, in this setup, even with even traffic distribution between the database servers, your "id as a representation of how many posts are in the system" is completely fucked. The November 23rd traffic spike could be when a system like this was bought online. The February 5 drop off could be a re-jiggering of a modulus that was, in retrospect, too high.

Other, less complicated, scenarios could involved a much high number being pushed into an auto_incrment field for some reason. Subsequent inserts would start at this new higher number.

I dig the graphs though (-:


Mar 15, 2007
9:18 PM  
Claus wrote:

Andy, the first surge is not on November 23. If you look closely, you'll see that use quadruled on November 21 and multipled by a factor of 7-8 the day after. November 21 was the day Twitter launched the "six word memoir" promotion.
Use remained flat for the rest of the year. The effect on March 10 is unmistakeable: 20 times the use of the day before.


Mar 15, 2007
9:41 PM  
tagami wrote:

I started on 11/18. I think the Thanksgiving phenom is because people are in motion during the holiday and are reaching out to their friends. That's what I did anyway...


Mar 15, 2007
10:53 PM  
Jeff wrote:

Hey, I joined 11/20. I'll take full credit.

Great graphs, thanks for posting them.


Mar 16, 2007
8:58 AM  
Kempton wrote:

Thanks for taking time to do the work to generate these graphs. And get the ball rolling to look inside Twitter a little bit.


Mar 16, 2007
10:02 AM  
Joshua schachter wrote:

Alan: They're almost certainly on a mysql backend with the default auto_increment stuff set. Which one should not do, for a variety of reasons:

http://joshua.schachter.org/2007/01/autoincrement.html

Joshua


Mar 16, 2007
12:13 PM  
rabble wrote:

First off, twitter does use autoincrement. They know they shouldn't, but they do. They also only have on backend database so far. So Andy's numbers look good from what i know of twitter's setup.

The question of why the jumps? Those dates correspond to when good IM support was added, API's where released and desktop apps where built, and when the signup process got streamlined. Before November you needed a mobile phone to join. Once that was dropped growth shot up.


Mar 16, 2007
2:13 PM  
Andy Baio wrote:

That sounds like a definitive answer to me. Thanks, Rabble.


Mar 16, 2007
2:33 PM  
Joshua Schachter wrote:

I assume they use something like PRIMARY KEY(msg_id) and KEY(user_id, twitter_dt).

under Innodb, if they had PRIMARY KEY(user_id, msg_id) and KEY(user_id, twitter_dt) it'd have to do far less disk seeks to do the join, because all of a user's messasges would be in contiguous pages.


Mar 17, 2007
4:07 PM  
Michal Migurski wrote:

Alan, if what you say is accurate, then there should be a sudden *and permanent* jump in the slope of the line, due to a more collision-proof use of the primary key space. Rabble the authority here, so I'm assuming that auto_increment is in use.

I'm especially interested in the sudden drop in growth over 2007, leading up to SXSW. Why is that? Will it fall back to that level once the Austin shine wears off?


Mar 17, 2007
6:53 PM  
Andy Baio wrote:

I've heard rumors that they were forced to block an entire country, which was abusing the SMS features and costing Twitter tons of money. That might explain the sudden dropoff.


Mar 18, 2007
7:11 PM  
Angulo wrote:

Thx a lot for the graphs. You do your research thoroughly.


Mar 19, 2007
9:24 AM  
Alan wrote:

Thanks for the twitter background info and the indexing lesson.


Apr 26, 2007
10:34 PM  
max wrote:

oh nice site dude,
i need some ideas like that!


Apr 29, 2007
1:09 AM  
Eli wrote:

It would be cool to see how this lines up with big stories about twitter, like John Edward's twitter account annoucement.


Jun 3, 2007
7:23 PM  
Tim wrote:

This is the first time I've heard of twitter. It's actually fairly addicting once you get in and start reading the posts.

Do you feel that the novelty will wear off after a few months (July / August)? Or, is this a mobile app that will help drive the development of more user generated content?

Thanks for the post...nice stats work. I'm curious to see how things go after the sxsw peak wears off. Long tail or not.


Jun 6, 2007
8:21 AM  
erik wrote:

"Human Giant using Twitter at the MTV Movie Awards"

Aziz also mentioned Twitter at the Sasquatch music festival where he hosted the main stage on the second day (sans the rest of the Human Giant people or MTV for that matter).


Jul 31, 2007
5:49 AM  
Create wrote:

Very informative data. Long tail at work indeed.


 

Leave a comment





Waxy Links
Ads via The Deck
November 20, 2009
Regretsy gets a book deal — the anonymous author turned out to be April Winchell, collector of audio oddities
Google Chrome OS Demo — a world without a local filesystem and apps; also, the Chrome UI concept video (via)
Patrick Moberg's Internet Vices — funny, Tumblr feels more like beer than wine to me
Charlotte Gainsbourg and Beck's "Heaven Can Wait" — Keith Schofield's surreal video and insane treatment were inspired by FFFFOUND and Reddit, but maybe too explicitly (via)
November 19, 2009
YouTube adds machine-translated automatic captions — starting with some partner channels, but auto-timing is available to everyone today
Microsoft tries to patent Edward Tufte's sparklines — they were recently added to Excel
Leonard Lin's Retweet Avatars for Greasemonkey — a subtle change, but a big improvement
Web-ops god John Allspaw leaves Flickr to join Etsy — he's the last of the original Ludicorp team to go (via)
November 18, 2009
Laptop Steering Wheel Desk — don't miss the product photos
Interview with Ralph Eggleston, Pixar's production designer on WALL-E — from last February, but new to me; I didn't know the Axiom had three passenger classes
NSFW: Animated pixel-art video for Flair's "Trucker's Delight" — warning: very offensive and sexist, but the attention to 16-bit detail by director Jérémie Perin is incredible
NY Observer on Anil Dash's new government 2.0 incubator project — Expert Labs debuted at Web 2.0 today, funded with a $500k grant from the MacArthur Foundation
November 17, 2009
Google's Dan Morrill explains how the Droid autofocus breaks every 24.5 days — this gets second-place for quirkiest Android bug (via)
Conan O'Brien and Andy Richter on Zach Galifianakis' Between Two Ferns — his style of comedy usually makes me uncomfortable, but this made me laugh
The Pirate Bay shuts down their tracker for good — they're switching to DHT instead
November 16, 2009
How Darren at Link Machine Go found Belle de Jour's identity five years ago — Brooke was part of the early UK blog scene
ICU64, real-time visualization of Commodore 64 memory — the developer also posted videos of Paradroid and Boulder Dash (via)
Russell Davies on pretending and "barely games" — his SAP prototype looks like great ambient fun (via)
NYT Magazine on the indie gaming movement — nothing new here, but good overview with a wonderful closing anecdote from Cactus
Tim O'Reilly on the pending War for the Web — "more than that, it's a war against the web as an interoperable platform"
November 14, 2009
Jason Scott rounds up Geocities' top 10 most popular MIDI files — along with a torrent with 51,000 MIDIs rescued by Archive Team
Matt Haughey on the discovery of his brain tumor, treatment, and the Internet's response — there were about 1,000 #mathowielove tweets in 24 hours
Belle de Jour reveals herself after six year of anonymity — only six people in the world knew, she only told her parents yesterday (via)
Paul F. Tompkins debates comedy ethics with Improv Everywhere's Charlie Todd — great discussion, and it's hard not to see where both are coming from (via)
November 13, 2009
Rogue Amoeba stops iPhone app development after App Store idiocy — I'm with Marco, the only fix is allowing external apps, but it's unlikely (via)
Numb3rs on IRC — "Luckily, I speak l33t."
Prank War 8: The Skydiving Prank — hard to say if life-threatening situations are funnier than public humiliation
301 Works, Internet Archive works to preserve URL shortener data — the shorteners will provide regular backups and hand over data on closure, though TinyURL's conspicuously missing
November 12, 2009
Quizipedia — simple game with trivia scraped from Wikipedia entries
Kill Screen, funding a new art magazine about videogames — sounds like the English analogue of Amusement I was hoping for

Andy Baio lives here. Some rights reserved, for your pleasure.