- January 22, 2009
(If you’re not familiar with a word cloud: the larger a word, the more often it was used. The colors & positions don’t mean anything, they’re just for fun. We stripped out the little words (a, the, with, …), leaving everything that appeared more than 10,000 times in the 50 million+ tweets we examined.)
Then I looked again at the filtered list and noticed something… just awesome.
Here are the forty most-commonly used words, in their exact order of decreasing frequency:
It’s time, Twitter. Love/Christmas blog:
Home! Thanks, people…
Tomorrow: looking news, trying nice? Check.
Live free. Life. Awesome days!
Feel house ready.
I like your poem, Twitter.
Someone is sure to notice it, so a pedantic but serendipitous detail: my stopword list didn’t filter the “it’s” and “that’s” contractions, so they made it into the poem. Since I like the poem with them, they stay. Wordle, the program I used for the word cloud did its own stopword filtering, so those aren’t in the word cloud image.
Also: many have inquired about a timescale for release of the friend graph data. Twitter has said to us they will allow bulk data release, but they want to formulate terms of service for its use. Well, they can’t just say “Here’s data. Do good things with it. And don’t be an asshat.” — they have to work up some document that says it pretty. So this will move at lawyer speed, not internet speed, and the moment we know something you will hear it too.
To cheer you up, we’re about to shovel some really huge, interesting datasets into the Amazon Public Data Sets collection, so watch this space.