SxSW 2010 — Lecture Notes

Here are notes from Infochimps on interesting talks at SxSW!

See also:

2009-03-14 11:00 Why You Aren’t Done Yet – David Heinemeier Hansson

  • Meetings are TOXIC
    * Turn work day into work moments

#1 enemy of productivity is interruption


  • everything is ASAP
  • they’re trying to get their thing done by Friday

not the only 4-letter word

  • NEED
    * Otherwise, what? is it really?
  • CAN’T
  • EASY
  • FAST

How to combat it

  • /SAY NO/: 40 hours is more than anyone needs in a creative industry
  • /GOOD ENOUGH IS FINE/. Maybe I’ll come back to finish it sometime
  • /YOU CAN ALWAYS DO LESS/ Most things can be passed.
    * At 37signals started imposing deadlines. every 2 weeks has to get done
    * list of ship has to get done by some deadline: NO. Broken
    * if the list is changeable — if not everything you need to get done has to get done, you can
  • whatever we can fit in the box, gets done.
  • everything else gets dropped
  • you will always meet your deadline
  • … even still, we often can’t follow through
  • — as soon as ‘need’ ‘fast’ comes up, off plan

Restating the problem

  • only one way to get /exactly/ what you want
  • thousands of ways to get approx. what you want
  • restating is most effective way to clear out list

Give up

  • walked away from a fair # of projects at 37 signals


  • All meetings are optional
  • work from home
  • How deal with slackers
    * ask people to live up to expectations
    * everyone gets a credit card. Policy is ‘Spend it wisely’
    * not set in concrete — if people abused, then we stop. Try it for 2 weeks
    * fire people quickly ‘it’s not Italy, everything in US is at-will’
  • What are criteria for saying Yes
    * A: “I care about it”. Don’t care if there’s a big opportunity, etc. Only if we need it ourselves.
    * Personal annoyance is my key criteria
  • Subscription models: how to figure out pricing, discounts
    * no idea.
    * there’s a lot of science, we’ve read none of it.
    * “Would I pay for this?”
    * Don’t make sure you’re profitable. Make sure you’re PREPOSTEROUSLY possible
    * rule of thumb: from 1 tier to next, pay 2x as much but get 3x as much access
    * But: don’t want to be changing prices every day. Shift from $100 to $150: no biggie. Shift from $5 to $7: HELL NO YOU SUCK BLAH BLAH BLAH

2009-03-14 9:30 Scaling Beyond LAMP — Twitter, Imgur, Facebook, Reddit

Christopher Slowe (Reddit)

  • Any query that doesn’t hit an index won’t finish
  • Moving to cassandra

Kevin Weil (Twitter)

  • started with a monolithic rails application, moved to an SOA
  • can upgrade different parts at different times
  • let you work in parallel
  • Moving all the tweet storage over to cassandra
  • 50M/tweets/day 70k external apps
  • Also memcached to the hilt

Serkan Piantino (Facebook)

  • 10’s 1000s of servers
  • using MySQL as a K-V store with GIGANTIC 30 or 40 TB of memcached
  • Transcoded all the PHP to c++
  • Serkan built news feed
  • key difference to twitter: only friends with 0-5000 people.
  • 100,000x a second
  • “megamodel”

=> *All of the big groups recontribute code back to memcached*

Alan Schaaf (Imgur)

  • 5-character hash for photos never hits DB, it just uses mod-rewrite with a simple regex
  • was using Apache at first — but too heavy
  • moved to nginx, with apache for php
  • HAproxy arbitrates: if image, nginx else Apache

Q: what is reddit using for indexes?

  • roll our own using memcacheDB

Why is Cassandra winning out over say Mongodb

  • Twitter is very write heavy
  • we put that tweet in the mailbox of all the followers
  • cassandra’s write-ahead log means no disk


  • Facebook: Ganglia Monitor, monitor, monitor
  • monitor everything, graph it where you can see it, set alerts for out of band
  • when you go thru a transitional phase you’ll see it and have to switch
  • digg gets 2-3x more traffic if on front page than on say reddit
  • crashed MySQL — moved to memcached
  • go with memcached early on
  • Twitter: does write-thru cache; assumes there will be 1x reads so doesn’t have to bounce in from DB
  • Scala based distributed social graph to be open sourced by twitter in the next couple months
  • Reddit: also ganglia
  • “Once ganglia comes up will have to be rebooted a prime number of times” huge laugh
  • Postgres config’ed for a 486 ;-) Read the docs


  • Facebook: hadoop and hive
  • “What’s the most status’ed word in Finland”
  • Twitter: hadoop


  • Unanimity: “Search is hard”
  • Our worst problems have been DB
  • Visible pain on @Keyser’s face talking about migration to
  • proper way to do a migration shut off all indexes and rebuild
  • unanimous: nginx over lightthp / apache
  • an aha moment w/ news feed: 1st version was only a few things, pushed to the user’s dB
  • built a query-based architecture that would do this on the fly – worked well until didn’t work at all
  • sometimes have all leafs on one rack and all aggregators on another: NOT GOOD saturated the uplink
  • built rack-awareness
  • defined whole approach — don’t do new servers, do new racks.


  • twitter: some capistrano, but now ‘murder’ (bittorrent based deploy)
  • deploy time from 12 mins to 37s
  • use cap for the low level stuff murder for big deploys
  • FB: once a week, “The Push” — all hands on, uses bittorrent; special port rules

DB strategies

  • Reddit: adding indexes becomes almost impossible
  • build logic into app code
  • rearchitect so that ‘all the SQL cruft just falls off’
  • all that stuff you learned in DB class just throw that out the window
  • FB: can’t change schemas — still have tables with what bracket you were in for NCAA 2007 etc
  • Twitter is already eventually consistent
  • a biz req we’re already fine with
  • Reddit: two levels of cache — data cache and render cache
  • we don’t write out a link, write out the fields that don’t change
  • author name doesn’t change, #points does

why not oracle?

  • twitter: Because it’s not open-source. If you can’t peek under the hood, don’t want it.
  • Reddit: expensive