SxSW 2010 — Lecture Notes
- March 14, 2010
Here are notes from Infochimps on interesting talks at SxSW!
- 2009-03-14 11:00 Why You Aren’t Done Yet – DavidĀ Heinemeier Hansson
- 2009-03-14 9:30 Scaling Beyond LAMP — Twitter, Imgur, Facebook, Reddit
See also:
2009-03-14 11:00 Why You Aren’t Done Yet – David Heinemeier Hansson
- Meetings are TOXIC
* Turn work day into work moments
#1 enemy of productivity is interruption
ASAP is POISON
- everything is ASAP
- they’re trying to get their thing done by Friday
not the only 4-letter word
- NEED
* Otherwise, what? is it really? - CAN’T
- EASY
- FAST
How to combat it
- /SAY NO/: 40 hours is more than anyone needs in a creative industry
- /GOOD ENOUGH IS FINE/. Maybe I’ll come back to finish it sometime
- /YOU CAN ALWAYS DO LESS/ Most things can be passed.
* At 37signals started imposing deadlines. every 2 weeks has to get done
* list of ship has to get done by some deadline: NO. Broken
* if the list is changeable — if not everything you need to get done has to get done, you can - whatever we can fit in the box, gets done.
- everything else gets dropped
- you will always meet your deadline
- … even still, we often can’t follow through
- – as soon as ‘need’ ‘fast’ comes up, off plan
Restating the problem
- only one way to get /exactly/ what you want
- thousands of ways to get approx. what you want
- restating is most effective way to clear out list
Give up
- walked away from a fair # of projects at 37 signals
Questions
- All meetings are optional
- work from home
- How deal with slackers
* ask people to live up to expectations
* everyone gets a credit card. Policy is ‘Spend it wisely’
* not set in concrete — if people abused, then we stop. Try it for 2 weeks
* fire people quickly ‘it’s not Italy, everything in US is at-will’
- What are criteria for saying Yes
* A: “I care about it”. Don’t care if there’s a big opportunity, etc. Only if we need it ourselves.
* Personal annoyance is my key criteria
- Subscription models: how to figure out pricing, discounts
* no idea.
* there’s a lot of science, we’ve read none of it.
* “Would I pay for this?”
* Don’t make sure you’re profitable. Make sure you’re PREPOSTEROUSLY possible
* rule of thumb: from 1 tier to next, pay 2x as much but get 3x as much access
* But: don’t want to be changing prices every day. Shift from $100 to $150: no biggie. Shift from $5 to $7: HELL NO YOU SUCK BLAH BLAH BLAH
2009-03-14 9:30 Scaling Beyond LAMP — Twitter, Imgur, Facebook, Reddit
Christopher Slowe (Reddit)
- Any query that doesn’t hit an index won’t finish
- Moving to cassandra
Kevin Weil (Twitter)
- started with a monolithic rails application, moved to an SOA
- can upgrade different parts at different times
- let you work in parallel
- Moving all the tweet storage over to cassandra
- 50M/tweets/day 70k external apps
- Also memcached to the hilt
Serkan Piantino (Facebook)
- 10′s 1000s of servers
- using MySQL as a K-V store with GIGANTIC 30 or 40 TB of memcached
- Transcoded all the PHP to c++
- Serkan built news feed
- key difference to twitter: only friends with 0-5000 people.
- 100,000x a second
- “megamodel”
=> *All of the big groups recontribute code back to memcached*
Alan Schaaf (Imgur)
- 5-character hash for photos never hits DB, it just uses mod-rewrite with a simple regex
- was using Apache at first — but too heavy
- moved to nginx, with apache for php
- HAproxy arbitrates: if image, nginx else Apache
Q: what is reddit using for indexes?
- roll our own using memcacheDB
Why is Cassandra winning out over say Mongodb
- Twitter is very write heavy
- we put that tweet in the mailbox of all the followers
- cassandra’s write-ahead log means no disk
Monitoring
- Facebook: Ganglia Monitor, monitor, monitor
- monitor everything, graph it where you can see it, set alerts for out of band
- when you go thru a transitional phase you’ll see it and have to switch
- digg gets 2-3x more traffic if on front page than on say reddit
- crashed MySQL — moved to memcached
- go with memcached early on
- Twitter: does write-thru cache; assumes there will be 1x reads so doesn’t have to bounce in from DB
- Scala based distributed social graph to be open sourced by twitter in the next couple months
- Reddit: also ganglia
- “Once ganglia comes up will have to be rebooted a prime number of times” huge laugh
- Postgres config’ed for a 486 ;-) Read the docs
Analytics
- Facebook: hadoop and hive
- “What’s the most status’ed word in Finland”
- Twitter: hadoop
Search
- Unanimity: “Search is hard”
- Our worst problems have been DB
- Visible pain on @Keyser’s face talking about migration to
- proper way to do a migration shut off all indexes and rebuild
- unanimous: nginx over lightthp / apache
- an aha moment w/ news feed: 1st version was only a few things, pushed to the user’s dB
- built a query-based architecture that would do this on the fly – worked well until didn’t work at all
- sometimes have all leafs on one rack and all aggregators on another: NOT GOOD saturated the uplink
- built rack-awareness
- defined whole approach — don’t do new servers, do new racks.
Deployment
- twitter: some capistrano, but now ‘murder’ (bittorrent based deploy)
- deploy time from 12 mins to 37s
- use cap for the low level stuff murder for big deploys
- FB: once a week, “The Push” — all hands on, uses bittorrent; special port rules
DB strategies
- Reddit: adding indexes becomes almost impossible
- build logic into app code
- rearchitect so that ‘all the SQL cruft just falls off’
- all that stuff you learned in DB class just throw that out the window
- FB: can’t change schemas — still have tables with what bracket you were in for NCAA 2007 etc
- Twitter is already eventually consistent
- a biz req we’re already fine with
- Reddit: two levels of cache — data cache and render cache
- we don’t write out a link, write out the fields that don’t change
- author name doesn’t change, #points does
why not oracle?
- twitter: Because it’s not open-source. If you can’t peek under the hood, don’t want it.
- Reddit: expensive
Pingback: NoSQL at SXSWi 2010 · Big Data Workshop
Pingback: uberVU - social comments