- January 17, 2013
I’ve been reading Flip’s book, Big Data for Chimps: A Guide to Massive Scale Data Processing, available for pre-order now from O’Reilly. While I’m no data engineer, I am able to follow along. After reading a bit, it comes as no surprise that Flip helped to found Infochimps with the philosophy of making the world’s knowledge accessible to anyone. The content is unexpected and engaging. Take, for example, the story of Chimpanzee and Elephant Start a Business, from The Stream Chapter:
Chimpanzee and Elephant Start a Business
As you know, chimpanzees love nothing more than sitting at typewriters processing and generating text. Elephants have a prodigious ability to store and recall information, and will carry huge amounts of cargo with great determination. The chimpanzees and the elephants realized there was a real business opportunity from combining their strengths, and so they formed the Chimpanzee and Elephant Data Shipping Corporation. They were soon hired by a publishing firm to translate the works of Shakespeare into every language. In the system they set up, each chimpanzee sits at a typewriter doing exactly one thing well: read a set of passages, and type out the corresponding text in a new language. Each elephant has a pile of books, which she breaks up into “blocks” (a consecutive bundle of pages, tied up with string).
Read the full chapter (available here: The Stream Chapter) to understand how this example, combined with pig latin, simple streamers, and running Hadoop jobs have to do with each other. You’ll also get two exercises and a Ruby helper section containing tips and tricks.
Amanda McGuckin Hager is a high-tech marketing professional with over 17 years of experience focused on driving demand through strategic marketing programs and is the Director of Marketing at Infochimps. Follow Amanda on Twitter.