- January 6, 2011
At Infochimps, we believe increasing the number of people familiar with handling and making sense of big data is good for the web community as a whole. That’s why we are happy to contribute our expertise to Data Day Austin, an event put together by Lynn Bender at GeekAustin and our friends at Riptano.
Data Day Austin includes both basic and advanced training in Hadoop as well as Cassandra. It takes place on Saturday, January 29, 2010 at the Norris Conference Center. The speaker list is as follows:
Introduction to Cassandra for Java Developers
Nate McCall – Software Developer, Riptano
I Know Where You Are: an introduction to working with location data.
Sandeep Parikh – Principal, Robotten Labs
Shaun Dubuque – Co-founder, Argia, Inc
Thinking of developing location-based apps? Sandeep and Shaun show you sources for location data and strategies for managing it.
Additional presentations and workshops to be announced shortly.
Hadoop Deep Dive includes:
It’s common to pay a few thousand dollars for a day of Hadoop training. We have Austin’s top Hadoop talent teaming up to give you a day of instruction as part of Data Day Austin. These are not mere presentations. If you so desire, we want you to leave Data Day Austin with a working knowledge of Hadoop.
Higher Order Languages for Hadoop I – Wukong
Flip Kromer Founder and CTO, Infochimps
Wukong allows you to treat your dataset like:
* a stream of lines when it’s efficient to process by lines, * a stream of field arrays when it’s efficient to deal directly with fields
* a stream of lightweight objects when it’s efficient to deal with objects
No one knows more about Wukong that Flip Kromer.
Higher Order Languages for Hadoop II- Pig
Jacob Perkins – Hadoop Engineer, Infochimps
Pig is a Hadoop extension that simplifies Hadoop programming by giving you a high-level data processing language while keeping Hadoop’s simple scalability and reliability.
Web Crawling and Data Gathering with Apache Nutch
Steve Watt (blog) – IBM Big Data Lead, IBM Software Strategy
The first phase of any analytics pipeline is finding and loading the data. Apache Nutch is a Hadoop based web crawler that acts as an excellent tool to be able to pull down content from the web and load it into the HDFS to make it available for Hadoop Analytics. This session will teach you how to install and configure Nutch, how to use it to crawl and gather targeted content from the web and how to fine tune your crawls through the Nutch API.
Hadoop Analytics for the Business Professional
(BigSheets demonstration with multiple analytic scenarios)
Instructor to be announced shortly
Additional workshops/presentations to be announced…
Be sure to register soon as there is currently Early Bird pricing. For comments, questions, or sponsorship opportunities, contact firstname.lastname@example.org