Monthly Archives August 2009

Infochimps SXSW Panels

There are two SXSW panels which have Infochimps as potential contributors.  They were mentioned in our last post about the SXSW panels we liked, but we didn’t talk about what we hoped to accomplish.

In Scraping the Social Web, founder Flip Kromer plans to talk about his experience in gathering data from multiple social networks and agreggating that data into datasets.  API’s only give you a limited view of what’s going on within a network.  With the dataset of the whole network you can explore the community structure and have a deeper understanding of how each user and action lives within the whole.

No longer would we have to use pandemics to explore a network, where the spread of disease or ideas lets us see who is connected to whom.  By having the whole Twitter dataset to explore we can the effects of an idea endemic to a network, an idea that started within the network itself.

In Petabyte as Platform: Making Big Data Accessible Online, co-founder Dhruv and Pete Skomoroch of Data Wrangling ch0ose an interesting dataset from which many interesting things can be done.  They will show you the right questions to ask when you get a big dataset you want to work with, helping you to solve the complexities that can arise.

Next, they will show you the various ways in which you can bridge the gap between raw data and refined insight.  Whether by visualization or application, they will again show you what you need to look for and how to get past the difficulties.

We are debating which dataset we’d like to use and could use your help to decide.  Our few ideas are outlined below, please discuss in the comments over which would be preferred.

“Who is the greatest rap artist in the world?” – Using a database of rap lyrics from over 40,000 songs, we do some datamining to discover how often and where the phrase “grown ass man” has been repeated.  Using the same techniques, we show what phrases are most popular, who coined them, and visualize them as they travel from coast to coast.

SXSW Data Panels

We are especially excited to announce and share that big data is coming to SXSW.  Here are the panels we like:

Pete Skomoroch of DataWrangling: Petabyte As Platform, Making Big Data Accessible Online – We have long been fans of Pete Skomoroch’s work, this is your chance to hear from him about web applications built on massive datasets.

Our own mrflip: Scraping the Social Web – Flip has done extensive work building massive datasets from social media sites.  Hear him talk about the nuances involved and ask him about best practices.

Michael Driscoll of Dataspora: Cloud Crunching Big Data with HIVE/Hadoop and R and Become a Sexy Data Geek in One Week – Another friend of ours, Michael, will be talking about how to use the right tools to massage and produce results from big datasets, and profiles what you need to do to be a data geek.

Stu Hood of Rackspace: Using Hadoop to Manage a Ton of Data – Hadoop might be the the most important tool to know for working with terabytes and terabytes of data.

Ian Davis of Talis: Set Your Data Free – Talis does great work.  Listen to Ian cover topics very relevant to Infochimps.org’s collection: data copyright and licensing.

Dave Bowker of Designing the News: Engaging Data Visualizations and Infographic Communication – Glad to see some data viz stuff at SXSW.

Casey Caplowe of GOOD: Interactive Infographics – More visualizations, GOOD stuff.

Leave a comment if you know of any other good ones.