- August 21, 2009
There are two SXSW panels which have Infochimps as potential contributors. They were mentioned in our last post about the SXSW panels we liked, but we didn’t talk about what we hoped to accomplish.
In Scraping the Social Web, founder Flip Kromer plans to talk about his experience in gathering data from multiple social networks and agreggating that data into datasets. API’s only give you a limited view of what’s going on within a network. With the dataset of the whole network you can explore the community structure and have a deeper understanding of how each user and action lives within the whole.
No longer would we have to use pandemics to explore a network, where the spread of disease or ideas lets us see who is connected to whom. By having the whole Twitter dataset to explore we can the effects of an idea endemic to a network, an idea that started within the network itself.
In Petabyte as Platform: Making Big Data Accessible Online, co-founder Dhruv and Pete Skomoroch of Data Wrangling ch0ose an interesting dataset from which many interesting things can be done. They will show you the right questions to ask when you get a big dataset you want to work with, helping you to solve the complexities that can arise.
Next, they will show you the various ways in which you can bridge the gap between raw data and refined insight. Whether by visualization or application, they will again show you what you need to look for and how to get past the difficulties.
We are debating which dataset we’d like to use and could use your help to decide. Our few ideas are outlined below, please discuss in the comments over which would be preferred.
“Who is the greatest rap artist in the world?” – Using a database of rap lyrics from over 40,000 songs, we do some datamining to discover how often and where the phrase “grown ass man” has been repeated. Using the same techniques, we show what phrases are most popular, who coined them, and visualize them as they travel from coast to coast.