Monthly Archives June 2011

Data In Sight: Visualizations Built In Two Days

datainsight Data In Sight: Visualizations Built In Two DaysThis past weekend, Jacob Perkins and I attended data in sight: making the transparent visual, a data visualization competition organized by Creative Commons, Swissnex San Francisco and the Kingdom of the Netherlands.

Held in the Adobe SF office and structured as a competitive hackathon, the aim for teams was to create a complete data visualization from scratch in two days.  Participants came from all over the world and included folks from established large companies, small start-ups, academia, non-profits, and several lone freelancers.

Friday evening, the contestants were briefed on the challenge.  Our very own Jacob delivered a stellar presentation of a carefully curated collection of useful datasets, that included specific suggestions of how the data might best be used.  This layer of practical explanation really helped folks quickly understand and get excited about the beautiful possibilities of Infochimps datasets.

After the presentations, participants formed into 19 teams of 3-5 developers, designers and data experts.  The groups worked continuously until Sunday at lunchtime and in the end, 14 of the teams delivered a final presentation, and 8 of the 14 used Infochimps data.  (You can peruse those 8 visualizations here: Pathlist, Marvel Universe Social Graph, UFO Siter, Uber Shady, Parkalator, CuriouSnakes, Disaster Strikes: A World In Sight and Silenced.)

A group of 11 judges (including myself) evaluated the teams’ efforts and while most of the teams created some impressive results, we quickly agreed upon the ones we thought were the best.  There were five prize categories, and 4 out of the 5 winners used Infochimps data!

This is a multi-model parking cost optimization tool for San Francisco residents.  It helps drivers decide where to park to save money or whether it’d be cheaper to take a cab. parkalator 1024x448 Data In Sight: Visualizations Built In Two Days

New Data Sets: Alcohol, Free WiFi & How Long You’ve Got Left to Live

phonemonkey New Data Sets: Alcohol, Free WiFi & How Long You’ve Got Left to LiveOur chimps have been busily scouring the data jungle and thanks to users like AggData and vanceinteriors, we got over 1000 delicious new datasets in just the last two weeks!  Today, we’ll highlight a few of our favorites and answer some of your most burning questions.

How Long Do I Have Left to Live?
How long have I got, Doc? In an interesting measure from the US Census, here’s a free dataset gives the average number of years an individual in the US has left to live, given their age, sex and race.

Where Can I Get Free WiFi?
Ever find yourself in a new city wondering where you can get free WiFi?  This dataset contains over 63,000 locations throughout the entire US with the latitude/longitude and business name.

What Bar in Austin Sells the Most Mixed Booze?
Curious what the hottest bars in town are (based on mixed drinks sales)?  The Texas Alcoholic Beverage Commission has got your answer!  This free dataset contains the trade name, address and reported tax on mixed drink sales for bars throughout Texas.  

Be the first to answer this question in our comments and we’ll send you a package of sweet Chimpy stuff (stainless steel water bottle, stickers and Startup: The Hackering!): What bar in Austin had the highest reported mix drink sales in May 2011?

We’ve got over 14,700 more where that came from.  Visit our site today and search for the data you want.  Can’t find what you need?  Let us know on UserVoice!

Beautiful Data: Elements of Happiness

happiness 11 Beautiful Data: Elements of HappinessLaura Javier, recent design graduate from Washington University in St. Louis has created an absolutely stunning 96 page book, visualizing over 72 years of adult development research from Harvard University.  It’s a lovely reminder of the humanity behind the bits.

The Harvard Study of Adult Development is the longest prospective study of mental and physical well-being ever conducted. For 72 years, researchers at Harvard have been following 824 individuals through war, career, marriage and divorce, parenthood and grandparenthood, and old age. In this book, I’ve taken 10 representative case studies and visualized their salient character traits, personal timeline, social supports, and physical health to draw conclusions about “the happy life.”

UFOLibs: The Truth is Out There

Have you ever found yourself looking to the sky and suddenly saw something… unfathomable?  Did it make you wonder if you were really alone… in the universe?  Well look no further – Dick Hall, Business Development Manager here at Infochimps has an app for that!

ufolibs UFOLibs: The Truth is Out There

Built over a 3 Day Startup weekend, UFOLibs uses web framework, Sinatra and Infochimps API for 60,000+ Documented UFO Sightings.  The resulting app allows you to enter your other worldly experience, Mad-Libs style, to compare against the recorded experiences of thousands of others.  Not only is the end result produce a delightful diversion, but it showcases that with our API and Sinatra, “even our Biz Dev guy can make a web application”. (Dick’s words!)

Try out the app for yourself or see the code on GitHub.  And remember… the truth is out there.

New Data Sets: Colleges, Hospitals, The Marvel Universe and Social Data

cheekymonkey 191x300 New Data Sets: Colleges, Hospitals, The Marvel Universe and Social DataLook what we found!  No, it’s not just a picture of a baby monkey, though we did think it was apropos for our new weekly feature highlighting some of the best new data sets to join our ever growing data marketplace.  Today, we bring you a mix of geo-data and social data, including a social graph constructed by three researchers the Balearic Islands, of the Marvel Universe, which surprisingly is not unlike a real-life social network!

US Colleges and Universities
This is a database of all US college and universities, as of 2010. There are 9350 total colleges and universities listed with name, address, phone number and URL.

US Hospitals
This data set contains 49 fields on 4287 hospitals in the United States. While not all datapoints are available for every hospital, this robust data set contains info such as: location and contact information, heart attack mortality rate, gross patient revenue, number of staffed beds, approximate average patient length of stay and patient satisfaction along several metrics.

2000+ Flickr Images, 10,000+ YouTube Videos and 10,000+ Digg Users
These data sets, courtesy of Munmun De Choudhury, showcase large scrapes of social data that has been used by the post-doctoral fellow to perform image content analysis, examine dynamics of threaded comments in rich media sharing, study information diffusion and community evolution centered around the topics.

Marvel Universe Social Graph
This fun Marvel Comics character collaboration graph showcases the artificial world that takes place in the universe of the Marvel comic books as an example of a social collaboration network. They compare the characteristics of this universe to real-world collaboration networks, such as the Hollywood network, or the one created by scientists who work together in producing research papers and find that the Marvel Universe is surprisingly closer to a real social graph than one might expect.

We’ve got over 13,700 more where that came from.  Visit our site today and search for the data you want.  Can’t find what you need? Let us know on UserVoice!

Sloppy Joes, Slop-Sloppy Joes, anyone?

freelunch 300x300 Sloppy Joes, Slop Sloppy Joes, anyone?There is no such thing as a free lunch… except at Infochimps.

The idea behind free lunch being policy at Infochimps is that it helps people maximize productivity because each individual doesn’t have to think of where they want to eat, then what, and then go out, get it, and return; it also helps us bond as teammates to all eat at the same time and place. We’re constantly running the cost/benefit on this practice, but at least for now it seems to be much more beneficial than cost incurring.

Infochimps is about increasing both user and programmer joy in whatever ways we can. We’re always streamlining our processes, tweaking, and fixing things along the way. It’s amazing how sometimes a small but major pain point can be fixed with some deft coding.

As we’ve grown larger, making lunch easy to get for everyone and making sure that everyone could have their tastes accommodated was becoming a problem. I created a small database with all the restaurants we order from on a regular basis so that I could find their contact info and menus more quickly, and that helped for a little while, but even with that tool, we had a “lunch coup” one day. (It was peacefully resolved with some Asian take-out, and no one was harmed in the process.)

Enter: The Lunchlady

lunchlady simpsons 300x232 Sloppy Joes, Slop Sloppy Joes, anyone?

No, not that one! (more…)

Clustering Baseball Data with Weka

This is a guest blog post from Peter Hauck, who works as a data analyst at Google.  His experience includes employee compensation optimization and dynamic pricing of live event tickets. He is a graduate of Cornell University with a B.A. in Mathematics and Physics and an M. Eng in Applied Physics.

Greetings sports fans and data nerds! Since 2004, Major League Baseball has published (x,y) “hit locations” of every at bat and for years, Sabermetric and actuarial analysts have turned this and other data into predictions of where individual sluggers will hit in the future. In hopes of optimally positioning players in the field, professional teams and sports commentators pay handsomely for this kind of forecasting.  The models I’ve seen employ data binning and statistical & probabilistic models to get these results.

In a twist, using the GUI software, Weka, I applied k-means clustering to find patterns in single-season hits record holder, Ichiro Suzuki‘s (x,y) hit locations from 2006.  For readers not familiar, clustering is a computational method of splitting a dataset into neighborhoods of similar points.  Unlike most clustering work, using Weka avoids real programming; once the data was loaded into Weka, the computation required about ten mouse clicks.  I think of this as a semi-scientific, exploratory method that offers quick insight and often reliable conclusions.
sOF spray Clustering Baseball Data with Weka (more…)

The Secret of the Cloud

the cloud The Secret of the Cloud

Ah… it all makes sense now.