- October 6, 2011
“You can please some of the people some of the time, all of the people some of the time, some of the people all of the time, but you can never please all of the people all of the time.” – Anonymous (although often attributed to Abraham Lincoln)
As Minister of Office Joy, the OM (Office Manager) of Infochimps, I am well aware of the truth of this statement. However, my career goal is to prove it wrong (at least occasionally), and the chimps have given me a chance to attain this seemingly impossible dream.
My job is pretty simple: remove obstacles to my team, help provide them with the tools they need to shine, and increase team joy. The thought is that happy teammates are productive teammates and productive teammates are happy teammates. A few months ago, we started the Office Joy fund. Responsibility for this fund was put in my hands (insert evil laughter) and I thought, “There has to be a way to get everyone involved and please the majority of the team most of the time, (and maybe even all of them at once every now and then).”
What are you doing on Friday, July 22nd at 1:20pm CT? You could be knocking back brewskis with our CEO, Nick Ducoff at historic Wrigley Field. He’s got prime tickets to the Cubs vs. Astros game and wants to take one lucky & talented developer out to the ball game. He’ll treat you to all the beer and peanuts you can handle – oh yeah!
All you’ve got to do is hack together an application or data visualization using at least one of our data sets or APIs and submit it to us by Wednesday, July 20th at 5pm CT. We’ll pick our favorite and let you know if you’re our winner by Thursday, July 21st!
Don’t live in Chicago, but still want to play along? We’ll send every person who enters a handful of starter decks of Startup: The Hackering, Infochimps’ infamous SXSW card game and some sweet stickers.
Here’s some of our favorite data sets and APIs we’d recommend starting off with:
For inspiration, check out what others have built using Infochimps data: App Gallery.
This past weekend, Jacob Perkins and I attended data in sight: making the transparent visual, a data visualization competition organized by Creative Commons, Swissnex San Francisco and the Kingdom of the Netherlands.
Held in the Adobe SF office and structured as a competitive hackathon, the aim for teams was to create a complete data visualization from scratch in two days. Participants came from all over the world and included folks from established large companies, small start-ups, academia, non-profits, and several lone freelancers.
Friday evening, the contestants were briefed on the challenge. Our very own Jacob delivered a stellar presentation of a carefully curated collection of useful datasets, that included specific suggestions of how the data might best be used. This layer of practical explanation really helped folks quickly understand and get excited about the beautiful possibilities of Infochimps datasets.
After the presentations, participants formed into 19 teams of 3-5 developers, designers and data experts. The groups worked continuously until Sunday at lunchtime and in the end, 14 of the teams delivered a final presentation, and 8 of the 14 used Infochimps data. (You can peruse those 8 visualizations here: Pathlist, Marvel Universe Social Graph, UFO Siter, Uber Shady, Parkalator, CuriouSnakes, Disaster Strikes: A World In Sight and Silenced.)
A group of 11 judges (including myself) evaluated the teams’ efforts and while most of the teams created some impressive results, we quickly agreed upon the ones we thought were the best. There were five prize categories, and 4 out of the 5 winners used Infochimps data!
MOST ACTIONABLE: Parkalator
This is a multi-model parking cost optimization tool for San Francisco residents. It helps drivers decide where to park to save money or whether it’d be cheaper to take a cab.
The idea behind free lunch being policy at Infochimps is that it helps people maximize productivity because each individual doesn’t have to think of where they want to eat, then what, and then go out, get it, and return; it also helps us bond as teammates to all eat at the same time and place. We’re constantly running the cost/benefit on this practice, but at least for now it seems to be much more beneficial than cost incurring.
Infochimps is about increasing both user and programmer joy in whatever ways we can. We’re always streamlining our processes, tweaking, and fixing things along the way. It’s amazing how sometimes a small but major pain point can be fixed with some deft coding.
As we’ve grown larger, making lunch easy to get for everyone and making sure that everyone could have their tastes accommodated was becoming a problem. I created a small database with all the restaurants we order from on a regular basis so that I could find their contact info and menus more quickly, and that helped for a little while, but even with that tool, we had a “lunch coup” one day. (It was peacefully resolved with some Asian take-out, and no one was harmed in the process.)
Enter: The Lunchlady
No, not that one! (more…)
It’s hard to say what will become of Data.gov and USAspending.com. Researcher and Scholar Vivek Wadhwa claims the sites have plenty of support from government officials, but do they have enough support from lawmakers to stay afloat? Reports claim that budget for Data.gov and USAspending.com will plummet from $35 million to $2 million.
If there’s one thing we like to do at Infochimps, it’s collecting interesting nuggets of information for you to use. So here are some useful posts on the matter. Please share them with your friends so we can ensure support for open government:
Over $30 billion was spent on unnecessary hospital admissions in 2006. Each of these unnecessary admissions took away one hospital bed from someone else who needed it more. Rather than waiting for politicians to settle their arguments about how to implement health care reform, health care provider Heritage Provider Network teamed up with data modeling and prediction competition network Kaggle to offer a very interesting solution.
Heritage Provider Network launched the Heritage Health Prize with one goal in mind: to develop a breakthrough algorithm that uses available patient data, including health records and claims data, to predict and prevent unnecessary hospitalizations. They’ve invited data scientists to help crack the problem, and the winner will receive $3 million.
$3 million sounds like a lot, but it could save Heritage Provider Network a considerable amount of superfluous claims and make our healthcare system much more efficient. How effective do you think data algorithms can be at distinguishing life-saving versus unnecessary visits? What data and precautions could be crucial for this contest to be a success?
My friend Tahir Hemphill has built the Hip Hop Word Count, a searchable database of over 40,000 songs with lyrics and metadata – including dates and geolocation of the artists. Check out Tahir talking about the project:
He was picked up in ReadWriteWeb recently and he’s raised over $6,000 through his Kickstarter campaign, from the likes of Clay Shirky no less, to launch the service publicly. And he’s started to share his data on Infochimps, now you can download a pack of Jay-Z lyrics. You can find similar data on Infochimps by searching the music tag.
Show your support for another developer/artist that’s doing something cool with data, and contribute to his fundraising campaign. Tahir will be using the proceeds to release the data, and his tool, to the public.
Stay tuned next week for a release of data from the Million Song Dataset project, a massive dataset that catalogs the features of a million songs. It’s music data like this and from the HHWC project that help create web services like Pandora, neat graphics about whether crunk was first used in the South, and that make the dreams of us data hobbyists come true.
Data visualizations are like houses and neighborhoods, monuments even, built on the foundation that Infochimps is laying with our big data gathering and processing. We love it when people do really cool things with the information that we have on our site and just wanted to share a recent example with you. One of our users, Kennedy Elliott (@kennelliott) found subway trend data on our site and used it to make a really cool holiday greeting card that she sent to us. :)
Open data has thus far largely been associated with government data. Though government data is indeed valuable, the potential of the data that private organizations gather has been overlooked. These organizations usually don’t realize the potential that their data holds.
At the Data Cluster last month, our own Dhruv Bansal and Gil Elbaz of Factual led the Open Data Birds-of-a-feather session. Using insights from that discussion, and some of our own, we want to highlight some pros and cons of this process to help organizations determine whether opening their data is the right move:
1. Profit generation – Almost all data will have some value to someone else, whether an organization realizes it or not. Putting up data for sale would help these organizations realize how valuable their data is and may even provide another revenue stream from this latent resource. For example, a firm with data on parking meter locations and occupancy rates can sell it to a firm building an iphone app to help you reliably find parking in our nation’s downtowns.
2. Crowd-sourced curation – Gil commented that a lot can be gained from crowd-sourced curation. Firstly, the organization avoids the costs of curating the data themselves. Secondly, the pool of brains working on the data can amount to incredible products that were not immediately evident, especially when your data is mashed with others’. In this Factual table of Nationwide Restaurants, geo data is mashed with information and reviews of restaurants from sites yelp, Yahoo! Citysearch and Zagat, to make this interactive search table.
3. Potential uses – There are many different uses for data that range from cool informational data visualizations to applications to mining for insights. The organization avoids the costs of having to set up infrastructure and gather manpower to translate the data into these products by opening their data for others to use.
Some examples of what has already been done with open government data can be found in a previous blog post “Open data applications”
4. Exposure – Organizations can gain exposure from opening their data, especially now while it’s still relatively uncommon, positioning itself on the cutting edge of the data sphere. Additionally, transparency is demanded more these days, and this is one of the ways to achieve that. Best Buy has an open API called the Best Buy Remix of their product catalog. With this open API, they not only leave the development of apps to others, but they also gain exposure and generate business from apps that would, for example, allow users to search for products they want and get details on it (location, price, specs, etc).
1. Historically difficult – The development of the market for alternative data is relatively new. Opening data used to be incredibly difficult, expensive and labor-intensive. Large amounts of data took a lot of time and were extremely hard, if not impossible, to process. However, things such as cloud computing and processing tools like Hadoop have helped address these problems, making the whole data process a lot easier.
2. Privacy concerns – These fall under two types: First, some companies might be concerned about certain data being accessed by their competitors. This problem can be avoided since companies can choose what data they open and keep more sensitive data secret. In the end, these organizations might find that the data that is crowd-sourced may result in interesting insights that would further develop their product/service. Second, there are also concerns about users’ personal data. Efforts need to be made to ensure that they understand how their data is being used, security upheld, and how to opt-out if they choose to do so.
3. Data processing – Some organizations don’t have the capabilities to process the data for public consumption, but if they really do have valuable data, then a cost-benefit analysis might show that setting up the required infrastructure is worth it. If a company just doesn’t have the resources for this, as mentioned earlier, it can leave some of the data processing to the crowd.
4. Reservations about crowd-sourcing – Someone from Wolfram Alpha pointed out that companies may believe that expert curation is better than crowd-sourcing. What these companies fail to realize is that there are increasingly more people fluent in data. Crowd-sourcing their many talents and ideas means that a lot more can be done with their data- things that one expert alone may overlook.
Verdict? Open your data! The data market is growing and infrastructure is developing alongside. The traditional hindrances to opening data, such as the scarcity of people who can curate data, the difficulty of identifying buyers, and the impossibility of handling large amounts of data, are dissipating. Instead, a lot of potential lies in the data, from financial gains to the increase of brand recognition. With all this in mind, companies need to take a second look at their data and evaluate its worth.