Monthly Archives November 2011

Bringing Transparency to Political Discourse

politilines Bringing Transparency to Political Discourse

I’ll be honest, you’d be hard pressed to find me watching a political debate that was not Auto-tuned or accompanied by commentary from John Stewart or Stephen Colbert.  This is not to say that I am not concerned with the political climate in our country; I just don’t necessarily think watching Rick Perry make a fool of himself on the topic of climate change does much to impact my opinion of him or his presidential prospects.

Politilines by Periscopic, a great interactive visualization of words used in recent Republican polical debates, lets you see what the candidates talked about each night.  The tool was created through a combination of manual and automated curation, using data from the American Presidency Project.

Admittedly, this is no perfect tool for understanding a political candidate and their stance on topics, but it can uncover some interesting insights and questions. (e.g., Why was Michelle Bachman the only candidate to mention Immigration during the 11/09/11 debate?)  The tool also serves as a great starting point for more bringing more accountability and transparency into the actual content of these discussions.

[via FlowingData]

Transitioning to Lean at Infochimps

Two nights ago, my fellow chimps, Dhruv Bansal, Tim Gasper and I gave a presentation at the Austin Lean Startup Circle on the company’s recent transition to lean. We discussed our switch to a lean product strategy driven by must-have customer problems and the lean concepts and tools we have used to get there. It’s chock full of insights, struggles and great ideas for startups looking to adopt the Lean methodology.

For a version with full audio, check out it out on Posterous.

Princeton or Prison: Which is More Expensive?

Each month, when over $400 is automatically deducted from my checking account, I can’t help but wonder… why did I choose to go to a private university that cost in excess of $20,000/year (with scholarships)?   And why did 17 year-old me think it was okay to imprison then-future me to 17 years of debt?

So, it’s not particularly surprising that when I came across this infographic on Fast Co. Design comparing the cost of ivy league higher education and incarceration, I took pause. Perhaps one of the causes of such large individual student loan debt is the fact that at a federal level, more money is spent on corrections than higher education. In addition, a handful of states, including New Jersey, also spend more on incarceration than universities.  So, to answer the question posed in the title of this blog post…

It’s prison.

That’s right, according to this thought-provoking infographic from, an online resource for students and professionals in public administration, the state of New Jersey spends more to lock away a prisoner in Trenton ($44,000) than it does to send someone to Princeton for a year ($37,000).  Let’s do a little math here…

New Jersey has an inmate population of 26,757.  Nationwide, on average, a quarter of prisoners are nonviolent offenders.  So, if New Jersey took those 6690 nonviolent offenders and instead of sending them to prison, sent them to Princeton, the state would save nearly $19 million dollars.  Send them to state school, Rutgers and the savings balloons to almost $87 million dollars.

Multiply this savings across all the nonviolent offenders in our country’s 2 million strong prison population and you can save $15.5 billion dollars, which would come in handy going towards our country’s frighteningly large student loan debt of $830 billion.

prison vs princeton Princeton or Prison: Which is More Expensive?


The World According to the Internet

maponlinecommunities2 The World According to the Internet

There are many ways to define the boundaries of our world.  In an attempt to gain more insights into our digital lives and poke some fun at the internet, about a year ago, Randall Munroe of XKCD updated his well-known map of online communities. Over the past year, there also seems to have been an explosion of infographics around how our online communities define the boundaries of our tangible world.  It’s amazing to see how crisply our online interactions follow our existing geo-political boundaries (and interesting to see where this isn’t the case!).

The Planet according to… Facebook Friendshipsxlarge facebookworld The World According to the Internet

Europe according to… Twitter Languages
image5 The World According to the Internet 

India according to… Panoramio Photo Uploads (tourists)
touristiness map The World According to the Internet

The United States according to… Twitter and Flickr Users
A Beautiful Illustration of Twitter And Flickr Users Around The World The World According to the Internet

Texas according to… Flickr Photos
2971287541 27e6a06a21 The World According to the Internet

Make your own map of the ______ according to _______ using our Geo API.  We’d love to see a map of the world according to geo-tagged Wikipedia articles or instances of towns with “Fort” in the name around the world… the possibilities are endless!  Also, if you’ve seen some awesome maps of the world according to the internet, we’d love to see them.

I Have a Headache: The Problem with Too Many Choices

 I Have a Headache: The Problem with Too Many ChoicesI went home to New York City for a wedding a few months ago and introduced my boyfriend to one of the wonders of the city that never sleeps. Pharmacy chain, Duane Reade at Times Square offers a two-story, 24-hour wonderland of over-the-counter choice with everything you could possibly need whether it’s liquid bandages, children’s sunglasses, organic fair trade mini chocolate bars and more. Perfect for wary tourists in need of some forgotten toiletries or impulse purchases.

When we were checking out, I noticed a display of simple cardboard boxes with declarative phrases on the front such as, “I have a headache” and “I can’t sleep”.  I found it curious that a company was positioning their brand so simplistically when there were shelves upon shelves dedicated to headache and sleep aides with dozens of different active ingredients, multitudes of packaging variations and several choices in medication delivery mechanism.

Help Remedies Infographic Full I Have a Headache: The Problem with Too Many Choices

The chart above offers real data about the seemingly infinite options available to the average shopper at a drug store.  Each thin line represents an actual product available for treating headaches.  With hundreds of seemingly indistinguishable options, no wonder there exists a company looking to find simplicity in the chaos.

Now, this post is not meant to be an advertisement for the company that makes the one-size-fits-all over the counter drugs, but it does pose an interesting question: why do we allow ourselves to be overwhelmed with choice in consumer packaged goods when we eschew such complications in our digital lives?  From clean UX design to simple online forms to the austere product lines of top tech companies such as Apple, ease of use and user joy are tops on the priority list.  I’m sure anyone who’s ever labored over the decision of picking out the correct one of Tylenol’s 13 different varieties of aches & pains medication for children while a 2-year old yells at the top of her lungs, can tell you that choice does not necessarily lead to ease or joy.

Perhaps in the attempt to create better product experiences for our customers and more happiness in our own lives, we should seek simplicity over choice.

Fantasy Football Picks: Finding Wisdom in the Crowd

football verdict logo Fantasy Football Picks: Finding Wisdom in the Crowd

On any given weekend when you’re hanging out with your friends watching a game, it’s inevitable that everyone will pipe up with their opinions and none are more vocal than those of us in the fantasy community.  But whose opinion can you trust and which one will lead you to a winning roster?  Professional football, much like baseball, is a heavily tracked sport with tons of stats on every player and every game.  So, sure, you can wade through a bunch of player stats or look to expert rankings, but wouldn’t it be more fun to pit your own football knowledge against the rest of the community?

Football Verdict is a new fantasy football advice platform where managers can ask weekly sit-or-start questions and get advice from the fantasy community. Football Verdict aims to provide personalized fantasy advice by giving you feedback on your specific dilemma. To encourage answers and engage users, we evaluate every answer submitted to the site and rank each user on our leaderboard based on their accuracy. Rather than purely focusing on player stats, we look to measures that gives us a better chance at the right prediction, fueled by a strong, passionate community.

Reviewing our data, in Week 8, fantasy football players most often asked about Bernard Scott, Antonio Brown, Roy Helu, and Jackie Battle. In Week 8, many high-powered offenses had bye weeks and there were a lot of injuries. Thus, fantasy owners asked many questions about fringe players.

What a difference a week makes.

In Week 9, there’s a boldface name sitting at the top: Chris Johnson. Just over halfway through the season, and the fantasy player most often in question is the same player who went #2 in most drafts this year.

Week 9’s Most Frequently Questions Players (with position, team and percentage of Verdict questions appearing)

1. Chris Johnson (RB, TEN, 14%)
Fantasy owners seek justification to bench their first round pick

2. Mike Williams (WR, TB, 12%)
Josh Freeman’s numbers are down all season

3. Brandon Lloyd (WR, STL, 8%)
Lloyd shows early promise in St. Louis, but instability at quarterback leaves players clueless

4. Stevie Johnson (WR, BUF, 8%)
Stranded on Revis Island

5. DeMarco Murray (RB, DAL, 8%)
Does Felix Jones’ return spell a committee in Dallas?

With four teams on byes this week (Detroit, Minnesota, Carolina, Jacksonville), and fewer than 6 viable fantasy players on those rosters, Football Verdict is seeing a glut of questions at the skill positions this week:

Wide Receivers: 32%
Trending: Mike Williams, Brandon Lloyd, Stevie Johnson

WR/RB Flex: 24%

Quarterbacks: 18%
Trending: Matt Cassel, Josh Freeman

Running Backs: 16%
Trending: Chris Johnson, DeMarco Murray, Cedric Benson

If you’re struggling with a fantasy football decision, ask your question at Football Verdict and the community will help you decide. If you think you’re great at predicting results, answer questions and you’ll see your name in lights on the leaderboard. If you have any questions, please email

Dan Chaparian is the co-founder of Football Verdict.

The Past, Present and Future of Data

Yesterday, our CEO, Nick Ducoff presented at Data Content, an Infocommerce conference. In this presentation geared towards fellow data publishers, Nick takes us through a history of information and his thoughts on the future and where Infochimps fits into the puzzle.  If you’d like to review a full transcript of his presentation, you can check it out after the jump. Enjoy!


Foursquare Venues, Wikipedia Articles, Census Data and More… All With Just an IP Address!

IMG 20110623 132455 1024x768 Foursquare Venues, Wikipedia Articles, Census Data and More... All With Just an IP Address!

Greetings from deep in the Data Mine here at Infochimps. This week the team rolled out new features that combine one of our most popular APIs with our Geo API platform, unlocking the ability to geolocate based on an IP Address with any of our Geo APIs.

The idea is based on one of our more popular mashups, our MaxMind GeoLite IP to Census API  which blends IP geolocation functionality with Census data. This allows you to find out not just where an IP address maps to, but also some high level information about that area – ideal for websites that do geotargeting and for people looking for a deeper understanding about their visitor audience. The data it draws on has become a bit dated though (it uses the 2000 Census), and the data covers a relatively narrow band of properties. Enter our Geo API platform, our platform for richer and more current data from a variety of sources.

A great advantage of our new Geo API platform is our ability to perform two-step queries internally, essentially converting a parameter into another parameter behind the scenes. It’s the key technology behind our ability to geolocate using an address: our geocoder first converts the address into latitude/longitude before making a secondary query against our data store to retrieve the response values.

By using the same principle with IP Geolocation instead of address geocoding, we have unlocked the ability for our users to query any of our Geo APIs with an IP Address as the geolocator, returning data as if the request had used a latitude/longitude. So now you can use an updated IP to Census API and also a more detailed drilldown version. Furthermore you can now go from IP to Foursquare Venue, Zillow Neighborhood, Wikipedia Article, and so on.

To use the new IP-Geolocation feature, just pass in the parameter g.ip_address with an IP address, along with a g.radius.  Check out this example query, which will help you locate banks and credit unions in our Foursquare database that are within 3 kms (about 1 mile) from the Infochimps office in Austin, TX.[YOUR API KEY HERE]

For client-side geo application developers we’ve also added another feature along with g.ip_address. With any of these APIs you can now pass “g.get_ip_address=true” instead, and our Geo API will determine the IP address of the machine calling our API and use that IP address as the geolocator. This new flag makes it easy to ask questions of our API like “tell me about venues near me” without ever having to know what your longitude is or how to interpret a quadkey.

All in the spirit of making Geo data more accessible and easy to use!

This Is Not My Beautiful House Music, How Did I Get Here? Uncovering How Music Travels.

interactive2 This Is Not My Beautiful House Music, How Did I Get Here? Uncovering How Music Travels.

Click image to open interactive version (via Thomson Holidays).

The history and evolution of different musical styles is a topic of hot debate.  Osman Khan, travel blog writer, attempts to stitch together a history of top level dance genres and how different styles have travelled throughout time around the world.  Non-dance music genres which influenced dance music are also included, but their own influences are not shown.  He notes that the sources used to create the map include Bass CultureLast Night A DJ Saved My LifeThe All Music Guide to Electronica, and Wikipedia.

Looking over this infographic got me wondering… with an evolution of an artform that is difficult to track and sources that seem more based in anecdotes than hard facts, how can we really understand how music travels?  Well, perhaps we’re asking the wrong question. Instead of asking where dub started and where electro started, we could ask how dub and electro are different and pinpoint which individual songs began that transformation between music forms.  Find the meta-data on that track with artist, country of origin, year created and more and you’re well on your way to understanding how music actually evolves and travels.

So, why don’t you try transforming some serious big data on music into a history of the craft?  Check out our Million Song data sets (organized by letter) or the 10,000 Song subset, which are freely-available collections of audio features and metadata for a million contemporary popular music tracks.  While you won’t be able to assess a long history of musical evolution, it might satiate some deep curiosity to understand how we moved from Nirvana to Britney Spears to LMFAO as the chart-toppers of our times. Joins the Bunch; Brings 30 Million+ News Headlines & Summaries from 2009-2011

303167 300 Joins the Bunch; Brings 30 Million+ News Headlines & Summaries from 2009 2011Hello fellow data monkeys,

A few weeks ago, Infochimps and completed a collaboration to release nearly 30 million news headlines and summaries from 2009-2011 in a nicely-structured JSON dump. This is data that’s crawlers have collected over the last 2 years from over 500,000 web news sources. I am a cofounder of and was the lead engineer who worked on making the data dump happen.

We have been receiving some questions about this data, so I thought it’d be helpful to give some background via this guest blog post. It’s also great timing: the whole team has just returned from a trip to Austin that included a stop at the Infochimps world headquarters. Let’s not let this opportunity for big data collaboration slip away!

OK, so what’s

parsely 800px Joins the Bunch; Brings 30 Million+ News Headlines & Summaries from 2009 2011