The data landscape online, as we see it. Part 1

Nathan at FlowingData did a wonderful job last week culling 30 great resources from the world wide web for finding data. Yesterday another site launched – Factual, making great resource number 31. We are excited to see a growing number of companies spring up that in turn increase everyone’s access to data. Solving the problems with data online is no small task fit for any single player. It’s a team effort, which we are proud to be a part of.

We thought we would take a minute today to talk about the problems as we see them, and how players within the online data market are choosing to tackle these problems.

The first problems are finding and sharing data. Most of these sources already solve this problem. Socrata and Factual let users upload data onto their sites, and each company’s datasets are easily searchable along with what’s on and Numbrary.

There are also other, more technical issues. Swivel, Socrata, Factual, Many Eyes – all of these websites allow users to play around with data live on the site. This opens up costly issues for the hosting company.

1. The data has to live in their platform and reconcile with the whole.
2. Many new datasets are on the order of gigabytes in size.

Whereas datasets on Infochimps can be of any size, format, or shape, their datasets must be in a standard csv/tsv/xls format and are limited to a few hundred megabytes. In reality, statisticians want data in .sas formats, and geographical data comes in .gis formats. Because of the larger size of today’s datasets, tools within a browser will be insufficient to work with and understand the data, and a person’s options for distributing that data are also limited.

Data, especially valuable data, is often proprietary. The owners of that data won’t release it unless there are clear licenses and terms of use. We differ from these other open data players in our commitment to host open data for free and maintain our open data commons for everyone’s benefit, but we will also host licensed data. Unfortunately, open data doesn’t include all of the data in the world. Instead, what we offer organizations is the ability to permit only users that have agreed to a license or paid for access to download their data. As the data marketplace grows, we believe more and more buyers will realize the value proposition in looking for data on Infochimps. Our aim is to give incentive to the long tail of businesses with data gathering dust on hard drives that could otherwise be useful to another person or organization.

Calling all Pollsters

Carl Bialik, from the WSJ Numbers Guy blog highlighted the recent controversy in the opinion polling industry over Strategic Vision’s choice to not share their polling methodology or raw data. and FiveThirtyEight have also weighed in on the problem.

Our message to opinion polling firms is this: share or sell your data on

Free, public polls can be distributed for free on our site.  If you’d like to charge for the download of your data, set your own price. Your data will live in a place where the whole world can find it, bringing you a larger and broader audience.

