The data landscape (Part 2), and Microsoft

The data platform industry has a new entrant this week!  Yesterday Microsoft announced a data store of their own at their developer conference.  Called Dallas, their offering is another example of a data marketplace.  The market for selling data online in an open way is still young (how many platforms besides ours and Microsoft’s do you know?) and so it is validating to see another entrant in this space.  We know that Microsoft will encourage the developer community to explore what these new platforms make possible.

Like many other services, Dallas meters out data through an API which is helpful to programmers with limited resources.  With Infochimps, however, developers get full datasets in bulk, which is better for many applications and essential for any kind of analytic work.

Both our marketplaces have the same value proposition: open up your data and profit.  When trying to convince an organization to open up its data, API’s can be an easier sell.  Even though they are costly to build and run, organizations may prefer the control they get over what people can access when compared to our simple and cheap bulk solution.

It is still unclear what the size and format restrictions are on Dallas.  If they are like other services out there (Socrata, Factual), they need data that comes in a structured, rectangular format.  These constraints enable these services to display their data live online.  While Infochimps doesn’t have that feature (yet!), we can handle datasets at the terabyte scale as well as those that don’t fit the spreadsheet paradigm, such as social network graphs.

Dallas is also part of a platform that forces users to integrate with other Microsoft services.  Infochimps’ mission is simply to connect people with the data they’re looking for, and we let anyone download data without having to register for an account.

We are proud to be a part of a strong community that’s grown over the past year, and to continue our commitment to an open data comons.  On the commercial side, we are narrowing focus on the right verticals after months of talking with this new market about what is possible.  That ultimately is what this is about – enabling something that couldn’t be done before, and connecting buyers to sellers and people to knowledge.


  1. Jason Cohen November 20, 2009 at 12:50 pm

    People do NOT want an API in general. They way Excel. CSV, TSV, that sort of thing.

    Even geeks like me can get data working for themselves faster with a CSV than with another API. Most people can barely use Excel, much less an API.

    You guys are MUCH better positioned than Microsoft. I know that doesn’t feel true because “it’s Microsoft.” But the lock-in and the API are both massive barriers to entry.

    API is required for data-crunching data sets too large to download like your twitter dump. Fine! You’re right, I can’t eat 300GB of Twitter data, nor do I wanted to, and anyway I need a distributed system to get answers in reasonable time, so it’s best if you do all that for me.

    Awesome, but for 99.9% of data sets, they are USEFUL not LARGE.

    Csv or tab to rule them all.