Announcements

New Developer Contest: Twilio + Infochimps

It’s a special day today for many of our readers. In addition to celebrating America’s Independence day, we want to celebrate data. I know that sounds crazy, but put Will Smith on mute for a minute and give us a chance.

Not too long ago, we didn’t have access to data. It existed, but it wasn’t really available. It was either shared on a one-off basis, or hidden somewhere in the deep corners of the Internet. Now,  you can have access to everything from all the reported UFO sightings to all the free WiFi spots in the nation.

So for this special day, we’re running a special contest with our friends at Twilio. What interesting applications could you create with our data? Perhaps a phone-powered UFO Mad Libs game? Or an SMS rolodex of social media profiles?  Be creative and show us what you could do!

It wouldn’t be a special contest, if there wasn’t a special prize. The winner of this developer contest will receive:

  • $100 in Twilio credit
  • $100 in Infochimps credit
  • A collection of swag from Twilio and Infochimps
  • All four books from the pioneer of data visualization, Edward Tufte

courses bookcovers New Developer Contest: Twilio + Infochimps

Entries must be submitted by 11:59pm PT on Sunday July 10th. Entries can be submitted here.

How To Get Involved

Entries to the contest can be submitted here.

Twilio runs weekly developer contests.  Check out more of them here.

If you need any help or want to bounce ideas off Twilio developers, join them on their forums or drop a note to help@twilio.com.  Want to reach a chimp?  Try connecting with us over UserVoice or send us some bananas at help@infochimps.com.

Infochimps Launches Even More API Calls

Right now, big data is restricted to 1.) the companies that can afford Oracle, and 2.) the companies that can leverage Hadoop and Cassandra, HBase or other NoSQL alternatives. These tools are robust and will always be necessary. They also take considerable amounts of time and knowledge to deploy.

Our mission at Infochimps is to democratize the world’s access to data. The best way to do this is to host useful data in one place so that people can share it. By collectively offsetting the hosting costs, a lot of people can access useful information without the pains of scraping and hosting it.

We have launched more data API calls on our website and intend to launch hundreds more in the next few weeks. Our data API allows you to query databases like our Twitter conversations database, which is over half a terabyte in size. This is not something you can comfortably do with MySQL, and we are giving you access to it for free. Using your Infochimps API key, you can access this data within seconds. I don’t even have MySQL installed on my computer and our data team has given me the power to find and understand data that only “big data” companies have the resources to access. It is truly inspiring.

Here are just a few of the types of data you can query with no prior knowledge of non-relational databases:
* Twitter People Search: Tired of poking around on Twitter forever just to find cool people to follow? Think of a subject you like and query this data set for it. It helps you find like-minded people on Twitter.
* The 100 million word British National Corpus: a representative sample of spoken and written British English in the late 20th century. This is incredibly useful for linguistics and language processing.
* Qwerly: Query a person’s social media handle and find all of their corresponding social media presences online. It helps you get a stronger sense of who a person is.
* IP to Demographic: Be smarter about the people who visit your website. Find the demographics of your visitors based on their IP addresses.
* Wikipedia Articles Abstract Search: Look up a term in Wikipedia and get general descriptions that contain that word. This helps people or machines instantly understand something.

We are still in beta with our new API calls, but you take a look at some of them yourself. We have bookmarked them with the tag “AwesomeAPIs”. They are here:
http://www.infochimps.com/awesomeapis

It doesn’t matter if you’ve ever written a MySQL query in your life. Find a data set you like and query it using our new API Explorer located on each data set that is accessible via our API. You’ll think it’s cool. We promise.

We are still in beta, so feel free to email me direct at michelle(at)infochimps.com should you have any questions or issues. Oh, and sign up for an API key. This will put you in the loop for when we launch more.

Infochimps Expands Advisor Presence Coast to Coast

It doesn’t matter if you are talking about Google, Facebook, Amazon, or the average startup working out of someone’s living room. The difference between the companies that remain merely great ideas and the ones that make significant impacts on society can be made with the addition of wise advisors and investors.

That’s why we’re excited to announce the addition of MFI Capital, Anduin Ventures and ff Asset Management to our rock star roster of investors. Tom Meredith from MFI Capital and Joe Lonsdale from Anduin Ventures will be advising the company, and John Frankel from ff Asset Management will be joining our board of directors.

Tom Meredith is a pillar of the Austin business community, having served as CFO of Dell and Motorola, and currently serving on the board of directors of Bazaarvoice, Motorola and others. Tom has a long and successful track record of shepherding technology companies through global growth.

Joe Lonsdale co-founded Palantir, one of the most advanced data analytics companies in the world. Joe is an active angel investor and philanthropist and is currently CEO of Addepar. The addition of Anduin Ventures, along with investor Draper Associates, give us a toehold in Silicon Valley.

John Frankel, a 20-year Goldman Sachs veteran based in NYC, currently serves on the board of directors of Infochimps, Klout and others. We’re thrilled to be joining his portfolio which includes Hashable, Livefyre, Parse.ly and Klout. John has a strong grasp of what our mission is here at Infochimps. He states, “The move from analogue to digital is creating vast amounts of accessible data, albeit in confusing formats and often hard to find. If only there was a solution that structured the world’s data, democratized access to it, and made shopping for it as simple as using Amazon. Oh wait, just a minute, there is! That is why we are excited to be part of the Infochimps story.”

We couldn’t be more excited about where we are, where we’ve come from, and where we’re going. Tom, Joe and John, along with our lead investors DFJ Mercury, an incredible team, will help us continue to provide you with the most data in one place on the web!

Introducing the Infochimps Query API

Infochimps is pleased to announce the release of our Query API in public beta today. As part of our ongoing effort to democratize access to structured data, the Infochimps Query API offers several calls that allow you to analyze a prodigious amount of Twitter data dating back to 2006. Our current operational calls include the following:

Trstrank

Trstrank uses an algorithm similar to Google Page Rank to generate a numerical rank that indicates the amount of influence a particular user has. This is a much more robust way to determine a Twiter user’s influence than by their number of followers alone.

Wordbag

Wordbag enables you to discover what a specific Twitter user finds interesting. After entering the handle of a specific Twitter user, Wordbag generates a list of words unique to that Twitter user.

Influencer Metrics

Influencer Metrics measures the number retweets, mentions, and @replies that a specific Twitter user has. Retweets and mentions can indicate the value the Twitter community gives to the tweets of a specific user. Coupling Trstrank with Influencer Metrics provides a particularly powerful way to gauge the influence of a Twitter user.

The potential applications of our API are limited only by the imagination. We hope market researchers, brazen self-promoters, statisticians, sociologists, cultural anthropologists, linguists, and all the curious Georges out there will find it as compelling as we do.

Looking to the future, our development team will be constantly polishing and updating the API. Follow @infochimps on Twitter for announcements. We received many requests on our private beta for more frequent refreshments of our data and fuller coverage.  Our next update will do just that. We have additional API calls percolating, including one that will allow you to discover close-knit interactions between Twitter users and see the level of interaction between them.

For features and pricing, including our totally free package, the Baboon, click here.

Partner with us

2009 was a great year for us.  We made lots of progress on the website (with a long way to go), but we were especially excited for all the great contacts we made with other developers and companies interested in data.  We strongly encourage all of our followers (you!) to get in touch with us to talk about your expertise and data needs.

We will create a page on the site soon which lists our network of data mechanics, data tools, and solution providers.  One of the issues with our site is that many of the datasets can’t be used by everybody – some are too large for Excel and average tools, and others require specialized skills in order to use.  Our TAKS and Twitter datasets are just some examples of datasets that can be really powerful for a lot of businesses only after an expert has had the chance to analyze them.

Here are a few of the great companies who have worked with us so far:

QVApps: QVApps is a marketplace for QlikView applications.  QlikView is a business intelligence software that supports third party applications.  Having trouble understanding your AWS reports?  QVApps has a free AWS report analyzer that we’ve found useful.

UPDATE: Check out QVApps’ slideshow on some of the data from our Twitter Census! See below for the imbedded slideshow.

Data Applied: Data Applied’s application is truly magic.  Their software is putting the power of machine learning into the world’s hands.  Techniques and algorithms that people wrote Phd. thesis’s on 20 years ago are here at the click of a mouse.  Try them out with a free account.

DataMiningTools.net: A startup based in India, DataMiningTools.net is doing a wonderful job working to educate the masses on data mining tools and resources.  Find tutorials on clustering analysis, R, Matlab – you name it.  Check out videos on Data Applied and your very own Infochimps!

If you are a Data Mechanic, another data company, or just interested in being listed as a solutions provider, please get in touch with me at joe@infochimps.org.  Likewise, if you’re a Ruby/Rails developer, we’re hiring!

The data landscape online, as we see it. Part 1

Nathan at FlowingData did a wonderful job last week culling 30 great resources from the world wide web for finding data. Yesterday another site launched – Factual, making great resource number 31. We are excited to see a growing number of companies spring up that in turn increase everyone’s access to data. Solving the problems with data online is no small task fit for any single player. It’s a team effort, which we are proud to be a part of.

We thought we would take a minute today to talk about the problems as we see them, and how players within the online data market are choosing to tackle these problems.

The first problems are finding and sharing data. Most of these sources already solve this problem. Socrata and Factual let users upload data onto their sites, and each company’s datasets are easily searchable along with what’s on Data.gov and Numbrary.

There are also other, more technical issues. Swivel, Socrata, Factual, Many Eyes – all of these websites allow users to play around with data live on the site. This opens up costly issues for the hosting company.

1. The data has to live in their platform and reconcile with the whole.
2. Many new datasets are on the order of gigabytes in size.

Whereas datasets on Infochimps can be of any size, format, or shape, their datasets must be in a standard csv/tsv/xls format and are limited to a few hundred megabytes. In reality, statisticians want data in .sas formats, and geographical data comes in .gis formats. Because of the larger size of today’s datasets, tools within a browser will be insufficient to work with and understand the data, and a person’s options for distributing that data are also limited.

Data, especially valuable data, is often proprietary. The owners of that data won’t release it unless there are clear licenses and terms of use. We differ from these other open data players in our commitment to host open data for free and maintain our open data commons for everyone’s benefit, but we will also host licensed data. Unfortunately, open data doesn’t include all of the data in the world. Instead, what we offer organizations is the ability to permit only users that have agreed to a license or paid for access to download their data. As the data marketplace grows, we believe more and more buyers will realize the value proposition in looking for data on Infochimps. Our aim is to give incentive to the long tail of businesses with data gathering dust on hard drives that could otherwise be useful to another person or organization.

Calling all Pollsters

Carl Bialik, from the WSJ Numbers Guy blog highlighted the recent controversy in the opinion polling industry over Strategic Vision’s choice to not share their polling methodology or raw data.  Pollster.com and FiveThirtyEight have also weighed in on the problem.

Our message to opinion polling firms is this: share or sell your data on Infochimps.org.

Free, public polls can be distributed for free on our site.  If you’d like to charge for the download of your data, set your own price. Your data will live in a place where the whole world can find it, bringing you a larger and broader audience.

Get in touch at upload@infochimps.org.

New site is live

Thanks to everyone new that’s come by. We appreciate the coverage from www.gigaom.com and others. We thought we’d spend a moment to cover what we hope to accomplish from this launch.

With this launch anyone can edit or add datasets to the site. Very soon, uploading will work and we can host and distribute open licensed datasets for free. These are our steps towards building an open data commons.

Additionally, this new site offers a few datasets for sale. These datasets are not ours, but owned by others. We make a commission on the sale of these datasets. An example is the TAKS dataset, which contains all of the test scores data for students in the state of Texas on standardized tests. This dataset has cost one particular researcher $1400 to free from the government coffers, and the format it came in was awful. On Infochimps you can find the same dataset but in a cleaned up format, and for a much lower price – $15.

We consider this marketplace offering an incentive to the world of data gatherers to put their data somewhere others can find it. By letting people charge for their data, we encourage data to come out of the woodwork that might otherwise remain behind closed doors.

We hope you enjoy playing around with the site. If you are excited to send data our way before we get upload working, please get in touch: upload@infochimps.org.

Infochimps receives a donation from SmartBear

Smart Bear Software is an Austin-based company whose founder, Jason Cohen, is one of our favorite people.  Jason grew Smart Bear from the ground up, and he has helped the Infochimps team in the past with practical advice.  Jason blogs about marketing and small business at http://blog.asmartbear.com/ and he is well worth reading.  

The Infochimps rely on agile methods for the building of Infochimps.org, a process which can benefit from a code review tool.  Smart Bear’s product, Code Collaborator, is a well-known online peer code review tool that simplifies and expedites code reviews, helping teams produce higher-quality, tested and done code more efficiently.

Smart Bear’s latest promotion offered 5 seats of one of their code review tools for $5.  As a part of this promotion, they selected a start-up company to receive the funds collected from the promotion.  Infochimps won!  Smart Bear has graciously donated $2220 to Infochimps to help our mission of increasing the world’s access to data.  We appreciate their acknowledgment of our work and we know we can put the funds to good use.

To see how we reacted to the news, check out the video below:

[youtube=http://www.youtube.com/watch?v=ZLtR8_qw_yM&hl=en&fs=1&]

What's New

Infochimps has been acknowledged as a finalist by the Capital Factory for 2009.

Infochimps is also a finalist in PepsiCo’s pitch competition.

Infochimps has a Facebook page! Become a fan.

Katherine at The New Civilization is aiding us in UX design for our Beta, to be launched at the end of May.  Eve Simon in Washington DC is helping us with the site design.  Our two big goals for the Beta are:

1) Improved browseability of the datasets, including a search bar and better surfing through tags, categories, and collections.

2) Uploading capability.  Users will be able to create accounts and upload datasets, as well as edit the descriptions of other data on the site.

Drop us a line anytime at info@infochimps.org