Announcing Dashpot, our Analytics & Operations Dashboard for the Infochimps Platform

Infochimps is happy to announce Dashpot, an easy-to-use analytics and operations dashboard that provides business metrics and visualization, cluster management capabilities, and system monitoring on top of the Infochimps Platform. Dashpot gives you real time visibility and control of your Big Data stack running with Infochimps, helping you go from input to insight faster, with our best-in-class Big Data infrastructure and tools.

Here are some of Dashpot’s key features:

  • Business Metrics – Dashpot’s in-stream visualization provides business users with the ability to capture and visualize business metrics on the fly as data is being ingested into their Infochimps Platform. By enabling data to be decorated in-stream through our Flume-based Data Delivery Service, Infochimps enables quick introspection on how a data or business process is performing. Organizations can view spikes or drops in key system or business metrics in near real-time, enabling quicker response to changing business conditions, saving time and helping ensure higher quality and more valuable information in the organization’s ultimate datastore. Infochimps business metrics are designed to provide an intermediate data visualization capability in conjunction with an organization’s existing investments in traditional business intelligence solutions.
  • Cluster Management – Built on the power of Ironfan, Dashpot offers simple Big Data system automation and management with a quick glance view into the servers and clusters currently running. Operations users can easily spin them up and down with a simple button click as their processing needs change, creating significant, easy-to-attain cost savings in machine usage.
  • Systems Monitoring – Dashpot provides integration with popular monitoring packages to provide users with at-a-glance views on Big Data system performance, availability, system integrity and more. Designed to easily integrate with any monitoring product, Infochimps has implemented the popular open source product, Zabbix as its initial reference monitoring solution, integrating Zabbix graphs on system performance and availability in the Infochimps Dashpot dashboard.

Implementing and operating Big Data architectures can be difficult, requiring significant investment of resources and time. By choosing to use the Infochimps Platform, enterprises needn’t worry about the time and hassle of building and maintaining their own infrastructure. When combined with our tools, such as Ironfan and DDS, Dashpot’s simple visualizations and management tools help organizations keep their Big Data system humming, with little operational overhead. Best of all, Dashpot’s in-stream visualizations help provide the insights businesses need to get the most value out of their Big Data infrastructure investment.

Interested in talking about how we can help simplify your Big Data stack?  Contact us today for more information!

Announcing the Infochimps Platform for Big Data

homepage new cropped1 Announcing the Infochimps Platform for Big Data


The Age of Big Data
Readers of this blog are no strangers to the problems that Gartner declares to be the hallmarks of our age of Big Data – volume, variety, and velocity. Nor would I consider Infochimps community members dark to the fact that there are tons and tons of wealth contained in the world’s data, both internal and external to the organization.

What’s rarely admitted, however, is how difficult it can be to wrangle these data sets and operate the systems to process them. Running Hadoop and other distributed data architectures in the cloud is still a massive challenge, something typically managed by the data and operations elite. The demand for data science talent is growing and growing, setting salaries for these skilled individuals to ranges only the wealthiest enterprises can afford.

The Vision Behind Infochimps
When Infochimps was born, the co-founders set out with a mission that was deceptively simple – increase access to the world’s data. We understood that one of the first things that made this hard for people was actually finding the data, as search engines don’t really work for tables and spreadsheets. The Infochimps catalog was born, and from that the Infochimps Data Marketplace as a way to incentivize content providers to make their data more open and available.

The Data Marketplace has been wonderfully successful. Hundreds of thousands of visitors have downloaded data from our catalog of over 15,000 data sets sourced from over 200 suppliers, including Bundle, Foursquare, and Twitter. Thousands of application developers from the likes of Sheckys, Summify, and Crimson Hexagon, have leveraged our data to make their apps more rich and compelling.

But we’ve always known that it’s not enough. Raw data is just the fuel. Without an engine to make it into something productive for the individual or organization, it’s doomed to not live up to its promise.

A Platform to Solve Our Own Problems
How do you get the world’s data to live in one place? This is no simple problem. Every day you’re dealing with the three major challenges quoted above. Some data sources update weekly, some by the minute, and others stream data to you at many GB’s per hour. Data can come in a tabular format, a JSON string, or a giant blob of text. Not to mention the sheer volume of sources and data you’re faced with warehousing.

From the beginning, Infochimps has used Amazon Web Services (AWS), Hadoop, and a number of other Big Data technologies to source and aggregate the world’s data. Faced with the resource and personnel constraints of a typical startup, we began with a simple best-effort design approach, allowing our small team of data engineers to get away with moving massive cloud resources around with minimal effort. We developed Wukong to make it easy for our Ruby developers to run Hadoop jobs, and extended Chef into Ironfan (formerly known as Cluster Chef) to make the instantiation and management of our infrastructure so simple our engineers can “move cities with their minds.”

Google rocked the world when it released its Map Reduce paper, inspiring what became Hadoop, and allowing the rest of the world to take advantage of the tools it developed for its own data gathering efforts. In a similar vein, it is our hope that the release of our own internal technologies as a Platform product may help the world’s organizations to gather and manage the world’s data for their own purposes.

Context – the Next Level
A recent New York Times article featured some of the analytics done by Target, where marketers there had been able to figure out that a woman was pregnant based on her purchase patterns. This type of insight is remarkable and only marks the beginning of what’s to come as all our purchases, clicks, and check-ins are tracked and analyzed. Organizations will be able to take this only so far; however, if they restrict their imaginations to just their own data.

The next big leap for the world’s organizations will be how they use all of these new and developing information streams – from Google search traffic, tweets, 100 years of weather measurements, check-ins, and UFO sightings. In the financial world, researchers have demonstrated that Google search query data can predict inflation metrics, weeks before the official numbers come out. Ecommerce websites have long used data like our IP-Geolocation to personalize web experiences to increase conversions.

The Infochimps Data Marketplace has helped us all appreciate the breadth of data the world has to offer. Now, we can help those organizations that want to use this data to find insight, increase revenues, and cut costs.

Interested? Want to know more?
The Infochimps Platform is made up of a suite of technologies we’ve developed internally, plus a number of open source software that we’ve developed tools and techniques for managing. The Platform comes with the brains and experience of the brilliant Infochimps team in order for you to maximize your return on a Big Data infrastructure investment.

For more information about the Platform, please use our contact form here.

We are excited to hear from you!

Winner of the Strata 2012 Conference Pass

Strata data conference Winner of the Strata 2012 Conference PassThanks to the random number generator, we’ve selected a winner amongst the folks who entered.  Congrats to #22 aka Nicolas Thiébaud.  And we swear… it’s not because he promised us French pastries, though we are excited for the rising Hadoop community in his home country!

We’ll see you at Strata!

Infochimps at Strata Conference 2012

strataheader Infochimps at Strata Conference 2012

We’re excited to have our CTO, Flip Kromer presenting a talk at Strata Conference in Santa Clara later this month.  The discussion centers around disambiguation.  Now you might be wondering… what is disambiguation?  Simply put, disambiguation is the process of resolving conflicts to remove ambiguity.  We’ve discussed this topic a number of times in this blog and Flip will be presenting on how this concept affects the way we ask questions and find answers about Big Data.

For more details on the talk, check out the Strata schedule.

Same awesome data, Sweet new website

homepage Same awesome data, Sweet new website

As an early Christmas present to ourselves, we’ve introduced a sweet new website meant to help our site visitors more easily navigate to the data products and solutions they need.  In the new design, we highlight our top data products: Social, Geo and Data Marketplace (where you can still access over 15,000+ downloadable data sets and APIs), as well as the data expertise we can bring to table.

Take a peek around the site and let us know what you think.  We’ve got more updates and changes in store over the next few months and we’d love your direct feedback as we iterate towards awesomeness.

Once a Chimp, Always a Chimp

ducoff nick Once a Chimp, Always a ChimpHaving had the privilege to be involved with Infochimps since its founding in the summer of 2009, and having led the company for the last year as CEO, it is with mixed feelings to announce that I will be reducing my day-to-day responsibilities with the company. In the interim, my co-founder Joe Kelly will be taking the reins. Having worked with Joe since the beginning, I know it will be a smooth hand off and the company will be in good hands as we expand. I will continue to be involved with Joe and the team as a Board Advisor.

In my time as CEO, we closed two rounds of financing, grew a tremendous user base, and built a best in class engineering team, including those that joined us through our acquisition of Keepstream. Our data catalog now boasts over 200 suppliers including Twitter and Foursquare, and with over 10,000 customers we’re well on our way toward our mission of democratizing access to data.

I’m excited to take what I’ve learned at Infochimps and all the friends I’ve made and apply it to something new and exciting. I look forward to what’s next, but am equally excited to continue to help the Infochimps team build the best data company in the world!

Transitioning to Lean at Infochimps

Two nights ago, my fellow chimps, Dhruv Bansal, Tim Gasper and I gave a presentation at the Austin Lean Startup Circle on the company’s recent transition to lean. We discussed our switch to a lean product strategy driven by must-have customer problems and the lean concepts and tools we have used to get there. It’s chock full of insights, struggles and great ideas for startups looking to adopt the Lean methodology.

For a version with full audio, check out it out on Posterous.

Become a Chimp… We’re Hiring!

office monkey set Become a Chimp... Were Hiring!

Do you love accessing cool data but hate scraping, cleaning and parsing it all day long? Apparently so do a lot of people! Come work for us and be a hero to developers everywhere who just want an easy place to access the data the want.  Check out our current open positions: Architect, Data Engineer, Data Scientist, Head of Marketing.

Here are just a few of the great things about working at Infochimps:

  • A world class team of friendly people eager to tackle hard problems
  • Ask around, we have one of the finest data science and scalable backend teams in the world
  • Convenient location in downtown Austin, a city ranked Kiplinger’s #1 city for the next decade and Forbes #1 best bargain city
  • Delish lunches brought in everyday, free for employees
  • All the bananas you can eat
  • Competitive salary and options
  • Health insurance benefits, fully paid for employees
  • If you want to be part of our team, please send a resume and details about why you would be excited to work at Infochimps to

Look forward to hearing from you. Please feel free to let us know if you have any questions!

Meet Jim the Monkey + Other Website Updates

monkey jim Meet Jim the Monkey + Other Website UpdatesMeet Jim the Monkey, the friendly greeter on our newly redesigned sign up page. Coincidentally, he shares a name with our new Director of User Experience, Jim England who has busily been improving key areas of  As you may recall, Jim (the human), formerly of Keepstream, joined just a few months ago and had already made some huge headway in making our site more user friendly, easier to navigate and just a wee bit cuter with the addition of Jim (the monkey).

Whether you’re a new visitor or a long-time fan of Infochimps, we’d love to know what you think of the changes we have underway!

Leave us a comment, send us a tweet, or send us an email with your thoughts!

New Header

header notloggedin Meet Jim the Monkey + Other Website Updates
header loggedin Meet Jim the Monkey + Other Website Updates
Our new header compresses the best elements of our old one into a sleeker, easier to navigate design.  The upper part in lighter grey now holds our search bar as well as our key navigational elements.  The lower part in darker grey helps users navigate to our most popular API offerings, as well as access their account.  Bonus – when you’re logged in, the dark grey bar becomes our account navigator with quick links to your profile, API dashboard (complete with usage charts) and account settings.


Bringing You a Bundle of Spending Trends Data

Bundle icon Bringing You a Bundle of Spending Trends DataThanks to an exciting new partnership with, we are proud to introduce a whole slew of exciting new APIs about spending trends.  Bundle uses aggregated, anonymized spending data from 20 million Visa and Mastercard customers to derive a slew of useful information on restaurants, bars and shops.  With the APIs they’ve made available on, you can get detailed information, including location, loyalty score, average transaction amount, neighborhood, loyalty ranking, similar businesses and more about merchants in New York and San Francisco.  (There are plans to expand to more cities in the future.)

paper dollars Bringing You a Bundle of Spending Trends Data

What can you do with this powerful information?  How about adding some real-life sales comparison data on your local competitors to your company’s analytics dashboard?  Building an app that appeals to bars with loyal customers and average tabs of over $50 and want to do some customer prospecting?  Creating a travel guide based on real-life customer behavior and not subjective opinions?  The possibilities with this data are boundless; we’re excited to see what you come up with!

Full list of APIs

Merchant Search API
Measure merchant loyalty and average transaction amount for thousands of San Francisco and New York Metropolitan Area bars, shops, and restaurants. Build cool apps that show how people really spend their money and what places have the most loyal clientele. The API pulls from’s extensive merchant data.

Detailed Merchant Search API
For a given merchant, get back its location, loyalty score, average transaction amount, neighborhood, and loyalty ranking by zip, neighborhood, and city. The API pulls from Bundle’s extensive spending and loyalty data for the San Francisco and New York Metropolitan Areas. Find merchant slugs by using the Merchant Search API by

Neighborhoods API
Use this free API to find a list of neighborhoods and their geo-coordinates in the San Francisco and New York City Metropolitan Areas. What is exciting is that with the Merchant Search API by, you can find out about the buying habits of people in specific San Francisco and New York City Neighborhoods.

Metros API
Use this free API to get a list of available Metropolitan areas and their geo-coordinates. At the moment, you can find information on the San Francisco and New York City Metropolitan areas. In combination with the Merchant Search API by, this API allows you to gain insights on customer loyalty and spending habits.

Merchant Categories API
Use this free API to get a list of merchant subcategories for merchants in the New York City and San Francisco Metropolitan Areas. Subcategories fall under Food & Drink, Shopping, Travel & Leisure, Health & Family, House & Home, and Getting Around. Use this API in combination with the Merchant Search API by to gain insight on customer loyalty and spending.

Related Merchants API
The Related Merchants API recommends bars, restaurants, and stores based on a user’s favorite merchant in the San Francisco and New York Metropolitan Areas. The API pulls from’s extensive merchant similarity data. It is based on actual anonymous credit card transactions from Citi®.