Monkey Business

Take a Tour of Our Big Data Platform

Sometimes, when we are trying to explain what Infochimps does, it can be tough to help folks understand the total package. To help with this, we put together a tour of the Infochimps Platform. Now, you can discover how we can work with your team to take data from the sources you need, make it useful, and deliver the insights you need to improve your business. Check it out!

chimpworld Take a Tour of Our Big Data Platform

How We Do It

this is how we do it How We Do ItInfochimps uses many cutting edge tools (Chef, Amazon Web Services, Hadoop, Hbase, ElasticSearch, Flume, MongoDB, Phantom.js, etc. ad nauseum), and we’ve written a number of custom tools to help corral these sometimes wild horses into a working team. Ironfan, our Chef specialization for big-data in the cloud, coordinates the installation and configuration of the many necessary components. Wukong is our Ruby library for Hadoop, combining the flexibility of JRuby with the raw power of MapReduce. Wonderdog is our Hadoop interface to ElasticSearch, allowing us to deliver large amounts of data quickly into a stable and searchable NoSQL data stores. Swineherd, the workflow engine for Hadoop jobs, helps tie all of this together into a coherent framework for running multi-stage data ingestions.

To crib a DevOps aphorism, however, it’s not the technology that makes Infochimps work: it’s the culture. Specifically, it’s about culture that keeps the challenges from all that novel technology manageable.


Announcing Support for OpenStack and the Rackspace Cloud

Infochimps is happy to announce that we now support the next generation Rackspace Cloud, based on OpenStack. Through integration with the OpenStack API the Infochimps Platform can now power big data applications based in the Rackspace Cloud, expanding the reach of the Infochimps Platform and making the running of complex big data infrastructures quick and easy for a broader range of users.

Rackspace customers running the new OpenStack-based Rackspace Cloud Servers can quickly and easily spin up Hadoop clusters to power their big data applications in as little as 20 minutes with a single command using the Infochimps Platform. With the power of Ironfan, Infochimps’ open source provisioning tool, and Dashpot, Infochimps’ visualization and operations dashboard, customers can easily monitor and manage their Big Data operations on an ongoing basis, or leave it to Infochimps to manage it on the Rackspace Cloud for them.

Check out this demo of Infochimps Platform running in the Rackspace Cloud:

Why OpenStack and Rackspace?
From the beginning, the Infochimps Platform has been built on a foundation of open source tools for managing data, aimed at simplifying the experience of working with complex technologies such as Hadoop or Cassandra. Within the Infochimps Platform, Wukong, Ironfan and Swineherd are major open sourced components of the stack. OpenStack supports our open source tradition with its strong open source ecosystem. It is used by and contributed to by not only Rackspace, but organizations such as NASA, Canonical, RedHat, Dell, HP, and AT&T, so its architecture serves a multitude of needs, rather than bending to the whims of a single provider.

OpenStack also encourages standardization among Infrastructure as a Service providers, which ultimately benefits everyone in the market. Clients can make (and remake) decisions based on their businesses’ current day to day needs, without needing to employ a crystal ball to try to predict which provider will be best for them in the long term. By sharing open and standard interfaces, cloud providers can compete on current quality and value, instead of fighting to lock-in customers based on promises.

The modular design of OpenStack is part of what makes standards possible without blocking innovation. There are a set of core APIs that every provider will support, and extensions for added capabilities that not every provider will want to allow. The contracts these APIs provide can be (and often are) fulfilled by different back-end providers, letting each provider make different architectural choices without requiring customers to completely retool to take advantage of them. All of this allows apples-to-apples comparison of provider architectures, without making orange sales impossible.

What does OpenStack mean for Infochimps?
The work we’ve done to support this announcement has enabled us to provide a level of abstraction from the Amazon Web Services environment, and we can deploy our platform in a cloud agnostic way. Many of our customers have asked for implementations on their in-house cloud environments – our OpenStack support allows those implementations to be airlifted in using a common set of APIs that sit on top of whatever infrastructure already exists, instead of one-off installations that require more custom development and introduce brittleness.

Interested in learning more about Infochimps, Rackspace, and OpenStack? Contact us today for more information!

Happy Holidays!

As most of the chimps have retreated back to the wilds of such foreign lands as Long Island, Cleveland, and Round Rock, things will be pretty quiet around here until Tuesday, December 27.  If you need us, we’ll still be around to pick up a bananaphone or answer emails.  Until we meet again, enjoy this Richard Feynman documentary…

Meet Jim the Monkey + Other Website Updates

monkey jim Meet Jim the Monkey + Other Website UpdatesMeet Jim the Monkey, the friendly greeter on our newly redesigned sign up page. Coincidentally, he shares a name with our new Director of User Experience, Jim England who has busily been improving key areas of  As you may recall, Jim (the human), formerly of Keepstream, joined just a few months ago and had already made some huge headway in making our site more user friendly, easier to navigate and just a wee bit cuter with the addition of Jim (the monkey).

Whether you’re a new visitor or a long-time fan of Infochimps, we’d love to know what you think of the changes we have underway!

Leave us a comment, send us a tweet, or send us an email with your thoughts!

New Header

header notloggedin Meet Jim the Monkey + Other Website Updates
header loggedin Meet Jim the Monkey + Other Website Updates
Our new header compresses the best elements of our old one into a sleeker, easier to navigate design.  The upper part in lighter grey now holds our search bar as well as our key navigational elements.  The lower part in darker grey helps users navigate to our most popular API offerings, as well as access their account.  Bonus – when you’re logged in, the dark grey bar becomes our account navigator with quick links to your profile, API dashboard (complete with usage charts) and account settings.


Gigantopithecus & Other Huge (Data) Apes

Gigantopithecus.7834526 std Gigantopithecus & Other Huge (Data) ApesMeet Gigantopithecus.

… or at least a life-like, artist rendering of the now 100,000 year extinct* giant ape. Based on the few fossils that have been unearthed, our best guess is that Gigantopithecus stood at about 10 feet tall and weighed in at 1200 lbs.  Fossils come few and far between and much about this creature remains unknown due to lack of complete data.

What does this have to do with huge data sets?

Once upon a time, huge data sets were hard to find and even harder to download, cleanup and analyze.  Mythical beasts, such as historical records of Twitter users & conversations or the mapping of the Human Genome, proved difficult to locate, let alone interact with in the wild.

We receive tons of emails from folks looking for large data sets.  Whether they’re looking to test a new algorithm or storage solution or simply want to try some new tools, requests for 100+ MB data sets are common and we’ve pulled together a list of our biggest and best.

Check out our huge data sets, including one that’s 150 TB!

* Yes, this means that we likely cohabited the Earth with these guys… and still may according to some cryptozoologists. 

Open a banana like a Monkey does

Open a Banana like a Monkey – most human primates do it wrong!


To go with open banana here is open banana data: