Monthly Archives April 2012

52 Billion Chickens

As you enter your weekend, consider this, human beings are outnumbered by lots of creatures in this world, including ants, which Harvard biologist and ant expert, Edward O. Wilson claims outnumber us one million to one. I’d personally suspect we are also greatly outnumbered by numerous varieties of insects, arachnids, and in Austin, grackles.

Somewhat unsurprisingly, we are also outnumbered by chickens.  In 2009, we killed 52 billion chickens for food (to say nothing of the ones we kept alive).  Kind of makes you thankful they aren’t fighting back.

Happy Friday!

FoodforThought 4e09178d45006 w640 52 Billion Chickens

by NatGeo. Browse more data visualizations.


Finding Real Neighborhoods

eastvillage 1024x683 Finding Real Neighborhoods

The boundaries of a neighborhood can be a topic of hot contention. Look to a tourist guidebook, a real estate agent, and a local and you’ll get four about whether or not north of 14th Street still counts as “The Village” in NYC.  Livehoods, a project by the School of Computer Science at Carnegie Mellon University takes a social spin on answering these questions and uncovers some truly insightful data of neighborhood boundaries, relationships, activity levels, character, and more.


Livehoods offer a new way to conceptualize the dynamics, structure, and character of a city by analyzing the social media its residents generate. By looking at people’s checkin patterns at places across the city, we create a mapping of the different dynamic areas that comprise it. Each Livehood tells a different story of the people and places that shape it.

newjersey Finding Real Neighborhoods

One thing I found particular fascinating, though not wholly unexpected about the New York City map was the clustering of neighborhoods in New Jersey.  In NYC, with the relative proximity of… everything to everything, it’s not surprising to find that neighborhoods are small areas comprised of a tightly clustered businesses and homes.  In New Jersey, the “neighborhoods” span across a half dozen suburban towns in the same county.

Interested in experimenting with some Foursquare data yourself?  Check out our Foursquare Places API!

Drought Tracking and Texas’ Extreme Weather

Drought map Drought Tracking and Texas Extreme Weather

Living in Austin, TX, it was pretty obvious that last year with its record number of 100+ degree days without rain, thousands of square miles burned in wildfires, and billions lost on agriculture that we were in the middle of a serious drought. The impact across the state and throughout much of the South since October 2010 is staggeringly reviewed in this simple flipbook-style map from NPR.

The potential solutions to the problem are outlined in the Water Plan. It will be interesting to see how the continuation of this drought will affect job growth, home prices, population, and more throughout the state in the coming years.

Various plans for dealing with future droughts and growing demand for water in Texas exist, but most comprehensive — and accepted — is the state Water Plan. It offers a frank assessment of the current landscape, saying Texas “does not and will not have enough water to meet the needs of its people, its businesses, and its agricultural enterprises.” It predicts that “if a drought affected the entire state like it did in the 1950s,” Texas could lose around $116 billion, over a million jobs, and the growing state’s population could actually shrink by 1.4 million people.

Announcing Support for OpenStack and the Rackspace Cloud

Infochimps is happy to announce that we now support the next generation Rackspace Cloud, based on OpenStack. Through integration with the OpenStack API the Infochimps Platform can now power big data applications based in the Rackspace Cloud, expanding the reach of the Infochimps Platform and making the running of complex big data infrastructures quick and easy for a broader range of users.

Rackspace customers running the new OpenStack-based Rackspace Cloud Servers can quickly and easily spin up Hadoop clusters to power their big data applications in as little as 20 minutes with a single command using the Infochimps Platform. With the power of Ironfan, Infochimps’ open source provisioning tool, and Dashpot, Infochimps’ visualization and operations dashboard, customers can easily monitor and manage their Big Data operations on an ongoing basis, or leave it to Infochimps to manage it on the Rackspace Cloud for them.

Check out this demo of Infochimps Platform running in the Rackspace Cloud:

Why OpenStack and Rackspace?
From the beginning, the Infochimps Platform has been built on a foundation of open source tools for managing data, aimed at simplifying the experience of working with complex technologies such as Hadoop or Cassandra. Within the Infochimps Platform, Wukong, Ironfan and Swineherd are major open sourced components of the stack. OpenStack supports our open source tradition with its strong open source ecosystem. It is used by and contributed to by not only Rackspace, but organizations such as NASA, Canonical, RedHat, Dell, HP, and AT&T, so its architecture serves a multitude of needs, rather than bending to the whims of a single provider.

OpenStack also encourages standardization among Infrastructure as a Service providers, which ultimately benefits everyone in the market. Clients can make (and remake) decisions based on their businesses’ current day to day needs, without needing to employ a crystal ball to try to predict which provider will be best for them in the long term. By sharing open and standard interfaces, cloud providers can compete on current quality and value, instead of fighting to lock-in customers based on promises.

The modular design of OpenStack is part of what makes standards possible without blocking innovation. There are a set of core APIs that every provider will support, and extensions for added capabilities that not every provider will want to allow. The contracts these APIs provide can be (and often are) fulfilled by different back-end providers, letting each provider make different architectural choices without requiring customers to completely retool to take advantage of them. All of this allows apples-to-apples comparison of provider architectures, without making orange sales impossible.

What does OpenStack mean for Infochimps?
The work we’ve done to support this announcement has enabled us to provide a level of abstraction from the Amazon Web Services environment, and we can deploy our platform in a cloud agnostic way. Many of our customers have asked for implementations on their in-house cloud environments – our OpenStack support allows those implementations to be airlifted in using a common set of APIs that sit on top of whatever infrastructure already exists, instead of one-off installations that require more custom development and introduce brittleness.

Interested in learning more about Infochimps, Rackspace, and OpenStack? Contact us today for more information!

Announcing Dashpot, our Analytics & Operations Dashboard for the Infochimps Platform

Infochimps is happy to announce Dashpot, an easy-to-use analytics and operations dashboard that provides business metrics and visualization, cluster management capabilities, and system monitoring on top of the Infochimps Platform. Dashpot gives you real time visibility and control of your Big Data stack running with Infochimps, helping you go from input to insight faster, with our best-in-class Big Data infrastructure and tools.

Here are some of Dashpot’s key features:

  • Business Metrics – Dashpot’s in-stream visualization provides business users with the ability to capture and visualize business metrics on the fly as data is being ingested into their Infochimps Platform. By enabling data to be decorated in-stream through our Flume-based Data Delivery Service, Infochimps enables quick introspection on how a data or business process is performing. Organizations can view spikes or drops in key system or business metrics in near real-time, enabling quicker response to changing business conditions, saving time and helping ensure higher quality and more valuable information in the organization’s ultimate datastore. Infochimps business metrics are designed to provide an intermediate data visualization capability in conjunction with an organization’s existing investments in traditional business intelligence solutions.
  • Cluster Management – Built on the power of Ironfan, Dashpot offers simple Big Data system automation and management with a quick glance view into the servers and clusters currently running. Operations users can easily spin them up and down with a simple button click as their processing needs change, creating significant, easy-to-attain cost savings in machine usage.
  • Systems Monitoring – Dashpot provides integration with popular monitoring packages to provide users with at-a-glance views on Big Data system performance, availability, system integrity and more. Designed to easily integrate with any monitoring product, Infochimps has implemented the popular open source product, Zabbix as its initial reference monitoring solution, integrating Zabbix graphs on system performance and availability in the Infochimps Dashpot dashboard.

Implementing and operating Big Data architectures can be difficult, requiring significant investment of resources and time. By choosing to use the Infochimps Platform, enterprises needn’t worry about the time and hassle of building and maintaining their own infrastructure. When combined with our tools, such as Ironfan and DDS, Dashpot’s simple visualizations and management tools help organizations keep their Big Data system humming, with little operational overhead. Best of all, Dashpot’s in-stream visualizations help provide the insights businesses need to get the most value out of their Big Data infrastructure investment.

Interested in talking about how we can help simplify your Big Data stack?  Contact us today for more information!

Stress Awareness Month

Apparently, April is Stress Awareness Month. Personally, I’m always aware of my stress, but this infographic does offer some interesting stats on our stress and nice reminders of how to let it go.

AShortBrainyTaleforStressAwarenessMonthApril2012 4f74b24578b7a w630 Stress Awareness Month

Browse more Health infographics.

Social Activism and What It Means For Your Company

socialactivism Social Activism and What It Means For Your Company

According to research conducted by Column Five, TBWA, and Take Part, social activism is on the rise with an increasing number of young adults (ages 20-28).  And we’re talking about more than just posting and commenting on Facebook here, folks in this age group are actually taking real action in ways companies and organizations should be aware of.  Decisions around employment, shopping, and sentiment are largely influenced by a company’s support of social causes that align with the causes that this group cares about.

So what issues matter most to young adults?

socialactivism8issues Social Activism and What It Means For Your Company

The above are simply the top 8 found in this study; however, it’s worth a deep dive for your industry and customer segments to understand how the issues you support can affect your customers’ support for you.



Big Data Love is Back!

bigdatalove Big Data Love is Back!Join us for our first Big Data Love Happy Hour of 2012. Now that we’ve completed the big announcement of our new Infochimps Platform and things have calmed down post-SXSW, we are looking forward to reconnecting with our local community.  So, if you’re in town this Thursday, swing by our office at 1214 W. 6th Street between 6:30pm and 9:30pm for some brews and nerdery.  We’re looking forward to the hang time!