Why Geeks Win
- May 16, 2012
We found this great little chart on Chart Porn today and thought it was an excellent representation of the foundations of our company. Yay, geeks!

We found this great little chart on Chart Porn today and thought it was an excellent representation of the foundations of our company. Yay, geeks!

When you think Big Data, the first words that come to mind are often Hadoop and NoSQL, but what do these technologies actually mean for your business? Different Big Data technologies have different use cases where they work best. For your real-time Big Data challenges often a very different class of tools must be implemented.
In this free white paper, we’ll explore:

When it comes to predicting the future, your best resource (short of a soothsayer) is historical data. As data collection, storage and processing has become more sophisticated, the volume of data has exploded. A recent article in the McKinsey Quarterly, states that in the US, across most business sectors, companies with more than 1000 employees store, on average, over 235 terabytes of data – more data than contained in the entirety of the US Library of Congress.
What does this mean? It means that companies are sitting on a goldmine of insights for competitive advantage. The McKinsey Quarterly article mentions this example:
The top marketing executive at a sizable US retailer recently found herself perplexed by the sales reports she was getting. A major competitor was steadily gaining market share across a range of profitable segments. Despite a counterpunch that combined online promotions with merchandizing improvements, her company kept losing ground.
When the executive convened a group of senior leaders to dig into the competitor’s practices, they found that the challenge ran deeper than they had imagined. The competitor had made massive investments in its ability to collect, integrate, and analyze data from each store and every sales unit and had used this ability to run myriad real-world experiments. At the same time, it had linked this information to suppliers’ databases, making it possible to adjust prices in real time, to reorder hot-selling items automatically, and to shift items from store to store easily. By constantly testing, bundling, synthesizing, and making information instantly available across the organization—from the store floor to the CFO’s office—the rival company had become a different, far nimbler type of business.
The amount of data we produce is staggering and the underlying possibilities are incredible, but that doesn’t necessarily mean companies have the ability to extract true value from their data.
Looking to understand how Big Data can revolutionize how your organization does business? Sign up for a free Big Data consultation with some of our leading data scientists to get started today!

A recent article from The Atlantic explores how Big Data has revolutionized the dairy industry. In the past sixty years, through innovations in dairy science, milk production from an individual dairy cow has gone up from an average 5,000 pounds of milk in a lifetime to 21,000 pounds of milk. This astonishing increase has largely been fueled by data-driven predictions that allow dairy breeders to optimize their herds.
Dairy breeding is perfect for quantitative analysis. Pedigree records have been assiduously kept; relatively easy artificial insemination has helped centralized genetic information in a small number of key bulls since the 1960s; there are a relatively small and easily measurable number of traits — milk production, fat in the milk, protein in the milk, longevity, udder quality — that breeders want to optimize; each cow works for three or four years, which means that farmers invest thousands of dollars into each animal, so it’s worth it to get the best semen money can buy. The economics push breeders to use the genetics.
As you enter your weekend, consider this, human beings are outnumbered by lots of creatures in this world, including ants, which Harvard biologist and ant expert, Edward O. Wilson claims outnumber us one million to one. I’d personally suspect we are also greatly outnumbered by numerous varieties of insects, arachnids, and in Austin, grackles.
Somewhat unsurprisingly, we are also outnumbered by chickens. In 2009, we killed 52 billion chickens for food (to say nothing of the ones we kept alive). Kind of makes you thankful they aren’t fighting back.
Happy Friday!
The boundaries of a neighborhood can be a topic of hot contention. Look to a tourist guidebook, a real estate agent, and a local and you’ll get four about whether or not north of 14th Street still counts as “The Village” in NYC. Livehoods, a project by the School of Computer Science at Carnegie Mellon University takes a social spin on answering these questions and uncovers some truly insightful data of neighborhood boundaries, relationships, activity levels, character, and more.
Livehoods offer a new way to conceptualize the dynamics, structure, and character of a city by analyzing the social media its residents generate. By looking at people’s checkin patterns at places across the city, we create a mapping of the different dynamic areas that comprise it. Each Livehood tells a different story of the people and places that shape it.
One thing I found particular fascinating, though not wholly unexpected about the New York City map was the clustering of neighborhoods in New Jersey. In NYC, with the relative proximity of… everything to everything, it’s not surprising to find that neighborhoods are small areas comprised of a tightly clustered businesses and homes. In New Jersey, the “neighborhoods” span across a half dozen suburban towns in the same county.
Interested in experimenting with some Foursquare data yourself? Check out our Foursquare Places API!
Living in Austin, TX, it was pretty obvious that last year with its record number of 100+ degree days without rain, thousands of square miles burned in wildfires, and billions lost on agriculture that we were in the middle of a serious drought. The impact across the state and throughout much of the South since October 2010 is staggeringly reviewed in this simple flipbook-style map from NPR.
The potential solutions to the problem are outlined in the Water Plan. It will be interesting to see how the continuation of this drought will affect job growth, home prices, population, and more throughout the state in the coming years.
Various plans for dealing with future droughts and growing demand for water in Texas exist, but most comprehensive — and accepted — is the state Water Plan. It offers a frank assessment of the current landscape, saying Texas “does not and will not have enough water to meet the needs of its people, its businesses, and its agricultural enterprises.” It predicts that “if a drought affected the entire state like it did in the 1950s,” Texas could lose around $116 billion, over a million jobs, and the growing state’s population could actually shrink by 1.4 million people.
Infochimps is happy to announce that we now support the next generation Rackspace Cloud, based on OpenStack. Through integration with the OpenStack API the Infochimps Platform can now power big data applications based in the Rackspace Cloud, expanding the reach of the Infochimps Platform and making the running of complex big data infrastructures quick and easy for a broader range of users.
Rackspace customers running the new OpenStack-based Rackspace Cloud Servers can quickly and easily spin up Hadoop clusters to power their big data applications in as little as 20 minutes with a single command using the Infochimps Platform. With the power of Ironfan, Infochimps’ open source provisioning tool, and Dashpot, Infochimps’ visualization and operations dashboard, customers can easily monitor and manage their Big Data operations on an ongoing basis, or leave it to Infochimps to manage it on the Rackspace Cloud for them.
Check out this demo of Infochimps Platform running in the Rackspace Cloud:
Why OpenStack and Rackspace?
From the beginning, the Infochimps Platform has been built on a foundation of open source tools for managing data, aimed at simplifying the experience of working with complex technologies such as Hadoop or Cassandra. Within the Infochimps Platform, Wukong, Ironfan and Swineherd are major open sourced components of the stack. OpenStack supports our open source tradition with its strong open source ecosystem. It is used by and contributed to by not only Rackspace, but organizations such as NASA, Canonical, RedHat, Dell, HP, and AT&T, so its architecture serves a multitude of needs, rather than bending to the whims of a single provider.
OpenStack also encourages standardization among Infrastructure as a Service providers, which ultimately benefits everyone in the market. Clients can make (and remake) decisions based on their businesses’ current day to day needs, without needing to employ a crystal ball to try to predict which provider will be best for them in the long term. By sharing open and standard interfaces, cloud providers can compete on current quality and value, instead of fighting to lock-in customers based on promises.
The modular design of OpenStack is part of what makes standards possible without blocking innovation. There are a set of core APIs that every provider will support, and extensions for added capabilities that not every provider will want to allow. The contracts these APIs provide can be (and often are) fulfilled by different back-end providers, letting each provider make different architectural choices without requiring customers to completely retool to take advantage of them. All of this allows apples-to-apples comparison of provider architectures, without making orange sales impossible.
What does OpenStack mean for Infochimps?
The work we’ve done to support this announcement has enabled us to provide a level of abstraction from the Amazon Web Services environment, and we can deploy our platform in a cloud agnostic way. Many of our customers have asked for implementations on their in-house cloud environments – our OpenStack support allows those implementations to be airlifted in using a common set of APIs that sit on top of whatever infrastructure already exists, instead of one-off installations that require more custom development and introduce brittleness.
Interested in learning more about Infochimps, Rackspace, and OpenStack? Contact us today for more information!
Infochimps is happy to announce Dashpot, an easy-to-use analytics and operations dashboard that provides business metrics and visualization, cluster management capabilities, and system monitoring on top of the Infochimps Platform. Dashpot gives you real time visibility and control of your Big Data stack running with Infochimps, helping you go from input to insight faster, with our best-in-class Big Data infrastructure and tools.
Here are some of Dashpot’s key features:
Implementing and operating Big Data architectures can be difficult, requiring significant investment of resources and time. By choosing to use the Infochimps Platform, enterprises needn’t worry about the time and hassle of building and maintaining their own infrastructure. When combined with our tools, such as Ironfan and DDS, Dashpot’s simple visualizations and management tools help organizations keep their Big Data system humming, with little operational overhead. Best of all, Dashpot’s in-stream visualizations help provide the insights businesses need to get the most value out of their Big Data infrastructure investment.
Interested in talking about how we can help simplify your Big Data stack? Contact us today for more information!
Apparently, April is Stress Awareness Month. Personally, I’m always aware of my stress, but this infographic does offer some interesting stats on our stress and nice reminders of how to let it go.