Monthly Archives December 2012

Infochimps Participates in Dell World 2012, Focused on Big Data

Dell World Infochimps Participates in Dell World 2012, Focused on Big Data

Last week our Chief Science Officer, Dhruv Bansal, participated in a Dell World Chief Technology Officer Summit focused on Big Data. It was definitely a riveting conversation! The panel was moderated by Josh Neland, Technology Strategist for Dell, and included Brian Andersen, Technology Strategist from Teradata, Jim Thompson, Engineering Vice President and CTO with Unisys and Tim McQuillen, CIO & Founder of StrongMail Systems.

The panel discussed a number of Big Data topics with an audience of 40 CIOs – and certainly Big Data is about the hottest topic on CIO’s minds.  One exchange examined the concept of the value chain of Big Data projects, specifically the challenges of starting with raw data, putting it through a process of loading, transforming and curating it with the end goal of uncovering the insights needed to drive the business forward.  Examples of true ROI abound.

Jim Thompson brought the analogy of “empowering data,” concentrating its power to deliver precise business value and distinct insights. This will be critical for future success in the enterprise.

Dhruv Bansal stressed the importance of knowing the questions you are trying to answer through your Big Data project, noting that successful Big Data projects know these questions before starting out. This is the key link between a Big Data project and the business.

In concluding, the group agreed that while Big Data is still in its infancy, it has massive potential and “failure of the imagination” is the only limit to what can be achieved.

47f18564 d70f 4a11 b8e3 f59ec64f85aa Infochimps Participates in Dell World 2012, Focused on Big Data

A Sneak Peek: Big Data for Chimps

  • Amanda McGuckin Hager

Big Data for Chimps 228x300 A Sneak Peek: Big Data for ChimpsYou may know leading data scientist, Flip Kromer, Infochimps co-founder and CTO. If you don’t, you soon will. O’Reilly is publishing his book “Big Data for Chimps, A Guide to Massive Scale Data Processing in Practice available for pre-order now. “Big Data for Chimps” is poised to bring an educational spin to those in the big data space that is unlike anything you may have read before. While beginners stand to gain quite a bit from reading the book, the book also appeals those experienced in modern programming techniques. Flip’s approach to technology builds the foundation throughout – using data at massive scale in very practical ways. That is, big data is about the data, and gaining value from it, not about the technologies.

Big Data for Chimps” will help you:

  • Discover how to think at scale by understanding how data must flow through the cluster to effect transformations
  • Identify the Big Data Infrastructure tuning knobs that matter
  • Learn the Big Data rules-of-thumb
  • Apply Hadoop to interesting problems through detailed example programs
  • Gain advice and best practices for efficient software development

You will be captivated and engaged through Flip’s creative use of examples, analogies and stories. When you read the book, you’ll experience: “Where is BBQ?,” “Pig Latin Translator,” “Patterns in UFO Sitings,” “Elephant and Chimpanzee Start a Business,” and more.

The chapter “Elephant and Chimpanzee Save Christmas” is especially commanding of your attention. Watch for it in January, as we’ll be releasing some chapters leading up to the publication.

Happy Holidays from all of us at Infochimps.  As our gift to you, here’s a little sneak peek at what’s in store, called “The Hadoop Haiku:”

data flutters by
elephants make sturdy piles
insight shuffles forth

47f18564 d70f 4a11 b8e3 f59ec64f85aa A Sneak Peek: Big Data for Chimps


5 Questions Framing Data-Driven Decisions

5 Data Driven Decision Questions 5 Questions Framing Data Driven DecisionsWhile data-driven decisions is nothing new (remember the rise of “decision support systems” and “business intelligence”?), it does seem that enterprises have a new urgency these days: Enterprises that make data-driven decisions are gaining benefits ranging from better customer insights, higher sales, more efficient operations and lower costs. What’s not to like with that?

Today, the “volume, velocity and variety” of data that enterprises have at their disposal is mind-bendingly greater than just a few years ago. And, enterprises are embracing the kind of real-time decision making that does not just run the business, it runs the business smarter. Whether driving better customer engagement (and sales) or enabling more efficient operations, big data has become an essential asset for the modern enterprise.

Earlier in my career I was an operations research analyst – kind of an early-day data scientist. There were 5 questions I always made sure to answer regarding any project I undertook. These questions frame an analytic process that underlies making effective data-driven decisions, and I think they are as applicable today as ever.

  1. Do I understand the decision to be made, especially the business factors that make this decision important?
  2. Do I have a model that captures the decision process? I.e., do I have an analytic framework, mathematical description, appropriate algorithms, etc., that describe the decision to an appropriate degree of detail. Part of this is picking the right algorithms.
  3. Do I have the data? This is pretty obvious: if data is going to drive a decision, you need to have the data. Even in today’s environment of an overabundance of data, it’s still important to make sure you have data appropriate for the model and the decision.
  4. Do I have the necessary computational infrastructure? This used to mean, can I run this on my PC or do I need to get time in the data center? Today it means, how can I get a cluster of Hadoop servers pumping data into a NoSql database to drive my analytics. Today’s infrastructure is much harder to master.
  5. Am I producing results that are driving the decision? If so, great. If not, maybe I got something wrong in #’s 1-4. Repeat 1-4 until satisfied.

Questions 1 and 5 are about the business. Since you know your business better than anyone, you’re pretty much on your own for these. Questions 2, 3 and 4 are about the data, analytics and computational infrastructure to get you the answers you need. There are plenty of companies that can help you here, in whole or in part. The important thing is to not get bogged down in the infrastructure. That’s where the Infochimps platform really shines. As quoted from a recent TechCrunch article, “Infochimps is one of a growing ecosystem of companies that are programming the knowledge of data scientists, statisticians and programmers into applications that businesspeople can use.”

34523bb2 2e50 4f42 88a1 5bd9ed0fddac 5 Questions Framing Data Driven Decisions

Image Source:

Announcing Infochimps Enterprise Cloud

Infochimps Enterprise Cloud Announcing Infochimps Enterprise Cloud

Big Data is confusing to most executives. It’s this nebulous concept of applying technologies from Yahoo!, Facebook, Linkedin, and Twitter in such a way that the organization will truly become data-driven and, equally as important, be able to do so quickly. Unfortunately, only a few companies are really realizing its full potential.

That’s why Infochimps is announcing its Enterprise Cloud – A Big Data cloud service built specifically for Fortune 1000 enterprises who want to rapidly explore how big data technology can unlock revenue from their data. The Infochimps Enterprise Cloud addresses several challenges holding back executives from quickly gaining value from this disruptive technology.

Enterprises are only leveraging 15% of their data assets

Enterprises, on average, capture and analyze about 15% of their data assets. Typical data sources include transactional data (who bought what). However, a 360-degree view of the business requires a 360-degree view of the customer, as well as manufacturing, supply chain, finance, sales, marketing, engineering, etc.  Only by capturing 100% of the enterprise’s entire operational data and then supplementing it with external data (e.g. we’re talking to one pharmaceutical company about using claims data from 100+ health plans covering more than 70 million people), will you achieve maximum value from your data analytics. With the Infochimps Enterprise Cloud, you can not only combine 100% of your private data in a private cloud, but you can also supplement that data with another 100%+ of external data.

Time-To-Market constrained by infrastructure deployments

The deployment of, and value creation from, new disruptive big data technologies (Hadoop, NoSQL, in-stream processing) still takes a considerable amount of time, human and financial resources. Typical Enterprise Data Warehouse projects take 18-24 months to deploy. Simple changes to star-schema data models take 6 months minimum to be made available to internal development organizations. Hadoop projects, although less complicated than EDW, take about 12 months to deploy. With the Infochimps Enterprise Cloud, you can deploy value in 30 days.

Big Data talent hard to find

When I read articles about the gap between supply and demand for big data talent, I think to myself, “this is not a situation where analysts are collecting a sample of 10 companies and then generalizing it to the entire market.” It’s a real problem. If you are some “antiquated” Fortune 1000 company (you know who you are) looking to hire crazy smart engineers and data scientists from Facebook…well, sorry…you don’t have the corporate culture or the exciting environment that this talent enjoys. McKinsey forecasts that the demand and supply of talent needed is only going to get worse (60% gap by 2018). With the Infochimps Enterprise Cloud, you can leverage your existing talent. This is done by providing a simple but powerful abstraction between your application development team and the complex big data infrastructure.

One Big Data technology does not fit all

There are literally hundreds of DBMS / data store solutions today, supporting many different advantages based on data type and use-case. This creates the problem where business users and application developers get lost in the nuances associated with data infrastructure, and lose focus on the business needs. Don’t listen to a single data store vendor tell you that they can address all your business needs. You need several. With the Infochimps Enterprise Cloud, we force you to start with the business problem first, then we draw from a very comprehensive data services layer which addresses the needs of the business problem. Guess what? It’s not just Hadoop.

Infrastructure and data integration is the most challenging

Knowing how to integrate existing data infrastructure with new big data infrastructure and then complicating this with external data sources, makes integration a completely new problem. This is not a matter of simply upgrading your ETL tools. With the Infochimps Enterprise Cloud, we help you understand the “new ETL” used by our web-scale friends.

Open source is cheap, but not easily commercialized

Silicon Valley has created over 250,000 open source projects alone. Disruption is obviously occurring within the open source community. However, enterprises are not in a position to properly deploy, even with the many commercialization vendors. How does a company integrate several open-source solutions into one? With the Infochimps Enterprise Cloud, we support an end-to-end big data service, which consists of many commercial open source projects combined to offer real-time stream processing, ad-hoc analytics, and batch analytics as one integrated data service.

Data security + data volume both dictate deployment options

Only non-sensitive, publicly available data sets (e.g. Twitter) are using elastic public cloud infrastructure. Compliance/governance issues still require that data-sensitive analytics occur “behind the firewall”. Also, if you are an established enterprise with large volumes of data, you are not going to “upload” to the cloud for your analytics. With the Infochimps Enterprise Cloud, we provide public, virtual private, private, or hybrid big data cloud services that address the needs of big businesses with big problems.

Today, I’m pleased to announce the Infochimps Enterprise Cloud, our big data cloud running on a network of big data-focused data centers and being deployed by leading big data system integrators.

These are exciting times, indeed. Read the full press release here >>.

119efc1b cf09 4f4f 9085 057e76e0464c Announcing Infochimps Enterprise Cloud


Webcast Recap: Top Strategies for Successful Big Data Projects

25% of IT projects are canceled before completion. More than double that, 62% of IT projects are considered “failed” because although they weren’t canceled, they faced severe budget overruns, failure to deliver business value, and many other issues.

In a recent survey that Infochimps and SSWUG performed, we discovered that 44% of Big Data projects are canceled before completion. How many more are failing to meet project goals and objectives? 80%? 90%?

We uncovered the most common reasons Big Data projects fail:

Business Challenges:

  • Inaccurate scope
  • Non-cooperation between departments
  • Lack of talent / lack of expertise

Technical Challenges:

  • Technical or roll-out roadblocks
  • Gathering data from different sources
  • Understanding the tools, platforms, technologies, and vendors

Big Data projects can have such a transformative effect on business, from deeper business insights leading to new profit channels or products, to unifying and streamlining a fragmented, siloed enterprise data environment.

With all this in mind, based on our research and experience, we’re sharing our 7 Strategies for Successful Big Data Projects. These strategies have worked for Infochimps customers, and can work for any organization looking to successfully tackle their own Big Data project.

7strategies logo2 1024x852 Webcast Recap: Top Strategies for Successful Big Data Projects

For more details on the 7 Strategies for Successful Big Data Projects, watch our webcast recording here. >>

Have a story about a Big Data project of your own? Share it with us, and help us continue developing and advancing this framework!

Source: CNET

47f18564 d70f 4a11 b8e3 f59ec64f85aa Webcast Recap: Top Strategies for Successful Big Data Projects

Infochimps Plugged In To Gnip

Plugged In To Gnip Infochimps Plugged In To Gnip



Infochimps is pleased to announce our partnership with Gnip, one of the world’s largest and most trusted provider of social data, as part of their Plugged In To Gnip program.

Today, Gnip announced their new program that “enables Plugged In To Gnip partners to transparently showcase their access to the most reliable, comprehensive and sustainable social data in the world, creating the best possible experience for their customers”.  Read the full press release here >>

Why is Infochimps Plugged In To Gnip?
Getting a handle on the immense volume of data produced by the social networks provided by Gnip often requires a sophisticated data infrastructure for the processing and control of feeds.  As a partner in providing solutions to customers needing to extract insight from this treasure trove of data, Infochimps can help by setting up customers with a best in class data platform for refining and working with Gnip’s feeds.

Gnip powers social analytics solutions for some of the world’s largest Business Intelligence and Social Media Monitoring firms. They are a certified Twitter partner and exclusive provider of commercial access to public data from Tumblr, WordPress, StockTwits and Disqus.

To learn more about Plugged In To Gnip, visit

6fefa857 2e95 4742 9684 869168ac7099 Infochimps Plugged In To Gnip

A Chimpy Movember

This November, the chimps participated in Movember, the moustache growing charity event held each November that raises awareness and funds for men’s health. There was an objective, rules, and winners – the makings of a friendly competition while building company culture for a worthy cause.

The Objective: To raise money for men’s health through growing facial hair, asking friends and family for donations, and by joining together with other chimps in camaraderie and some good-hearted revelry.

The Rules:
1. You do not have to begin the month with a clean shaven face.
2. You must maintain a moustache continuously from the 15th to end of month.
3. You must end the month with a moustache, but not necessarily the same moustache you started with.
4. A moustache is not a beard. For example: There is no joining of the moustache to the sideburns.
5. A moustache is not a goatee. There is no joining of the handlebars to the chin.
6. Other facial hair is permitted.
7. Category winners are determined on November 30 by consensus vote of the Mo Sistas.
8. Chimpiest Mo is determined on whatever criteria the Mo Sistas agree to on November 30.
9. Each Mo shall conduct themselves as true country gentlemen; each Mo Sista shall conduct themselves as true city ladies.

Category Winners:
The Chimpiest Mo – Travis Dempsey
– The Lamest Mo (for the follically challenged) – Joe Kelly
– The Most Styled Mo – Flip Kromer
(Moustache Memorabilia was awarded to the winners.)

Infochimps Movember1 A Chimpy Movember

(From Left to Right: Mo Sistas, Winning Mos, Miami Vice Mos)

Go Infochimps! We are proud to support Movember, raising awareness and funds for men’s health.

Just because it’s December, doesn’t mean you can’t support men’s health all year round. See the official Movember merchandise page for everything from posters to shoes, and like Movember USA on Facebook.