Monthly Archives November 2013

Overcoming the Data Scientist Shortage

Data scientists who can make business decisions are certainly not a dime a dozen. Today’s data professionals are tasked with driving bottom-line success for their companies by using business solutions to make actionable decisions based on customer and market insights. It takes more than a number cruncher to do that; it requires business acumen – an ability to make sense out of massive volumes of data coming from various silos.

Now consider just how much data is at the fingertips of companies today. According to IBM, 90 percent of the world’s data was created in the last two years. With such a large amount of new data, there is huge potential for multiple industries to dig through and extract insights. The only problem is that this has created a heavy demand for data scientists, a role that universities haven’t traditionally built curriculum around and companies haven’t necessarily heavily recruited for. Needless to say, there is a small pool of candidates to pick from.

In the video interview below, Michael Koploy, who researches business intelligence software solutions at Software Advice, talks with icrunchdata Co-Founder Todd Nevins, to discuss the increasing demand for Big Data jobs. They cover which specializations in the Big Data field, from data science to market analytics, are most sought-after, as well as how companies are circumventing the shortage of data science candidates to acquire top talent.

 Overcoming the Data Scientist Shortage

6e6c46da 2b08 4559 8c27 e09f1e4df781 Overcoming the Data Scientist Shortage

[Next Week’s Webinar] Faster Insights: A Framework for Agile Big Data

Getting to Insights Faster: A Framework for Agile Big Data

Thursday, November 21 @ 10a PT/12p CT/1p ET

Webinar Resource [Next Weeks Webinar] Faster Insights: A Framework for Agile Big DataThe technology world is rapidly changing. No longer is it reasonable for companies to wait 2 years to see value from important data and insight initiatives. To successfully compete in today’s markets, insights must be available in real-time. A new approach must be utilized to allow agile, iterative development to have successful insights in as soon as 30 days.

Register for this live webcast and join Infochimps Director of Product, Tim Gasper, as he discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Join as Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed. During this webinar, we will discuss:

  • How agile Big Data application development differs from traditional development approaches
  • What our agile delivery framework looks like for planning Big Data projects and architecting customer solutions
  • What App Reference Designs are, and how they accelerate customer use cases
  • Real life case studies of business problems that have benefited from our agile approach
  • A technology deep dive into a customer example

REGISTER 300x75 [Next Weeks Webinar] Faster Insights: A Framework for Agile Big Data



This webinar will be recorded, and emailed after the event to all who register.

Who Should Attend?
This webcast is ideal for Enterprise Executives, Line of Business Executives, Technology Executives, Enterprise Architects, IT Project Managers, Application Developers and IT Professionals with expected or current Big Data projects at any stage.

6e6c46da 2b08 4559 8c27 e09f1e4df781 [Next Weeks Webinar] Faster Insights: A Framework for Agile Big Data

What’s Next After #StrataConf?

Strata Hadoop 300x76 What’s Next After #StrataConf?Did you know Infochimps Cloud delivers Big Data systems with unprecedented speed, simplicity, scale and flexibility to enterprise companies? If you came to Strata Hadoop Conference in New York last week, hopefully you stopped by our booth and walked away with this message – or a tshirt at the very least.

If you didn’t make it out to #StrataConf, here are a few things you may have missed:

  • App Reference Designs Announcement: Check out the press release>>
  • Jim Kaskade’s Passionate Keynote: Check out the video>>
    • Watch as Jim highlights a new revolution of analytic applications with some touching examples in the healthcare industry with cancer research and medication therapy management.
  • We’re Hiring: Check out the new job openings>>
    • Join the troop! Our exciting start up environment as part of CSC’s Big Data and Analytics group is rapidly growing. We’re seeking top talent who loves solving the world’s hardest big data problems, flexible hours, and competitive benefits. Join our team of gentle geniuses in our belief that we can change the world through data driven decisions.

Check out the newest press:

6e6c46da 2b08 4559 8c27 e09f1e4df781 What’s Next After #StrataConf?

Nothing so Practical as a Good Theory

Actionable Insight 150x150 Nothing so Practical as a Good TheoryThe most common error I have encountered among new data science practitioners is forgetting that the goal is not simply knowledge, but actionable insight. This isn’t limited to data scientists. Many analysts get carried away with the wrong metrics, tracking what is easy to measure rather than what is correct to measure. New data scientists get carried away with the latest statistical method or machine learning algorithm, because that’s much more fun than acknowledging that key data are missing.

To create actionable insight, we must start from the action, a choice. Data science is useless if it is not used to make decisions. When starting a project, I first ask how we will measure our progress towards our goals. As my colleague Morgan said last week, this often boils down to revenue, cost, and risk. An economist might bundle that up as time-discounted risk-adjusted future profits. My second task is identifying what decisions we will make in the process of accomplishing these goals.

The choices we make might be between different types of actions or might be between different intensities of an action: which advertising campaign, how much to spend, etc. These choices usually benefit from information. Some choices, such as selecting “red” or “black” at the roulette table, do not benefit from information. The outcome of most choices is partially dependent on information. Knowledge gives us power, but there is some randomness too. We might have hundreds of observations of every American’s response to our spokesperson’s call to action, but the predictive model we generate from that data might not help us after the spokesperson’s embarrassing incident at the golf course. The business case for data science is the estimation of how much information we can gain from our data and how much that information will improve the time-discounted, risk-adjusted benefit of our decisions.

The third task is picking what metrics to use. A management consultant might call this developing key performance indicators. A statistician might call this variable selection. A machine learning practitioner might call this feature engineering. We transform, combine, filter, and aggregate our data in clever and complex ways. Most critical is picking a good dependent variable, or explained variable. This is the metric you are predicting. This will be the distillation of all our knowledge to a single number.

To pick a good dependent variable, a data scientist must consider the quality of the data available and what predictions they might support, but more importantly, the data scientist must consider the decision improved by our prediction. When choosing whether to eat outside for lunch, we prefer to know the temperature at noon rather than the average temperature for the day. More important would be the chance of rain. The exact temperature to the fraction of a degree is unnecessary. Best of all would be a direct estimate of lunchtime happiness for outside versus inside on a scale of, “Yes, go outside” or “No, stay inside.” Unfortunately, we often cannot pick the most directly representative variable, because it is too difficult to measure. Lunchtime surveys would be expensive to conduct and self-reported happiness might be unreliable. A good dependent variable balances predictive power with decision relevance.

After we have built a great predictive model, the last step is figuring out how to operationalize the knowledge we gained. This is where the data science stops and the traditional engineering, or big data engineering, starts. No matter how great our product recommendations are, they are useless if we do not share those recommendations with the customer in a timely manner. In large enterprises, operationalizing insights often requires complex coordination across teams and business units, as hard a problem as the data science. Keeping this operation in mind from the start of the project will ensure the data science has business value.

Michael Selik is a data scientist at Infochimps. Over his career, he has worked for major enterprises and venture-backed startups delivering sophisticated analysis and technology project management services from hyperlocal demographics inference to market share forecasting. With Infochimps, Michael helps organizations deploy fast, scalable data services. He received a MS Economics, a BS Computer Science, and a BS International Affairs from the Georgia Institute of Technology; he likes bicycles and semi-colons.

Image Source:

6e6c46da 2b08 4559 8c27 e09f1e4df781 Nothing so Practical as a Good Theory