Overcoming the Data Scientist Shortage

Data scientists who can make business decisions are certainly not a dime a dozen. Today’s data professionals are tasked with driving bottom-line success for their companies by using business solutions to make actionable decisions based on customer and market insights. It takes more than a number cruncher to do that; it requires business acumen – an ability to make sense out of massive volumes of data coming from various silos.

Now consider just how much data is at the fingertips of companies today. According to IBM, 90 percent of the world’s data was created in the last two years. With such a large amount of new data, there is huge potential for multiple industries to dig through and extract insights. The only problem is that this has created a heavy demand for data scientists, a role that universities haven’t traditionally built curriculum around and companies haven’t necessarily heavily recruited for. Needless to say, there is a small pool of candidates to pick from.

In the video interview below, Michael Koploy, who researches business intelligence software solutions at Software Advice, talks with icrunchdata Co-Founder Todd Nevins, to discuss the increasing demand for Big Data jobs. They cover which specializations in the Big Data field, from data science to market analytics, are most sought-after, as well as how companies are circumventing the shortage of data science candidates to acquire top talent.

 Overcoming the Data Scientist Shortage

6e6c46da 2b08 4559 8c27 e09f1e4df781 Overcoming the Data Scientist Shortage

[Next Week's Webinar] Faster Insights: A Framework for Agile Big Data

Getting to Insights Faster: A Framework for Agile Big Data

Thursday, November 21 @ 10a PT/12p CT/1p ET

Webinar Resource [Next Weeks Webinar] Faster Insights: A Framework for Agile Big DataThe technology world is rapidly changing. No longer is it reasonable for companies to wait 2 years to see value from important data and insight initiatives. To successfully compete in today’s markets, insights must be available in real-time. A new approach must be utilized to allow agile, iterative development to have successful insights in as soon as 30 days.

Register for this live webcast and join Infochimps Director of Product, Tim Gasper, as he discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Join as Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed. During this webinar, we will discuss:

  • How agile Big Data application development differs from traditional development approaches
  • What our agile delivery framework looks like for planning Big Data projects and architecting customer solutions
  • What App Reference Designs are, and how they accelerate customer use cases
  • Real life case studies of business problems that have benefited from our agile approach
  • A technology deep dive into a customer example

REGISTER 300x75 [Next Weeks Webinar] Faster Insights: A Framework for Agile Big Data



This webinar will be recorded, and emailed after the event to all who register.

Who Should Attend?
This webcast is ideal for Enterprise Executives, Line of Business Executives, Technology Executives, Enterprise Architects, IT Project Managers, Application Developers and IT Professionals with expected or current Big Data projects at any stage.

6e6c46da 2b08 4559 8c27 e09f1e4df781 [Next Weeks Webinar] Faster Insights: A Framework for Agile Big Data

What’s Next After #StrataConf?

Strata Hadoop 300x76 What’s Next After #StrataConf?Did you know Infochimps Cloud delivers Big Data systems with unprecedented speed, simplicity, scale and flexibility to enterprise companies? If you came to Strata Hadoop Conference in New York last week, hopefully you stopped by our booth and walked away with this message – or a tshirt at the very least.

If you didn’t make it out to #StrataConf, here are a few things you may have missed:

  • App Reference Designs Announcement: Check out the press release>>
  • Jim Kaskade’s Passionate Keynote: Check out the video>>
    • Watch as Jim highlights a new revolution of analytic applications with some touching examples in the healthcare industry with cancer research and medication therapy management.
  • We’re Hiring: Check out the new job openings>>
    • Join the troop! Our exciting start up environment as part of CSC’s Big Data and Analytics group is rapidly growing. We’re seeking top talent who loves solving the world’s hardest big data problems, flexible hours, and competitive benefits. Join our team of gentle geniuses in our belief that we can change the world through data driven decisions.

Check out the newest press:

6e6c46da 2b08 4559 8c27 e09f1e4df781 What’s Next After #StrataConf?

Nothing so Practical as a Good Theory

Actionable Insight 150x150 Nothing so Practical as a Good TheoryThe most common error I have encountered among new data science practitioners is forgetting that the goal is not simply knowledge, but actionable insight. This isn’t limited to data scientists. Many analysts get carried away with the wrong metrics, tracking what is easy to measure rather than what is correct to measure. New data scientists get carried away with the latest statistical method or machine learning algorithm, because that’s much more fun than acknowledging that key data are missing.

To create actionable insight, we must start from the action, a choice. Data science is useless if it is not used to make decisions. When starting a project, I first ask how we will measure our progress towards our goals. As my colleague Morgan said last week, this often boils down to revenue, cost, and risk. An economist might bundle that up as time-discounted risk-adjusted future profits. My second task is identifying what decisions we will make in the process of accomplishing these goals.

The choices we make might be between different types of actions or might be between different intensities of an action: which advertising campaign, how much to spend, etc. These choices usually benefit from information. Some choices, such as selecting “red” or “black” at the roulette table, do not benefit from information. The outcome of most choices is partially dependent on information. Knowledge gives us power, but there is some randomness too. We might have hundreds of observations of every American’s response to our spokesperson’s call to action, but the predictive model we generate from that data might not help us after the spokesperson’s embarrassing incident at the golf course. The business case for data science is the estimation of how much information we can gain from our data and how much that information will improve the time-discounted, risk-adjusted benefit of our decisions.

The third task is picking what metrics to use. A management consultant might call this developing key performance indicators. A statistician might call this variable selection. A machine learning practitioner might call this feature engineering. We transform, combine, filter, and aggregate our data in clever and complex ways. Most critical is picking a good dependent variable, or explained variable. This is the metric you are predicting. This will be the distillation of all our knowledge to a single number.

To pick a good dependent variable, a data scientist must consider the quality of the data available and what predictions they might support, but more importantly, the data scientist must consider the decision improved by our prediction. When choosing whether to eat outside for lunch, we prefer to know the temperature at noon rather than the average temperature for the day. More important would be the chance of rain. The exact temperature to the fraction of a degree is unnecessary. Best of all would be a direct estimate of lunchtime happiness for outside versus inside on a scale of, “Yes, go outside” or “No, stay inside.” Unfortunately, we often cannot pick the most directly representative variable, because it is too difficult to measure. Lunchtime surveys would be expensive to conduct and self-reported happiness might be unreliable. A good dependent variable balances predictive power with decision relevance.

After we have built a great predictive model, the last step is figuring out how to operationalize the knowledge we gained. This is where the data science stops and the traditional engineering, or big data engineering, starts. No matter how great our product recommendations are, they are useless if we do not share those recommendations with the customer in a timely manner. In large enterprises, operationalizing insights often requires complex coordination across teams and business units, as hard a problem as the data science. Keeping this operation in mind from the start of the project will ensure the data science has business value.

Michael Selik is a data scientist at Infochimps. Over his career, he has worked for major enterprises and venture-backed startups delivering sophisticated analysis and technology project management services from hyperlocal demographics inference to market share forecasting. With Infochimps, Michael helps organizations deploy fast, scalable data services. He received a MS Economics, a BS Computer Science, and a BS International Affairs from the Georgia Institute of Technology; he likes bicycles and semi-colons.

Image Source: blog.cmbinfo.com

6e6c46da 2b08 4559 8c27 e09f1e4df781 Nothing so Practical as a Good Theory

Announcing Application Reference Designs

Today at the Strata NY + Hadoop World Conference, we announced a new key component to our business analytics offerings, which empowers enterprises with agile development and rapid deployment of scalable Big Data applications.

Designed with the expertise gained from experience with our customers in ad tech, manufacturing, healthcare, financial services, and with use cases involving social media, and customer service, these pre-packaged frameworks for the development of Big Data applications enable businesses to quickly execute targeted and agile analytics strategies tailored to the individual needs of an organization.

CxOs can not afford to wait 24 months for their Big Data application to launch before they start making mission-critical course corrections to their business. Our customers need to deliver value years ahead of their competition.

Today I’m pleased to announce the launch of a disruptive suite of Application Reference Designs, fueling a new era of analytic application development.

App Reference Designs Release Image 2 300x123 Announcing Application Reference Designs

Read the Full Press Release Here >

Request a Demo >

406b489e b14e 4684 bbd3 c316b533aea8 Announcing Application Reference Designs

Data Science and the Personal Optimization Problem

Data Science 300x174 Data Science and the Personal Optimization Problem“What gets measured gets done” is a common refrain.  And, to a large extent, that is how the business world works.  As Data Scientists, we have an outsized influence on what gets measured (and by extension, what gets done) in a business.  This is especially true with advent of predictive analytics.  We have a lot of responsibility, and we need to use it wisely.

Data Scientists need to be proactive to ensure that what we model and predict and measure provides quantifiable value for our organization.  But how can we do this, realistically?  After all, the numbers are the numbers, we are just drawing conclusions from them.  Right?  The truth is that you can have two Data Scientists develop models with the same tools against the same data and one analysis can be significantly more valuable to the people paying the bills.  It is our own personal optimization problem.

A salesperson usually has a number of accounts where revenue comes in from.  A typical consultant has one or more projects that they can bill hours to.  However, if you are in R&D or on staff in a support role, how can you ensure that your data science is valuable to your organization?

As a Data Scientist, the best barometer for the business value of your work is how well it:

  1. Generates Revenue
  2. Reduces Cost
  3. Eliminates Risk

That sounds great, but does a Data Scientist know that what they are working on is valuable?  This can be especially hard to figure if you are working in a supporting role or are in a shared service environment, such as a centralized data science team in a large organization.  My colleagues and I have had long discussions on this subject, and it seems that there is little consensus on how to do this effectively.

However, I have one sure-fire way to make sure that your data science is as valuable to your organization as you are.

Personal Optimization for Data Scientists

For every project that you work on, imagine that your part is going to be used as an entry on your resume in a section marked “Major Accomplishments” (there are lots of resume guides available that talk about how to do this).  Now, think about a hiring manager who is looking at your resume; not some bozo or corporate drone who is just there to fill bodies. Imagine a shark, someone who knows the industry inside and out and wants only to hire the best; someone who knows the data and the math and can sniff out a phony a mile away.

The hiring manager is going to grill you for detailed answers about your major accomplishments.  They want to know what you know and how you learned it.  They want to know what went well and what didn’t.  They want to know if you can do the same (or better) work for them.  They want to make sure that you know the theory and the application, and can deliver on the goods in a timely manner.  This is the definition of the bottom line.

Can you comfortably sit down in front of this person and talk about your major accomplishments?  Is your data science adding to your list of accomplishments?

Making Data Science Count

Data science has some really fantastic tools such as machine learning, data mining, statistics, and predictive modeling.  They are only going to get better in the future. However, we have to remember that these are just tools at our disposal.  Having skilled craftsmen using the best tools is key, but the most important thing we can do is to make sure that we are building the right things.

One of the things I like best about the Infochimps Cloud is that it takes care of all the infrastructure and architecture work in building a Big Data solution, and lets me focus on really figuring out how to make a valuable solution.  I don’t have to worry about building a Hadoop cluster for batch analytics, or stitching together Storm and Elasticsearch and Kibana to deliver real-time visualizations.  I also don’t have to worry about scaling things up if and when my data volume goes through the roof.

When I build with Infochimps, I know that my effort is being harnessed to build out major accomplishments; not to build sandboxes or dither with infrastructure issues. If you would like to learn more about Infochimps and the value of real-time data science, come by and see us at Strata in New York on October 28-30.  See you there!

Morgan Goeller is a Data Scientist at Infochimps, a CSC company. He is a longtime numbers guy with a B.S. in Mathematics and background in Hadoop, ETL, and Data Warehousing. Morgan lives in Austin, Texas with his wife, sons, and many cats and dogs.

3527b357 2038 47ae a163 deda4a8c5176 Data Science and the Personal Optimization Problem

Photo credit: kdnuggets.com

Can Big Data Save Them?

Strata Hadoop Can Big Data Save Them?Early next week, located in New York, NY from Oct. 28-30, Infochimps will be going big at Strata + Hadoop World, along side thousands of the best minds in data gathering to learn, connect, share knowledge, and explore.

Strata + Hadoop World is one of the largest gatherings of the Apache Hadoop community in the world, with emphasis on hands-on and business sessions on the Hadoop ecosystem. If you want to tap into the opportunities brought by Big Data, data science, and pervasive computing, you’ll want to be there.

Easily the biggest show of the year for us, we’re looking forward to:

  • Infochimps CEO Jim Kaskade keynoting Wed, Oct. 30 at 9:50am EDT in the Grand Ballroom about Cancer and Big Data:
    • Can Big Data Save Them? Data and analytics is a means to an end. Jim highlights a new revolution of analytic applications with some touching examples in the healthcare industry with cancer research and medication therapy management.
  • Giving you our famous Infochimps t-shirt at Booth #38. Meet a bunch of eager chimps ready to talk about Big Data. Key exhibiting team members include our VP of Sales Burke Kaltenberger, Director of Marketing Amanda McGuckin Hager, Director of Product Tim Gasper, Director of Sales Strategy and Operations Ryan Miller, Demand Gen Manager Caroline Lim, Sales Engineer Morgan Goeller, and our VP of Business Development.
  • Meetings with you! Set up a meeting with us by emailing ryan@infochimps.com.

CONTACT Can Big Data Save Them?

119efc1b cf09 4f4f 9085 057e76e0464c Can Big Data Save Them?

More Complex in Asia: Mapping the Most Visited Website by Country

Being engulfed in the online world, this article from FlowingData caught my attention about the most visited website by country. Mark Graham and Stefano De Sabbata from Information Geographies mapped the most visited site based on Alexa data. Countries are sized by Internet population.

Seeing the pretty visual graphic along with the post didn’t draw my attention to the red and blue (the obvious Google and Facebook takeovers in the Americas and Europe), but instead to the massive screaming green.

TopSitePerCountry InternetPopulation More Complex in Asia: Mapping the Most Visited Website by Country

Mark Graham and Stefano De Sabbata’s findings suggest “the situation is more complex in Asia, as local competitors have been able to resist the two large American empires. Baidu is well known as the most used search engine in China, which is currently home to the world’s largest Internet population at over half a billion users. At the same time, we see a puzzling fact that Baidu is also listed as the most visited website in South Korea (ahead of the popular South Korean search engine, Naver). We speculate that the raw data that we are using here are skewed. However, we may also be seeing the Baidu empire in the process of expanding beyond its traditional home territory. The remaining territories that have escaped being subsumed into the two big empires include Yahoo! Japan in Japan (in join venture with SoftBank) and Yahoo! in Taiwan (after the acquisition of Wretch). The Al-Watan Voice newspaper is the most visited website in the Palestinian Territories, the e-mail service Mail.ru is the most visited in Kazakhstan, the social network VK the most visited in Belarus, and the search engine Yandex the most visited in Russia.”

READ More Complex in Asia: Mapping the Most Visited Website by Country



Thank you FlowingData for providing interesting findings for us data nerds.


b0bae296 90b0 4bfe 8177 b5ac72be71c6 More Complex in Asia: Mapping the Most Visited Website by Country

Democratizing Big Data: We Get It

We’ve been there.

Maybe you’re an enterprise with huge data sets, competing in a saturated market like telecommunications, healthcare or financial services.

Or maybe you’re a startup that has lots of data but not the manpower to handle the data.

Or maybe you’re a retailer moving from multichannel to omnichannel, but you’re struggling to synthesize data from disparate sources, like legacy point-of-sale systems and Foursquare check-in data.

Or maybe you’re something else entirely, for whom the promise of Big Data seems like a pipe dream because:

  • The infrastructure hurdle is a towering one:
    • How do you acquire, store and manage all that data?
    • How do you integrate tools like Hadoop, Storm, Kafka, NoSQL, and others in ways that produce transformational insights for your business?
    • How do you plan a technology stack with the elasticity to scale for use cases known and unknown?
  • You’re understaffed for the project you’re considering, and you can’t afford to poach experts from Big Data pedigree houses like Google, LinkedIn, Twitter, etc.
  • The clock is ticking. No one is giving you a year to go on a Big Data fishing expedition. You need quick initial results and even quicker iterations.

Ok, deep breaths. We get it. We’ve been there.

Infochimps’ growth from data marketplace to Platform as a Service (PaaS) makes us a highly evolved group of chimps: The kind who can take any kind of data, and do any kind of analytics with it, in any type of cloud. We’ve worked with every kind of database. We can produce batch, streaming and ad hoc analytics. And we can deploy from public, private and hybrid clouds.

And we do it quickly. While typical Big Data projects take over a year to yield results, we can have your first use case in production in 90 days, and complete subsequent projects in weeks.

Our approach to Big Data is built on Infochimps™ Cloud for Big Data: three essential cloud services that unleash the full analytic capabilities needed to solve any enterprise Big Data problem. Infochimps Cloud expedites and simplifies development and deployment of Big Data applications.

So if you’re late to the Big Data game or you’ve been beaten in it before, let’s talk. Infochimps can save your organization hardware and hiring costs, while accelerating results – enabling you to unlock insights that can transform your business.

6fefa857 2e95 4742 9684 869168ac7099 Democratizing Big Data: We Get It

‘Tis the Season…for Events

It’s that time of year again…Big Data events season. After hacking away at our events calendar, we decided to make an appearance at the following events:

Data Curiosity RoundTable

  • When: September 25, 2013
  • Where: Austin, Texas
  • What: This is an open forum with the goal of sharing and trading ideas as we all “noodle” through the byproduct of intensive Big Data harnessing
  • Who: Presented by Dell, and Infochimps Sales Engineer Morgan Goeller will be attending this event
  • Why: A roundtable discussion with free pizza

LEARN 300x77 Tis the Season...for Events




  • When: September 30 – October 3, 2013
  • Where: Las Vegas, NV
  • What: The 4th Annual Splunk Worldwide Users’ Conference
  • Who: Presented by Splunk, and Infochimps CEO Jim Kaskade will contribute to the CxO Big Data panel on Tuesday, October 1
  • Why: Deepen your knowledge of Splunk, learn best practices, check out new solutions, see how others apply Splunk technology to real-world projects, and become more involved in the Splunk community

LEARN 300x77 Tis the Season...for Events

GigaOM Mobilize 2013

  • When: October 16 – 17, 2013
  • Where: San Francisco, CA
  • What: A 2-day conference to examine what opportunities are created by new mobile technologies and business models
  • Who: Presented by GigaOM
  • Why: This conference helps attendees make sense of where developments are headed and how they’ll affect everything from applications to infrastructure choices

LEARN 300x77 Tis the Season...for Events

The Big Data Conference

  • When: October 22 – 23, 2013
  • Where: Chicago, IL
  • What: A 2-day conference of comprehensive content for every IT, marketing or digital professional seeking to capitalize on the boom in data volume, variety and velocity
  • Who: Presented by UBM
  • Why: This conference fills a gap in the US market for businesses looking to capitalize on the Big Data opportunity and look for efficient solutions to manage the ever growing amount of structured and unstructured data

LEARN 300x77 Tis the Season...for Events

…and last but not least…our biggest conference of the year: Strata Conference + Hadoop World

  • When: October 28 – 30, 2013
  • Where: New York, NY
  • What: 3 days of inspiring keynotes and intensely practical, information-rich sessions exploring the latest advances, case studies, and best practices
  • Who: Co-presented by O’Reilly and Cloudera, and Infochimps CEO http://www.twitter.com/jimkaskade will be keynoting Wed, Oct. 30 at 9:50am EDT in the Grand Ballroom
  • Why: Strata + Hadoop World is where Big Data’s most influential decision makers, architects, developers, and analysts gather to shape the future of their businesses and technologies

LEARN 300x77 Tis the Season...for Events

If you’re going to any of the above events and want to set up a meeting with a chimp to talk about Big Data, we’d love to chat.

CONTACT 300x78 Tis the Season...for Events

3527b357 2038 47ae a163 deda4a8c5176 Tis the Season...for Events