Data Science and the Personal Optimization Problem

Data Science 300x174 Data Science and the Personal Optimization Problem“What gets measured gets done” is a common refrain.  And, to a large extent, that is how the business world works.  As Data Scientists, we have an outsized influence on what gets measured (and by extension, what gets done) in a business.  This is especially true with advent of predictive analytics.  We have a lot of responsibility, and we need to use it wisely.

Data Scientists need to be proactive to ensure that what we model and predict and measure provides quantifiable value for our organization.  But how can we do this, realistically?  After all, the numbers are the numbers, we are just drawing conclusions from them.  Right?  The truth is that you can have two Data Scientists develop models with the same tools against the same data and one analysis can be significantly more valuable to the people paying the bills.  It is our own personal optimization problem.

A salesperson usually has a number of accounts where revenue comes in from.  A typical consultant has one or more projects that they can bill hours to.  However, if you are in R&D or on staff in a support role, how can you ensure that your data science is valuable to your organization?

As a Data Scientist, the best barometer for the business value of your work is how well it:

  1. Generates Revenue
  2. Reduces Cost
  3. Eliminates Risk

That sounds great, but does a Data Scientist know that what they are working on is valuable?  This can be especially hard to figure if you are working in a supporting role or are in a shared service environment, such as a centralized data science team in a large organization.  My colleagues and I have had long discussions on this subject, and it seems that there is little consensus on how to do this effectively.

However, I have one sure-fire way to make sure that your data science is as valuable to your organization as you are.

Personal Optimization for Data Scientists

For every project that you work on, imagine that your part is going to be used as an entry on your resume in a section marked “Major Accomplishments” (there are lots of resume guides available that talk about how to do this).  Now, think about a hiring manager who is looking at your resume; not some bozo or corporate drone who is just there to fill bodies. Imagine a shark, someone who knows the industry inside and out and wants only to hire the best; someone who knows the data and the math and can sniff out a phony a mile away.

The hiring manager is going to grill you for detailed answers about your major accomplishments.  They want to know what you know and how you learned it.  They want to know what went well and what didn’t.  They want to know if you can do the same (or better) work for them.  They want to make sure that you know the theory and the application, and can deliver on the goods in a timely manner.  This is the definition of the bottom line.

Can you comfortably sit down in front of this person and talk about your major accomplishments?  Is your data science adding to your list of accomplishments?

Making Data Science Count

Data science has some really fantastic tools such as machine learning, data mining, statistics, and predictive modeling.  They are only going to get better in the future. However, we have to remember that these are just tools at our disposal.  Having skilled craftsmen using the best tools is key, but the most important thing we can do is to make sure that we are building the right things.

One of the things I like best about the Infochimps Cloud is that it takes care of all the infrastructure and architecture work in building a Big Data solution, and lets me focus on really figuring out how to make a valuable solution.  I don’t have to worry about building a Hadoop cluster for batch analytics, or stitching together Storm and Elasticsearch and Kibana to deliver real-time visualizations.  I also don’t have to worry about scaling things up if and when my data volume goes through the roof.

When I build with Infochimps, I know that my effort is being harnessed to build out major accomplishments; not to build sandboxes or dither with infrastructure issues. If you would like to learn more about Infochimps and the value of real-time data science, come by and see us at Strata in New York on October 28-30.  See you there!

Morgan Goeller is a Data Scientist at Infochimps, a CSC company. He is a longtime numbers guy with a B.S. in Mathematics and background in Hadoop, ETL, and Data Warehousing. Morgan lives in Austin, Texas with his wife, sons, and many cats and dogs.

3527b357 2038 47ae a163 deda4a8c5176 Data Science and the Personal Optimization Problem

Photo credit: