Does the Big Data Solution Exist?

What is a Big Data solution and what does it take to make a project successful? Perform your own experiment by posing this question to technology companies in the Big Data space. Then pose the same question to the pure service providers that are focused on Big Data. Finally, pose the same question to a few customers. Here is what I have found:

Technology providers will talk in terms of their specific contribution to the solution. Let’s think of the architectural stack from the bottom up. In the simplest terms, the Big Data solution is enabled by the infrastructure, the platform for the analytics to be performed, data software (which includes everything from data ingestion to statistical analysis), the visualization of the data, and the applications that depend on this solution. It is the sum of the parts, which no one vendor has, which makes up the enabling technologies that is “Big Data.”

big data 2 300x168 Does the Big Data Solution Exist?Service providers will talk in terms of business needs to understand what value there is in the data (e.g., use case discoveries, the data science engagements, proof-of-value offerings, implementation assistance, and application development).

Customers interested in Big Data are looking to simplify things to get to the incremental and previously unattainable insights that are the promise of Big Data. That journey, however, is a very complex one and one that is not without risk. The customer answer depends on who you ask. Ask IT and they may talk technology and the partners they prefer. Ask the application team or analytics team and your answers will straddle both the business value discussions and the technology needed to get to those answers. Lastly, the more progressive line of business decision makers aren’t interested in the complexities that make up a Big Data solution, but they are interested in the game changing insight that will allow them to create new service offerings or help to make the business more efficient as a result of the analytics being performed.

Is it now time to say that all of these answers combined is what makes up a Big Data solution? Not quite. Compliance and security are considerations businesses must address. Add to this, the deployment options which include on-premise bare metal, on-premise private cloud, a private secured cloud, a hybrid approach with both data center and cloud resources available, and finally public options like Amazon, Google, AT&T, and others. Not to mention, the talent needed to do this all in-house by customers of all sizes isn’t readily available.

The war to win in the Big Data space is being waged and customers are in the middle of it. Continuing the analogy further, customers would like to sit the war out and have the Big Data solution provided to them, removing the confusion, complexity and concern.

Now ask yourself the question, “What is a Big Data solution and what does it take to make your project successful?” Now the answer…it’s easier than you think. Ask yourself who has the technology expertise, services capabilities, customer proof points, provide flexibility in deployment, and has the option to provide all of this in a managed service so that you pay for just what you use. Those who provide “The Big Data Solution” exist. You just need to ask the right questions and look in the right places for those answers.

Alan Geary, VP of business development at Infochimps, a CSC Big Data Business, has focused on business and channel development at software and technology companies that have grown through partnering. Alan has a unique combination of Big Data and Cloud experience by working over the last decade at both a Hadoop distribution company and VMware. Both companies doubled revenue year over year with the partnerships playing a significant role in the adoption of both Hadoop and virtualization respectively.

Image source:

5fd3b37b f0ff 4b11 a9ba 54ff208f06f1 Does the Big Data Solution Exist?

Strata Santa Clara Recap + What Comes Next

Another year, another great Strata! Team Infochimps is back in Austin, Texas, but hopefully you stopped by and grabbed a t-shirt at our booth before you left. We saw a few familiar faces and made a lot of new friends — but in case you weren’t at Strata, here’s what you may have missed…

We got a lot of questions about our workshop, where you can meet with our experienced data scientists to learn key Big Data concepts and best practices. We’re offering personalized recommendations for each workshopper and sharing the secrets of success we’ve seen from other businesses. If you missed us at the booth or haven’t heard about it yet, you can learn more and request a workshop here too.

Many of you asked about our second annual “CIOs & Big Data: What Your IT Team Wants You to Know” report. Last year’s report showed some interesting takeaways — that 55 percent of Big Data projects aren’t completed, with 58 percent citing “inaccurate scope” as the reason for failure. We launched the questionnaire again this year and would love to hear your thoughts on what you want your CIO to know about Big Data. It only takes 10 minutes, and you can win cool prizes like an Amazon gift card

Many of you were interested in the data sheets we had at the booth too, but if you missed it you can still check out our adoption lifecycle data sheet and our solution overview online.

On a fun note, we hosted a Big Data Mixer, co-sponsored by Silicon Valley Data Science and Pacific Crest Securities, and we had a great time picking the brains of other industry pioneers. More than 120 of the best and brightest in Big Data joined in for food, drinks, and good conversation. We took a lot of photos that night (see a few teaser photos below), and you can see the entire album (around 500 total) here.

Screen Shot 2014 02 19 at 11.31.04 PM 300x197 Strata Santa Clara Recap + What Comes Next Screen Shot 2014 02 20 at 9.29.08 AM 300x195 Strata Santa Clara Recap + What Comes Next Screen Shot 2014 02 19 at 11.34.04 PM 252x300 Strata Santa Clara Recap + What Comes Next Screen Shot 2014 02 19 at 11.37.53 PM 262x300 Strata Santa Clara Recap + What Comes Next

Overall we had a blast at this year’s Strata Santa Clara! We hope you had a great time at Strata too, and we can’t wait to see you at the next one in New York. Until then, I’ll leave you with another awesome photo from our mixer featuring a couple of our chimps, Tim Gasper and Cameron Peek. Oh and if you can think of a good caption, I’m all ears.

Screen Shot 2014 02 19 at 10.51.07 PM 202x300 Strata Santa Clara Recap + What Comes Next

5fd3b37b f0ff 4b11 a9ba 54ff208f06f1 Strata Santa Clara Recap + What Comes Next

Live at Strata: Announcing a Workshop with our Big Data Experts

And we’re live at the 2014 O’Reilly Strata Conference! For the next three days, we’ll be joining the most brilliant minds in the Data and Analytics space to discuss the latest (and emerging) tools, technologies, trends and best practices. This year at Strata, Infochimps CEO Jim Kaskade will describe the state of Big Data from the perspective of our company’s work with some of the world’s top companies. He’ll provide a vision of what’s in store for the business landscape in 2014 and share some surprising trends in the world of data-driven decisions. Learn more about what Jim and the rest of our team are up to at the conference here.

4HdeoVb Live at Strata: Announcing a Workshop with our Big Data Experts

February Strata season always gets us excited, but this year we’re thrilled to present a specialized workshop with our leading Big Data Experts. With individualized attention to your business, our experienced team will help you apply key Big Data concepts and teachings to your own business problems and opportunities. If you’re interested in getting personalized recommendations, you can request a workshop here or ask any of the chimps at booth #740 for more info.

On that note, we can’t wait to chat with our peers here in Santa Clara, so be sure to stop by and say hello to us at booth #740 (we’ll be handing out awesome t-shirts too—seriously, take a look). See you out on the floor!

Image source:

5fd3b37b f0ff 4b11 a9ba 54ff208f06f1 Live at Strata: Announcing a Workshop with our Big Data Experts

Data Science: State of the Industry

O’Reilly has released their 2013 Data Science Salary Survey, and it’s a treasure trove of interesting information about the work of data science.

One of the most informative things I found was a breakdown of the data tools that were used most often by data scientists.

 Data Science: State of the Industry

This confirms a lot of hunches about the state of the industry:

  • SQL is the mack daddy of data science. It is used literally twice as much as Hadoop.

  • Excel and R are the analysis tools of choice. Since both of these tools can do multiple things (analysis and visualization), it makes sense that these would be more popular than single-use tools.

  • Scripting is widespread and diverse. Python, R, JavaScript, and Ruby are the glue of data science, with an especially strong showing for Python.

The big surprise to me was the relative unpopularity of SAS/SPSS. I think this effect may be exaggerated by the nature of the survey population (it was limited to people attending the Strata conference). However, a 4x disparity between R and Legacy vendors really highlights what I see as an accelerating trend towards open tools.

Another fascinating visualization was the breakdown of how different tools are used together by data scientists.

 Data Science: State of the Industry

In geek speak, this is a graph that describes the positive and negative correlations between tool usage. Visually, this separates into the traditional I/T world (in blue) and the new Hadoop world (in orange). “Visualization” might be a way to describe the red cluster, although Weka really breaks the mold.

What this tells me is that there is a definite geography to the work of data science. If traditional I/T is North America and Hadoop is South America, Tableau would be the Panama Canal, the conduit between the two continents. Also, this picture makes it easy to see why SQL is so popular. Like Starbucks, there’s at least one SQL-like tool in each of the clusters (Hive, MySQL, PostgreSQL, SQL, and SQL Server), with more on the way soon.

Looking at the big picture, this tells us three important things:

  1. Data science can come from anywhere. Innovation does not require the resources of the Fortune 500, nor the specialization of Silicon Valley. The work can leverage the strengths of either environment, and the best people can work anywhere.

  2. Virtually any company either already has or can inexpensively acquire the tools to do data science. If you can download R Studio and have a SQL database, you can start working like the pros.

  3. Data science isn’t thinking about real-time analytics, yet. Storm, Spark, and other tools are still cutting edge. Watch out for this in the 2014 survey.

Thanks O’Reilly, for the insight into data science and data scientists!

Dhruv Bansal is the chief science officer and co-founder of Infochimps, a CSC Big Data Business. He holds a B.A. in math and physics from Columbia University in New York and attended graduate school for physics at The University of Texas at Austin. For more information, email Dhruv at or follow him on Twitter at @dhruvbansal.

Image source:

119efc1b cf09 4f4f 9085 057e76e0464c Data Science: State of the Industry

Big Data and the Case for Optimism

Futurist Peter Diamandis gave an inspiring TED-talk in 2012, making the case for optimism in our world — that we’ll harness technology and continue to invent, innovate and create ways to solve the challenges that loom over us. If you’re not familiar with the technological singularity (aka, the singularity), it’s a theoretical moment in time when artificial intelligence will reach the point of surpassing the intelligence of the collective human species. Supposedly this will radically change human civilization, and “perhaps even human nature itself.”

To expand a little bit further on how the singularity might come about, take Moore’s Law into consideration. Moore’s Law is the observation that the number of transistors on integrated circuits doubles about every two years. In laymen’s terms, technology is getting better and more powerful at a staggering exponential rate, which leads some people believe there will be a period where progress in technology occurs almost instantly.

singularity graphic Big Data and the Case for Optimism

How does all this factor into the future of Big Data and manufacturing? Technology is becoming cheaper and more accessible. Faster computers build faster computers, bringing in more data through the process. Basically, we’re not moving backwards. Manufacturers are realizing that making decisions on gut instincts simply won’t get the job done in the most efficient way possible. Because of technology, businesses can measure operations, interactions with customers, human resources, supply chain relationships, and more with complete accuracy.

But what good is all this meaningful data without a way to harness it and use it to your advantage? The nature of Big Data itself is just that: big data (or data too large to process through traditional methods) — but the fact is organizations that use Big Data to replace guesswork are those that become significantly more profitable than their competitors.

Stay tuned for part two of this post series, where we’ll go into a specific case study on how leveraging Big Data worked to a company’s advantage. (Sneak peek: How Big Data Transformed the Dairy Industry! Moo.)

Rhea Somaney is the community manager at Infochimps, a CSC Big Data Business, and the newest chimp to join the team. She has followed her passion for technology throughout her career, working previously for tech startups like BlackLocus and Main Street Hub. When she’s not working, you can probably find Rhea watching movies, exploring the Austin food scene, or trying to finish one of the many books on her To-Read list.

Image source:

6fefa857 2e95 4742 9684 869168ac7099 Big Data and the Case for Optimism

Movies + Charts = Nerdy Creativity

I love movies. I love charts. I have to say that FlowingData did it again – this is brilliant:

AFI movie quotes Movies + Charts = Nerdy Creativity


In celebration of their 100-year anniversary, the American Film Institute selected the 100 most memorable quotes from American cinema. FlowingData took those quotes and created the 100 most memorable quotes in chart form.

See the chart in bigger detail, here. >>

As always, thank you FlowingData for providing interesting posts for us data nerds.

3527b357 2038 47ae a163 deda4a8c5176 Movies + Charts = Nerdy Creativity

Becoming a Believer in Artificial Intelligence

Big Think originally published this transcript of Eric Siegel’s own words. The article relates to his book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.

Why I Became a Believer in Artificial Intelligence

I’ve been asked periodically for a couple of decades whether I think artificial intelligence is possible.  And I taught the artificial intelligence course at Columbia University.  I’ve always been fascinated by the concept of intelligence.  It’s a subjective word.  I’ve always been very skeptical. And I am only now newly a believer.

Now this is subjective: my opinion is that IBM’s Watson computer is able to answer questions, and so, in my subjective view, that qualifies as intelligence.  I spent six years in graduate school working on two things.  One is machine learning and that’s the core to prediction – learning from data how to predict.  That’s also known as predictive modeling. And the other is natural language processing or computational linguistics.

Working with human language really ties into the way we think and what we’re capable of doing and that does turn out to be extremely hard for computers to do.  Now playing the TV quiz show Jeopardy means you’re answering questions – quiz show questions.  The questions on that game show are really complex grammatically.  And it turns out that in order to answer them Watson looks at huge amounts of text, for example, a snapshot of all the English speaking Wikipedia articles.  And it has to process text not only to look at the question it’s trying to answer but to retrieve the answers themselves.  Now at the core of this it turns out it’s using predictive modeling.  Now it’s not predicting the future but it’s predicting the answer to the question.

The core technology is the same.  In both cases it involves learning from examples.  In the case of Watson playing the TV show Jeopardy it takes hundreds of thousands of previous Jeopardy questions from the TV show having gone on for decades and learns from them.  And what it’s learning to do is predict whether this candidate answer to this question is likely to be the correct answer.  So it’s going to come up with a whole bunch of candidate answers, hundreds of candidate answers, for the one question at hand at any given point in time.  And then amongst all these candidate answers it’s going to score each one.  How likely is it to be the right answer?  And, of course, the one that gets the highest score as the highest vote of confidence – that’s ultimately the one answer it’s going to give.

READ 300x80 Becoming a Believer in Artificial Intelligence



Eric Siegel, Ph.D., founder of Predictive Analytics World and Text Analytics World, and Executive Editor of the Predictive Analytics Times, makes the how and why of predictive analytics understandable and captivating. In addition to being the author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Eric is a former Columbia University professor, and a renowned speaker, educator, and leader in the field. 

406b489e b14e 4684 bbd3 c316b533aea8 Becoming a Believer in Artificial Intelligence

Strata Santa Clara – 25% Discount + Early Price Special

strata logo 500px 300x219 Strata Santa Clara   25% Discount + Early Price SpecialLast October, at Strata + Hadoop World, Infochimps announced Application Reference Designs and had a booth full of t-shirt giving chimps. This February, at Strata Santa Clara, Infochimps will be back with the same enthusiastic team (and t-shirts, of course) eager to talk about Big Data.

The O’Reilly Strata Conference always delivers. They bring together the brightest minds in data science and Big Data: decision makers using data to drive business strategy, as well as practitioners who collect, analyze, and manipulate it. Join Infochimps as we mingle with over 150 of the leading data experts, network with our peers, and hear about the latest (and emerging) data tools, technologies, and best practices.

Infochimps will be exhibiting at Booth # 740 so be sure to stop by and grab our famous Infochimps t-shirt, chat with exhibiting team members, or set up a 1:1 meeting, we’d love to chat!

CONTACT 300x78 Strata Santa Clara   25% Discount + Early Price Special



Not registered? Register today and save 25% with discount code: INCH1 on top of the early price special ending January 9th. Strata sells out every conference, so register today!

119efc1b cf09 4f4f 9085 057e76e0464c Strata Santa Clara   25% Discount + Early Price Special

CSC 2014 Forecast: Big Data Trends (+ Bonus Predictions)

Big Data infrastructure was so last year. In 2013, many companies reported success in bringing together their legacy mainframe infrastructure with new Big Data infrastructure. In 2014, we’ll see companies shift their attention to putting that infrastructure investment to use.

See what Andy Walker, vice president and general manager of Big Data & Analytics at CSC, expects to see happening in enterprise Big Data in 2014. Watch the following video for 3 Big Data predictions for 2014:

For 3 bonus predictions

READ 300x80 CSC 2014 Forecast: Big Data Trends (+ Bonus Predictions)

b0bae296 90b0 4bfe 8177 b5ac72be71c6 CSC 2014 Forecast: Big Data Trends (+ Bonus Predictions)

Jim Kaskade’s Big Data Top 10

Big Data 300x298 Jim Kaskades Big Data Top 10

What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig?

This opening question asked by Infochimps CEO, Jim Kaskade, sets the stage in his newest blog post, “Big Data Top Ten“. The following excerpt highlights his general prediction – leading to his top 10 Big Data predictions in 2014.

Big Data Top Ten
— My general prediction is that Cloudera and Hortonworks are both aggressively moving to fulfilling a vision which looks a lot like Gartner’s “Logical Data Warehouse”….namely, “the next-generation data warehouse that improves agility, enables innovation and responds more efficiently to changing business requirements.”
In 2012, Infochimps (now CSC) leveraged its early use of stream processing, NoSQLs, and Hadoop to create a design pattern which combined real-time, ad-hoc, and batch analytics. This concept of combining the best-in-breed Big Data technologies will continue to advance across the industry until the entire legacy (and proprietary) data infrastructure stack will be replaced with a new (and open) one.

For Jim Kaskade’s Top 10 Predictions, read the full article, here. >>

6fefa857 2e95 4742 9684 869168ac7099 Jim Kaskades Big Data Top 10