Tell the White House What You Think About Their Use of Big Data

When you think of the US government and big data, it’s almost impossible not to think about the issue of privacy and the growing concerns surrounding it. In light of multiple controversies surrounding this topic including Edward Snowden, the NSA and general US spying, the government has (perhaps belatedly) opened conversations on the future of privacy — and you can join in.

obamaprivacy 300x148 Tell the White House What You Think About Their Use of Big Data

Back in January the White House launched a 90-day comprehensive review on the use of big data and its impact on the future of privacy. Part of this review includes a short, public survey where you can voice your opinions on who you trust (or don’t) with your data. The survey asks a series of questions like how different types of data collection concern you (with options ranging from “not at all” to “very concerned”), and there’s a free-form section where you can share everything on your mind. Headed by President Obama’s counselor John Podesta, the review should offer insight and eventually lead to an action plan on how the government uses big data.

We know you have lots to say about this so go on, tell the prez what you think HERE. While you’re at it, let us know what you think about this initiative in the comments too — we’re curious!

6fefa857 2e95 4742 9684 869168ac7099 Tell the White House What You Think About Their Use of Big Data

Science Tells Us How to Have a Happy Relationship

The secret to a happy relationship has finally been cracked thanks to Happify and countless hours of scientific research. Now I know some of you have probably seen this infographic before (probably a couple months ago around Valentine’s Day), but I thought it was so good I had to share in case you missed it. Also who doesn’t want a few tips every now and then about how to dominate your romantic life?

The infographic includes some insightful pointers, but here’s a little tip of my own to help send you down the path to happiness with your partner — send it to your significant other for brownie points because it’s going to start a dialogue. Not only that but it’s going to start a positive dialogue (I’m 95% sure on that but don’t quote me), which just so happens to be the number one takeaway: Have a good positive to negative interaction ratio. This is probably the most important message, and the rest is all about how to reach that ratio.

Getting a good positive to negative interaction ratio comes in many shapes and sizes, but we’ll let the data speak for itself. Here’s how you and your partner can make it happen:

7TknIii Science Tells Us How to Have a Happy Relationship

25 Years of the World Wide Web

Anyone reading this blog post right now knows the significance of the World Wide Web. It’s an invention that has revolutionized our world and given rise to seemingly boundless creativity, innovation, collaboration and knowledge — but it hasn’t yet reached its full potential. Father of the Web, Tim Berners-Lee, named some of the key challenges we still face:

  • How do we connect the nearly two-thirds of the planet who can’t yet access the Web?
  • Who has the right to collect and use our personal data, for what purpose and under what rules?
  • How do we create a high-performance open architecture that will run on any device, rather than fall back into proprietary alternatives?

Berners-Lee and the Web Foundation are launching the net-neutrality Web We Want campaign to promote changes in public policy to make sure the web stays open, free and accessible. In his guest blog post for Google Berners-Lee writes, “On the 25th birthday of the web, I ask you to join in—to help us imagine and build the future standards for the web, and to press for every country to develop a digital bill of rights to advance a free and open web for everyone. Learn more at and speak up for the sort of web we really want with #web25.”

119efc1b cf09 4f4f 9085 057e76e0464c 25 Years of the World Wide Web

The Quantified Cow: The Internet of Things for the Dairy Industry

The future is here. We’ve all heard about the Internet of Things, another buzz word circulating the tech community recently. Although technically in existence for more than two decades, the Internet of Things movement has gained greater momentum in the last few years—most notably stepping into a bigger spotlight with Google’s $3.2 billion purchase of Nest Labs, a home device company responsible for the best-selling Nest thermostat. By keeping track of manually inputted temperature settings and surrounding environmental data like room humidity and lighting, Nest eventually collects enough data to learn the daily behavior and preferences of the residents in the home.

These ideas tie into the concept of the Quantified Self, the movement to incorporate technology into data acquisition on aspects of a person’s daily life. Things like daily food consumption, quality of surrounding air, blood oxygen levels, physical and mental performance, and even mood and arousal can be tracked, measured and analyzed—all in the name of improving daily functions and making better decisions (or maybe just nodding thoughtfully at the data instead).

Milking the Benefits in the Dairy Industry

So how does the Quantified Self and the Internet of Things fit with cows, pastures, farmers and milk? Three words: robotic milking machines. Dutch company Lely, self-proclaimed innovators in agriculture, created the Astronaut A4, a state-of-the-art “fully automated milk harvester.” Although the robotic milking machine will set you back about $200,000, the Lely Astronaut A4 collects a large range of cow data to help dairy farmers make better decisions regarding milk production and herd management.

cow 300x168 The Quantified Cow: The Internet of Things for the Dairy Industry

The A4 keeps track of each individual cow’s feeding and health history, preventing cows from sneaking back into the machine for more food if they return too close to their last visit. The system tracks different variables on each cow as it’s being milked: its weight, milk production, time required for milking, amount of feed eaten, and how long the cow chews on its cud. If there’s a health issue with one of the herd, farmers can isolate the problem right away. The machine collects data on the milk itself too, checking the color fat and protein content, temperature, somatic cell count and overall quality.

Equipped with access to more data, dairy farmers are able to gain greater knowledge into their industry and thus maximize outputs. All of this data have translated to better decision-making for the farmers, better quality control of milk production and generally happier cows—and who doesn’t want happy cows? Having a machine do the work allows farmers to focus their energy elsewhere too, freeing up time for really anything else. The trend is clear: as the technology continues to get better, I have a feeling we’ll be seeing a lot more quantifiable and actionable data. Quantified Self, the movement to incorporate technology into data acquisition on aspects of not only a person’s daily life, but a cow’s daily life as well.

162e2ef6 f2d3 4701 97b7 4fd140b7a864 The Quantified Cow: The Internet of Things for the Dairy Industry

Take Our CIO & Big Data Survey (and win $100 Amazon gift cards)

We learned a lot from our 2013 CIOs & Big Data report — for instance, that 96% of enterprises have Big Data in their top 10 priorities list, but 55% of Big Data projects aren’t completed. This year we’re interested in seeing if the stats have changed, and we want to hear from you.

takethesurvey big Take Our CIO & Big Data Survey (and win $100 Amazon gift cards)

Take this quick survey and tell us what you really want your execs to know when doing Big Data projects. It only takes 10 minutes, and you can win awesome prizes like $100 Amazon gift cards and a membership to, one of the largest community-driven sites focused on enterprise technology, IT education and professional growth.

Although the gift cards cap out at $100, we think letting your execs know how you truly feel is pretty priceless. Take the survey here.

162e2ef6 f2d3 4701 97b7 4fd140b7a864 Take Our CIO & Big Data Survey (and win $100 Amazon gift cards)

Reinvention of Enterprise Analytics

Authors: Mark Lenke and Shawn Nelson

Breaking the Barriers of Time and Expense

There’s no doubt that companies have benefited tremendously from business intelligence (BI) applications. Enterprise business intelligence (EBI) has enabled companies to spot emerging trends, identify new markets, serve customers more effectively and improve operational efficiencies.

Recently though, EBI solutions have had a hard time adapting to the information explosion that companies have experienced. Attempting to stuff massive volumes of data into the structure required by traditional BI systems is inefficient, expensive and time consuming.

Screen Shot 2014 02 28 at 2.00.59 PM 231x300 Reinvention of Enterprise Analytics

What’s more, systems have become more complex and difficult to use, limiting the types of insights that can be generated in a reasonable time frame. Big data solutions, which can efficiently handle large volumes of data, can also require people with a specific and hard-to-find skill set in order to get results. Typically, business process experts feel shut out from advances because new systems are too hard to use.

As a result, many companies have avoided implementing more advanced BI or next-generation big data solutions because there is a perception — real or imagined — that they take too much time and are too hard to use to justify the expense involved in implementation.

But avoiding the change means that companies are missing out on opportunities to gain new insights that can radically transform their business. The next generation of BI systems, commonly referred to as big data, offers a huge leap forward in capabilities and features. Big data and analytics can help companies ask sophisticated, forward-looking questions that make new connections between seemingly unrelated trends. Big data and analytics can power new types of applications that provide real-time feedback, putting insights directly into the hands of people who can use that information.

This paper examines advances in big data infrastructure and applications that can help companies overcome the challenges associated with bringing these new systems to life.

READ 300x80 Reinvention of Enterprise Analytics

Read the full article “Reinvention of Enterprise Analytics” and gain access to the “Breaking the Barriers of Time and Expense” white paper on the CSC Big Data & Analytics blog.

162e2ef6 f2d3 4701 97b7 4fd140b7a864 Reinvention of Enterprise Analytics

Does the Big Data Solution Exist?

What is a Big Data solution and what does it take to make a project successful? Perform your own experiment by posing this question to technology companies in the Big Data space. Then pose the same question to the pure service providers that are focused on Big Data. Finally, pose the same question to a few customers. Here is what I have found:

Technology providers will talk in terms of their specific contribution to the solution. Let’s think of the architectural stack from the bottom up. In the simplest terms, the Big Data solution is enabled by the infrastructure, the platform for the analytics to be performed, data software (which includes everything from data ingestion to statistical analysis), the visualization of the data, and the applications that depend on this solution. It is the sum of the parts, which no one vendor has, which makes up the enabling technologies that is “Big Data.”

big data 2 300x168 Does the Big Data Solution Exist?Service providers will talk in terms of business needs to understand what value there is in the data (e.g., use case discoveries, the data science engagements, proof-of-value offerings, implementation assistance, and application development).

Customers interested in Big Data are looking to simplify things to get to the incremental and previously unattainable insights that are the promise of Big Data. That journey, however, is a very complex one and one that is not without risk. The customer answer depends on who you ask. Ask IT and they may talk technology and the partners they prefer. Ask the application team or analytics team and your answers will straddle both the business value discussions and the technology needed to get to those answers. Lastly, the more progressive line of business decision makers aren’t interested in the complexities that make up a Big Data solution, but they are interested in the game changing insight that will allow them to create new service offerings or help to make the business more efficient as a result of the analytics being performed.

Is it now time to say that all of these answers combined is what makes up a Big Data solution? Not quite. Compliance and security are considerations businesses must address. Add to this, the deployment options which include on-premise bare metal, on-premise private cloud, a private secured cloud, a hybrid approach with both data center and cloud resources available, and finally public options like Amazon, Google, AT&T, and others. Not to mention, the talent needed to do this all in-house by customers of all sizes isn’t readily available.

The war to win in the Big Data space is being waged and customers are in the middle of it. Continuing the analogy further, customers would like to sit the war out and have the Big Data solution provided to them, removing the confusion, complexity and concern.

Now ask yourself the question, “What is a Big Data solution and what does it take to make your project successful?” Now the answer…it’s easier than you think. Ask yourself who has the technology expertise, services capabilities, customer proof points, provide flexibility in deployment, and has the option to provide all of this in a managed service so that you pay for just what you use. Those who provide “The Big Data Solution” exist. You just need to ask the right questions and look in the right places for those answers.

Alan Geary, VP of business development at Infochimps, a CSC Big Data Business, has focused on business and channel development at software and technology companies that have grown through partnering. Alan has a unique combination of Big Data and Cloud experience by working over the last decade at both a Hadoop distribution company and VMware. Both companies doubled revenue year over year with the partnerships playing a significant role in the adoption of both Hadoop and virtualization respectively.

Image source:

5fd3b37b f0ff 4b11 a9ba 54ff208f06f1 Does the Big Data Solution Exist?

Strata Santa Clara Recap + What Comes Next

Another year, another great Strata! Team Infochimps is back in Austin, Texas, but hopefully you stopped by and grabbed a t-shirt at our booth before you left. We saw a few familiar faces and made a lot of new friends — but in case you weren’t at Strata, here’s what you may have missed…

We got a lot of questions about our workshop, where you can meet with our experienced data scientists to learn key Big Data concepts and best practices. We’re offering personalized recommendations for each workshopper and sharing the secrets of success we’ve seen from other businesses. If you missed us at the booth or haven’t heard about it yet, you can learn more and request a workshop here too.

Many of you asked about our second annual “CIOs & Big Data: What Your IT Team Wants You to Know” report. Last year’s report showed some interesting takeaways — that 55 percent of Big Data projects aren’t completed, with 58 percent citing “inaccurate scope” as the reason for failure. We launched the questionnaire again this year and would love to hear your thoughts on what you want your CIO to know about Big Data. It only takes 10 minutes, and you can win cool prizes like an Amazon gift card

Many of you were interested in the data sheets we had at the booth too, but if you missed it you can still check out our adoption lifecycle data sheet and our solution overview online.

On a fun note, we hosted a Big Data Mixer, co-sponsored by Silicon Valley Data Science and Pacific Crest Securities, and we had a great time picking the brains of other industry pioneers. More than 120 of the best and brightest in Big Data joined in for food, drinks, and good conversation. We took a lot of photos that night (see a few teaser photos below), and you can see the entire album (around 500 total) here.

Screen Shot 2014 02 19 at 11.31.04 PM 300x197 Strata Santa Clara Recap + What Comes Next Screen Shot 2014 02 20 at 9.29.08 AM 300x195 Strata Santa Clara Recap + What Comes Next Screen Shot 2014 02 19 at 11.34.04 PM 252x300 Strata Santa Clara Recap + What Comes Next Screen Shot 2014 02 19 at 11.37.53 PM 262x300 Strata Santa Clara Recap + What Comes Next

Overall we had a blast at this year’s Strata Santa Clara! We hope you had a great time at Strata too, and we can’t wait to see you at the next one in New York. Until then, I’ll leave you with another awesome photo from our mixer featuring a couple of our chimps, Tim Gasper and Cameron Peek. Oh and if you can think of a good caption, I’m all ears.

Screen Shot 2014 02 19 at 10.51.07 PM 202x300 Strata Santa Clara Recap + What Comes Next

5fd3b37b f0ff 4b11 a9ba 54ff208f06f1 Strata Santa Clara Recap + What Comes Next

Live at Strata: Announcing a Workshop with our Big Data Experts

And we’re live at the 2014 O’Reilly Strata Conference! For the next three days, we’ll be joining the most brilliant minds in the Data and Analytics space to discuss the latest (and emerging) tools, technologies, trends and best practices. This year at Strata, Infochimps CEO Jim Kaskade will describe the state of Big Data from the perspective of our company’s work with some of the world’s top companies. He’ll provide a vision of what’s in store for the business landscape in 2014 and share some surprising trends in the world of data-driven decisions. Learn more about what Jim and the rest of our team are up to at the conference here.

4HdeoVb Live at Strata: Announcing a Workshop with our Big Data Experts

February Strata season always gets us excited, but this year we’re thrilled to present a specialized workshop with our leading Big Data Experts. With individualized attention to your business, our experienced team will help you apply key Big Data concepts and teachings to your own business problems and opportunities. If you’re interested in getting personalized recommendations, you can request a workshop here or ask any of the chimps at booth #740 for more info.

On that note, we can’t wait to chat with our peers here in Santa Clara, so be sure to stop by and say hello to us at booth #740 (we’ll be handing out awesome t-shirts too—seriously, take a look). See you out on the floor!

Image source:

5fd3b37b f0ff 4b11 a9ba 54ff208f06f1 Live at Strata: Announcing a Workshop with our Big Data Experts

Data Science: State of the Industry

O’Reilly has released their 2013 Data Science Salary Survey, and it’s a treasure trove of interesting information about the work of data science.

One of the most informative things I found was a breakdown of the data tools that were used most often by data scientists.

 Data Science: State of the Industry

This confirms a lot of hunches about the state of the industry:

  • SQL is the mack daddy of data science. It is used literally twice as much as Hadoop.

  • Excel and R are the analysis tools of choice. Since both of these tools can do multiple things (analysis and visualization), it makes sense that these would be more popular than single-use tools.

  • Scripting is widespread and diverse. Python, R, JavaScript, and Ruby are the glue of data science, with an especially strong showing for Python.

The big surprise to me was the relative unpopularity of SAS/SPSS. I think this effect may be exaggerated by the nature of the survey population (it was limited to people attending the Strata conference). However, a 4x disparity between R and Legacy vendors really highlights what I see as an accelerating trend towards open tools.

Another fascinating visualization was the breakdown of how different tools are used together by data scientists.

 Data Science: State of the Industry

In geek speak, this is a graph that describes the positive and negative correlations between tool usage. Visually, this separates into the traditional I/T world (in blue) and the new Hadoop world (in orange). “Visualization” might be a way to describe the red cluster, although Weka really breaks the mold.

What this tells me is that there is a definite geography to the work of data science. If traditional I/T is North America and Hadoop is South America, Tableau would be the Panama Canal, the conduit between the two continents. Also, this picture makes it easy to see why SQL is so popular. Like Starbucks, there’s at least one SQL-like tool in each of the clusters (Hive, MySQL, PostgreSQL, SQL, and SQL Server), with more on the way soon.

Looking at the big picture, this tells us three important things:

  1. Data science can come from anywhere. Innovation does not require the resources of the Fortune 500, nor the specialization of Silicon Valley. The work can leverage the strengths of either environment, and the best people can work anywhere.

  2. Virtually any company either already has or can inexpensively acquire the tools to do data science. If you can download R Studio and have a SQL database, you can start working like the pros.

  3. Data science isn’t thinking about real-time analytics, yet. Storm, Spark, and other tools are still cutting edge. Watch out for this in the 2014 survey.

Thanks O’Reilly, for the insight into data science and data scientists!

Dhruv Bansal is the chief science officer and co-founder of Infochimps, a CSC Big Data Business. He holds a B.A. in math and physics from Columbia University in New York and attended graduate school for physics at The University of Texas at Austin. For more information, email Dhruv at or follow him on Twitter at @dhruvbansal.

Image source:

119efc1b cf09 4f4f 9085 057e76e0464c Data Science: State of the Industry