Monthly Archives October 2013

Announcing Application Reference Designs

Today at the Strata NY + Hadoop World Conference, we announced a new key component to our business analytics offerings, which empowers enterprises with agile development and rapid deployment of scalable Big Data applications.

Designed with the expertise gained from experience with our customers in ad tech, manufacturing, healthcare, financial services, and with use cases involving social media, and customer service, these pre-packaged frameworks for the development of Big Data applications enable businesses to quickly execute targeted and agile analytics strategies tailored to the individual needs of an organization.

CxOs can not afford to wait 24 months for their Big Data application to launch before they start making mission-critical course corrections to their business. Our customers need to deliver value years ahead of their competition.

Today I’m pleased to announce the launch of a disruptive suite of Application Reference Designs, fueling a new era of analytic application development.

App Reference Designs Release Image 2 300x123 Announcing Application Reference Designs

Read the Full Press Release Here >

Request a Demo >

406b489e b14e 4684 bbd3 c316b533aea8 Announcing Application Reference Designs

Data Science and the Personal Optimization Problem

Data Science 300x174 Data Science and the Personal Optimization Problem“What gets measured gets done” is a common refrain.  And, to a large extent, that is how the business world works.  As Data Scientists, we have an outsized influence on what gets measured (and by extension, what gets done) in a business.  This is especially true with advent of predictive analytics.  We have a lot of responsibility, and we need to use it wisely.

Data Scientists need to be proactive to ensure that what we model and predict and measure provides quantifiable value for our organization.  But how can we do this, realistically?  After all, the numbers are the numbers, we are just drawing conclusions from them.  Right?  The truth is that you can have two Data Scientists develop models with the same tools against the same data and one analysis can be significantly more valuable to the people paying the bills.  It is our own personal optimization problem.

A salesperson usually has a number of accounts where revenue comes in from.  A typical consultant has one or more projects that they can bill hours to.  However, if you are in R&D or on staff in a support role, how can you ensure that your data science is valuable to your organization?

As a Data Scientist, the best barometer for the business value of your work is how well it:

  1. Generates Revenue
  2. Reduces Cost
  3. Eliminates Risk

That sounds great, but does a Data Scientist know that what they are working on is valuable?  This can be especially hard to figure if you are working in a supporting role or are in a shared service environment, such as a centralized data science team in a large organization.  My colleagues and I have had long discussions on this subject, and it seems that there is little consensus on how to do this effectively.

However, I have one sure-fire way to make sure that your data science is as valuable to your organization as you are.

Personal Optimization for Data Scientists

For every project that you work on, imagine that your part is going to be used as an entry on your resume in a section marked “Major Accomplishments” (there are lots of resume guides available that talk about how to do this).  Now, think about a hiring manager who is looking at your resume; not some bozo or corporate drone who is just there to fill bodies. Imagine a shark, someone who knows the industry inside and out and wants only to hire the best; someone who knows the data and the math and can sniff out a phony a mile away.

The hiring manager is going to grill you for detailed answers about your major accomplishments.  They want to know what you know and how you learned it.  They want to know what went well and what didn’t.  They want to know if you can do the same (or better) work for them.  They want to make sure that you know the theory and the application, and can deliver on the goods in a timely manner.  This is the definition of the bottom line.

Can you comfortably sit down in front of this person and talk about your major accomplishments?  Is your data science adding to your list of accomplishments?

Making Data Science Count

Data science has some really fantastic tools such as machine learning, data mining, statistics, and predictive modeling.  They are only going to get better in the future. However, we have to remember that these are just tools at our disposal.  Having skilled craftsmen using the best tools is key, but the most important thing we can do is to make sure that we are building the right things.

One of the things I like best about the Infochimps Cloud is that it takes care of all the infrastructure and architecture work in building a Big Data solution, and lets me focus on really figuring out how to make a valuable solution.  I don’t have to worry about building a Hadoop cluster for batch analytics, or stitching together Storm and Elasticsearch and Kibana to deliver real-time visualizations.  I also don’t have to worry about scaling things up if and when my data volume goes through the roof.

When I build with Infochimps, I know that my effort is being harnessed to build out major accomplishments; not to build sandboxes or dither with infrastructure issues. If you would like to learn more about Infochimps and the value of real-time data science, come by and see us at Strata in New York on October 28-30.  See you there!

Morgan Goeller is a Data Scientist at Infochimps, a CSC company. He is a longtime numbers guy with a B.S. in Mathematics and background in Hadoop, ETL, and Data Warehousing. Morgan lives in Austin, Texas with his wife, sons, and many cats and dogs.

3527b357 2038 47ae a163 deda4a8c5176 Data Science and the Personal Optimization Problem

Photo credit:

Can Big Data Save Them?

Strata Hadoop Can Big Data Save Them?Early next week, located in New York, NY from Oct. 28-30, Infochimps will be going big at Strata + Hadoop World, along side thousands of the best minds in data gathering to learn, connect, share knowledge, and explore.

Strata + Hadoop World is one of the largest gatherings of the Apache Hadoop community in the world, with emphasis on hands-on and business sessions on the Hadoop ecosystem. If you want to tap into the opportunities brought by Big Data, data science, and pervasive computing, you’ll want to be there.

Easily the biggest show of the year for us, we’re looking forward to:

  • Infochimps CEO Jim Kaskade keynoting Wed, Oct. 30 at 9:50am EDT in the Grand Ballroom about Cancer and Big Data:
    • Can Big Data Save Them? Data and analytics is a means to an end. Jim highlights a new revolution of analytic applications with some touching examples in the healthcare industry with cancer research and medication therapy management.
  • Giving you our famous Infochimps t-shirt at Booth #38. Meet a bunch of eager chimps ready to talk about Big Data. Key exhibiting team members include our VP of Sales Burke Kaltenberger, Director of Marketing Amanda McGuckin Hager, Director of Product Tim Gasper, Director of Sales Strategy and Operations Ryan Miller, Demand Gen Manager Caroline Lim, Sales Engineer Morgan Goeller, and our VP of Business Development.
  • Meetings with you! Set up a meeting with us by emailing

CONTACT Can Big Data Save Them?

119efc1b cf09 4f4f 9085 057e76e0464c Can Big Data Save Them?

More Complex in Asia: Mapping the Most Visited Website by Country

Being engulfed in the online world, this article from FlowingData caught my attention about the most visited website by country. Mark Graham and Stefano De Sabbata from Information Geographies mapped the most visited site based on Alexa data. Countries are sized by Internet population.

Seeing the pretty visual graphic along with the post didn’t draw my attention to the red and blue (the obvious Google and Facebook takeovers in the Americas and Europe), but instead to the massive screaming green.

TopSitePerCountry InternetPopulation More Complex in Asia: Mapping the Most Visited Website by Country

Mark Graham and Stefano De Sabbata’s findings suggest “the situation is more complex in Asia, as local competitors have been able to resist the two large American empires. Baidu is well known as the most used search engine in China, which is currently home to the world’s largest Internet population at over half a billion users. At the same time, we see a puzzling fact that Baidu is also listed as the most visited website in South Korea (ahead of the popular South Korean search engine, Naver). We speculate that the raw data that we are using here are skewed. However, we may also be seeing the Baidu empire in the process of expanding beyond its traditional home territory. The remaining territories that have escaped being subsumed into the two big empires include Yahoo! Japan in Japan (in join venture with SoftBank) and Yahoo! in Taiwan (after the acquisition of Wretch). The Al-Watan Voice newspaper is the most visited website in the Palestinian Territories, the e-mail service is the most visited in Kazakhstan, the social network VK the most visited in Belarus, and the search engine Yandex the most visited in Russia.”

READ More Complex in Asia: Mapping the Most Visited Website by Country



Thank you FlowingData for providing interesting findings for us data nerds.


b0bae296 90b0 4bfe 8177 b5ac72be71c6 More Complex in Asia: Mapping the Most Visited Website by Country

Democratizing Big Data: We Get It

We’ve been there.

Maybe you’re an enterprise with huge data sets, competing in a saturated market like telecommunications, healthcare or financial services.

Or maybe you’re a startup that has lots of data but not the manpower to handle the data.

Or maybe you’re a retailer moving from multichannel to omnichannel, but you’re struggling to synthesize data from disparate sources, like legacy point-of-sale systems and Foursquare check-in data.

Or maybe you’re something else entirely, for whom the promise of Big Data seems like a pipe dream because:

  • The infrastructure hurdle is a towering one:
    • How do you acquire, store and manage all that data?
    • How do you integrate tools like Hadoop, Storm, Kafka, NoSQL, and others in ways that produce transformational insights for your business?
    • How do you plan a technology stack with the elasticity to scale for use cases known and unknown?
  • You’re understaffed for the project you’re considering, and you can’t afford to poach experts from Big Data pedigree houses like Google, LinkedIn, Twitter, etc.
  • The clock is ticking. No one is giving you a year to go on a Big Data fishing expedition. You need quick initial results and even quicker iterations.

Ok, deep breaths. We get it. We’ve been there.

Infochimps’ growth from data marketplace to Platform as a Service (PaaS) makes us a highly evolved group of chimps: The kind who can take any kind of data, and do any kind of analytics with it, in any type of cloud. We’ve worked with every kind of database. We can produce batch, streaming and ad hoc analytics. And we can deploy from public, private and hybrid clouds.

And we do it quickly. While typical Big Data projects take over a year to yield results, we can have your first use case in production in 90 days, and complete subsequent projects in weeks.

Our approach to Big Data is built on Infochimps™ Cloud for Big Data: three essential cloud services that unleash the full analytic capabilities needed to solve any enterprise Big Data problem. Infochimps Cloud expedites and simplifies development and deployment of Big Data applications.

So if you’re late to the Big Data game or you’ve been beaten in it before, let’s talk. Infochimps can save your organization hardware and hiring costs, while accelerating results – enabling you to unlock insights that can transform your business.

6fefa857 2e95 4742 9684 869168ac7099 Democratizing Big Data: We Get It