Monthly Archives April 2013

[New Whitepaper] Real-Time Data Aggregation

Fast response times generate costs savings and greater revenue. Enterprise data architectures are incomplete unless they can ingest, analyze, and react to data in real-time as it is generated. While previously inaccessible or too complex — scalable, affordable real-time solutions are now finally available to any enterprise.

StormKafka1 e1366923782399 [New Whitepaper] Real Time Data Aggregation

Read Infochimps’ newest whitepaper on how Infochimps Cloud::Streams is a proprietary stream processing framework based on four years of experience with sourcing and analyzing both bulk and in-motion data sources. It offers a linearly and fault-tolerant stream processing engine that leverages a number of well-proven web-scale solutions built by Twitter and Linkedin engineers, with an emphasis on enterprise-class scalability, robustness, and ease of use.

In this whitepaper, you’ll learn:

  • Definitions & History – batch processing, stream processing
  • Comparison of Stream vs. Batch for Selected Use Cases – includes industry use case: aviation
  • Why Cloud::Streams is the leading stream processing framework

DOWNLOAD1 [New Whitepaper] Real Time Data Aggregation

229fa9b4 2ea6 4535 8a80 e041d110204c [New Whitepaper] Real Time Data Aggregation

Infochimps Recognized in Inaugural Big Data 100 List

CRN Big Data 100 Infochimps Recognized in Inaugural Big Data 100 ListInfochimps is proud to be named among UBM Tech Channel’s CRN 2013 Big Data 100 list, developed by the CRN editorial team to include “vendors that have demonstrated an ability to innovate in bringing to market products and services that help businesses manage Big Data.” The list consists of 3 categories: business analytics, data management, and infrastructure and services.

Infochimps was named within the Big Data infrastructure and services category – identified as 1 out of 25 “IT vendors who can do it all, from data storage hardware and software, to management tools, to business analytics.” We are proud to be recognized alongside other innovative companies such as Amazon Web Services, Oracle, and Rackspace.

Thank you CRN for understanding the struggle with increasing volume, speed and variety of information being generated today; identifying Infochimps Enterprise Cloud as a solution to help companies address their Big Data needs.

229fa9b4 2ea6 4535 8a80 e041d110204c Infochimps Recognized in Inaugural Big Data 100 List

Image Source: CRN

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics

InfochimpsThinkBig [Webinar] Measure Twice, Build Once: Real Time Predictive AnalyticsThurs, May 9 @ 11amPT, 1pmCT, 2pm ET

Measure Twice, Build Once: Hadoop and other Big Data technologies are not solutions to business problems in and of themselves, but they do have the capability of supporting your business goals and impacting your top and bottom lines.  This webinar walks you through essential steps of identifying your business goal and then building the right infrastructure to support it. We will provide use cases of the types of data that should be collected and the real-time, predictive or insightful analytic applications needed to ensure success.

Register for this live webcast and listen to Infochimps CSO and Co-Founder, Dhruv Bansal, and Think Big Analytics Principal Architect, Douglas Moore, share successful use cases and recommendations for building real-time predictive analytics in your enterprise.

Who should attend?: This webcast is ideal for CIOs, CMOs, CEOs, Project Managers, Analysts, and IT professionals with expected or current Big Data projects at any stage.

Register Today >>

229fa9b4 2ea6 4535 8a80 e041d110204c [Webinar] Measure Twice, Build Once: Real Time Predictive Analytics

Cloud. It’s More Than Just Price

It’s not about price, as GigaOM recently posted an article that discusses shifting motivations for adopting cloud.  Sure, adopting cloud will in some cases be a smaller total cost of ownership (TCO), as well as representing a variable (OpEx) expenditure instead of one big upfront investment (CapEx). Despite cloud vendor focuses on cost, customers note that time to value is the top motivation.

everest group cloud chart Cloud. Its More Than Just Price

Barb Darrow of GigaOM notes:

What’s interesting to me is that this debate is evolving much like the discussion around Software as a Service (SaaS) did a decade or so ago. Initially, when was coming into its own, most of the sales pitch was around price. Salesforce was so much cheaper than Siebel Systems.

Of course, when Microsoft started rolling out its own cloud-based CRM, that price-based argument dissipated. […] Then Salesforce’s benefits became that it freed companies from the tedium and expense of on-site server and software upgrades. You could focus on business and leave the IT heavy lifting to your provider.

Customers want to build out applications or see a return on investment as fast as possible regardless of the project; cloud enables faster iteration and agility. No need to worry about operational headaches — particularly around complex systems like streaming data pipelines or Hadoop clusters. This is a primary reasons why Infochimps’ customers choose our managed, cloud services approach to Big Data.

An even more concrete analysis is performed by Virtual Geek, with some key quotes:

[…] it’s not about being “cheaper than IT”, it’s about:

  • Being more agile than traditional IT.
  • Being more elastic economically than traditional IT.
  • Being more more price transparent than traditional IT.
  • Being more “frictionless” than traditional IT.

[…] The place for traditional IT?   IMO – Internal IT are shifting to be more of “IT services brokers”, and less about “operators”.

[…] This isn’t about technology, and the COST is not the benefit of the IaaS model of AWS EC2, it’s that the OPERATING MODEL that is the benefit.

Business units are demanding more insights and delivery on projects that IT has never had to tackle before, such as:

  • Managing terabytes and sometimes petabytes of data
  • Capturing and analyzing social media, ad impressions, website clickstreams, stock prices, and other fast moving data
  • Producing predictive insights, machines learning, statistical modeling, and interactive visualizations and dashboards

IT organizations are discovering that these complex projects don’t have to become the bane of existence and frustrate them for the next several years. These initiatives can be de-risked by embracing “cloud” to iterate more quickly – build faster, fail faster, learn faster, win faster. Cloud empowers the IT team to focus on proving out projects, not just on herding the fundamental systems.

Tim Gasper is the Director of Product for Infochimps. He was previously co-founder and CMO at Keepstream, a social media curation and analytics company. He graduated from Case Western Reserve University with dual degrees in Economics and Management and originally from Cleveland, Ohio.

229fa9b4 2ea6 4535 8a80 e041d110204c Cloud. Its More Than Just Price

Image Source: GigaOM – Everest Group – Cloud Connect 2012 Enterprise Cloud Adoption Survey

Selling Your Big Data Initiative to Your C-Suite

Chimpmark Selling Your Big Data Initiative to Your C Suitegigaom Selling Your Big Data Initiative to Your C SuiteJoin GigaOm and Infochimps as we discuss Selling Your Big Data Initiative to Your C-Suite during this free webcast.

Thurs, April 18 @ 10amPT, 12pmCT, 1pm ET

With vital business information spread across your company, at different locations, in various databases and data warehouses and now potentially in Big Data platforms like Hadoop, there is no easy way to manage all of this infrastructure let alone get value out of the data that resides in these systems. But there is an alternative to the build-it-yourself Big Data trend of the last 12 months. Managed Big Data services are emerging that are easy to consume and provide a much quicker path to value than rolling out your own. Join Infochimps and GigaOM Research as we discuss “selling your big data initiative to your c-suite.”

Register Today >>


  • Jo Maitland, GigaOm Research Director, Cloud 


  • Ron Bodkin, GigaOm Analyst & Founder of ThinkBig Analytics
  • Dan Olds, GigaOm Analyst & Founder of Gabriel Consulting Group
  • Jim Kaskade, CEO of Infochimps

Topics of discussion include:

  • Drivers for doing Big Data as a service
    • Shortage of expertise; Open Source Software (OSS) products are complex and immature, faster time to value
  • Understanding the ecosystem of Big Data cloud services and Big Data managed services
    • Infochimps, AWS EMR, SnapLogic, Oversight Systems
  • When does the cloud model make sense for Big Data?
    • Understanding when “Total Cost of Ownership” or “Total Business Value” becomes the key factor
  • Risks, challenges
    • Security of intellectual property, transport of data into cloud

Register Today >>

3 Tiers: What Infochimps and Netflix Have in Common

Infochimps Cloud 300x150 3 Tiers: What Infochimps and Netflix Have in CommonA recent article on Gigaom, “3 shades of latency: How Netflix built a data architecture around timeliness”, shines some light on how the best-in-class architecture for Big Data has 3 different levels, separated by the dimension of “timeliness”.

“Netflix knows that processing and serving up lots of data — some to customers, some for use on the backend — doesn’t have to happen either right away or never. It’s more like a gray area, and Netflix detailed the uses for three shades of gray — online, offline and nearline processing.”

Just as Netflix defined their “three shades of gray”, Infochimps defined the three shades through our three cloud services: Cloud::Streams (real-time processing / online), Cloud::Queries (near real-time processing / nearline), and Cloud::Hadoop (batch processing /offline). By satisfying all aspects along the time dimension, companies unlock the ability to handle virtually any use case. Collect data in real-time, or import it in batch. Process data and generate insights as it flows, or do it in large-scale historical jobs. Choose your Big Data analysis adventure by mixing and matching approaches.

The article highlights how this approach “is fairly common among web companies that understand that different applications can tolerate different latencies”. Just as LinkedIn and Facebook were mentioned sharing the same general theory, working with Infochimps will provide you the benefits from a similar architecture; delivering the superior “3 tier approach” to Big Data.

6fefa857 2e95 4742 9684 869168ac7099 3 Tiers: What Infochimps and Netflix Have in Common

Is Big Data the Tail Wagging the Data Economy Dog?

Business Dog 300x213 Is Big Data the Tail Wagging the Data Economy Dog?Segmenting the overall IT market horizontally typically results in five sub-markets: Semiconductors, hardware, software, telecommunications, and professional services.  But an anomaly buried in the usual segmentation has existed for several decades, glossed over because it was such a slender slice of IT.   That hidden slice has widened considerably post-2000 however, and the time has come to give those IT suppliers, for want of a better term we will call them “data providers,” their fair due – recognition of their own market space which I refer to as the “Data Economy.”

Even though many data providers are not-for-profit, if one aggregates the revenues of all the data providers the “Data Economy” market now runs in excess of $100 billion in annual revenues.  By comparison, ESG estimates the software and core services revenues associated with the BI-Analytics platform market at around $20 billion.  Even if you add all the adjunct products and services required for big data, such as servers, storage, networking and professional services, it probably still slightly trails the Data Economy in terms of market size.  And ESG believes the Data Economy is growing even faster than big data.  Who are these data providers?  Let’s barely scratch the surface of some of the Data Economy players.

You are probably familiar with some of the world’s largest data providers like the multi-billion dollar Acxiom and Lexis Nexis.  Unless you pay close attention to the securities arm of the financial services industry you may not have heard of Interactive Data, a nearly $1b firm, and similarly if you are quite interested in channel data you might have heard of Zyme.  Not all data providers focus on a particular industry or role, for example DataLab USA offers data spanning insurance, credit, healthcare and real estate.  If you have ever been wondering about how best to classify industries to optimize search you might try WAND, and of course the U.S. Department of Labor’s Bureau of Labor Statistics will help you with understanding job taxonomies and data thereof.  And if you really want to span the globe in terms of data you might want to start at which acts as a portal for governmentally-sourced data from 39 states, 41 other countries, and a host of other governmental organizations, as part of the movement to “democratize data.”  The Open Archives Initiative is another data democratization example.  While not-for-profits are important participants in the Data Economy, the United States Postal Service offers data products for a price to help try to offset its rather notorious non-profitability.

Most data providers don’t simply provide data.  Particularly commercial providers like Lexis Nexis offer a variety of products for understanding the data they offer, in terms of data attributes, how to best ingest and use the data, and even tools to perform data analysis a la big data.  Almost all data providers offer information about the metadata, or at least how to interpolate the metadata, for the data they distribute.  Data providers generally gather, aggregate, qualify, refine and distribute (preferably with value-added ease) data.  Data has been referred to as “the new oil,” and while I might extend the metaphor to all kinds of mining and agricultural activities as well, the basic idea that data increasingly acts as the caloric source for an increasing number of modern pursuits, business, governmental, and consumer, is the fundamental driver behind the Data Economy.

If you are a business professional, or a data analyst, or a CIO, why should you care about the Data Economy?  First, your competitors may have already jumped ahead of you by tapping into the Data Economy.  For example, if you are highly dependent on channel partners, but have little visibility as to their performance in terms of reselling your products other than some simple monthly reports and word-of-mouth, you may be over or under-investing in various channels.  Your competitor, however, working with the aforementioned Zyme, might have a far clearer grasp of what is and isn’t working in the channel, and is making workflow and investment decisions accordingly.  In that example if you are not plugged into the Data Economy, what you don’t know may indeed be hurting you.

As a data analyst, you, with help from your IT department, may have done a fantastic job culling all the internal data available for business intelligence and analytical purposes.  However, that internal data may lack context, or perhaps could be further enriched with 3rd party data.  Some big data BI-Analytics platform vendors, like Alteryx, make it really easy to tap into data providers by offering relevant built-in data services from those providers.  To the data analyst using such a feature the 3rd party data looks just like an internal data source – except you may have to pay for the data.  Regardless, however, the data analyst or scientist who regularly scans and potentially uses external data for business intelligence and analytical model development is using a best practice.  Data analysts who only look to internal data sources are potentially overlooking major insight opportunities.

CIOs should think of the Data Economy as another external resource, like public cloud or professional services, which may be brought to bear to help the IT department deliver the best possible information technology for the business.  If the CIO has a CDO, Chief Data Officer, the CDO should certainly track potential external sources for the data needs of the business.  If the business has a CDO, Chief Digital Officer, that CDO should likewise be tapping into, as applicable, 3rd party data, and perhaps should consider using the company’s internal data as a revenue-generating asset; perhaps your company could be a data provider in the Data Economy too.

The big data movement has largely been technology based.  But the innovation for much of the core technology for big data, like Hadoop, was originally developed by Web 2.0 companies, like Yahoo and Google, for business purposes.  BI-Analytics platforms offer the tools to gain deeper insights into business performance, market opportunities, research and development, and customer understanding.  But they are merely the tools, like an automobile.  The fuel they need to run on is data, and increasing that data will come from outside of the firewall.   For IT departments, being your organization’s steward of technology is no longer enough.  IT and its data professional partners in the lines of business also increasingly carry the responsibility for ensuring that the company has ALL the right data, from inside and outside, to help guide the business from daily tactical operations through strategic decision-making.

Evan Quinn is Senior Principal Analyst at Enterprise Strategy Group (ESG), an integrated IT research, analysis, and strategy firm that is world-renowned for providing actionable insight and intelligence to the global IT community.

6fefa857 2e95 4742 9684 869168ac7099 Is Big Data the Tail Wagging the Data Economy Dog?

Image source: