Monthly Archives November 2012

IE. Invites: Hadoop Innovation Summit

IE Group1 IE. Invites: Hadoop Innovation SummitStart preparing for your trip to San Diego for the largest gathering of Fortune 500 business executives leading Hadoop initiatives.

Hadoop Innovation Summit:  February 20-21, 2013, San Diego
Unlocking the Value of Big Data

The Hadoop Innovation Summit brings together business leaders and innovators for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. There will also be plenty of hands on demonstrations allowing you to make the most from industry leaders and vendors alike.

Hadoop Innovation will help your business understand how to unlock the value of Big Data with the realization that no data is too big. With a vast amount of data now available, modern businesses are faced with the challenge of making use from the data available and unlocking its true value.

Register by December 23, 2012 and save up to $700 on standard pass prices.

Register Online Today >>

34523bb2 2e50 4f42 88a1 5bd9ed0fddac IE. Invites: Hadoop Innovation Summit

[Infographic] Taming Big Data from Wikibon

Opening with a Big Data market forecast, to ending with a shout-out for all industries to embrace Big Data as the definitive source of competitive advantage, the following infographic from Wikibon personifies Big Data as a beast (data volumes are growing exponentially) that can be tamed (thanks to new approaches for processing, storing and analyzing).  It includes real-world Big Data use cases, which I appreciated. I was most amazed by how “decoding the human genome used to take ten years, but can now be done in 7 days.”

The quote from Kevin Weil, the Director of Product for Revenue at Twitter brings the benefit of valuable Big Data insights home: “It’s no longer hard to find the answer to a given question; the hard part is finding the right question and as questions evolve, we gain better insight into our ecosystem and our business.”

Scroll down, geek out on the infographic, and if you want more, check out an oldie but goodie article:  6 Illuminating Big Data Infographics

Taming Big Data [Infographic] Taming Big Data from Wikibon

Did you notice the chimp within the Big Data forecast?

Thank you Wikibon for posting this!

84493d0d e63a 4f96 ae8b 01f76694dc55 [Infographic] Taming Big Data from Wikibon

Successful Planning: Business Analytics Innovation Summit

Business Analytics Innovation Summit Successful Planning: Business Analytics Innovation Summit
Our partners at *IE. would like to introduce you to the exclusive Business Analytics Innovation Summit, January 30 – 31 in Las Vegas.

This Summit brings the leaders and innovators from the industry together for a summit acclaimed for its insight into business intelligence and analytics.

Effective business analytics is central to business success. In the modern business environment, technological developments and the advances of globalization have created unparalleled opportunities for businesses to expand their markets.

Register Now to Start Successful Planning Through Advanced Analytics

84493d0d e63a 4f96 ae8b 01f76694dc55 Successful Planning: Business Analytics Innovation Summit

Breaking Hadoop out to the Larger Market

There are a lot of people out there with a Terabyte problem but who lack a Petabyte problem — yet they are forced to try to make use of a stack developed to address Facebook, Yahoo and JP Morgans‘ Petabyte problem. Hadoop out of the box is oriented for achieving 100% utilization of fixed-sized clusters by 12, 50, 100+ person analytics teams. In contrast, the bulk of even forward-thinking enterprises are at the level of just having handed two PhD statisticians a copy of the elephant book, a mis-provisioned cluster, and a slap on the back with a directive to “go find us som’a that insight!”.

There are a few observations we’ve made about these other customers and their differentiated needs that I wanted to share, and point to how we seek to address these with our own product.

Our first major observation is that while Hadoop might headline the bill, streaming data delivery is the opening act that moves the most merchandise.  Most of our customers on initial contact mention Hadoop by name — yet universally the first-delivered and most necessary component has been streaming data delivery into a scalable database and/or Hadoop.

In fact, we’ve had clients who excitedly purchased and setup a Hadoop cluster, and they had plenty of data they’d like to analyze, but had no data in their Hadoop cluster. It may seem obvious once pointed out that you need a way to feed data into your cluster. Enter modern open source tools such as Flume and Storm.  Indeed, Flume was originally created to feed hungry Hadoop clusters with streaming log data.

What people are now realizing though is just how powerful streaming data delivery tools like these are — that you can realize a surprising amount of analytical power (and even visibility of data as well) while the data is still in flight. These realizations have driven the accelerated adoption of many of these open source streaming technologies, like Esper, Flume, and Storm. I’ve been using Hadoop since ’08, and the adoption demand of Storm outpaces even Hadoop’s ascent.

Another important feature set we evangelize and see validated is what an underlying cloud infrastructure enables for the enterprise.  Cloud-enabled elasticity makes exploratory analytics transformatively more powerful, as companies can scale their infrastructure up and down as needed.

Contrasted to the Petabyte-companies, who focus on 100% cluster utilization, the target metric for a development cluster fit for the Terabyte-company is high downtime — the ability to go from 10 to 100 machines; back down to 10; then rest to 0 machines over the course of a job. Hadoop out-of-the-box doesn’t meet this target, which was one of the most interesting engineering challenges we’ve solved.

So where else does the cloud fit in the Hadoop use case? Being able to safely grow, shrink, and stop/restart Hadoop isn’t just a slider UX control, it’s a fundamental change in developer mindset and capabilities. For example, when we were a 6-person team with an AWS bill that rivaled our payroll, we would run parse stages of jobs on high CPU instances, then slam it shut mid-workflow and bring the cluster up on high memory instances for the graph-heavy stages. As our platform matured, we moved to giving each developer their own cluster; too often Chimp A needed 30 machines for 2 hours, while Chimp B needed 6 machines all day. Most companies would have to compromise with a 30-machine cluster running all day – we’ve been able to reject that approach.

Hadoop Elastic Context 300x239 Breaking Hadoop out to the Larger MarketTuning a Hadoop job to your cluster is fiendishly difficult and time consuming; while tuning your cluster to the job is comparatively straightforward.  Data Scientists at the Terabyte-company shouldn’t be pinned down by the difficulties of working with technologies that weren’t designed for them.  By enabling Hadoop in an elastic context — public or private cloud, internal or outsourced — Infochimps and others working on these challenges are a big part of breaking it out to the larger market.

84493d0d e63a 4f96 ae8b 01f76694dc55 Breaking Hadoop out to the Larger Market

Live Webcast:Top Strategies for Successful Big Data Projects

Title:Top Strategies for Successful Big Data Projects
Date: Thursday, November 29, 2012
Time: 10a Pacific/12p Central/1p Eastern

Register Infochimps Live Webcast:Top Strategies for Successful Big Data Projects

44% of Big Data projects don’t get fully deployed, and very few achieve intended business objectives. Common barriers to success surround securing executive buy-in, whether to build or leverage a third-party solution, determining scope and establishing realistic goals. Here are some tips to ensure your Big Data projects not only get off the ground and completed, but also quickly and positively impact your business’ bottom line.

Register for this live webinar and listen to Big Data expert and Infochimps Product Manager Tim Gasper, share insights and explain how to effectively execute your Big Data project, and avoid the most common pitfalls. In addition, you will learn:

  • Common roadblocks to Big Data project momentum
  • Elements of a clear project process and plan
  • Popular big data project objectives
  • Top 3 priorities for Big Data solutions

Join the webcast here. Looking forward to seeing you Thursday, November 29, 2012 @ 10a PT, 12p CT, 1p ET!

84493d0d e63a 4f96 ae8b 01f76694dc55 Live Webcast:Top Strategies for Successful Big Data Projects

Announcing Ironfan v4: Multicloud Capabilities + Community Support

Ironfan Announcing Ironfan v4: Multicloud Capabilities + Community Support Ironfan, the groundwork of the Infochimps Platform, is a systems provisioning, deployment, and updating tool that is built from a combination of proprietary technologies and open-source technologies like Chef and Fog.

After several proof-of-concepts and forks, hampered by the lack of underlying abstractions, we are happy to announce true multicloud capabilities for Ironfan. These capabilities update the current version to a largely similar feature-set to core Ironfan v3 (i.e. EC2 only). The current version is also ready for new providers, and VMware is working on catching up their fork of Ironfan, Serengeti, to use the latest code. This latest version has been undergoing heavy development and testing, including increasing third-party contributions, and we have an increased internal focus on expanding and hardening both the cookbooks and the knife plugin.

Interested in our growing community? Please join our new mailing list, managed by Nick Marden from GetSatisfaction. We’d love to hear your feedback!

34523bb2 2e50 4f42 88a1 5bd9ed0fddac Announcing Ironfan v4: Multicloud Capabilities + Community Support