Monthly Archives August 2012

Infochimps Culture: New CEO, Bocce, Opa!

Yesterday we announced some exciting news. We welcomed Jim Kaskade as our new CEO.

To welcome him to the team, we did it the Infochimps way: Bocce.

What’s Bocce? Bocce is a ball sport popular around Europe that is traditionally played on ground courts between two teams.  You throw a smaller ball (or jack) from one end of the court into a zone from the far end of the court. The objective then is for each team to bowl their four balls, trying to throw the balls closest to the jack alternating turns.

company culture1 300x225 Infochimps Culture: New CEO, Bocce, Opa!

team 300x225 Infochimps Culture: New CEO, Bocce, Opa!

That’s right, we’re cultured as well.

So where can you play Bocce in Austin? At Opa! Coffee and Wine Bar. Who knew?

After some competitive games of Bocce, the team relaxed with some good food and conversation.

See the whole team at Polvos! Jim (bottom left) is already in the chimpy spirit, showing off his Infochimps shirt.

Work hard, play hard: An Infochimps philosophy.

If you or anyone you know is interested in the Infochimps philosophy, we’re hiring! So if you know any Designers, Engineers, or Architects who are interested in working with a world class team of friendly geniuses, send them our way.

 

 Infochimps Culture: New CEO, Bocce, Opa!

The Data Era – Moving from 1.0 to 2.0

“This post is from Jim Kaskade, the newly appointed CEO of Infochimps.  When we first met Jim, we were really impressed by him from multiple points of view.  His first questions about us were about our culture, something we pride ourselves on cultivating and would only want to work with an executive that shared the same concern.  Second, his understanding of the market and technological solutions matched, and in some areas exceeded, our own.  Third, Jim brings true leadership and CEO experience to the table, having been an executive and leading a number of startups in the past after a career at Teradata. We are truly excited to have Jim aboard and look forward to working together for many years!”

-Flip Kromer, Dhruv Bansal, and Joseph Kelly, co-founders of Infochimps

Do you think they truly understood just how fast the data infrastructure marketplace was going to change?

That is the question that comes to mind when I think about Donald Feinberg and Mark Beyer at Gartner who, last year, wrote about how the data warehouse market is undergoing a transformation. Did they, or anyone for that matter, understand the significant change underway in the data center? I describe it as Big Data 1.0 versus Big Data 2.0.

Big Data 1.0

1 The Data Era – Moving from 1.0 to 2.0

I was recently talking to friends at one of our largest banks about their Big Data projects under way. In less than one year, their Hadoop cluster has already far exceeded their Teradata enterprise data warehouse in size.

Is that a surprise? Not really. When you think about it, a traditionally large data warehouse is always in the terabytes, not petabytes (well, unless you are eBay).

With the current “Enterprise Data Warehouse” (EDW) framework (shown here) we will always see the high-value structured data in the well-hardened, highly available and secure EDW RDBMS (aka Teradata).

In fact, Gartner defines a large EDW starting at 20TB. This is why I’ve held back from making comments like, “Teradata should be renamed to Yottadata.” After all, it is my “alma mater” after having spent 10 years learning Big Data 1.0 there. I highly respect the Teradata technology and more importantly the people.

Big Data 2.0

So with over two zettabytes of information being generated in 2012 alone, we can expect more “Big Data” systems to be stood up, new breakthroughs in large dataset analytics, and many more data-centric applications being developed for businesses.

2 The Data Era – Moving from 1.0 to 2.0

However, many of the “new systems” will be driven by “Big Data 2.0” technology. The enterprise data warehouse framework itself doesn’t change much. However, there are many, many new players – mostly open source, who have entered the scene.

Examples include:

  • Talend for ETL
  • Cloudera, Hortonworks, MapR for Hadoop
  • SymmetricDS for replication
  • Hbase, Cassandra, Redis, Riak, Elastic Search, etc. for NoSQL / NewSQL data stores
  • ’R’, Mahout, Weka, etc. for machine learning / analytics
  • Tableau, Jaspersoft, Pentaho, Datameer, Karmasphere, etc. for BI

These are so many new and disruptive technologies, each contributing to the evolution of the enterprise’s data infrastructure.

I haven’t mentioned one of the more controversial statements made in adjacent graphic – Teradata is becoming a source along side the new pool of unstructured data. Both the new and the old data are being aggregated into the “Big Data Warehouse”.

We may also be seeing much of what Hadoop does in ETL feeding back into the EDW. But I suspect that this will become less significant as compared to the new analytics architecture with Hadoop + NoSQL/NewSQL data stores at the core of the framework – especially as this new architecture becomes more hardened and enterprise class.

Infochimps’ Big Data Warehouse Framework

3 The Data Era – Moving from 1.0 to 2.0

This leads us to why Infochimps is so well positioned to make a significant impact within the marketplace.

By leveraging four years of experience and technology development in cloud-based big data infrastructure, the company is now offering a suite of products that contribute to each part of Big Data Warehouse Framework for enterprise customers.

DDS: With Infochimps’ Data Delivery Services (DDS), our customer’s application developers do not rely on sophisticated ETL tools. But rather, they can manipulate data streams of any volume or velocity using DDS through a simple developer-friendly language, referred to as Wukong. Wukong turns application developers into data scientists.

Ingress and egress can be handled directly by the application developer, uniquely bridging the gap between them and their data.

Wukong: Wukong is much more than a data-centric domain specific language (DSL). With standardized connectors to analytics from ‘R’, Mahout, Weka, and others, not only is data manipulation made easy, integration of sophisticated analytics with the most complicated data sources is also made easy.

Hadoop & NoSQL/NewSQL Data Stores: At the center of the framework, is not only an elastic and cloud-based Hadoop stack, but a selection of NoSQL/NewSQL data stores as well. This uniquely positions Infochimps to address both decision support-like workloads, which are complex and batch in nature, with OLTP or more real-time workloads as well. The complexities of standing up, configuring, scaling, and managing these data stores is all automated.

Dashpot: The application developer is typically left out with many of the business intelligence tools offered today. This is because most tools are extremely powerful and built for special groups of business users / analysts. Infochimps has taken a slightly different approach, staying focused on the application developer. Dashpot is a reporting and analytics dashboard which was built for the developer – enabling quick iteration and insights into the data, prior to production and prior to the deployment of more sophisticated BI tools.

Ironfan and Homebase: As the underpinning of the Infochimps solution, Ironfan and Homebase are the two solutions which essentially abstract any and all hardware and software deployment, configuration, and management. Ironfan is used to deploy the entire system into production. Homebase is used by application developers to create their end-to-end data flows and applications locally on their laptops or desktops before they are deployed into QA, staging, and/or production.

All-in-all Infochimps has taken a very innovative approach to enabling application developers with Big Data 2.0 technologies in a way that is not only comprehensive, but fast, simple, extensible, and safe.

Our vision for Infochimps leverages the power of Big Data, Cloud Computing, Open Source, and Platform as a Service – all extremely disruptive technology forces. We’re excited to be helping our customers address their mission critical questions, with high impact answers. And I personally look forward to executing on our vision to provide the simplest yet most powerful cloud-based and completely managed big data service for our enterprise customers.

blog platform demo v21 The Data Era – Moving from 1.0 to 2.0

SXSW Panels: Vote Today for Infochimps

SXSW2 SXSW Panels: Vote Today for Infochimps

One. Week. Left.

You’re familiar with South by Southwest (SXSW) held in Austin each March, yes?

SXSW panels are up for voting and we’d love your support. Read the panel submissions below and be sure to vote!

Voting ends August 31st.

A huge thank you to those of you who supported us on Twitter:

  • @hashonomy_gus: SXSW PanelPicker hashonomy.com/ybgd/ #bigdata #sxsw (via @infochimps)
  • @dteten: RT @ffvc RT @infochimps: SXSW 2013 PanelPicker: Vote Today for @josephkelly: The Tao Te Chimp: A Principle Driven Approach
  • @eldonnn: SXSW 2013 panelpicker under way. wanna go! waah! panelpicker.sxsw.com/vote/5500 this panel by @infochimps

 SXSW Panels: Vote Today for Infochimps


SXSW Image courtesy of SXSW

(Video) Infochimps + OpenStack

Back in April, we announced support for OpenStack and the Rackspace Cloud.

Our friends from Rackspace came to the office and made the following video where Nathanial Eliot and Dhruv Bansal discuss Infochimps’ experience with OpenStack and the Rackspace open cloud. Infochimps + OpenStack = terabytes of Big Data love.

Video Description from Rackspace: “Infochimps knows what it’s like to deal in big data; it’s the company’s specialty as a big data platform for the cloud. And one of its top requirements is flexibility. As Infochimps Operations Engineer Nathanial Eliot put it: no two jobs are alike – one may need thousands of servers, one may need two. Flexibility and scalability are two of the reasons Infochimps is excited about Rackspace Cloud Servers powered by OpenStack.”

Some amazing quotes from Dhruv Bansal:

  • “…as time goes on, inevitably open source always develops the power that we need; it’s easier to build an ecosystem and community over open tools.”
  • “…for us, as a vendor who has to operate on multiple clouds, we love OpenStack.”
  • “Rackspace’s fanatical support is truly fanatical …”

Video courtesy of Rackspace

$100m vs. $600m: Open-Source Big Data vs. Proprietary Databases

Money Laptop1 $100m vs. $600m: Open Source Big Data vs. Proprietary Databases

I recently read this ZDNet article, which I thought was an awesome comparison between open-source Big Data vendors such as Infochimps and Cloudera, and proprietary database vendors like Oracle.

The case study: the cost of operating the YouTube stack. Given YouTube’s technical requirements, what would it cost to operate YouTube’s infrastructure on Oracle instead of using open-source tools?

“In a nutshell, the Oracle Exadata capital expenses for hardware and software total $589.4 million compared to an open source and commodity hardware cost of $104.2 million.”

The following chart shows the numbers breakdown:

Big Data Provisioning System1 $100m vs. $600m: Open Source Big Data vs. Proprietary DatabasesOpen-source software is free. That’s the huge difference. Even if you added another several million dollars for open-source Big Data software support, you’d still come in at less than a quarter of the proprietary database cost.

As companies get more comfortable with trusting open-source tools, the economic value is undeniable. Add the benefits of getting direct access to the code, a huge open-source community, and an unlimited supply of examples and documentation… open-source is fast becoming a no-brainer. It’s only a matter of time where many proprietary strongholds such as BI, BPM, and Data Warehousing are supplanted by open-source Big Data platforms and applications.

What will continue to push open-source adoption and proliferation? (1) Making it easier to gain expertise in using those technologies, or just making them easier to use in the first place, (2) vendors can implement those technologies quickly and efficiently to keep costs down, and (3) high quality open-source vendor support services to offer peace of mind.

blog platform demo v21 $100m vs. $600m: Open Source Big Data vs. Proprietary Databases

ZDNet Article courtesy of Larry Dignan
Money and Laptop image courtesy of BigStock

Upcoming Webinar on High-Speed Retail Analytics

retailer Upcoming Webinar on High Speed Retail AnalyticsGet excited about next week’s webinar on Thursday, August 23rd at 11am CST.  Infochimps‘ Co-Founder, and Chief Science Officer Dhruv Bansal, and BlackLocus’ VP of Customer Development Amos Schwatzfarb, will present a quick webinar on how retailers are increasing their bottom lines with BlackLocus, the leading SaaS eCommerce competitive analytics company, and with Infochimps, the #1 Big Data platform-as-a-service.

Join us on August 23, 2012 at 9:00am PST / 11:00am CST / 12:00pm EST.

Shopping Cart and Laptop image courtesy of BigStock

Eating Towns and Drinking Towns

Trulia Restaurant Density Heatmap Eating Towns and Drinking Towns

In another well done data analysis from Trulia, the real estate technology company uses US Census data to map out the country’s bars and restaurants.  Perhaps unsurprisingly, San Francisco reigns supreme in the restaurant contest, with one restaurant for every 243 households in the city.  Trulia compares this data to the median price per square foot for for-sale houses and in that chart, it quickly becomes clear that in general, higher income provides for a greater ability to patronize (and support) a bustling restaurant culture.

Top Metros for Eating Out
# U.S. Metro Restaurants per 10,000 households Median price per sqft of for-sale homes
1 San Francisco, CA 39.3 $459
2 Fairfield County, CT 27.6 $222
3 Long Island, NY 26.5 $217
4 New York, NY-NJ 25.3 $275
5 Seattle, WA 24.9 $150
6 San Jose, CA 24.8 $319
7 Orange County, CA 24.8 $260
8 Providence, RI-MA 24.3 $146
9 Boston, MA 24.2 $219
10 Portland, OR-WA 24.0 $129

Note: among the 100 largest metros.

Can you guess which city in the US has the greatest number of bars per capita?  I’ll give you a hint – you can get drive-thru margaritas and the city is nicknamed “The Big Easy”.  Yup, good ol’ New Orleans ranks #1 with one bar for every 1,173 households.  Interestingly, the median price per square foot for for-sale houses is significantly lower than for San Francisco, which ranks #8 by this measure.  It looks like sustaining a thriving bar scene does not have the same income requirements as restaurants.

Top Metros for Drinking
# U.S. Metro Bars per 10,000 households Median price per sqft of for-sale homes
1 New Orleans, LA 8.6 $99
2 Milwaukee, WI 8.5 $109
3 Omaha, NE-IA 8.3 $79
4 Pittsburgh, PA 7.9 $91
5 Toledo, OH 7.2 $71
6 Syracuse, NY 7.0 $86
7 Buffalo, NY 6.8 $91
8 San Francisco, CA 6.0 $459
9 Las Vegas, NV 6.0 $69
10 Honolulu, HI 5.9 $390

Note: among the 100 largest metros.

Trulia Bar Density Heatmap Eating Towns and Drinking Towns

I’d love to see these maps overlaid for a compare and contrast of the various metro areas featured in this analysis.  Interesting, it looks like the middle of the country has a considerably higher density of bars (relative to the rest of the country) than it does restaurants.

The Impact of a Nationwide Drought

OB TY034 PARCHE G 20120727173419 The Impact of a Nationwide Drought

According to a recent report from the Wall Street Journal, more than half of the United States is dry.  Insufficient rainfall and soaring temperatures have left much of the country ravaged with severe crop damage.  The latest US Drought monitor indicates that 20% of the country is facing extreme or exceptional drought conditions, up 7% from just one week ago.  Perhaps it is time that the country as a whole take a hard look at solutions, such as Tom Mason’s Water Plan.

The Value of an Olympic Medal

MeddlingWithTheGold 501bfb30b53e1 The Value of an Olympic Medal

Olympic medals may be a lot of facade (a gold medal only had 1.34% gold content?), but they can come with big cash prizes.  The US Olympic committee will dole out in upwards of $25,000 for a gold medalist.  Countries such as Italy or Russian who pay $182,000 and $135,000, respectively to their countries top performers.  Surprisingly, the UK, this year’s host, does not provide any monetary compensation to their athletes for bringing home the gold.

Big Data for Retail is a Hot Product

future of retail Big Data for Retail is a Hot Product

Check out this guest post in Forbes from the VP of Product Marketing from SAP.  With recent customer wins, including retail technology company, BlackLocus, we are very familiar with the growing trend of retailers looking to Big Data to solve a variety of business challenges, including identifying lost sales, improving transport logistics, and better anticipating customer needs.