Announcements

Next Gen Real-time Streaming with Storm-Kafka Integration

At Infochimps, we are committed to embracing cutting edge technology, while ensuring that the latest Big Data innovations are enterprise-ready. Today, we are proud to deliver on that promise once again by announcing the integration of Storm and Kafka into the Cloud::Streams component of the Infochimps Cloud.

StormKafka 1024x578 Next Gen Real time Streaming with Storm Kafka Integration

Cloud::Streams provides solutions for challenges involving:

  • Large-scale data collection - clickstream web data, social media and online monitoring, financial market data, machine-to-machine data, sensors, business transactions, listening to or polling application APIs and databases, etc.
  • Real-time stream processing - real-time alerting, tagging and filtering, real-time applications, fast analytical processing like fraud detection or sentiment analysis, data cleansing and transformation, real-time queries, distribution to multiple clients, etc.
  • Analytics system ETL - providing normalized/de-normalized data using customer-defined business logic for various analytics data stores and file systems including Hadoop HDFS, HBase, Elasticsearch, Cassandra, MongoDB, PostgreSQL, MySQL, etc.

Storm and Kafka

Recently in my guest blog post on TechCrunch, I mentioned why you should care about Storm and Kafka.

“With Storm and Kafka, you can conduct stream processing at linear scale, assured that every message gets processed in real-time, reliably. In tandem, Storm and Kafka can handle data velocities of tens of thousands of messages every second.”

Ultimately, Storm and Kafka form the best enterprise-grade real-time ETL and streaming analytics solution on the market today. Our goal is to put the same technology that Twitter uses to process over 400 million tweets per day — in your hands. Other companies that have adopted Storm in production include Groupon, Alibaba, The Weather Channel, FullContact, and many others.

Nathan Marz, Storm creator and senior Twitter engineer, comments on Storm’s rapid growth:

“Storm has gained an enormous amount of traction in the past year due to its simplicity, robustness, and high performance. Storm’s tight integration with the queuing and database technologies that companies already use have made it easy to adopt for their stream computing needs.”

Storm solves a broad set of use cases, including “processing messages and updating databases (stream processing), doing a continuous query on data streams and streaming the results into clients (continuous computation), parallelizing an intense query like a search query on the fly (distributed RPC), and more.”

Apache Kafka, which was developed by LinkedIn to power its activity streams, provides an additional reliability guarantee, robust message queueing, and distributed publish-subscribe capabilities.

Cloud::Streams

Cloud::Streams is fault-tolerant and linearly scalable, and performs enterprise data collection, transport, and complex in-stream processing. In much the same way that Hadoop provides batch ETL and large-scale batch analytical processing, Cloud::Streams provides real-time ETL and large-scale real-time analytical processing — the perfect complement to Hadoop (or in some cases, what you needed instead of Hadoop).

Cloud::Streams adds important enterprise-class enhancements to Storm and Kafka, including:

  • Integration Connectors to your existing tech environment for collecting required data from a huge variety of data sources in a way that is robust yet as non-invasive as possible
  • Optimizations for highly scalable, reliable data import and distributed ETL (extract, transform, load), fulfilling data transport needs
  • Developer Toolkit for rapid development of decorators, which perform the real-time stream processing
  • Guaranteed delivery framework and data failover snapshots to send processed data to analytics systems, databases, file systems, and applications with extreme reliability
  • Rapid solution development and deployment, along with our expert Big Data methodology and best practices

Infochimps has extensive experience implementing Cloud::Streams, both for clients and for our internal data flows including large-scale clickstream web data flows, massive Twitter scrapes, the Foursquare firehose, customer purchase data, product pricing data, and more.

Obviously, data failover and optimizations are key to enterprise readiness. Above and beyond that though, Cloud::Streams is a joy to work with because of its flexible Integration Connectors and the Developer Toolkit. No matter where your data is, you can access and ingest it with a variety of input methods. No matter what kind of work you need to perform (parse, transform, augment, split, fork, merge, analyze/process, …) you can quickly develop that processor unit, test it, and deploy it as a Cloud::Streams decorator.

One of our most recent customers was able to build an entire production application flow for large-scale social media data analysis using the Infochimps Cloud development framework in just 30 days with only 3 developers. That is both unheard of from an enterprise timeline perspective, as well as an amazing case of business ROI. Big Data is too important to spend months and months developing. Your business needs results now, and the Infochimps Cloud leverages the talent you have today for fast project success.

How much is it worth to you to launch your own revenue generating applications for your customers? Or for your internal stakeholders as part of a Big Data business intelligence initiative? How much value would launching 12 months sooner provide your organization? These are questions which we’re trying to make the answer to obvious.

Steve Blackmon, Director of Data Sciences at W2O Group, explains why they are working with Infochimps and Cloud::Streams:

“Storm and Kafka are excellent platforms for scalable real-time data processing. We are very pleased that Infochimps has embraced Storm and Kafka for Cloud::Streams. This new offering gives us the opportunity to supplement our listening and analytics products with Infochimps’ data sources, to integrate capabilities seamlessly with our partners who also use Storm, and to retain Infochimps’ unique technical team to support and optimize our data pipelines.”

More Information

Check out the full press release here, including quotes from CEO Jim Kaskade and co-founder and CTO Flip Kromer.

You can access additional resources from the Cloud::Streams web page or our general resources directory.

Lastly, check out our previous product announcements! In February, we launched the Infochimps Platform. In April we launched Dashpot as well as our support of OpenStack. In August, we announced the Platform’s newest release.

6fefa857 2e95 4742 9684 869168ac7099 Next Gen Real time Streaming with Storm Kafka Integration



The Data Era – Moving from 1.0 to 2.0

“This post is from Jim Kaskade, the newly appointed CEO of Infochimps.  When we first met Jim, we were really impressed by him from multiple points of view.  His first questions about us were about our culture, something we pride ourselves on cultivating and would only want to work with an executive that shared the same concern.  Second, his understanding of the market and technological solutions matched, and in some areas exceeded, our own.  Third, Jim brings true leadership and CEO experience to the table, having been an executive and leading a number of startups in the past after a career at Teradata. We are truly excited to have Jim aboard and look forward to working together for many years!”

-Flip Kromer, Dhruv Bansal, and Joseph Kelly, co-founders of Infochimps

Do you think they truly understood just how fast the data infrastructure marketplace was going to change?

That is the question that comes to mind when I think about Donald Feinberg and Mark Beyer at Gartner who, last year, wrote about how the data warehouse market is undergoing a transformation. Did they, or anyone for that matter, understand the significant change underway in the data center? I describe it as Big Data 1.0 versus Big Data 2.0.

Big Data 1.0

1 The Data Era – Moving from 1.0 to 2.0

I was recently talking to friends at one of our largest banks about their Big Data projects under way. In less than one year, their Hadoop cluster has already far exceeded their Teradata enterprise data warehouse in size.

Is that a surprise? Not really. When you think about it, a traditionally large data warehouse is always in the terabytes, not petabytes (well, unless you are eBay).

With the current “Enterprise Data Warehouse” (EDW) framework (shown here) we will always see the high-value structured data in the well-hardened, highly available and secure EDW RDBMS (aka Teradata).

In fact, Gartner defines a large EDW starting at 20TB. This is why I’ve held back from making comments like, “Teradata should be renamed to Yottadata.” After all, it is my “alma mater” after having spent 10 years learning Big Data 1.0 there. I highly respect the Teradata technology and more importantly the people.

Big Data 2.0

So with over two zettabytes of information being generated in 2012 alone, we can expect more “Big Data” systems to be stood up, new breakthroughs in large dataset analytics, and many more data-centric applications being developed for businesses.

2 The Data Era – Moving from 1.0 to 2.0

However, many of the “new systems” will be driven by “Big Data 2.0” technology. The enterprise data warehouse framework itself doesn’t change much. However, there are many, many new players – mostly open source, who have entered the scene.

Examples include:

  • Talend for ETL
  • Cloudera, Hortonworks, MapR for Hadoop
  • SymmetricDS for replication
  • Hbase, Cassandra, Redis, Riak, Elastic Search, etc. for NoSQL / NewSQL data stores
  • ’R’, Mahout, Weka, etc. for machine learning / analytics
  • Tableau, Jaspersoft, Pentaho, Datameer, Karmasphere, etc. for BI

These are so many new and disruptive technologies, each contributing to the evolution of the enterprise’s data infrastructure.

I haven’t mentioned one of the more controversial statements made in adjacent graphic – Teradata is becoming a source along side the new pool of unstructured data. Both the new and the old data are being aggregated into the “Big Data Warehouse”.

We may also be seeing much of what Hadoop does in ETL feeding back into the EDW. But I suspect that this will become less significant as compared to the new analytics architecture with Hadoop + NoSQL/NewSQL data stores at the core of the framework – especially as this new architecture becomes more hardened and enterprise class.

Infochimps’ Big Data Warehouse Framework

3 The Data Era – Moving from 1.0 to 2.0

This leads us to why Infochimps is so well positioned to make a significant impact within the marketplace.

By leveraging four years of experience and technology development in cloud-based big data infrastructure, the company is now offering a suite of products that contribute to each part of Big Data Warehouse Framework for enterprise customers.

DDS: With Infochimps’ Data Delivery Services (DDS), our customer’s application developers do not rely on sophisticated ETL tools. But rather, they can manipulate data streams of any volume or velocity using DDS through a simple developer-friendly language, referred to as Wukong. Wukong turns application developers into data scientists.

Ingress and egress can be handled directly by the application developer, uniquely bridging the gap between them and their data.

Wukong: Wukong is much more than a data-centric domain specific language (DSL). With standardized connectors to analytics from ‘R’, Mahout, Weka, and others, not only is data manipulation made easy, integration of sophisticated analytics with the most complicated data sources is also made easy.

Hadoop & NoSQL/NewSQL Data Stores: At the center of the framework, is not only an elastic and cloud-based Hadoop stack, but a selection of NoSQL/NewSQL data stores as well. This uniquely positions Infochimps to address both decision support-like workloads, which are complex and batch in nature, with OLTP or more real-time workloads as well. The complexities of standing up, configuring, scaling, and managing these data stores is all automated.

Dashpot: The application developer is typically left out with many of the business intelligence tools offered today. This is because most tools are extremely powerful and built for special groups of business users / analysts. Infochimps has taken a slightly different approach, staying focused on the application developer. Dashpot is a reporting and analytics dashboard which was built for the developer – enabling quick iteration and insights into the data, prior to production and prior to the deployment of more sophisticated BI tools.

Ironfan and Homebase: As the underpinning of the Infochimps solution, Ironfan and Homebase are the two solutions which essentially abstract any and all hardware and software deployment, configuration, and management. Ironfan is used to deploy the entire system into production. Homebase is used by application developers to create their end-to-end data flows and applications locally on their laptops or desktops before they are deployed into QA, staging, and/or production.

All-in-all Infochimps has taken a very innovative approach to enabling application developers with Big Data 2.0 technologies in a way that is not only comprehensive, but fast, simple, extensible, and safe.

Our vision for Infochimps leverages the power of Big Data, Cloud Computing, Open Source, and Platform as a Service – all extremely disruptive technology forces. We’re excited to be helping our customers address their mission critical questions, with high impact answers. And I personally look forward to executing on our vision to provide the simplest yet most powerful cloud-based and completely managed big data service for our enterprise customers.

blog platform demo v21 The Data Era – Moving from 1.0 to 2.0

Information. Insight. Instantly: Check Out The Latest Version Of Our Big Data Platform!

An old 1978 ad slogan from Scrubbing Bubbles stated that, “We work hard so you don’t have to” – essentially promising customers that they would take care of the “dirty work” and let the customer reap the benefit of the clean, finished product.

The same holds true today here at Infochimps. Our mission is to do the heavy lifting and seamlessly handle your big data implementations –removing the requirement for expensive integration or specialists–allowing you to focus on generating insights from data, not managing Big Data infrastructure. We provide you the insights you need to make data-driven decisions, speed your application development, and, ultimately, improve your operational efficiencies and time to market.

We are proud to announce today the latest version of our Big Data Platform, a managed, fully optimized and hosted service for deploying Big Data environments and apps in the cloud, on-demand.

Key new features include:

New Data Delivery Service

  • Based on the open source Apache Flume project
  • Integrates with your existing data environment and data sources with a combination of out-of-the-box and custom connectors
  • High scalability and optimization of distributed ETL (Extract, Transform, Load) processes, able to handle many terabytes of data per day
  • Both distributed real-time analysis and distributed Hadoop batch analysis

Real-time, Data Streaming Framework

  • You can use familiar programming languages such as Ruby to vastly simplify performing real-time analytics and stream processing, all within the Infochimps Data Delivery Service
  • Extends Infochimps’ Wukong open source project, which lets developers use Ruby micro-scripts to perform Hadoop batch processing

We’ve bundled together everything you need to install the platform, making it faster than ever to get a big data project off the ground — a configured solution can be deployed in just a few hours.

The Infochimps Platform is capable of executing on hundreds of data sources and many terabytes of data throughput, delivering scalability to any type and quantity of database, file system or analytic system.

Check out our other new features in today’s press release.

Also, read GigaOm’s take on our news here.

blog platform demo v21 Information. Insight. Instantly: Check Out The Latest Version Of Our Big Data Platform!

Homepage Redesign – Check It Out!

new homepage 072012 Homepage Redesign   Check It Out!

 

We gave the homepage of infochimps.com a little facelift this week to better showcase all the awesome things we are working on now.  After hearing feedback from many prospective and current customers, the revamp includes clearer articulation of the benefits and process of working with the Infochimps Platform, features our Platform tour, and spotlights our free demo.

Take a gander and let us know what you think!

Hadoop in the Cloud – Infochimps and VMware

logo Hadoop in the Cloud   Infochimps and VMware

Infochimps is proud to be a part of a new effort launched today by VMware to enable big data applications running on Hadoop to be deployed more easily on top of virtual and cloud-based IT environments. The Serengeti project, released today under the Apache 2.0 license, is built upon a number of open source technologies including our own Ironfan tool and supports all major Hadoop distributions including Cloudera, Greenplum, Hortonworks, and MapR.

Ironfan is the foundation of the Infochimps Platform and the basis of our customers’ Big Data deployments. It makes provisioning and configuring Big Data infrastructure simple – you can easily spin up clusters when you need them and kill them when you don’t, so our customers can spend their time, money, and engineering focus on finding insights, not configuring and deploying machines. Ironfan is quickly becoming the number one deployment tool for Hadoop platforms in the cloud, and this endorsement by VMware and inclusion in Serengeti is further evidence of the popularity of the tool.

What does Serengeti mean for Infochimps users?
From the beginning, the Infochimps Platform has been built on a foundation of open source tools for managing data that simplify the experience of working with complex technologies such as Hadoop. Within the Infochimps Platform, Ironfan, as well as other tools like Wukong and Swineherd, are major open sourced components of the stack. And with our enterprise tools including Data Delivery Service and Dashpot, customers can deploy complete Big Data environments and be assured of highly reliable delivery of data to their Hadoop environments.

The Serengeti project supports our open source tradition with its strong open source foundation and support by all of the major Hadoop distributions. Within the Serengeti project, Ironfan enables users to quickly and easily configure and deploy Hadoop clusters on top of VMware vSphere® in minutes with a single command. Now, users running VMware’s virtual and cloud infrastructure can more easily take advantage of the power of Hadoop as well as other Big Data technologies like the Infochimps Data Delivery Service, Dashpot, and Infochimps big data expertise to manage, process, and analyze massive amounts of unstructured, semi-structured, or structured data at scale and in the cloud.

We’re excited to be included in Serengeti and look forward to working with VMware customers and partners as they further their use of Big Data technologies.

Interested in learning more about Infochimps, VMware, and Serengeti? Contact us today for more information!

Announcing Support for OpenStack and the Rackspace Cloud

Infochimps is happy to announce that we now support the next generation Rackspace Cloud, based on OpenStack. Through integration with the OpenStack API the Infochimps Platform can now power big data applications based in the Rackspace Cloud, expanding the reach of the Infochimps Platform and making the running of complex big data infrastructures quick and easy for a broader range of users.

Rackspace customers running the new OpenStack-based Rackspace Cloud Servers can quickly and easily spin up Hadoop clusters to power their big data applications in as little as 20 minutes with a single command using the Infochimps Platform. With the power of Ironfan, Infochimps’ open source provisioning tool, and Dashpot, Infochimps’ visualization and operations dashboard, customers can easily monitor and manage their Big Data operations on an ongoing basis, or leave it to Infochimps to manage it on the Rackspace Cloud for them.

Check out this demo of Infochimps Platform running in the Rackspace Cloud:

Why OpenStack and Rackspace?
From the beginning, the Infochimps Platform has been built on a foundation of open source tools for managing data, aimed at simplifying the experience of working with complex technologies such as Hadoop or Cassandra. Within the Infochimps Platform, Wukong, Ironfan and Swineherd are major open sourced components of the stack. OpenStack supports our open source tradition with its strong open source ecosystem. It is used by and contributed to by not only Rackspace, but organizations such as NASA, Canonical, RedHat, Dell, HP, and AT&T, so its architecture serves a multitude of needs, rather than bending to the whims of a single provider.

OpenStack also encourages standardization among Infrastructure as a Service providers, which ultimately benefits everyone in the market. Clients can make (and remake) decisions based on their businesses’ current day to day needs, without needing to employ a crystal ball to try to predict which provider will be best for them in the long term. By sharing open and standard interfaces, cloud providers can compete on current quality and value, instead of fighting to lock-in customers based on promises.

The modular design of OpenStack is part of what makes standards possible without blocking innovation. There are a set of core APIs that every provider will support, and extensions for added capabilities that not every provider will want to allow. The contracts these APIs provide can be (and often are) fulfilled by different back-end providers, letting each provider make different architectural choices without requiring customers to completely retool to take advantage of them. All of this allows apples-to-apples comparison of provider architectures, without making orange sales impossible.

What does OpenStack mean for Infochimps?
The work we’ve done to support this announcement has enabled us to provide a level of abstraction from the Amazon Web Services environment, and we can deploy our platform in a cloud agnostic way. Many of our customers have asked for implementations on their in-house cloud environments – our OpenStack support allows those implementations to be airlifted in using a common set of APIs that sit on top of whatever infrastructure already exists, instead of one-off installations that require more custom development and introduce brittleness.

Interested in learning more about Infochimps, Rackspace, and OpenStack? Contact us today for more information!

Announcing Dashpot, our Analytics & Operations Dashboard for the Infochimps Platform

Infochimps is happy to announce Dashpot, an easy-to-use analytics and operations dashboard that provides business metrics and visualization, cluster management capabilities, and system monitoring on top of the Infochimps Platform. Dashpot gives you real time visibility and control of your Big Data stack running with Infochimps, helping you go from input to insight faster, with our best-in-class Big Data infrastructure and tools.

Here are some of Dashpot’s key features:

  • Business Metrics – Dashpot’s in-stream visualization provides business users with the ability to capture and visualize business metrics on the fly as data is being ingested into their Infochimps Platform. By enabling data to be decorated in-stream through our Flume-based Data Delivery Service, Infochimps enables quick introspection on how a data or business process is performing. Organizations can view spikes or drops in key system or business metrics in near real-time, enabling quicker response to changing business conditions, saving time and helping ensure higher quality and more valuable information in the organization’s ultimate datastore. Infochimps business metrics are designed to provide an intermediate data visualization capability in conjunction with an organization’s existing investments in traditional business intelligence solutions.
  • Cluster Management – Built on the power of Ironfan, Dashpot offers simple Big Data system automation and management with a quick glance view into the servers and clusters currently running. Operations users can easily spin them up and down with a simple button click as their processing needs change, creating significant, easy-to-attain cost savings in machine usage.
  • Systems Monitoring – Dashpot provides integration with popular monitoring packages to provide users with at-a-glance views on Big Data system performance, availability, system integrity and more. Designed to easily integrate with any monitoring product, Infochimps has implemented the popular open source product, Zabbix as its initial reference monitoring solution, integrating Zabbix graphs on system performance and availability in the Infochimps Dashpot dashboard.

Implementing and operating Big Data architectures can be difficult, requiring significant investment of resources and time. By choosing to use the Infochimps Platform, enterprises needn’t worry about the time and hassle of building and maintaining their own infrastructure. When combined with our tools, such as Ironfan and DDS, Dashpot’s simple visualizations and management tools help organizations keep their Big Data system humming, with little operational overhead. Best of all, Dashpot’s in-stream visualizations help provide the insights businesses need to get the most value out of their Big Data infrastructure investment.

Interested in talking about how we can help simplify your Big Data stack?  Contact us today for more information!

Announcing the Infochimps Platform for Big Data

homepage new cropped1 Announcing the Infochimps Platform for Big Data

 

The Age of Big Data
Readers of this blog are no strangers to the problems that Gartner declares to be the hallmarks of our age of Big Data – volume, variety, and velocity. Nor would I consider Infochimps community members dark to the fact that there are tons and tons of wealth contained in the world’s data, both internal and external to the organization.

What’s rarely admitted, however, is how difficult it can be to wrangle these data sets and operate the systems to process them. Running Hadoop and other distributed data architectures in the cloud is still a massive challenge, something typically managed by the data and operations elite. The demand for data science talent is growing and growing, setting salaries for these skilled individuals to ranges only the wealthiest enterprises can afford.

The Vision Behind Infochimps
When Infochimps was born, the co-founders set out with a mission that was deceptively simple – increase access to the world’s data. We understood that one of the first things that made this hard for people was actually finding the data, as search engines don’t really work for tables and spreadsheets. The Infochimps catalog was born, and from that the Infochimps Data Marketplace as a way to incentivize content providers to make their data more open and available.

The Data Marketplace has been wonderfully successful. Hundreds of thousands of visitors have downloaded data from our catalog of over 15,000 data sets sourced from over 200 suppliers, including Bundle, Foursquare, and Twitter. Thousands of application developers from the likes of Sheckys, Summify, and Crimson Hexagon, have leveraged our data to make their apps more rich and compelling.

But we’ve always known that it’s not enough. Raw data is just the fuel. Without an engine to make it into something productive for the individual or organization, it’s doomed to not live up to its promise.

A Platform to Solve Our Own Problems
How do you get the world’s data to live in one place? This is no simple problem. Every day you’re dealing with the three major challenges quoted above. Some data sources update weekly, some by the minute, and others stream data to you at many GB’s per hour. Data can come in a tabular format, a JSON string, or a giant blob of text. Not to mention the sheer volume of sources and data you’re faced with warehousing.

From the beginning, Infochimps has used Amazon Web Services (AWS), Hadoop, and a number of other Big Data technologies to source and aggregate the world’s data. Faced with the resource and personnel constraints of a typical startup, we began with a simple best-effort design approach, allowing our small team of data engineers to get away with moving massive cloud resources around with minimal effort. We developed Wukong to make it easy for our Ruby developers to run Hadoop jobs, and extended Chef into Ironfan (formerly known as Cluster Chef) to make the instantiation and management of our infrastructure so simple our engineers can “move cities with their minds.”

Google rocked the world when it released its Map Reduce paper, inspiring what became Hadoop, and allowing the rest of the world to take advantage of the tools it developed for its own data gathering efforts. In a similar vein, it is our hope that the release of our own internal technologies as a Platform product may help the world’s organizations to gather and manage the world’s data for their own purposes.

Context – the Next Level
A recent New York Times article featured some of the analytics done by Target, where marketers there had been able to figure out that a woman was pregnant based on her purchase patterns. This type of insight is remarkable and only marks the beginning of what’s to come as all our purchases, clicks, and check-ins are tracked and analyzed. Organizations will be able to take this only so far; however, if they restrict their imaginations to just their own data.

The next big leap for the world’s organizations will be how they use all of these new and developing information streams – from Google search traffic, tweets, 100 years of weather measurements, check-ins, and UFO sightings. In the financial world, researchers have demonstrated that Google search query data can predict inflation metrics, weeks before the official numbers come out. Ecommerce websites have long used data like our IP-Geolocation to personalize web experiences to increase conversions.

The Infochimps Data Marketplace has helped us all appreciate the breadth of data the world has to offer. Now, we can help those organizations that want to use this data to find insight, increase revenues, and cut costs.

Interested? Want to know more?
The Infochimps Platform is made up of a suite of technologies we’ve developed internally, plus a number of open source software that we’ve developed tools and techniques for managing. The Platform comes with the brains and experience of the brilliant Infochimps team in order for you to maximize your return on a Big Data infrastructure investment.

For more information about the Platform, please use our contact form here.

We are excited to hear from you!

Winner of the Strata 2012 Conference Pass

Strata data conference Winner of the Strata 2012 Conference PassThanks to the random number generator, we’ve selected a winner amongst the folks who entered.  Congrats to #22 aka Nicolas Thiébaud.  And we swear… it’s not because he promised us French pastries, though we are excited for the rising Hadoop community in his home country!

We’ll see you at Strata!

Infochimps at Strata Conference 2012

strataheader Infochimps at Strata Conference 2012

We’re excited to have our CTO, Flip Kromer presenting a talk at Strata Conference in Santa Clara later this month.  The discussion centers around disambiguation.  Now you might be wondering… what is disambiguation?  Simply put, disambiguation is the process of resolving conflicts to remove ambiguity.  We’ve discussed this topic a number of times in this blog and Flip will be presenting on how this concept affects the way we ask questions and find answers about Big Data.

For more details on the talk, check out the Strata schedule.