Monthly Archives September 2013

‘Tis the Season…for Events

It’s that time of year again…Big Data events season. After hacking away at our events calendar, we decided to make an appearance at the following events:

Data Curiosity RoundTable

  • When: September 25, 2013
  • Where: Austin, Texas
  • What: This is an open forum with the goal of sharing and trading ideas as we all “noodle” through the byproduct of intensive Big Data harnessing
  • Who: Presented by Dell, and Infochimps Sales Engineer Morgan Goeller will be attending this event
  • Why: A roundtable discussion with free pizza

LEARN 300x77 Tis the Season...for Events




  • When: September 30 – October 3, 2013
  • Where: Las Vegas, NV
  • What: The 4th Annual Splunk Worldwide Users’ Conference
  • Who: Presented by Splunk, and Infochimps CEO Jim Kaskade will contribute to the CxO Big Data panel on Tuesday, October 1
  • Why: Deepen your knowledge of Splunk, learn best practices, check out new solutions, see how others apply Splunk technology to real-world projects, and become more involved in the Splunk community

LEARN 300x77 Tis the Season...for Events

GigaOM Mobilize 2013

  • When: October 16 – 17, 2013
  • Where: San Francisco, CA
  • What: A 2-day conference to examine what opportunities are created by new mobile technologies and business models
  • Who: Presented by GigaOM
  • Why: This conference helps attendees make sense of where developments are headed and how they’ll affect everything from applications to infrastructure choices

LEARN 300x77 Tis the Season...for Events

The Big Data Conference

  • When: October 22 – 23, 2013
  • Where: Chicago, IL
  • What: A 2-day conference of comprehensive content for every IT, marketing or digital professional seeking to capitalize on the boom in data volume, variety and velocity
  • Who: Presented by UBM
  • Why: This conference fills a gap in the US market for businesses looking to capitalize on the Big Data opportunity and look for efficient solutions to manage the ever growing amount of structured and unstructured data

LEARN 300x77 Tis the Season...for Events

…and last but not least…our biggest conference of the year: Strata Conference + Hadoop World

  • When: October 28 – 30, 2013
  • Where: New York, NY
  • What: 3 days of inspiring keynotes and intensely practical, information-rich sessions exploring the latest advances, case studies, and best practices
  • Who: Co-presented by O’Reilly and Cloudera, and Infochimps CEO will be keynoting Wed, Oct. 30 at 9:50am EDT in the Grand Ballroom
  • Why: Strata + Hadoop World is where Big Data’s most influential decision makers, architects, developers, and analysts gather to shape the future of their businesses and technologies

LEARN 300x77 Tis the Season...for Events

If you’re going to any of the above events and want to set up a meeting with a chimp to talk about Big Data, we’d love to chat.

CONTACT 300x78 Tis the Season...for Events

3527b357 2038 47ae a163 deda4a8c5176 Tis the Season...for Events

Jim Keynotes Strata Conference + Hadoop World

Strata Hadoop 300x76 Jim Keynotes Strata Conference + Hadoop World

Everyone’s talking about Big Data, but who’s actually doing it right?

Located in New York, NY from Oct. 28-30, Infochimps will be going big at Strata + Hadoop World, along side thousands of the best minds in data gathering to learn, connect, share knowledge, and explore.

Easily the biggest show of the year for us, we are excited to announce that Infochimps CEO Jim Kaskade will be keynoting Wed, Oct. 30 at 9:50am EDT in the Grand Ballroom.

LEARN 300x77 Jim Keynotes Strata Conference + Hadoop World



Jim’s not the only face from Infochimps going to the show; we’ll be exhibiting with a packed booth (Booth #38) full of eager chimps ready to talk about Big Data. Key exhibiting team members include our VP of Sales Burke Kaltenberger, Director of Marketing Amanda McGuckin Hager, Director of Product Tim Gasper, Director of Sales Strategy and Operations Ryan Miller, and Demand Gen Manager Caroline Lim. If you’re going to Strata + Hadoop World and would like to set up a meeting with a chimp, we’d love to chat.

CONTACT 300x78 Jim Keynotes Strata Conference + Hadoop World

Not registered? Register today, save 20% with discount code: INCHP, and be sure to stop by Booth#38 to chat with us about Big Data.

119efc1b cf09 4f4f 9085 057e76e0464c Jim Keynotes Strata Conference + Hadoop World

Part 2: The Lucky Break Scoreboard

Last week, Infochimps CTO Flip Kromer introduced his truth on the failures that led to the successful acquisition by CSC in his blog post, Part 1: The Truth – We Failed, We Made Mistakes.  Flip continues his blog series with Part 2, his love letter – the real Infochimps story.


7 years ago, having switched majors from Computer Science in college to Physics in grad school, and failing twice to successfully execute a plan of research in Physics, I decided to switch to Education – my favorite part of grad school was teaching. A year before, my ever-patient advisor, physics professor Mike Marder, had started a wildly successful alternative program for a public-school teaching certification. It replaced a full general education curriculum with frequent in-classroom experience and focused education classes  — and it let me reuse the scientific coursework I already had way too much of.

A year later, I was near the end of the program and preparing my teaching portfolio, which led me to spend a lot of time thinking about what I wanted my students to learn, and why. For many of them, my course would be their last formal chance to acquire the skill of quantitatively understanding their universe. As I started to write (less bluntly), I had no interest in burdening them with three different forms of the quadratic equation, or pretending that as a practicing physicist I’d ever used the formula for the perimeter of a trapezoid.

What they should be learning was the ability to make use of a complex information stream, understand sophisticated information displays, and extract straightforward insight using tools such as … … ‽‽

I paused, struck, mid-sentence. Those tools do not exist. Not for a high school student, not for a domain expert in another field, and only after years of study, for me. That’s what I was supposed to be working on: democratizing the ability to see, explore and organize rich information streams.

So as a lapsed computer scientist and failed physicist, I decided to abandon education as well and start yet a different new thing, one that was none of those and all of those together.

Challenge Accepted

I asked Mike Marder if I could come back to his research group and work on tools to visualize data; we could figure out along the way how to tie it into a research plan. I had some savings (thanks largely to my Grandmother, who was just your typical successful 1940’s woman entrepreneur), so I wouldn’t cost him any money. Mike reasoned that although I didn’t know how to solve my own problems, I was frequently useful in helping others solve theirs — and who knows, I seemed really fired up about this new idea whatever it was. So all in all it was an easy decision to hide me away in a shared office and let me get to work.

Building the visualization tool required demonstration data sets to prove the concept, and there are few better than the ocean of numbers around Major League Baseball.

In addition to the retrosheet project — the history of every major-league baseball game back to the 1890s — was publishing one of the most remarkable data sets I knew of. For the past seven years, it gives every single game, every single at-bat, every single play, down to the actual trajectory of every single pitch. I first started playing with the retrosheet data, and found some scattered errors — things like a game-time wind speed of 60mph.

(Lucky break scoreboard: most patient graduate advisor ever; financial safety and family support.)

Weekend Project Gone Awry

Well, the NOAA has weather data. Lots of weather data. The hour-by-hour global weather going back 50 years and more, hundreds of atmospheric measurements for every country in the world, free for the asking. And the Keyhole (now Google Earth) community published map files giving the geolocation of every current and historical baseball stadium.

So if you’re following, we have:

  • A full characterization of every game event
  • … including the time of the game and the stadium it was played in,
  • … and so using the stadium map files, the event’s latitude and longitude
  • … and using that lat/long, all the nearby weather stations
  • … and using the game date and time, the atmospheric conditions governing that event

I connected the data sets looking to correct and fill in the weather data, and found out I accidentally wired up a wind tunnel. There’s no laboratory with the budget to have every major league pitcher throw thousands of pitches for later research purposes — none, except the data set I described.

What’s screwy (and here’s where every practicing data scientist groans and shakes their head) is that the hard part wasn’t performing the analysis. The hard parts were a) making that data useful, and b) connecting the data sets, making them use the same concepts and measurement scales.

But all that work — the mundane, generic work anybody would have to do — just sat there on my hard disk. If I created a useful program, or improved an existing public project, I knew right where to go: open-source collaboration hubs like sourceforge or github. But no such thing existed for data. I had to spend weeks transforming the MLB game data into a form that you could load into a database. If we could avoid that repetition of labor, we would solve the problem of every practicing data scientist.

On Christmas Day 2007, I bought a book on how to build websites using the “Ruby on Rails” framework, and figured I’d knock something useful out in, y’know, a week or so. By sometime that Spring, I had something useful: a few interesting data sets and a website to generically host and describe any further data sets. The initial version of the site was read-only, because I didn’t know how to do join models or form inputs in Ruby on Rails, but I could add new data sets directly to the database. And just like that, Infochimps was born.

I cold-emailed blogger Andy Baio, who linked to “Infochimps, an insane collection of open datasets”. For a guy working alone in an ivory tower, the resulting response was overwhelming.

One of the individuals who emailed to encourage us was Jeff Hammerbacher, founder of the data team at Facebook. Chatting on the phone with him, he told me about a new data analysis tool that Facebook was using, called Hadoop. I looked into it, but couldn’t see how I would ever need to use it. Still, it was really exciting that big names in data were taking interest.

On a trip to San Francisco a few weeks later, I went to a meetup at Freebase. @skud, their community manager, recognized that Infochimps was the perfect raw-data complement to Freebase. She asked me to come back the next month and give a meetup talk. Kurt Bollacker, head of their data team (and future teammate and profoundly valuable mentor), asked me to come back the next day and give an internal lunch lecture. I stayed up all night using google docs on my uncle’s powerpoint-less computer, and gave some hot mess of a presentation to their internal group. Kirrily didn’t uninvite me, so it wasn’t too bad.

It was clear that the lack of a collaboration hub was a problem many people were feeling.

So as a lapsed computer scientist, failed physicist, and no-show educator, I decided to abandon working on a visualization tool and make a collaboration hub instead. Yup.

(Lucky break scoreboard: most patient graduate advisor ever; financial safety and family support; incipient critical mass of public data sets; new breakthroughs in the world; big names taking interest in the project and deciding to market it.)


One of the new faces on Mike’s research team when I returned was Dhruv Bansal, who was working on a fascinating problem bridging Mike’s two interests: physics and education. They used a freedom-of-information request to acquire a fascinating data set: the anonymized test scores for every student, on every question, for the yearly exam taken by every schoolchild in Texas.

They used the physics equations for fluid flow to model the year-on-year change in student test scores, highlighting patterns that demanded immediate action within the education community.

As you can guess again, the costliest part of that project was not performing the analytics; or applying the Fokker-Planck equation for fluid-flow; or working the paper through peer review. No, the costliest part of the project was the 3-month process of acquiring the data and cleaning it for use. For the random researcher who discovered and requested the data, Dhruv would spend a few hours burning the data to a DVD and physically mail a copy. For reasons I still don’t understand, while researchers in Sociology, Psychology, other “soft” sciences immediately latched on to the usefulness of Infochimps from the very start, Physicists and Computer Scientists almost never understood what we were doing or why it might be valuable. Dhruv and Mike’s split focus meant they got it immediately.

This is probably the most unlikely lucky break, and most crucial development, of this adventure: sitting a few offices away from where I worked was one of the most talented programmers I’ve ever worked with, possessed with a mountainous drive to change the world, the laconic cool to keep me level, and a furious anger at the same exact problem I was working to solve.

(Lucky break scoreboard: most patient graduate advisor ever; financial safety and family support; incipient critical mass of public data sets; new breakthroughs in the world; big names taking interest in the project and deciding to market it; sharing the same advisor as Dhruv.)

Twitter Dreams

At around this time Twitter was blowing up in popularity, though still a tool largely used by nerds to tell each other about what they had for lunch. We couldn’t explain, any more than most, the appeal of Twitter a social service.

But to 2 physicists with a background in the theory of random network graphs, Twitter as a data set was more than a social service, it was a scientific breakthrough. It implemented a revolutionary new measurement device, giving us an unprecedented ability to quantify relationships among people and conversations within communities. Just as the microscope changed biology, and the X-ray transformed medicine, we knew seeing into a new realm places us on the cusp of a new understanding of the human condition. Making this data available for analysis and collaboration was the best way to provide value and draw attention to the Infochimps site. We emailed Alex Payne, engineering lead at Twitter, for permission to pull in that data and share it with others. He gave me a ready thumbs-up: better that scientists download the data from us, than that they pound it out of his servers.

We wrote a program to ‘crawl’ the user graph: download a user, list their followers, download those users, list their followers, repeat. That was the easy part. Sure, each hundred followers had hundreds of followers themselves, but we could make thousands of requests per hour, millions of requests per week.

The hard part came over the next few weeks as we realized that none of our tools were remotely capable of managing, let along analyzing, the scale of data we so easily pulled in. As quickly as we could learn MySQL, the data set outgrew it. Sure, Dhruv and I could request supercomputer time for research, but supercomputers weren’t actually a good match — they’d be more like a rocketship when what we needed was a fleet of dump trucks. We realized what we needed was Hadoop, the tool Jeff Hammerbacher mentioned to me a few months earlier.

But where could we set up Hadoop? The physics department’s computers were scattered all over and largely locked down. But I also had an account on the UT Math department’s computers. Their sysadmin, Patrick Goetz, was singularly passionate about enabling researchers with the tools they needed to make breakthroughs. He took the much more courageous (and time-consuming for him) route of allowing expert users to install new software across departmental machines.

What’s more, the Math department had just installed a 70-machine educational lab. During the day, it was filled with frustrated freshman fighting Matlab and math majors making their integrals converge. From evening to 6am, however, they were just sitting there… running… inviting someone to put them to good use.

So that’s what we did; put them to good use. We set up Hadoop on each of the machines, modifying their configuration for the comparatively wussy undergrad-lab hardware, and set about using this samizdat supercluster on the Twitter user graph.

(Lucky break scoreboard: most patient graduate advisor ever; financial safety and family support; incipient critical mass of public data sets; new breakthroughs in the world; big names taking interest in the project and deciding to market it; sharing the same advisor as Dhruv; the explosion of social media data; the invention of Hadoop.)

Data Community

All through 2006-2009, people walking different paths — social media, bioinformatics, web log analysis, graphic design, physics, open government, computational linguistics — were arriving in this wide-open space, forming communities around open data and Big Data.

On Twitter, we were finally seeing what all the people in our favorite data set knew: a novel communication medium that enabled frictionless exchange of ideas and visible community. I’ll call out people like @medriscoll (CEO of Metamarkets), @peteskomoroch (Pricinpal Data Scientist at LinkedIn) @mndoci (Product Manager of Amazon EC2), @hackingdata (Founder of Cloudera, now professor at Mt Sinai School of Medicine), @dpatil (everything), @neilkod and @datajunkie (Facebook data team), @wattsteve (Head of Big Data at Red Hat), among dozens more. It didn’t matter if someone was a random academic, a bored database engineer, a consultant escaping one field into this new one, a big name building the core technology. When you saw a person you respected talking to a person with a good idea, you hit “follow”, and you learned. And when you heard that someone in the Big Data space wasn’t on Twitter, you harangued them until they joined. (Hi, Tom!)

Meanwhile, Aaron Swartz had started the Get.theinfo Google Group. This most minor of his contributions had a larger impact that most know, and was typical of why he’s so missed. He recognized a problem (no conversation space for open-data enthusiasts), built just enough infrastructure to solve it (a google group and a website), then galvanized the community to take over (gifting enthusiastic members with the white elephant of moderator permissions), and offered guidance to make it grow.

The relationships we built and communities we joined became critical catalysts for our growth.

Twitter Reality

We spent the next several months building out the site during the day and running analysis on the growing hundreds of gigabytes by night (does that seem quaintly small now?). Right before Christmas break, we did a set of runs producing data suitable for people in the community to find useful. Hours before hopping on the plane to visit my family, I finished compressing and uploading them, wrote up a minimal readme file, and posted a note to the Get.Theinfo mailing list. I knew the folks there wouldn’t mind the rough cut version, so I figured I’d mention it quietly there, but wait to do a proper release after break — after all, there was no internet where I’d be staying.

Well, two predictable things happened: 1) a huge response, far more than expected, flowing up the chain to large tech blogs and twitter-ers, and 2) a polite but forceful email from Ev Williams (Twitter’s CEO) asked us to take the data files down while they figured out a data terms-of-service. We reluctantly removed the data.

Sure, the experience was a partial success. It brought great publicity, and of course you probably caught the foreshadowing of how important Hadoop was about to become for us. But we failed at the important goal, sharing this immensely valuable data we invested months to release.

Minister of Simplicity

Now to introduce Joe Kelly into the story. Our research center decided to hire someone to build our new website, and one of the respondents to our Craigslist ad was Joe, a former UT business school student who had been working with his roommate to get their general contracting firm off the ground. He didn’t really know how to design websites, but he absolutely loved reading about the science our center was doing, so he applied.

His interview was amazing. He had the design sense of a paper bag compared to the other candidates, but every one of us left the room saying, “wow, that guy was awesome, the kind of person you just want to work with on a project”. Only Dhruv was smart enough to take the face-slappingly obvious next step — replying 1-to-1 to a later email from Joe to say, “well, hey, we also have this other project going on; we don’t really want need your help on the website, but there’s a lot of work to do”. Within days, Joe had set up a bank account and PO box, organized the papers to make us an official partnership, and generally turned this ramshackle project into an infant company. It was an easy decision for Dhruv and I to make him a co-founder.

An easy decision until a few days later, when I read some cautionary article about how the #1 mistake companies make is choosing co-founders hastily. Well, hell. We just made this guy we randomly met a couple weeks ago a co-founder, handing him a huge chunk of the company. I didn’t know if we just made a huge mistake or not.

So the next day, we were hanging out at the Posse East bar (our “office” for the first several months of the company), and Joe introduced us to the idea of an Elevator Pitch. “If we’re going to be at the South by Southwest (SXSW) Conference, we need to be able to explain Infochimps”. I replied with some kind of rambling high-concept noodle. Dhruv rang in with his version — more scientific, more charm and cool, but no more useful than mine.

Joe replied, “No. What Infochimps is this: ‘A website to find or share any data set in the world'”.

I rocked back in my chair and knew Dhruv and I made one of the best decisions of our lives. His version said everything essential, and nothing more. In one week, he understood what we were doing better than we did after a year. Joe’s role emerged as our “Minister of Simplicity”. He removed all complications, handled all necessary details, smoothed all lines of communications, making it possible for our team to Just Hack. Everything essential, and nothing more.

Capital Factory

With the decision to move forward as a company, not an academic project, we applied to the starting class of Capital Factory (Austin’s startup accelerator). It was an amazing experience, and we went hard at it: we hit all the meetings, spent hours working on our pitch, tried to make contact with every mentor, and made an epic application video. (One of Dhruv’s housemates was a professional filmmaker. Friends in high places.)

We got great feedback and obvious interest from the mentors, and were chosen as finalists. We were confident that we had the right combination of team and big idea to merit acceptance.

They rejected us.

After the acquisition, Bryan Menell — one of the Capitol Factory founders — posted a graciously bold blog post explaining what happened. As we later heard from several mentors, they each individually loved our company. Once in the same room though, they found that none of them loved the same company. This mentor loved Infochimps, a company that would monetize social media data. This other one loved Infochimps, a set of brilliant scientists who could help businesses understand their data. Some of them just knew we worked our asses off and were incredibly passionate about whatever the hell it is we were doing but couldn’t explain. A few of the mentors loved Infochimps because we were building something so cool and potentially huge that surely some business value would later emerge. Whichever idea a mentor did like, they generally didn’t like the others.

I can’t overstate how difficult it was to explain what we were doing back then. After two years, we can now crisply state what we had in mind: “A platform connecting every public and commercially available database in the world. We will capture value by bringing existing commercial data to new markets, and creating new data sets from their connections.” It’s easy(er) now, partly because of the time we spent to crystallize an explanation of the idea. Even more so, people now have had years of direct experience and background buzz preparing them to hear the idea. For example, the concept that “sports data” or “twitter data” might have commercial value was barely defensible then, but is increasingly obvious now.

Above all that though, the Capital Factory mentors were right: we were all those ideas, and all of those ideas were (as we’d find out) mostly terrible. And working on the combination of all of them was a beyond-terrible idea. On that point, Capital Factory was right to reject us.

We worked hard, had the perfect opportunity, and failed.

For good reasons and bad, we failed to get in, Or, well, we mostly failed to get in. Some of the mentors liked what they heard enough to stay in touch — meeting for beers and advice, making introductions, and being generous with their time and contacts in many other ways. The Austin startup scene was about to explode, led by Joshua Baer, Jason Cohen, Damon Clinkscales, Alex Jones and others. The energy that the Capital Factory mentors and these other leaders put into mentoring startups like ours ricocheted and multiplied within the community, in the kind of “liquid network” that Steven Johnson writes about. Although the companies within the first CapFac class benefited the most, it was like every startup in Austin was admitted.

The Truth

On the one hand, we had a bunch of fans in blog land, some website code, and a good team. But we had no idea how to make money and a finite runway. Our most notable validation as a project was a failed effort to share data, and our most notable validation as a business was an honorable mention ribbon.

Are you seeing it?

We were experiencing success after success after success.

Every time we failed, a smaller opportunity opened: one that was sharper; one that was more real; one that brought us closer to the right leverage point for changing the world.

These opportunities were smaller, but the energy behind them was the same. We were following what inspired people — to use data sets from Infochimps, to post a data set, to join our pied-piper team, to tweet about us, to make an intro, to have coffee and teach us something. All our ideas were useless crap, except in one essential way: to gather and inspire the people who would help us uncover a few ideas that were good, and execute on them.

(Lucky break scoreboard: most patient graduate advisor ever; financial safety and family support; incipient critical mass of public data sets; new breakthroughs in the world; big names taking interest in the project and deciding to market it; sharing the same advisor as Dhruv; the explosion of social media data; the invention of Hadoop; the completely random intersection with Joe; starting Infochimps just as the Austin startup scene exploded.)

The 3rd part of this blog series will highlight the journey from “project that inspired people” to “business that solved a real problem” — powered by individuals who made sizable investments of time, energy, money and kindness to produce repeated successes from repeated failures, and by the early customers of Infochimps who believed in us.

As we go, that  “lucky break scoreboard” will get more and more improbable, enough to make that word “lucky” ludicrously inapplicable.

Philip (Flip) Kromer is co-founder and CTO of Infochimps where he built scalable architecture that allows app programmers and statisticians to quickly and confidently manipulate data streams at arbitrary scale. He holds a B.S. in Physics and Computer Science from Cornell University and attended graduate school in Physics at the University of Texas at Austin. He authored the O’Reilly book on data science in practice, and has spoken at South by Southwest, Hadoop World, Strata, and CloudCon. Email Flip at or follow him on Twitter at @mrflip.

b0bae296 90b0 4bfe 8177 b5ac72be71c6 Part 2: The Lucky Break Scoreboard

Reinvent Your Business for Big Data

Infomart Reinvents its Business for Big Data with Infochimps 

Infochimps Solution 300x91 Reinvent Your Business for Big DataOver the past ten years, the media business has been turned on its head. The general shift from print to digital (and increasingly free) sources has challenged the traditional revenue model. To make matters even more complicated, the advent of social media has added a multitude of layers of interaction to digital content, making the task of determining how target audiences are responding to brand initiatives incredibly complex.

Learn how Infomart, Canada’s leading media consultancy of 25 years,  reinvented their business by transforming a legacy app with Infochimps Cloud for Big Data.

READ 300x80 Reinvent Your Business for Big Data



Other resources you may be interested in:

6fefa857 2e95 4742 9684 869168ac7099 Reinvent Your Business for Big Data

Infochimps SXSW Panels: Voting Closes Tomorrow

sxswi 2014 Infochimps SXSW Panels: Voting Closes TomorrowCalling all supporters, calling all supporters, it’s that time of year again.

SXSW Panel Voting! Voting ends tomorrow, Friday, September 6, 2013 (11:59pmCST) – Please read the panel submissions below and vote for your Chimps.

Growing an Open-Source Project: Code to Community 

  • Speaker: Infochimps CTO Flip Kromer
  • Description: How do you grow an open source project from “It’s public and has a LICENSE file” to “Caught fire; people we’ve never met commit more code than we do”?
  • We’ll explore:
    • How do you promote awareness and word-of-mouth, and foster the early community?
    • How do you navigate and balance the twin goals of production stability and community-driven features?
    • How do you ensure code quality without discouraging involvement?
    • Of the values gained from open source – free velocity, hiring, credibility, reputation, and so forth – how much tangible value are you deriving and when does that return start exceeding investment?

VOTE 300x71 Infochimps SXSW Panels: Voting Closes Tomorrow



Managing Effective Documentation Effectively

  • Speaker: Infochimps Customer Support Engineer Rachel McCuistion
  • Description: Maintaining accurate, up-to-date, and effective documentation requires time, devoted content producer(s), and expertise. The essence of a company’s documentation should not hinder but accelerate the company’s focus and productivity. We’ll discuss the importance of creating effective documentation, how to maintain a healthy lifecycle for internal and external documentation, the common pitfalls that can lead to less effective documentation, answer the most common and difficult questions, and finally introduce an effective workflow for maintaining accurate and helpful documentation including the best tools have have been proven to increase efficiency and minimize downtime.

VOTE 300x71 Infochimps SXSW Panels: Voting Closes Tomorrow

Inbound Marketing for the Lean Startup

  • Speakers:
  • Description: Lean methodology has provided a great framework for validating your business assumptions and model — by leveraging an Inbound Marketing model with Lean, you can benchmark against your hypotheses while also growing your business in real, measurable ways. Proven lean startup veterans will teach you how to set up an inbound marketing engine and use the engine to test, validate, and grow your business using Lean tools and approaches. With over a decade of experience, we will share best practices, lessons learned, and pitfalls to lookout for. This workshop will be four hours long.

VOTE 300x71 Infochimps SXSW Panels: Voting Closes Tomorrow

Thank you for all your support and we hope to talk Big Data with you at SXSW.

Image source:

119efc1b cf09 4f4f 9085 057e76e0464c Infochimps SXSW Panels: Voting Closes Tomorrow

Part 1: The Truth – We Failed, We Made Mistakes

announcement 240x240 Part 1: The Truth   We Failed, We Made Mistakes

As I’m sure most of you have heard, Infochimps was recently acquired by CSC, giving us the resources and mandate to build the Big Data platform of the future. This is a perfect landing for the company and our vision, and we couldn’t be more excited.

The great acquisition stories I’m familiar with have a few commonalities: the companies share a mission and vision; the acquired team works to focus their product and integrate it with the parent company’s offering; and the parent company gives them the resources to succeed without changing what enabled the acquiring team to excel.

A perfect example of this was when Apple bought Siri. At that time, Siri was a cute little iphone app, built on amazing technology and with a highly-respected engineering team behind it. Married to Apple’s powerhouse strengths and global network, the result has transformed the way people interact with machines and is a centerpiece advantage of Apple’s product. Our goal is for nothing less than a similar story within CSC.

CSC is a global corporation that provides information technology (IT) services and professional services. They employ 95,000 people globally, who create a $16B revenue stream serving governments and large enterprise. Our challenge, and we embrace it, is to provide a significant positive return even against that massive background.

We think that we can do so (as do many analysts) because the acquisition marries the signal strengths of Infochimps and CSC:

We live in the future:

  • Proven Big Data expertise and perspective on the technical landscape
  • An indelible culture and a crazy-awesome team
  • Solid open-source citizenship, as contributors to the projects we build on and stewards of well-adopted projects we’ve written

CSC lives at enterprise-scale:

  • 50+ years of expertise in big enterprise and security
  • Passion for building customer solutions and support
  • The resources a $16B revenue stream provides

CSC’s strengths address our biggest weaknesses, letting us focus on what we do best. There are no changes to the team, the culture, our Austin location, our open-source contributions, our development approach, our irreverence, our hiring standards, or our mission to make the world smarter. We’ll continue to operate independently, continue buying lunch for the office every day, and continue open-sourcing the majority of code we write.

So this is a huge win for our team, our customers, our investors, and CSC. I could finish the post right here, and all anyone would remember is that we persevered and reached this milestone through tenacious hard work and great ideas.

Well here’s the truth: The actual history of our company is one of failure after failure, costly mistakes, and multiple near-death experiences. The only reason we’ve “succeeded” is through a preposterous series of lucky breaks and kind acts. Trying to list all the people behind that hard work and those lucky breaks would be foolish. There are too many, and I’ll just offend some by omission. But if you’re reading this, you’re probably one of them; so thank you.Success Failure Part 1: The Truth   We Failed, We Made Mistakes

Now for the real story; the story you probably haven’t heard. The story to show how large the number of people making sizable investments of time, energy, money, and kindness is required to make successes out of failures and how a small favor can change the world. It’s a thank you note to those who have helped us and a love letter to other startups figuring it out as they go. It’s a reminder that this is just another chapter of Infochimps’ book, and we’re nowhere near the resolution.

Thanks and love from the co-founders and whole Infochimps team,
Flip Kromer
Infochimps Co-Founder and CTO

*Update* Flip continued this blog series with Part 2: The Lucky Break Scoreboard, where he explains “with every failure, a smaller opportunity opened: one that was sharper; one that was more real; one that brought us closer to the right leverage point for changing the world.” Read Part 2 >>

Philip (Flip) Kromer is co-founder and CTO of Infochimps where he built scalable architecture that allows app programmers and statisticians to quickly and confidently manipulate data streams at arbitrary scale. He holds a B.S. in Physics and Computer Science from Cornell University and attended graduate school in Physics at the University of Texas at Austin. He authored the O’Reilly book on data science in practice, and has spoken at South by Southwest, Hadoop World, Strata, and CloudCon. Email Flip at or follow him on Twitter at @mrflip.

b0bae296 90b0 4bfe 8177 b5ac72be71c6 Part 1: The Truth   We Failed, We Made Mistakes