- January 28, 2009
Hold on to your pith helmets: the Infochimps are releasing an Amazon Machine Image designed for data processing, analysis, and visualization.
Amazon’s Elastic Compute Cloud (EC2) allows users to instantiate a virtual computer with a pre-installed operating system, software packages, and up to 1 TB of data loaded on disk, ready to work with, from a shared image (an “Amazon Machine Image”, or AMI).
MachetEC2 is an effort by a group of Infochimps to create an AMI for data processing, analysis, and visualization. If you create an instance of MachetEC2, you’ll be have an environment with tools designed for working with data ready to go. You can load in your own data, grab one of our datasets, or try grabbing the data from one of Amazon’s Public Data Sets. No matter what, you’ll be hacking in minutes.
We’re taking suggestions for what software the community would be most interested in having installed on the image (peek inside to see what we’ve thought of so far…)
We’ve thought of including some subset of
- Ruby, Python, Erlang, R
- MySQL, PostgreSQL
- AllegroGraph, CouchDB
- Hadoop, Hive, Pig
- Cytoscape, Gruff
- Processing, Prefuse/Flare, Modest Maps
- NLTK, SciPy
What other software would you like to see? Operating system preferences? Know of any similar AMI’s? (Only suggest free and open software please!)
When we feel that the AMI is getting too bloated, we’ll split it up: MachetEC2-ML (machine learning), MachetEC2-viz, MachetEC2-lang, MachetEC2-bio, &c.
(Also check out a similar discussion on the forums at FlowingData. We’ll reply to comments both here and there.)