Roll You Own Supercomputing Cluster
Amazon has activated a beta version of a new “Elastic MapReduce” service. The naming is a little obscure, but it looks like it’s essentially the Hadoop distributed computing framework running on their existing “Elastic Compute Cloud“, which is a distributed virtual computing environment running in turn on top of their Simple Storage Service, which is distributed server network. In other words, Elastic MapReduce is a huge, amorphous operating system designed for large computation tasks like web indexing or heavy science, running on top of a huge, amorphous processor and harddrive farm.
Calling these things “elastic” is Amazon’s way of pointing out that you can buy any amount of operating system/processor/storage that you care to pay for, as little or as much, with the only notional limit being the entirety of Amazon’s many large data center buildings. No fuss, no muss, give them some money and you can run your computing application for a while. Give them more and you can run it faster, longer, bigger. All you need is an internet connection to interface with it all. They won’t bother you with details like what state of the union your processes are running in, or on what hardware, or how many people are employed servicing it and swapping out burnt out drives and sweeping up the diet pepsi cans left by the programmers. It’s commodity supercomputing.
I wonder if I had been aware of this option if it would have changed how I structured my thesis research. Maybe I could have got another little research grant and bought myself $1000 worth of supercomputing (I don’t actually know how much supercomputing that would be). What could I have done with it? Then again, most of the software I ran for that project is Windows-only and presumably locked out of hadoop clusters. ArcGIS, Imagine, fragstats, patch analyst, Terrain Analysis System, GeoDA, all Windows and Windows only. Landscape ecology really needs to get off the Windows, it doesn’t scale.