Cloud Society: the compute cloud in collaborative research
October 24, 2011That cloud computing is of use in academic and corporate research spheres is in little doubt.
Research funding doesn’t scale very well – in academia, one of the main functions of professors and department heads is to identify opportunities, build the necessary relationships and then bid for pockets of finance. Meanwhile, even for the largest companies, the days of unfettered research into whatever takes a scientist or engineer’s fancy are long since over. Is cloud computing signalling a new dawn in research circles, however?…
Much research has a software element – from aerodynamic experiments on Formula 1 rear wings, to protein folding and exhaustive antibody comparisons, there’s no substitute for dedicating a few dozen servers to the job. Such tasks sometimes fall into the domain of High-Performance Computing – that is, providing highly tuned and customised kit that can run such tasks at maximum efficiency. Other times, however, simply having access to hardware resources is enough – as long as the price is right.
The infrastructure-as-a-service model popularised by Amazon’s Elastic Compute Cloud (EC2) has been of enormous help to research establishments of all flavours, for a number of reasons.
For a researcher, the idea of asking for twenty servers, correctly configured, would have been a problem in itself: no budget, no dice. Even if the money was available however, the kit would have to be correctly specified, sometimes without full knowledge of whether it would be enough. Consider the trade-off between number versus size of processors, coupled with quantity of RAM: it would be too easy to find out, in hindsight, that a smaller number of more powerful boxes would have been more appropriate.
Then come the logistical challenges. Lead times are always a challenge: even if (and this is a big ‘if’) central procurement is operating a tight ship, the job of speccing, gaining authorisation and checking the necessary contractual boxes can take weeks. At which point a purchase order is raised and passed to a supplier, who can take several more weeks to fulfil the order. It is not unknown for new versions of hardware, chipsets and so on to be released in the meantime, returning the whole thing to the drawing board.
Any alternative to this expensive, drawn-out yet unavoidable process would be attractive. The fact that a number of virtual servers can be allocated, configured and booted up in a matter of minutes can still excite, even though the model, and indeed the service, has existed for a few years. Even better, if the specification proves to be wrong, the whole lot can be taken down and replaced by another set of servers – one can only imagine the political and bureaucratic ramifications of doing the same in the physical world.
The absolute cherry on the top is the relative cost. As one CTO of a pharmaceutical company said to me, "And here’s the punchline: the whole lot, to run the process, get the answers and move on – cost $87. Eighty seven dollars," he said, shaking his head as though he still couldn’t believe it.
While such experiences are still recent enough to cause even the most dour of professors to break into a smile, evidence suggests that they are just the beginning. While in many such instances use of cloud resources is enabling algorithmic processing to take place where cost might otherwise have been prohibitive, they are still taking a reasonably traditional view of server-based computing and mapping it onto the cloud through virtualisation.
The opportunity – following the "things we haven’t thought of yet" epithet – is to take thing further than simply a virtual like-for-like for individual projects. A virtuous relationship is evolving between use of cloud resources and the increasingly collaborative nature of research, spawning shared facilities and tools such as those of the Galaxy project and Arch2POCM respectively.
While such moves are to be welcomed, some are concerned that they ignore one particular source of innovation – the maverick idealist who is concerned with the really big challenges, rather than just trying to solve one facet. Last week, Cycle Computing’s Jason Stowe announced the Big Science Challenge 2011, offering 8 hours of CPU time on a 30-thousand core cluster to whoever could come up with a big question. "No idea is too big or crazy," says the release; "We want the runts, the misfits, the crazy ideas that are normally too big or too expensive to ask, but might, just might, help humanity."
The final entry date is 7 November 2011. Whatever the result, it already illustrates that the answers we reach are dependent on the questions we ask, which tend to be based on the tools we have available. Times are changing however: the computing and collaboration models now available give researchers an unprecedented opportunity to ask previously un-askable questions, and truly deliver on the potential for innovation.


