VMware aims for Hadoop on VMs with ‘Serengeti’ project
VMware is launching a new open-source project, called “Serengeti,” that aims to let the Hadoop data-processing platform run on the virtualization leader’s vSphere hypervisor. The company apparently smells a lucrative opportunity with growing enterprise interest in the Hadoop data-processing platform, and is not about to miss out on it. Serengeti is just one of several moves VMware has made lately to make big data and virtualization software play nice together.
The company explained the thinking behind Serengeti in a press release:
By decoupling Apache Hadoop nodes from the underlying physical infrastructure, VMware can bring the benefits of cloud infrastructure – rapid deployment, high-availability, optimal resource utilization, elasticity, and secure multi-tenancy – to Hadoop…
That sounds great — and all those features represent current shortcomings for most Hadoop distributions — but there are some significant limitations to running Hadoop on virtual resources (this tutorial from Apache’s Hadoop Wiki lays out the pros and cons as they currently stand)…

