HPCC Systems tunes big data platform for AWS

November 30, 2011 Off By David
Object Storage
Grazed from GigaOM.  Author: Derrick Harris.

HPCC Systems, the division of LexisNexis that’s pushing a big-data processing-and-delivery platform, has tuned its software to run on Amazon’s cloud computing platform. Interested developers can now experiment with the open source software without having to wrangle physical servers for that purpose, which brings HPCC one step closer to establishing itself as a viable alternative to the uber-popular Hadoop framework.

When I last spoke with HPCC Systems CTO Armando Escalante in September, he explained that although he thinks his company will have little trouble attracting risk-averse large enterprise and government customers, it will be tougher to establish a developer ecosystem similar to what Hadoop has built. As good as HPCC might be — and at least some analysts are starting to sing its praises — having a vibrant community goes a long way…

Hadoop has no shortage of startups, large vendors and individual developers committed to it already. That gives potential users the confidence that not only will Hadoop products be supported for a long time, but that the code will continue to improve and interoperate across a variety of different vendors’ data products, Hadoop-based or not.

Microsoft killing its Dryad data-processing platform to focus on Hadoop opened a door for HPCC Systems, but also served to block its entry into the room. Now there are really only two unstructured-data processing platforms of note, but having Microsoft on the Hadoop bandwagon is yet another sign that Hadoop is for real.

Making HPCC run on AWS — or any cloud — is a good start for HPCC Systems, as it provides a low-risk option for developers to get started on the platform. It will be even more appealing when HPCC’s software is supported by AWS’s Elastic MapReduce service, which the company says is the next step. Assuming one has data there to work with, the cloud is a great place to get started with big data tools because they generally require server clusters that are cheaper to rent than to buy, in the short term.

Technically, HPCC has only tuned a portion of its platform — the Thor Data Refinery Cluster — to run on Amazon Web Services, but that’s the part that matters most. Thor does the data-processing for HPCC, which makes it the apples-to-apples comparison with Hadoop. The platform also consists of the Roxie Data Query Cluster, a data-warehouse and query layer that’s akin to the higher-level Hive and HBase projects that have been developed for Hadoop.

HPCC Systems is quick to point out, however, that its platform all utilizes a single language, Enterprise Control Language, whereas Hadoop itself uses MapReduce, but projects such as HBase and Hive have their own languages.