Dedupe and the Cloud: Is There a Problem Here?

June 14, 2011 Off By David
Object Storage
Grazed from IT Business Edge.  Author: Arthur Cole.

One of the most effective tools for managing increased data loads to have emerged over the past 10 years is deduplication. Figures vary, but it’s not hard to imagine redundant data rates at some enterprises hitting the 50 percent mark, particularly in email systems.

But just because dedupe is good technology, does it follow that it is also good policy? Increasingly, flags are being waved that dedupe can produce a number of unintended consequences, particularly of the regulatory and legal variety.

Right now, the biggest challenge for dedupe is the cloud. Once you start to blur the distinction between what the client enterprise provides for itself and what the host offers as a service, you start to cross some very fine lines between secure and vulnerable.

For instance, we have cloud providers like Zetta, which recently began to offer new dedupe capabilities on its Zetta Data Protect for Linux platform. The service is issued on the system’s client-side agent, known as the ZettaMirror, which provides transport deduplication that the company says cuts redundancy by 99 percent and vastly improves cloud-based backup and replication performance.

As long as dedupe services are too limited to individual customers’ data, the cloud shouldn’t be any more or less secure than traditional enterprise infrastructure. Cross that line, and you could have the feds on your tail, as file-sharing service Dropbox found out last month. The company was the target of an FTC complaint this week, alleging that it was deduping files and then linking to existing copies residing in other customers’ data sets. The company has refuted the claims, but it now faces a regulatory hassle nonetheless.

If the feds are going to start nosing around in other people’s dedupe services, it might want to keep tabs on their own use of the technology. According to Carahsoft Technology, a government provider of dedupe and other services, more than 90 percent of government IT officials have ranked the technology high on the list of priorities for the coming year, with more than 60 percent already in the planning stage. The survey indicated that the government data load in high volume environments is expanding at close to 30 percent a year. Couple that with Washington’s stated goal of converting much of its existing IT infrastructure to the cloud and the potential exists for some government agencies to engage in the very same activity that it scolds private industry for.

Despite any perceived risks, however, it is very likely that dedupe will simply be too crucial to ignore in the very near term. ExaGrid, for example, has pushed the capacity of its EX1300E backup appliance to 13 Terabytes, a 30 percent improvement over existing models. Ten units in a grid architecture, then, provides a whopping 130 TB while cutting power and cooling requirements in half.

Greater capacity with an equal or smaller footprint: That’s been the guiding force behind nearly all enterprise technology development over the past five years. If dedupe does start to raise eyebrows in legal and regulatory circles, let’s hope the problem can be corrected so we don’t lose one of the most valuable tools in the drive for data center efficiency.