Murphy’s Law of Disaster Recovery Strikes RetailerAugust 16, 2010
In July, American Eagle Outfitters — which has a market capitalization of $2.52 billion — had the sort of apocalyptic outage that many companies fear. Its e-commerce site was down for eight days. Both a primary and a secondary storage system failed, backups wouldn’t restore properly, and a disaster recovery site wasn’t provisioned properly.
These are the sorts of nightmare scenarios that drive many companies to turn to outsourcers, data center specialists that have the resources to protect against this sort of worst-case scenario.
Ironically, that is just what American Eagle had done, as my friend and former colleague Evan Schuman reported on Storefront Backtalk. The e-commerce site was hosted in an IBM Corp. (NYSE: IBM) data center, based on software from Oracle Corp. (Nasdaq: ORCL).
Keeping a database like the one that powers American Eagle’s e-commerce and mobile commerce sites running — with over 400 gigabytes of data — is a non-trivial task, to put it lightly. In a Webinar I recently did with Peter Eicher, product marketing manager for backup software provider Syncsort Inc. , he said that companies globally spend $5.9 billion on data protection, and yet they still don’t feel confident in their backup systems.
That’s for good reason: Older backup systems can’t keep up with the growth in data that has to be backed up. Traditional tape backups can’t finish a full nightly backup off-hours. Even government agencies that have a legislative and regulatory imperative to have “continuity of operations” find their systems and plans fall short of what’s really needed with the huge growth in data they depend on for operations.
Even more reason, then, to call in the pros to help. But apparently IBM’s hosting teams felt that having a single secondary system ready was enough protection — I mean, how often do two storage systems go down at the same time, right? And they were confident enough in the resiliency of their data center to, as a source told Evan Schuman, allow the recovery site to “fall off the priority list in the past few months,” without completing the recovery site installation of Oracle Data Guard software.
This kind of oversight is typical with roll-your-own data center operations. Operational requirements get budget before data protection all the time, because backup systems don’t inherently have income dollars associated with them — until you have an eight-day outage that costs you millions in sales.
Evan and I talked about the outage. And we agreed that American Eagle had done everything right, strategically. And, hopefully, it had a good set of quality-of-service clauses in its hosting agreement with IBM, so it can recoup some of its business losses.
The one thing that American Eagle didn’t do, perhaps, was ensure that IBM followed through on implementing and testing disaster recovery plans. If you’re outsourcing any part of your IT operations, you should make sure that you include some sort of mandatory disaster recovery testing as part of your service-level agreement, to ensure you get a chance to see how well your provider can respond to a catastrophic failure.
Sure, the odds are a million to one against the sort of failure that American Eagle had. But when you roll the dice a million times a day, those odds don’t seem quite as long.