Infrastructure Key to Google’s No-Downtime Guarantee

January 15, 2011 Off By David
Grazed from GigaOM.  Author: Derrick Harris.

Google blogged this morning about a new no-planned-downtime for Google Apps, a promise it’s able to make because of its globally distributed infrastructure estimated at more than 1 million servers. Unlike many SaaS infrastructures, and certainly many on-premise application environments, Google’s expansive infrastructure gives it multiple options for migrating workloads during planned downtime on a given set of servers or a specific data center.

Google was inspired to make the change after a year in which its flagship application, Gmail, experienced overall availability of 99.984 percent. As blog author Matthew Glotzbach points out, that translates to an average of 7 minutes of downtime per month, which is far better than most on-premise email systems, including Microsoft Exchange. However, the post doesn’t include comparisons to competitive hosted email options, such as Microsoft BPOS or IBM LotusLive. One potentially big competitor, Microsoft Office 365, is still in beta, so an accurate uptime comparison can’t be made.

Google hasn’t been too forthcoming about its processes migrating workloads from place to place, but this 2009 interview with SVP of Operations Urs Holzle does shed some light on how the company utilizes its global footprint to route around both server-level and data-center-level issues. If the company is able to handle unforeseen outages fairly smoothly, it stands to reason that it can route around planned downtime without issue.

It will be interesting to see if Microsoft — Google’s primary rival in the cloud services space — matches Google’s promise of no planned downtime. Microsoft, too, has a large server footprint distributed across the world. It’s arguable that Microsoft already has the better SLA anyhow, as Microsoft is promising a “financially backed” 99.9 percent SLA for Office 365, whereas Google compensates for below-SLA service levels with free days tacked onto the end of the service term.