Best Options for Protecting SQL Server in the AWS Elastic Compute CloudJuly 15, 2019
Written by Dave Bermingham, Technical Evangelist at SIOS Technology
Amazon Web Services leads the industry with over half of the Windows Server instances deployed in the public cloud, according to IDC. As the migration from purely private to public and hybrid cloud arrangements continues, confidence in the cloud has increased substantially. So much so that system and database administrators are now migrating mission-critical SQL Server database applications to AWS on both Windows Server and Linux. But challenges remain to providing adequate protection for the data and assuring rapid recovery from failures.
This article provides some information administrators will need to make prudent choices for protecting SQL Server databases running in the AWS cloud, beginning with an understanding of the various high availability (HA) and disaster recovery (DR) options available.
HA/DR Options Available within and for AWS
Understanding the options requires recognizing that the AWS Elastic Compute Cloud (EC2) offers no provisions for HA and/or DR at the database and application levels. Period. There are, however, services and capabilities that have important roles to play in providing HA and DR protections for SQL Server.
The fundamental AWS building block for high availability is the Availability Zone. AWS makes multiple AZs available in every region, and these are interconnected via low latency, high throughput networks to enable synchronous data replication. Being able to replicate the database synchronously assures that the standby instance is always “hot” and ready to take over immediately should a failure occur in active instance.
When using multiple Availability Zones, the AWS Service Level Agreement guarantees an uptime of four-nine’s or 99.99 percent. But the SLA takes a rather narrow view of what constitutes uptime. Explicitly excluded is any downtime caused by “factors outside of our reasonable control” (e.g. natural disasters), “actions or inactions of you or any third party” (i.e. human error), and “third party equipment, software or other technology” (e.g. SQL Server). In effect, AWS only guarantees “dial tone” or, more specifically, that at least one EC2 instance will have external connectivity. In other words: any failures in a database or any application(s) accessing the data are not counted—or even detected for that matter.
So while it is advantageous to leverage AWS Availability Zones, additional provisions are needed to ensure adequate protection for SQL Server, which offers two of its own options: Always On Failover Cluster Instances and Always On Availability Groups. FCIs afford two major advantages: inclusion in the Standard Edition; and protection for the entire SQL Server instance, including system databases. A major disadvantage is the need for cluster-aware shared storage, which is not available in the AWS cloud.
Always On Availability Groups replaced database mirroring in SQL Server 2012, and this feature is also included in SQL Server 2017 for Linux. This is SQL Server’s more robust HA/DR offering, capable of delivering rapid failovers with no data loss. But this option lacks protection for the entire SQL Server instance, and requires licensing the more expensive Enterprise Edition for Windows Server, making it cost-prohibitive for many applications.
A notable disadvantage with application-specific options like Always On Availability Groups is the need for administrators to implement other HA and/or DR solutions for all non-SQL Server applications. Having multiple solutions inevitably increases complexity and costs, leading many administrators to choose to use application-agnostic third-party failover clustering solutions for Windows Server and Linux that are purpose-built for HA and DR. These solutions are implemented entirely in software to enable creating failover clusters to be created in the cloud without any need for shared storage, and with automatic failover to assure high availability at the database and application levels.
The diagram shows a popular AWS configuration that provides both HA and DR protections in a Virtual Private Cloud (VPC) that spreads three SQL Server instances across multiple Availability Zones and Regions. For the two-node HA cluster in Region A, data replication is synchronous and failovers can occur automatically. The third instance in Region B uses asynchronous data replication and a manual recover process to protect against widespread disasters. Note how this configuration also overcomes yet another limitation—this one in the Standard Edition of SQL Server—of a maximum of two FCI nodes in a failover cluster.
This common configuration of a SANless failover cluster consists of a two-node HA cluster spanning two Availability Zones, with DR protection provided by a third instance deployed in a separate Region.
It is also possible to have two- and three-node SANless failover clusters in hybrid cloud configurations for HA and/or DR purposes. One such three-node configuration is a two-node HA cluster located in an enterprise datacenter with asynchronous data replication to the AWS cloud for DR protection—or vice versa.
Confidence in the AWS Cloud
With 66 Availability Zones spread across 21 Regions (as of this writing), the secure AWS Global Infrastructure affords enormous opportunity to provide carrier-class protection for SQL Server databases by configuring SANless failover clusters with multiple, geographically-dispersed redundancies. With a purpose-built solution, such carrier-class high availability need not mean paying a carrier-like high cost, however. Because failover clustering software makes effective and efficient use of EC2’s compute, storage and network resources, these solutions help make HA and DR protections more affordable for more applications than ever before.
About the Author
David Bermingham is Technical Evangelist at SIOS Technology. He is recognized within the technology community as a high-availability expert and has been honored to be elected a Microsoft MVP for the past 8 years: 6 years as a Cluster MVP and 2 years as a Cloud and Datacenter Management MVP. David holds numerous technical certifications and has more than thirty years of IT experience, including in finance, healthcare and education.