Cloud Computing: Multi-Tenant Data Strategies for Windows Azure – Part 1

February 13, 2012 Off By David
Object Storage
Grazed from Business Cloud9.  Author: Steve Morgan.

Cloud platforms are inherently multi-tenanted. When you sign up for the services of a Cloud provider, you’re a tenant of theirs. If you’re ready to embrace Cloud Computing, being a tenant in a multi-tenant world is a key feature that you need to accept. 

The degree of isolation demanded by your customers may be driven by cost, legislation, security requirements or degrees of paranoia. As a SaaS vendor, it is important to understand these concerns and to have a well thought out solution for the particular sector that your application targets.
 
Considerations
 
Choosing the ideal tenancy model for your application involves a considerable number of factors to be taken into account. Figure 1 is a visual aid that I use to help explain the process…

 
Figure 1 – Tenancy Considerations
 
As you can see, there’s quite a lot to consider. Strategy, volume and value are high-level characteristics that I’ve not got so much to say about specifically. If you have many, low value (in terms of potential revenue) tenants who are happy to share everything, a multi-tenant approach is a good starting point. If you have just a few, high-value tenants who insist on sharing nothing, multi-tenancy is unlikely to lead to the ideal solution. 
 
In this article, I shall concentrate on one particular aspect: data. Most online applications are all about data and most of our customers are especially concerned about how their data is handled. I intend to talk about identity, business logic, presentation and networks in future articles.
 
Data
Windows Azure offers a range of data storage capabilities, including Windows Azure Storage for structured or unstructured, non-relational data and SQL Azure for relational data. If you’re considering moving an existing application to Windows Azure, I’d hazard a guess that you’re less likely to be hosting your data in anything like Windows Azure Storage. Chances are, it’s a relational database and if you’re going to have an easy time of it, hopefully it’s Microsoft’s SQL Server. I’m going to consider data isolation in SQL Azure to host relational data within a multi-tenant environment.
 
Row Isolation
In the ‘shared everything’ model, the highest level of consolidation (and thus where most value can typically be extracted from the platform) can be achieved by storing all your tenants’ data in a single database. Each tenant is allocated a unique ID (I have a whole separate topic on how to identify the tenant) and a tenant’s data is keyed on that ID. The defining characteristic here is that data for all tenants is interspersed. This is the aspect that is most likely to cause concern for customers who are worried about privacy.
 
Another consequence of data for multiple tenants being stored in a single database is that everybody’s data gets backed-up and restored together. That’s not necessarily so bad for the backup itself (unless one of your customers wants an off-site copy), but it does make it much more difficult to restore the data for an individual customer. 
 
You should ensure that any service level agreements recognise that backups in this case are specifically for the purposes of service recovery. If a customer asks you to roll back their data, there’s likely to be significant effort and cost involved.
 
Schema versioning, going hand-in-hand with code versioning, can also be more complicated using this model. Any schema changes are going to affect all of your customers and that can make it difficult to progress enhancements through your application.
 
On the flip-side, row isolation is a straightforward pattern for your developers to understand and implement; the implications of data being scoped to a tenant are made explicit by making the tenant ID a mandatory column for every entity.
 
You may have to consider that the maximum size of a SQL Azure database is constrained. Currently, a single SQL Azure database can be up to 150GB. If it looks as though your database is going to exceed this maximum size, you’ll need to split it. Migrating data for a subset of your tenants could be complex.

Schema Isolation
Schema level isolation involves using a single database to store data for each of your tenants, but storing them in separate tables. Tables (and views, stored-procedures, etc.) are scoped to a tenant-specific schema which is determined at runtime from the tenant context of the user. 
 
One advantage of this approach is that it is implemented at the database level and by-and-large is transparent to the application. The application is responsible for using a tenant-specific database connection string and care must be taken to ensure that any references to specific schema are removed (which may involve hand editing your Entity model, if you’re using Entity Framework, for example). Database access will then take place within the context of the default schema defined for the tenant. 
 
It is necessary to ensure that your application code opens and closes database connections quickly, as it will be switching tenant contexts (and thus, connection strings) between requests. However, that’s a standard requirement for Windows Azure applications, but one worth restating; SQL Azure will throttle your application if you leave connections open.
 
However, there are limitations with this pattern. Firstly, there are practical scaling limitations with this approach; the total number of tables allowed in a single SQL Azure database is 149. If your application requires 10 tables for each tenant, you’re only going to be able to support 14 customers with a single database instance.
 
Secondly, there is significant scope for really awkward bugs to creep in. If code runs under a user context with db_owner rights, rather than a tenant- and schema-specific one, it’s relatively easy for data corruption to occur.
 
The same backup and restore issues that afflict the row isolation pattern apply equally to applications that use schema isolation. Restoring data for the entire application is easy; restoring data for a single customer is complex and time-consuming.
 
You’re limited to the same database size constraints as the row isolation pattern; currently 150GB. 
 
If you have a relatively small number of tenants, this can be a good choice of pattern. But it won’t scale well unless you combine it with one or more of the other patterns.
In the second part of this blog Iwe’ll explore separate databases, separate servers and how, by using a hybrid pattern, you can provide a tiered service offering for clients that are willing to share a multi-tenant system.