On-Premise Data and Cloud Computing Integration Patterns
November 21, 2011Grazed from Sys Con Media. Author: Srinivasan Sundara Rajan.
We are seeing increased commitment from cloud providers from a security perspective as more and more providers are certifying their cloud offerings for standard security regulations like:
- ISO/IEC 27001:2005 standard
- SAS 70 Standard
- Safe Harbor Certification
- HIPAA Compliance
There should be an increased confidence for cloud consumers to utilize various SaaS offerings and BpaaS offerings in the cloud…
However, there are a few customers who are also interested in utilizing the COMPUTE (i.e., processing) power that is massively scalable on cloud while keeping their STORAGE (i.e., data) on the data center (i.e., on-premise).
This article tries to analyze those data integration patterns for enterprises to keep their data inside the data center and utilize the processing power of cloud. Again this is just another option for enterprises. Enterprises should be confident in the security initiatives by major vendors and try to move nonstrategic applications fully into cloud to utilize its best powers.
Cloud Cache Pattern
There are certain workloads from specific providers where the read-only data is provided to clients and clients make a lot of repeated queries and perform analytics on them. Some of the examples of such services are listed below:
- National Change of Address database provided by the United States Postal Service. If a business, family, or individual submits a change-of-address to USPS, they add their change of address information to the vast NCOA database.
- Weather data feeds like climatic conditions, forecasts etc.
While this list can be extended depending on industry needs, the underlying data pattern is about Cloud Resources acting as a read-only cache for the gold source data that exists on on-premise databases. The performance of the underlying analytics would be much faster since the data exists only in memory and the same can be refreshed from on-premise databases.
Recently Amazon AWS launched a service called, Amazon Elastic Cache, which provides an in-memory cache for the applications hosted on Amazon Cloud. There are multiple configurations available including High Memory cache nodes to store large amounts of data.
There are also a popular frameworks like memcached available, that provide a simple network-based key-value store. Memcache is most commonly used as a cache of results from recent queries to a database, which significantly reduces the load on the database, reducing the need to scale up the back-end system. Memcache is typically organized as a write-through cache – so updates are written into both memcache and the back-end database, and query logic first checks memcache to see if the result is already available.
The Windows Azure AppFabric Caching service provides the necessary building blocks to simplify these challenges without having to learn about deploying and managing another tier in your application architecture. In a nutshell, the caching service is the elastic memory that your application needs to increase its performance and throughput by offloading the pressure from the data tier and the distributed state so that your application is able to easily scale out the compute tier.
While this pattern is very much viable, we may want to analyze the APIs and other third-party provider options for connecting the on-premise databases and load them into the CACHE and keep them validated. The following diagram provides a contextual view of a CLOUD CACHE working on an on-premise database.

Cloud Intermediary Pattern
In this pattern the cloud platform will act as an intermediary for data transfer between on-premise databases to other on-premise databases or to third parties. This is ideally implemented through PaaS (Platform as a Service) service model, especially on business-to-business scenarios.
For example, Windows Azure Appfabric Service Bus provides secure messaging and connectivity capabilities that enable building distributed and loosely coupled applications in the cloud, as well as hybrid applications across both on-premises and the cloud. It enables using various communication and messaging protocols and patterns, and saves the developer from worrying about delivery assurance, reliable messaging and scale. The following diagram, courtesy of Microsoft, will provide a high-level view of how the Appfabric Service Bus works.
Refer to vendor documentation http://www.microsoft.com/windowsazure/features/servicebus/ for further information.

Informatica Cloud also provides a SaaS platform to support the Cloud Intermediary pattern. It uses the following characteristics:
- Informatica Cloud Services metadata repository is hosted and managed by Informatica in a designated third-party, secure data center that is independently audited to be SAS-70 Type II compliant.
- Data is not stored on Informatica’s or its data center’s servers
- Data does not leave your system or cross your firewall until your users order it to move bi-directionally between approved SaaS providers
- The following diagram courtesy of the Vendor provides the view of how Informatica achieves the Cloud Intermediary Pattern
- In fact this is quite a robust solution with integration between several popular applications like SAP, Oracle eBusiness Suite, Microsoft Dynamics, SalesForce and others.
- Refer the vendor documentation, http://www.informaticacloud.com/products/architecture-and-security.html for further information.

Cloud MPP (Massively Parallel Processing) Pattern
One of the highest rated benefits of Cloud is its ability to bring high performance computing to any consumer without the need for them to buy a super computer or Cray or some such high-cost machines. In that context this pattern enables the enterprises to use the cloud as a temporary store to crunch massive amounts of data and arrive at the results and delete the source and intermediary files and utilize the results of the analytics for further purposes.
Apache Hadoop is a scalable, fault-tolerant system for data storage and processing. Hadoop is economical and reliable, which makes it perfect to run data-intensive applications on commodity hardware.
Hadoop has many use cases that are suited for large enterprises to process their petabytes of data and derive meaningful information out of them. Some of the areas like fraud detection and security analysis, ad targeting, analysis forecasting are popular use cases of Hadoop.
With Cloudera Enterprise, you can leverage your existing team’s experience and Cloudera’s expertise to operationalize your Hadoop system with ease. Cloudera Enterprise is a subscription service, comprised of Cloudera Support and a portfolio of software.
Cloud Replication Pattern
This is a variant where the enterprise databases can be replicated in near real time to cloud and the data can still be owned by the enterprise servers, but cloud will have a back-up copy that can be consumed by cloud applications.
Microsoft SQL Azure Data Sync Community Technology Preview (CTP) is a cloud-based data synchronization service. It provides unidirectional and bidirectional data sync, allowing data to be easily shared between SQL Azure and on-premises SQL Server databases as well as between multiple SQL Azure databases within the same or different data centers. It enables multiple synchronization scenarios spanning both cloud and on-premises databases. Now it is easy to enable one-way as well as bi-directional data movement across SQL Azure and on-premises SQL Server databases.
Informatica Cloud Data Loader Service, performs a similar option to load an on-premise database to a cloud database.
Summary
As mentioned in the beginning of the article, cloud storage and proximity and the location of the data center continues to be a major concern for enterprises. However, almost all the cloud providers are addressing this concern with more and more security compliance in their offering.
However, the above mentioned options can be considered by enterprises as alternatives for not fully relying on cloud-based storage and continue to host the data on the data centers, and utilize the power of cloud for processing the data rather than storing the data.


