Cloud Computing: Essential Network Considerations when Building a Private Cloud

October 25, 2011 Off By David

Grazed from Sys Con Media.  Author: Jim Morin.

Today’s typical broadband virtual private network (VPN) connections to cloud applications will prove insufficient for tomorrow’s cloud infrastructure services.

The reason is that infrastructure workloads demand more from the network than software services…

While broadband network services fit the user-to-machine cloud model for Software as a Service (SaaS) applications, the network needs to be upgraded in three key areas for machine-to-machine, cloud infrastructure services (IaaS):

  • Capacity and scalability
  • Security and encryption
  • Bandwidth on-demand

Let’s take a look at why your network will need to incorporate each of these emerging requirements for IaaS.

Capacity and Scalability
The first requirement is most obvious, as the workload size under infrastructure services is orders of magnitude larger than the amount of network traffic generated by software services. Cloud workloads start with virtual machines (VM) and storage mobility.

As business-critical server applications like email, Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) move to the cloud, they are typically deployed as VMs rather than on dedicated physical servers. Today, organizations can take advantage of the advanced processing features on an average server to house as many as 15 VMs per physical server – each with their own operating system and application.

This logical partitioning can increase the server’s efficiency from the standard 15-30 percent range to upwards of 90 percent. Once the server is virtual, workload balancing to alleviate hot spots and avoid application performance degradation can now be done electronically by moving VMs over the network to alternate servers. Ideally, this workload balancing is done while the application is "live" for uninterrupted availability and elimination of complex server restarting.

In the cloud, virtualized servers can be called a VM "instance." Each VM "instance" is contracted from the cloud provider with a certain amount of CPU, memory and storage resources, and can range considerably in size. Amazon Web Services (AWS) instances vary from Small 1.7 GB memory and 160 GB of storage to Quad Extra Large 68 GB memory and 1,690 GB storage. These numbers could soon go higher as VMware recently announced support for monster-sized VMs with up to 1 Tbyte size of memory.

In addition to server instances, many cloud firms are now providing cloud-based storage services ranging from corporate services like Amazon’s Simple Storage Service (S3) to consumer-oriented, easy-to-use cloud storage provided by Dropbox. Let’s not forget Apple’s new iCloud service, which promises five free GBytes for storing not only music and photos, but also books, videos and even business-oriented information like applications, documents, contacts, calendar and email. Clearly, storage has proven to be an early "killer app" for the cloud, and it’s a market that Taneja Group estimates to be $4B, and will grow to $14B by 2014.

The need to offer a network with larger capacity that can easily scale becomes apparent as the industry moves beyond using cloud storage services for modest bandwidth-intensive applications to more demanding enterprise-class needs.

Consumer Class Cloud
Use Cases

Enterprise Class Cloud
Use Cases

Business files

Disaster Recovery

Music

VM Workload migration

Photo

Storage virtualization

Video

Virtulaized data centers

The need to offer a secure, reliable, high-performance connection to the cloud becomes much more critical to enterprise success. The reason for this is simple – enterprise cloud customers only have so much time in the day to move their mission-critical data, and therefore require the right connection and the ability to tune that connection based on their specific needs.

As the cloud business evolves from Software Services running cloud-based applications that transfer small amounts of cloud storage to Infrastructure Services for more mission-critical, larger file size requirements, the standard Internet connection will no longer suffice. Instead, we need a different network architecture approach. IaaS applications like storage, and new use cases like VM mobility, require technology with better bandwidth capacity and scalability to get their workflow accomplished in a reasonable amount of time.

Today’s cloud IaaS users are not coping very well with existing network restrictions, which may have them sending their information via truck instead of electronically. And truck transfers introduce security concerns as well as obviously long latency values.

Let’s see why these typical VM and storage workloads impact the network.

The chart above maps VM and storage workload sizes against different bandwidth deployments, to show the time in days to accomplish the migration.

The .52 TByte case on the bottom reflects a "small" instance of VMs and storage, 10 GB of memory and 2 GB of storage, and a use case to move 10 instances. The 25 TByte use case on the top of the chart scales up in this example to 500 VMs of larger VM instances.

As the figure shows, even small jobs like an occasional VM move to change server vendor platforms, for example, may be fairly small-sized, but cannot be accomplished within a day on most corporate networks. These relatively small infrastructure jobs – moving VMs and associated storage consisting of .52 TBytes – would take multiple eight-hour days using typical Internet speeds, or more than one workday on a typical corporate 40 Mbps network. These workload times are "best case" as retransmissions and network delays due to packet loss and latency often seen on shared Internet links would greatly expand the time for VM and storage workload transfers.

Unplanned VM moves, such as an emergency workload balancing when a critical application hits a server capacity threshold, may require immediate, large doses of bandwidth to resolve the crisis in a timely manner. Often we have predictable peak workload times such as during a holiday season where applications may be moved to the cloud to take advantage of a very scalable server environment. We can see that the model of typical job sizes for these workloads of 1.25 TBytes and 10.5 TBytes require around 1 Gbps links to complete in a day.

Finally, the bulk workload use case example for moving critical applications live during a data center change could involve many Terabytes of data, and with a relatively short time frame for completion. These larger jobs like a 25 TByte bulk VM migration would take multiple days even with a 1 Gbps network connection, further illustrating the need for more scalability and capacity in the cloud network.

Next, we’ll see why network connections to fulfill the promise of cloud-based enterprise-class infrastructure services will also need to be secure and on-demand.

Security and Encryption
In addition to more flexible bandwidth, cloud services need to address a wide array of security concerns, from storage security for data at rest to network service security for data in flight. Enterprises considering cloud deployments have many other concerns related to security such as data recovery, reliability, physical location, network access, performance and network latency.

Public IP networks tend to offer few guarantees for service level uptime, quality of service and latency. For example, Amazon assumes 80 percent network utilization for data transfers in their Import/Export calculations, which we can attribute to typical congestion, retransmission and latency characteristics of shared network connectivity. These "best effort" networks force enterprises to compromise, and settle for less than ideal levels of packet loss and network latency that greatly affects the performance levels of infrastructure applications. In addition, enterprise users of public IP networks for critical infrastructure processes may be at risk to a denial of service attack, which could have very severe business availability implications.

With modern, carrier-grade Ethernet and Packet Optical networking architectures, enterprises can comfortably drive as much as 95 percent network utilization for increased throughput, along with better access performance, scalability, availability and lower network latency. A predictable and secure network is essential for enterprise mission-critical infrastructure networking applications.

Many organizations also face regulatory compliance and intellectual property protection requirements for their data networking. For example, network-level encryption services are increasingly important in health care, government, financial services and other industries dependent on their ability to protect their sensitive data.

Encryption services address data protection requirements by making the data in flight unintelligible in case the connection is compromised. Today’s encryption services offer line-speed encryption in a compact size, and feature the added benefit of providing complete end-to-end management of encrypted services where key management is separated from network management. This separation is a critical element in allowing service providers to offer encryption services that still enable enterprises to control their own encryption keys.

Encryption of data in-flight between the organization and a cloud provider ensures secure transfer while maintaining network performance, latency and bandwidth level.

Bandwidth On-Demand
While network services need to be scalable and secure, they also need to be affordable.

We’ve discussed the need for network scalability and capacity for infrastructure services in the first section. Under local area network (LAN) conditions, VM migrations are usually not a problem. When moving across metro or long distances, however, we need dynamic network scalability to provide the throughput and other characteristics necessary for transferring large VM and storage workloads. The deployment of higher capacity bandwidth circuits is possible, but the industry standard 3- or 5-year contracts for bandwidth capacity are not economically viable for variable workload demands like VM migrations typically experienced with cloud infrastructure services.

We need to do some math to see why the connection speeds used for cloud-based user-to-machine traffic need to be at fundamentally different levels when applied to machine-to-machine traffic for server and storage services.

At Amazon Web Services, the company provides a simple chart to determine how long it will take to transfer data to the Amazon cloud, taking into account the volume of data that needs to be sent and available bandwidth speeds, assuming standard Internet connections from T1 (1.54 Mbps) though 1 GbE. When the time to transfer exceeds their recommended threshold value, Amazon suggests physically shipping data on storage devices via its Amazon Web Services Import/Export service.

According to Amazon’s chart, it would take 82 days to transfer 1 TByte of information using a T1 network service, so that means that anything above 100 GBytes should be physically shipped instead of electronically transferred. (To put this in perspective, 100 GBytes is about the size of a 2004-era, laptop PC disk, so that’s not a lot of information by today’s standards.) This means that a T1 service is not enough bandwidth for many workload transfers.

On the other end of the scale, Amazon estimates that sending 1TByte over a 1 GbE network would take less than one day (similar to the calculations shown in the chart discussed in the first section). For transfers exceeding 60 TBytes over a 1 GbE network, Amazon, again, recommends using its import/export physical transport service. Even with a 1 GbE network, there are still some serious limitations with cloud data transfer. (Keep in mind that multiple-day electronic data transfers dramatically increase the probability of something going wrong – which would extend the job even longer.)

Providing "on-demand" bandwidth to accomplish this workload makes it more affordable for cloud use cases like workload mobility, availability and collaboration. For example, a cloud service backbone could scale to a 10 Gbps network and enable more than 30 TBytes to be transferred in a day, easily addressing the bulk VM migration use case, and then scale down once the migration is over.

Amazon’s new Direct Connect service is a response to this need and could be a forerunner to more cloud service providers moving to new cloud networking architectures that respond to the growing amount – and importance – of the information in the cloud. Direct Connect provides a direct 1 or 10 Gbps connection to an Amazon cloud data center billed on an hourly basis. For Amazon cloud users, this new network service could provide the scalability and extra capacity to move large workloads back and forth from the cloud while paying only for time used on the network.

Dynamic networking can also be implemented with intelligent edge devices that can change an application’s connection and allocation to existing bandwidth. A steady state configuration may have equal bandwidth allocation to each connected application. When a bandwidth-hungry workload is needed over a connection, such as a VM migration, the edge device can dynamically reallocate bandwidth connection assignments so the VM migration gets the bandwidth it needs to accomplish the job in a timely manner.

Carrier networks have the potential to dramatically increase performance by adding incremental new bandwidth end-to-end, charging for the premium bandwidth only when used. Then, after the workload task is accomplished, the premium bandwidth could be automatically reduced to the former steady state level.

Many service providers are looking to these new designs that can accommodate the ebb and flow of IT workload between enterprise and cloud data centers.

Summary
Cloud IaaS services offer IT management many options that increase their agility and decrease the time to deploy new solutions. Today’s private enterprise networks are already prepared to address cloud application access, but as noted above, this all changes in a cloud infrastructure services model.

Virtualization of servers breaks the physical boundaries of workload balancing. The desire for policy-driven and automated workload balancing between private and cloud data centers requires a more scalable, secure and on-demand backbone network. Now, a more flexible, secure and dynamic network can extend the virtual data center, breaking down the data center walls by connecting enterprise data centers and cloud resources.

This new enterprise IT architecture – the IT architecture of the future – will feature virtualized data center capacity enabled with a carrier class, on-demand network backbone designed for cloud infrastructure services.