Containers Explained

March 2, 2015 Off By David
Article written by Leo Reiter
 
It’s hard to spend any time in cloud computing today without hearing or reading about containers. Let’s understand what they are, and why they are disrupting the landscape so much…
 
Containers, the “Elevator Pitch”

Containers take individual applications and their dependencies (libraries, configuration files, etc.), and package them up for easy deployment on any virtual or physical infrastructure. Unlike virtual machines they do not capture full operating system “images” (complete with device drivers, boot loaders, etc.). This makes them much lighter weight, faster to launch, and easier to move, without giving up many of the benefits of virtualization. Simply stated, containers are much more agile and scalable for today’s most demanding applications.

Like all proper cloud workloads, containers are designed to be “stateless” – they connect to whatever data they need to process once deployed. This promotes agility and fault tolerance, unlike traditional virtual machines that must be managed much like ordinary servers.
 
As we measure cloud workloads such as analytics algorithms in thousands (or more), rather than dozens of nodes, virtual machines carry way too much overhead to scale effectively. Containers are lighter weight, and much more portable than virtual machines.
 
Brief History of Containers

Most modern container technologies, certainly those running on Linux (the predominant operating system for cloud workloads), owe their existence to the concept of a “chroot jail”. This technology first appearing in Unix operating systems in the late 1970s and early 1980s. Since Linux is a modern Unix-like operating system, this concept carried forward to today’s environments.
 
A chroot jail effectively runs a process with a new “root” directory. The “root” directory of a Unix-like system is the top level of the disk or block device where the operating system is installed. It contains all the programs, libraries, and configuration files needed to run any application on the system. With the chroot construct, it’s possible to point to a specific folder as the new “root” directory when running a specific process. That process in turn has no visibility above that folder, hence the term “jail”. This mechanism was initially popular for security purposes – a simple example was giving a user access to upload files to a server. You did not want that user to have visibility into the entire file system, just his or her folder(s). So often times the file upload service (FTP, SFTP, etc.) would switch to a chroot jail once a particular user authenticated. Because Unix-like operating systems already run with “protected memory”, there was no need for further protection between processes or users. It’s never possible for one user on the system to disrupt work another user is doing, unless that user is the system administrator. Even then, the memory used by particular applications is never visible to other users or applications unless explicitly shared. Combined with chroot jails, protected memory and user account isolation add up to a very strong security environment for running applications on Unix-like operating systems.
 
With all this said, virtual machines provide much higher levels of isolation and resource controls than chroot jails running on Unix-like systems. When running workloads on multi-tenant environments, this level of security is crucial.
 
In 2001, a company called SWSoft released a technology for Linux called Virtuozzo, which later became the open source OpenVZ technology. This software performs what is known as “operating system virtualization”, instead of machine virtualization common with hypervisor tools such as VMware ESX and KVM. Many service providers went on to host “virtual private servers” based on Virtuozzo and later OpenVZ, as an effective way to provide secure Linux environments running under a larger shared host Linux operating system kernel. Unfortunately, for both technical and political reasons, OpenVZ did not gain “upstream” acceptance and therefore is not part of standard Linux distributions.
 
Today, the technology of choice is called Linux Containers, or LXC for short. You’ve probably heard of Docker as well, which is based on LXC (more on Docker in a moment). LXC leverages features in the kernel called cgroups, which allow machine-type constraints to be applied to individual processes. In short, it’s possible to isolate networks, control CPU usage, and limit memory and other resources much like a hypervisor does for entire virtual machines. But the difference is that cgroups give you granularity at a process level, so there’s no need to deploy a hypervisor and all its associated overhead (both above and within the virtual machine environment). When combined with the good old chroot jail, cgroups allow for operating system virtualization without the need to install 3rd party software. And because this is upstream kernel functionality, it’s bundled with any modern Linux distribution from your favorite vendor (Red Hat, Canonical, etc.). With Linux Containers, it’s possible to take just a single application plus the minimum set of dependencies that it needs, and run it as a completely isolated environment. Unlike a virtual machine, there’s no need to “boot” it or provide drivers, since it just leverages the host kernel. You can even control what physical devices get passed into the container – again, without the need for special drivers. As long as the application runs on some flavor of Linux, it can run in a Linux Container. Given that almost all major cloud computing and web services stacks are Linux-based, this constraint is rarely any sort of limitation.
 
The Next Step for Containers

Here’s where Docker comes back into the picture. Like with any other software, a containerized application needs to be packaged, moved around, and deployed before you can launch it. Docker likens its capabilities to the intermodal shipping container, which is a way to move goods around using various forms of transportation, such as ships, trains, and trucks. As long as whatever you’re moving “to” is some form of Linux, physical or virtualized, Docker can help get your Linux Container-based application there. This is an important element to deploying software – entire industries used to specialize in manufacturing boxes of software and shipping them to retailers so that you (the end user) could procure and install applications. In modern cloud computing, technologies such as Docker help fill this role instead. Software is still useless if it can’t be deployed!
 
But just as Docker is evolving higher “up the stack”, into orchestration, a number of new technologies and services are already here to help manage containers at true global cloud scale.
 
Putting Containers in Workflows

Most large scale applications combine many components into unified workflows. For example, if you want to perform prescriptive analytics on various data sources, you’d rather not concern yourself with the mundane details of deploying tools such as Hadoop on hundreds or even thousands of nodes before you can even think about doing anything productive. Workflow orchestration involves solving a higher level problem, and letting the application stacks figure out how to deploy themselves to get it done. This way, a data scientist can interact with technologies without depending on support from a team of IT architects, for example. So it’s not enough to define a way to deploy containers, now we have to orchestrate very large scale deployments based on what end users want to do at any given time – on demand, no less.
 
Nimbix’s JARVICE technology has been at the forefront of workflow-oriented containerized applications orchestration since 2013. It’s a cloud computing platform that organizes components into end to end workflows with just a few mouse clicks, delivering results once the work completes. Since it presents workflows in a Software-as-a-Service model, end users have no need to care about underlying infrastructure, container packaging technology, or deployment details. JARVICE runs on the Nimbix public cloud, on 3rd-party public clouds, or on private clouds (either hosted or on customer premises). It can also run on bare metal or on top of virtualized infrastructure. Because the applications are containerized, hypervisors are not needed and workflows can run with much higher performance. In fact, Nimbix even offers High Performance Computing staples such as supercomputing NVIDIA Tesla GPUs and Infiniband interconnects, which are not compatible with ordinary virtual infrastructure.
 
Containers help JARVICE users and developers by greatly improving agility and reducing overhead – this is precisely why they are disruptive to traditional cloud infrastructure, no matter what the use case.
 
Many other companies and technologies also focus on DevOps automation for containers, allowing large scale orchestration of distributed applications in the cloud. It’s no longer interesting just to be in container technology – what matters is what you do with those containers. In other words, the industry is maturing into solutions, and that’s good for all of us.
 
Where do Containers Go from Here?

Will containers completely replace virtual infrastructure? Of course not. In many cases, the hypervisor is the new hardware platform. Countless billions of dollars are invested world-wide in virtual infrastructure, both in private data centers and in the cloud. Plus there are some real reasons to use virtual infrastructure, such as running non-Linux applications. Given that containers are easily deployed inside virtual machines, these two technologies will exist in harmony for many years to come.
 
However… because containers give you the security and isolation of virtual machines but on physical infrastructure, new possibilities emerge. For example, one big problem with virtualized infrastructure is that it’s complex to enable access to specialized coprocessors and accelerators, such as GPUs, FPGAs, DSPs, and high speed fabrics like Infiniband. Sure, there are some devices capable of “pass-through” into hypervisors, but this is a small subset of what’s out there. In fact, one pillar of hypervisor-based virtualization is to isolate software from hardware, providing a common abstraction layer. So despite the existence of some functionality, it’s really at odds with the spirit of this technology. Coprocessors and accelerators are making big strides in new areas of big data analytics such as machine learning, where complex algorithms demand immense amounts of compute power. It’s not enough to just use CPU cores anymore to get the scale we need. Containers make accessing these types of coprocessors easy, since no special drivers are needed – they’re already there in the host Linux kernel.
 
Another area for containers to make a big impact in is non-x86 platforms, which generally can’t run traditional hypervisors efficiently. For example, low powered processors such as those we find in smartphones and tablets could provide very “green” computation at scale for specific types of problems. Because most of these processors already run Linux, porting your containerized application to them is much easier and efficient than trying to build hypervisor-based virtual infrastructure stacks on them.

No matter what happens from here on out, one thing is clear: containers are here, and they are disrupting the way we think about running and deploying applications in the cloud. 

##

About the Author

Leo Reiter is a cloud computing pioneer who has been designing, developing, and evangelizing large scale, on demand systems and technologies since the mid 1990’s.  He co-founded Virtual Bridges and helped introduce VDI and desktop cloud (DaaS) to the market.  Currently, Leo serves as Chief Technology Officer of Nimbix, Inc., a global provider of High Performance Computing applications and platforms on demand.  

Leo is on a long-term mission to help more people from all walks of life derive more value from advanced technology, particularly in the cloud.

In his spare time, Leo enjoys reading, cooking, and exercising.

Follow Leo Reiter’s CloudCow Column – Demystifying the Cloud

 
Twitter: @VirtualLeo
LinkedIn: https://www.linkedin.com/in/leoreiter