A High-Level Guide to Containerization

The concept of small, lightweight execution environments, known as containers, has been a phenomenon in computer science for more than a decade. Over the past ten years, this technology has grown rapidly in popularity, and software such as Docker and Kubernetes have arrived on the scene to support the “container revolution.” Still, even with its increase in adoption and popularity, containerization remains a mystery to many. In this guide, we’ll walk you through containers at a high level. Armed with this knowledge, you can approach the actual implementation of containers with more confidence and clarity.

Containers: An Overview

Within an operating system (OS), let’s say Linux, users have the ability to run processes. These processes share a number of qualities associated with the space they take up in the operating system: address space, identification numbers, etc. Operating systems are designed to allow a number of processes to run concurrently. That’s all fine and well, but what if you’d like to isolate certain processes? In the most basic sense, a container creates an environment where an isolated process can run. Now, within that isolated space, you can own process space, own network interference, install packages, run services, and tinker with routing.

That Sounds Like a VM, What’s the Difference?

While working with containers can have the feeling of working in a separate VM, it’s important to know the difference. While containers enable a way to virtualize an OS so that multiple workloads can run on a single OS instance, VMs allow the hardware to virtualize to run multiple OS instances. A container uses the host kernel, can’t run multiple OSes, and can’t have its own modules. Once applications are containerized, they are able to be deployed on any infrastructure (virtual machines, public cloud infrastructure, bare metal, etc.).

Virtual Machines vs. Containerization. Image Source: Docker Blog.

What Does That Look Like?

If you were to put a shell on that process, you’d find a process with its own unique namespace. Therefore, you’re only seeing the processes that run in that specific container. In addition to the namespace, containers allow you to add cgroups that restrict the capabilities of the process. Cgroups can do a number of things including limit the amount of CPU your process can use or restrict the system calls the process can make.

Diving Deeper: Types of Cgroups and Namespaces

Namespaces:

PID: The PID namespace provides processes with an independent set of process IDs (PIDs) from other namespaces. Think of this like a process having children. They have their own name and they take up their own space but they are still related to you, the original process. PID namespaces are nested, meaning when a new process is created it will have a PID for each namespace from its current namespace up to the initial PID namespace.
Network: Network namespaces virtualize the network stack. This allows you to run programs on any port you want without it conflicting with what is already running.
Mount: Mount namespaces control mount points. Upon creation, the mounts from the current mount namespace are copied to the new namespace. Essentially, you are able to mount and unmount file systems without it affecting the host system by making a clone of a namespace and then altering it independently.

Cgroups:

Memory limitation: Isolates the resource usage (CPU, memory, disk I/O, network, etc.)
Seccomb-bpf: Filters which system calls your processes run.

What else can ‘Container’ Mean?

Other than an isolated process running in its own sandbox, what else do people mean when they refer to containers? There’s such a thing as a “container image.” An image is an inert, immutable file that’s essentially a snapshot of a container. If you use the build command to create an image, you use the run command to produce a container. I’m a writer and I like metaphors so here you go: if an image is the recipe, the container is the cake. With the recipe, you can make as many cakes as you’d like. Maybe you made a cake but you aren’t craving sweets right now? Put the cake in the freezer; we call this a stopped container.

Images are arranged in an image hierarchy. You have one binary state and you add something, creating a new image. From there you add something on top of them, adding a third layer. This hierarchical nature allows for the sharing of application stacks resulting in the consolidation of certain binary states. This is a good thing because multiple processes can pull from images at once. This hierarchical nature also means that things are inherently connected. The patching of a vulnerability on one image will result in changes to all the other nodes.

What is Docker?

Containers sound cool, right? If you were to build a bunch of scripts around these basic Linux kernels that isolate the containers in a lightweight and usable manner, we’d be talking about Docker. Docker works by alleviating a problem developers have when using containers: it’s difficult to ship code to the server. This is because the traditional software stack is comprised of dozens of software components and thousands of places where that component might need to run.

Metaphor alert! Think of a world in which shipping companies don’t exist. You create a product and, because there is no company to help you ship that specific product, you need to analyze all the different effective ways in which to ship products (boats, planes, buses, etc.) If you have products of various sizes and fragility, you might have to look at various options and find that infrastructure. What a nightmare. It’s the same with shipping isolated code to your server. Docker is the shipping company that has the infrastructure in place to erase that difficulty. Now, shipping your code is scalable, repeatable, and less expensive.

Let’s imagine your tech stack. On the bottom, you have your server hardware (NGINX, AWS, etc.) On top of the hardware, you have the host operating system. Docker will then be installed on top of the OS, allowing us to spin up containers.

What is Kubernetes?

There’s a small misconception that when working with containers, you have to use either Docker or Kubernetes. Kubernetes allows you to use your existing Docker containers and workloads but allows you to tackle some of the complex issues when trying to scale. Containers that use Docker become complicated when you need to adjust the current workflow or introduce a new microservice. This is because actions like these require strategic scaling for each independent application in order to avoid docks in load time. The fundamental premise behind Kubernetes is “desired state management,” which means that Kubernetes allows distributed containerized systems to run resiliently.

Architecturally, Kubernetes is made up of several components. For the purpose of simplicity and comprehension, I’ll only discuss a few major players. The “master” component is the Kubernetes Cluster Services Component. The KBs Cluster Service Component, through its own API, is fed a specific configuration which it is expected to interpret and run in the infrastructure. This infrastructure is called worker nodes which act as a host for individual containers. Kubernetes worker nodes have a unique component called a kubelet which is responsible for communicating with the Kubernetes Cluster Services Component and making sure that containers are run in the correct pod (a group of one or more containers).

Kubernetes architecture. Image Source: Kubernetes in Three Diagrams.

The configuration is fed to the master component via a deployment .yaml file. Within the.yaml file is a pod configuration that contains one or more container images. Also within the .yaml file is the ReplicaSet Controller which is responsible for identifying how many containers will run in a specific pod. The deployment file will feed into the Cluster Services API and the Cluster Services Component will figure out how to schedule the right pod in the right worker node, at the right time. If a worker node becomes unavailable, it is up to the Kubernetes Master Component to reschedule pods.

Learn More About Containers, Docker, and Kubernetes

This is a basic guide for understanding the fundamental concepts behind containers, Docker, and Kubernetes. For in-depth information, check out the Docker and Kubernetes documentation. Curious about getting started with Docker in WordPress? Check out this informative article or learn about using a service container to improve your WordPress code.