Both containers and virtual machines provide ways of separating applications running on a host from the operating system itself. Understanding FCOS, which is the operating system used by OKD, will help you see how the host systems protect containers and hosts from each other.
Containers simplify the act of deploying many applications to run on the same host, using the same kernel and container runtime to spin up each container. The applications can be owned by many users and, because they are kept separate, can run different, and even incompatible, versions of those applications at the same time without issue.
In Linux, containers are just a special type of process, so securing containers is similar in many ways to securing any other running process. An environment for running containers starts with an operating system that can secure the host kernel from containers and other processes running on the host, as well as secure containers from each other.
Because OKD 4.11 runs on FCOS hosts, with the option of using Fedora as worker nodes, the following concepts apply by default to any deployed OKD cluster. These Fedora security features are at the core of what makes running containers in OKD more secure:
Linux namespaces enable creating an abstraction of a particular global system resource to make it appear as a separate instance to processes within a namespace. Consequently, several containers can use the same computing resource simultaneously without creating a conflict. Container namespaces that are separate from the host by default include mount table, process table, network interface, user, control group, UTS, and IPC namespaces. Those containers that need direct access to host namespaces need to have elevated permissions to request that access.
SELinux provides an additional layer of security to keep containers isolated from each other and from the host. SELinux allows administrators to enforce mandatory access controls (MAC) for every user, application, process, and file.
Disabling SELinux on FCOS is not supported.
CGroups (control groups) limit, account for, and isolate the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. CGroups are used to ensure that containers on the same host are not impacted by each other.
Secure computing mode (seccomp) profiles can be associated with a container to restrict available system calls. See page 94 of the OpenShift Security Guide for details about seccomp.
Deploying containers using FCOS reduces the attack surface by minimizing the host environment and tuning it for containers. The CRI-O container engine further reduces that attack surface by implementing only those features required by Kubernetes and OKD to run and manage containers, as opposed to other container engines that implement desktop-oriented standalone features.
FCOS is a version of Fedora that is specially configured to work as control plane (master) and worker nodes on OKD clusters. So FCOS is tuned to efficiently run container workloads, along with Kubernetes and OKD services.
To further protect FCOS systems in OKD clusters, most containers, except those managing or monitoring the host system itself, should run as a non-root user. Dropping the privilege level or creating containers with the least amount of privileges possible is recommended best practice for protecting your own OKD clusters.
Requirements for a cluster with user-provisioned infrastructure
Traditional virtualization provides another way to keep application environments separate on the same physical host. However, virtual machines work in a different way than containers. Virtualization relies on a hypervisor spinning up guest virtual machines (VMs), each of which has its own operating system (OS), represented by a running kernel, as well as the running application and its dependencies.
With VMs, the hypervisor isolates the guests from each other and from the host kernel. Fewer individuals and processes have access to the hypervisor, reducing the attack surface on the physical server. That said, security must still be monitored: one guest VM might be able to use hypervisor bugs to gain access to another VM or the host kernel. And, when the OS needs to be patched, it must be patched on all guest VMs using that OS.
Containers can be run inside guest VMs, and there might be use cases where this is desirable. For example, you might be deploying a traditional application in a container, perhaps to lift-and-shift an application to the cloud.
Container separation on a single host, however, provides a more lightweight, flexible, and easier-to-scale deployment solution. This deployment model is particularly appropriate for cloud-native applications. Containers are generally much smaller than VMs and consume less memory and CPU.
When you deploy OKD, you have the choice of an installer-provisioned infrastructure (there are several available platforms) or your own user-provisioned infrastructure. Some low-level security-related configuration, such as adding kernel modules required at first boot, might benefit from a user-provisioned infrastructure. Likewise, user-provisioned infrastructure is appropriate for disconnected OKD deployments.
Keep in mind that, when it comes to making security enhancements and other configuration changes to OKD, the goals should include:
Keeping the underlying nodes as generic as possible. You want to be able to easily throw away and spin up similar nodes quickly and in prescriptive ways.
Managing modifications to nodes through OKD as much as possible, rather than making direct, one-off changes to the nodes.
In pursuit of those goals, most node changes should be done during installation through Ignition or later using MachineConfigs that are applied to sets of nodes by the Machine Config Operator. Examples of security-related configuration changes you can do in this way include:
Adding kernel arguments
Adding kernel modules
Enabling support for FIPS cryptography
Configuring disk encryption
Configuring the chrony time service
Besides the Machine Config Operator, there are several other Operators available to configure OKD infrastructure that are managed by the Cluster Version Operator (CVO). The CVO is able to automate many aspects of OKD cluster updates.