Network-Bound Disk Encryption (NBDE) allows you to encrypt root volumes of hard drives on physical and virtual machines without having to manually enter a password when restarting machines.
To understand the merits of Network-Bound Disk Encryption (NBDE) for securing data at rest on edge servers, compare key escrow and TPM disk encryption without Clevis to NBDE on systems running Fedora.
The following table presents some tradeoffs to consider around the threat model and the complexity of each encryption solution.
| Scenario | Key escrow | TPM disk encryption (without Clevis) | NBDE | 
|---|---|---|---|
| Protects against single-disk theft | X | X | X | 
| Protects against entire-server theft | X | X | |
| Systems can reboot independently from the network | X | ||
| No periodic rekeying | X | ||
| Key is never transmitted over a network | X | X | |
| Supported by OpenShift | X | X | 
Key escrow is the traditional system for storing cryptographic keys. The key server on the network stores the encryption key for a node with an encrypted boot disk and returns it when queried. The complexities around key management, transport encryption, and authentication do not make this a reasonable choice for boot disk encryption.
Although available in Fedora, key escrow-based disk encryption setup and management is a manual process and not suited to OKD automation operations, including automated addition of nodes, and currently not supported by OKD.
Trusted Platform Module (TPM) disk encryption is best suited for data centers or installations in remote protected locations. Full disk encryption utilities such as dm-crypt and BitLocker encrypt disks with a TPM bind key, and then store the TPM bind key in the TPM, which is attached to the motherboard of the node. The main benefit of this method is that there is no external dependency, and the node is able to decrypt its own disks at boot time without any external interaction.
TPM disk encryption protects against decryption of data if the disk is stolen from the node and analyzed externally. However, for insecure locations this may not be sufficient. For example, if an attacker steals the entire node, the attacker can intercept the data when powering on the node, because the node decrypts its own disks. This applies to nodes with physical TPM2 chips as well as virtual machines with Virtual Trusted Platform Module (VTPM) access.
Network-Bound Disk Encryption (NBDE) effectively ties the encryption key to an external server or set of servers in a secure and anonymous way across the network. This is not a key escrow, in that the nodes do not store the encryption key or transfer it over the network, but otherwise behaves in a similar fashion.
Clevis and Tang are generic client and server components that provide network-bound encryption. Fedora CoreOS (FCOS) uses these components in conjunction with Linux Unified Key Setup-on-disk-format (LUKS) to encrypt and decrypt root and non-root storage volumes to accomplish Network-Bound Disk Encryption.
When a node starts, it attempts to contact a predefined set of Tang servers by performing a cryptographic handshake. If it can reach the required number of Tang servers, the node can construct its disk decryption key and unlock the disks to continue booting. If the node cannot access a Tang server due to a network outage or server unavailability, the node cannot boot and continues retrying indefinitely until the Tang servers become available again. Because the key is effectively tied to the node’s presence in a network, an attacker attempting to gain access to the data at rest would need to obtain both the disks on the node, and network access to the Tang server as well.
The following figure illustrates the deployment model for NBDE.
The following figure illustrates NBDE behavior during a reboot.
Shamir’s secret sharing (sss) is a cryptographic algorithm to securely divide up, distribute, and re-assemble keys. Using this algorithm, OKD can support more complicated mixtures of key protection.
When you configure a cluster node to use multiple Tang servers, OKD uses sss to set up a decryption policy that will succeed if at least one of the specified servers is available. You can create layers for additional security. For example, you can define a policy where OKD requires both the TPM and one of the given list of Tang servers to decrypt the disk.
The following components and technologies implement Network-Bound Disk Encryption (NBDE).
Tang is a server for binding data to network presence. It makes a node containing the data available when the node is bound to a certain secure network. Tang is stateless and does not require Transport Layer Security (TLS) or authentication. Unlike escrow-based solutions, where the key server stores all encryption keys and has knowledge of every encryption key, Tang never interacts with any node keys, so it never gains any identifying information from the node.
Clevis is a pluggable framework for automated decryption that provides automated unlocking of Linux Unified Key Setup-on-disk-format (LUKS) volumes. The Clevis package runs on the node and provides the client side of the feature.
A Clevis pin is a plugin into the Clevis framework. There are three pin types:
Binds the disk encryption to the TPM2.
Binds the disk encryption to a Tang server to enable NBDE.
Allows more complex combinations of other pins. It allows more nuanced policies such as the following:
Must be able to reach one of these three Tang servers
Must be able to reach three of these five Tang servers
Must be able to reach the TPM2 AND at least one of these three Tang servers
When planning your Tang server environment, consider the physical and network locations of the Tang servers.
The geographic location of the Tang servers is relatively unimportant, as long as they are suitably secured from unauthorized access or theft and offer the required availability and accessibility to run a critical service.
Nodes with Clevis clients do not require local Tang servers as long as the Tang servers are available at all times. Disaster recovery requires both redundant power and redundant network connectivity to Tang servers regardless of their location.
Any node with network access to the Tang servers can decrypt their own disk partitions, or any other disks encrypted by the same Tang servers.
Select network locations for the Tang servers that ensure the presence or absence of network connectivity from a given host allows for permission to decrypt. For example, firewall protections might be in place to prohibit access from any type of guest or public network, or any network jack located in an unsecured area of the building.
Additionally, maintain network segregation between production and development networks. This assists in defining appropriate network locations and adds an additional layer of security.
Do not deploy Tang servers on the same resource, for example, the same rolebindings.rbac.authorization.k8s.io cluster, that they are responsible for unlocking. However, a cluster of Tang servers and other security resources can be a useful configuration to enable support of multiple additional clusters and cluster resources.
The requirements around availability, network, and physical location drive the decision of how many Tang servers to use, rather than any concern over server capacity.
Tang servers do not maintain the state of data encrypted using Tang resources. Tang servers are either fully independent or share only their key material, which enables them to scale well.
There are two ways Tang servers handle key material:
Multiple Tang servers share key material:
You must load balance Tang servers sharing keys behind the same URL. The configuration can be as simple as round-robin DNS, or you can use physical load balancers.
You can scale from a single Tang server to multiple Tang servers. Scaling Tang servers does not require rekeying or client reconfiguration on the node when the Tang servers share key material and the same URL.
Client node setup and key rotation only requires one Tang server.
Multiple Tang servers generate their own key material:
You can configure multiple Tang servers at installation time.
You can scale an individual Tang server behind a load balancer.
All Tang servers must be available during client node setup or key rotation.
When a client node boots using the default configuration, the Clevis client contacts all Tang servers. Only n Tang servers must be online to proceed with decryption. The default value for n is 1.
Red Hat does not support postinstallation configuration that changes the behavior of the Tang servers.
Centralized logging of Tang traffic is advantageous because it might allow you to detect such things as unexpected decryption requests. For example:
A node requesting decryption of a passphrase that does not correspond to its boot sequence
A node requesting decryption outside of a known maintenance activity, such as cycling keys