The File Integrity Operator is an OKD Operator that continually runs file integrity checks on the cluster nodes. It deploys a DaemonSet that initializes and runs privileged advanced intrusion detection environment (AIDE) containers on each node, providing a status object with a log of files that are modified during the initial run of the DaemonSet pods.

Currently, only Fedora CoreOS (FCOS) nodes are supported.

Understanding the FileIntegrity custom resource

An instance of a FileIntegrity custom resource (CR) represents a set of continuous file integrity scans for one or more nodes.

Each FileIntegrity CR is backed by a DaemonSet running AIDE on the nodes matching the FileIntegrity CR specification.

The following example FileIntegrity CR enables scans on only the worker nodes, but otherwise uses the defaults.

Example FileIntegrity CR
kind: FileIntegrity
  name: worker-fileintegrity
  namespace: openshift-file-integrity
  nodeSelector: ""
  config: {}

Checking the FileIntegrity custom resource status

The FileIntegrity custom resource (CR) reports its status through the .status.phase subresource.

  1. To query the FileIntegrity CR status, run:

    $ oc get fileintegrities/worker-fileintegrity  -o jsonpath="{ .status.phase }"
    Example output

FileIntegrity custom resource phases

  • Pending - The phase after the custom resource (CR) is created.

  • Active - The phase when the backing daemon set is up and running,

  • Initializing - The phase when the AIDE database is being reinitialized.

Understanding the FileIntegrityNodeStatuses object

The scan results of the FileIntegrity CR are reported in another object called FileIntegrityNodeStatuses.

$ oc get fileintegritynodestatuses
Example output
NAME                                                AGE
worker-fileintegrity-ip-10-0-130-192.ec2.internal   101s
worker-fileintegrity-ip-10-0-147-133.ec2.internal   109s
worker-fileintegrity-ip-10-0-165-160.ec2.internal   102s

FileIntegrityNodeStatus might not be created until the second run of the scanner is finished. The period is configurable.

There is one result object per node. The nodeName attribute of each FileIntegrityNodeStatus object corresponds to the node being scanned. The status of the file integrity scan is represented in the results array, which holds scan conditions.

$ oc get -ojsonpath='{.items[*].results}' | jq

FileIntegrityNodeStatus status types

These conditions are reported in the results array of the corresponding FileIntegrityNodeStatus:

  • Succeeded - The integrity check passed; the files and directories covered by the AIDE check have not been modified since the database was last initialized.

  • Failed - The integrity check failed; some files or directories covered by the AIDE check have been modified since the database was last initialized.

  • Error - The AIDE scanner encountered an internal error.

FileIntegrityNodeStatus success status example

Example output of a condition with a success status
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:57Z"
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:46:03Z"
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:48Z"

In this case, all three scans succeeded and so far there are no other conditions.

FileIntegrityNodeStatus failure status example

To simulate a failure condition, modify one of the files AIDE tracks. For example, modify /etc/resolv.conf on one of the worker nodes:

$ oc debug node/ip-10-0-130-192.ec2.internal
Example output
Creating debug namespace/openshift-debug-node-ldfbj ...
Starting pod/ip-10-0-130-192ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP:
If you don't see a command prompt, try pressing enter.
sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
sh-4.2# exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-ldfbj ...

After some time, the Failed condition was reported in the results array of the corresponding FileIntegrityNodeStatus. The previous Succeeded condition is retained, which allows you to pinpoint the time the check failed.

$ oc get -ojsonpath='{.results}' | jq -r

Alternatively, if you are not mentioning the object name, run:

$ oc get -ojsonpath='{.items[*].results}' | jq
Example output
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:54:14Z"
    "condition": "Failed",
    "filesChanged": 1,
    "lastProbeTime": "2020-09-15T12:57:20Z",
    "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
    "resultConfigMapNamespace": "openshift-file-integrity"

The Failed condition points to a config map that gives more details about what exactly failed and why:

$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Example output
Name:         aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Namespace:    openshift-file-integrity
Annotations: 0


AIDE 0.15.1 found differences between database and filesystem!!
Start timestamp: 2020-09-15 12:58:15

  Total number of files:  31553
  Added files:                0
  Removed files:            0
  Changed files:            1

Changed files:

changed: /hostroot/etc/resolv.conf

Detailed information about changes:

File: /hostroot/etc/resolv.conf
 SHA512   : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg

Events:  <none>