×

Linux user namespaces allow administrators to isolate the container user and group identifiers (UIDs and GIDs) so that a container can have a different set of permissions in the user namespace than on the host system where it is running. This allows containers to run processes with full privileges inside the user namespace, but the processes can be unprivileged for operations on the host machine.

By default, a container runs in the host user namespace. Running a container in the host user namespace can be useful when the container needs a feature that is available only in the host namespace. However, running pods in the host namespace introduces security concerns, such as the possibility of container breakouts, in which a process inside another container breaks out onto the host where the process can access or modify files on the host or in your containers.

Running containers in individual user namespaces can mitigate container breakouts and several other vulnerabilities that a compromised container can pose to other pods and the node itself.

Configuring Linux user namespace support

You can configure Linux user namespace by setting the hostUsers parameter to false in the pod spec, and a few other configurations, as shown in the following procedure.

Running workloads in user namespaces makes it safe to configure RunAsAny for Security Context Constraint (SCC) fields, such as fsGroup, runAsGroup, runAsUser, and supplementalGroups, as the UID or GID outside of the container is different from the one inside, which these fields express.

For extra security, you can use the restricted-v3 or nested-container SCC, which are specifically designed for workloads in Linux user namespaces. The userNamespaceLevel: RequirePodLevel field in the SCC requires that the workloads run in user namespaces. For more information about SCCs, see "Managing security context constraints".

To require a specific SCC for a workload, you can add an SCC to a specific user or group by using the oc adm policy add-scc-to-user or oc adm policy add-scc-to-group command. For more information, see the "OpenShift CLI administrator command reference".

Also, you can optionally use the procMount parameter in a pod specification to configure the /proc file system in pods as unmasked. Setting /proc to unmasked, which is generally considered as safe, bypasses the default masking behavior of the container runtime, and should be used only with an SCC that sets hostUsers to false.

Procedure
  1. Edit the default user ID (UID) and group ID (GID) range of the OKD namespace where your pod is deployed by running the following command:

    $ oc edit ns/<namespace_name>
    Example namespace
    apiVersion: v1
    kind: Namespace
    metadata:
      annotations:
        openshift.io/description: ""
        openshift.io/display-name: ""
        openshift.io/requester: system:admin
        openshift.io/sa.scc.mcs: s0:c27,c24
        openshift.io/sa.scc.supplemental-groups: 1000/10000 (1)
        openshift.io/sa.scc.uid-range: 1000/10000 (2)
    # ...
      name: userns
    # ...
    1 Specifies the default GID to require in the pod spec. The range for a Linux user namespace must be 65535 or lower. The default is 1000000000/10000.
    2 Specifies the default UID to require in the pod spec. The range for a Linux user namespace must be 65535 or lower. The default is 1000000000/10000.

    The range 1000/10000 means 10,000 values starting with ID 1000, so it specifies the range of IDs from 1000 to 10,999.

  2. Enable the use of Linux user namespaces by creating a workload configured to run with an appropriate SCC and the hostUsers parameter set to false.

    1. Create a YAML file similar to the following:

      Example pod specification
      apiVersion: v1
      kind: Pod
      metadata:
        namespace: userns
        name: userns-pod
      # ...
      spec:
      #...
        template:
          metadata:
            labels:
              app: name
            annotations:
              openshift.io/required-scc: "restricted-v3" (1)
          spec:
            hostUsers: false (2)
            containers:
            - name: userns-container
              image: registry.access.redhat.com/ubi9
              command: ["sleep", "1000"]
              securityContext:
                capabilities: (3)
                  drop: ["ALL"]
                allowPrivilegeEscalation: false
                runAsNonRoot: true (4)
                procMount: Unmasked (5)
                runAsUser: 1000 (6)
                runAsGroup: 1000 (7)
      # ...
      1 Specifies the SCC to use with this workload.
      2 Specifies whether the pod is to be run in a user namespace. If false, the pod runs in a new user namespace that is created for the pod. If true, the pod runs in the host user namespace. The default is true.
      3 capabilities permit privileged actions without giving full root access. Technically, setting capabilities inside of a user namespace is safer than setting them outside, as the scope of the capabilities are limited by being inside user namespace, and can generally be considered to be safe. However, giving pods capabilities like CAP_SYS_ADMIN to any untrusted workload could increase the potential kernel surface area that a containerized process has access to and could find exploits in. Thus, capabilities inside of a user namespace are allowed at baseline level in pod security admission.
      4 Specifies that processes inside the container run with a user that has any UID other than 0.
      5 Optional: Specifies the type of proc mount to use for the containers. The unmasked value ensures that a container’s /proc file system is mounted as read/write by the container process. The default is Default.
      6 Specifies the user ID for processes that run inside of the container. This must fall in the range that you set in the namespace object.
      7 Specifies the group ID for processes that run inside of the containers. This must fall in the range that you set in the namespace object.
    2. Create the object by running the following command:

      $ oc create -f <file_name>.yaml
Verification
  1. Check the user and group IDs being used by the container in the pod you created. The pod is inside the Linux user namespace.

    1. Start a shell session with the container in your pod:

      $ oc rsh -c <container_name> pod/<pod_name>
      Example command
      $ oc rsh -c userns-container_name pod/userns-pod
    2. Display the user and group IDs being used inside the container:

      sh-5.1$ id
      Example output
      uid=1000(1000) gid=1000(1000) groups=1000(1000) (1)
      
      1 The UID and group for the container should be the same as you set in the pod specification.
    3. Display the user ID being used in the container user namespace:

      sh-5.1$ lsns -t user
      Example output
              NS TYPE  NPROCS PID USER COMMAND
      4026532447 user       3   1 1000 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1000 (1)
      
      1 The UID for the process should be the same as you set in the pod spec.
  2. Check the UID being used by the node. The node is outside of the Linux user namespace. This user ID should be different from the UID being used in the container.

    1. Start a debug session for that node:

      $ oc debug node/ci-ln-z5vppzb-72292-8zp2b-worker-c-q8sh9
      Example command
      $ oc debug node/ci-ln-z5vppzb-72292-8zp2b-worker-c-q8sh9
    2. Set /host as the root directory within the debug shell:

      sh-5.1# chroot /host
    3. Display the UID being used by the node:

      sh-5.1#  lsns -t user
      Example command
              NS TYPE  NPROCS   PID USER       COMMAND
      4026531837 user     233     1 root       /usr/lib/systemd/systemd --switched-root --system --deserialize 28
      4026532447 user       1  4767 2908816384 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1000 (1)
      
      1 The UID should be different from what you set in the pod specification.
    4. Exit the debug session by using the following commands:

      sh-5.1#  exit
      sh-5.1#  exit
  3. Check that the /proc file system is mounted into container as unmasked, as indicated by read/write permission (rw) in the output of the following command:

    $ oc exec <pod_name> -- mount | grep /proc
    Example output
    proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)