Using GlusterFS - Configuring Persistent Storage | Installation and Configuration

Overview
Support Requirements
- Supported Operating Systems
- Environment Requirements
Provisioning
Gluster Volume Security

Overview

You can configure your OKD cluster to use Red Hat Gluster Storage as persistent storage for containerized applications. There are two deployment solutions available when using Red Hat Gluster Storage, using either a containerized or dedicated storage cluster. This topic focuses mainly on the the persistent volume plug-in solution using a dedicated Red Hat Gluster Storage cluster.

Containerized Red Hat Gluster Storage

Starting with the Red Hat Gluster Storage 3.1 update 3 release, you can deploy containerized Red Hat Gluster Storage directly on OKD. Containerized Red Hat Gluster Storage converged with OKD addresses the use case where containerized applications require both shared file storage and the flexibility of a converged infrastructure with compute and storage instances being scheduled and run from the same set of hardware.

Figure 1. Architecture - Red Hat Gluster Storage Container Converged with OpenShift

Step-by-step instructions for this containerized solution are provided separately in the following Red Hat Gluster Storage documentation:

Container-Native Storage for OpenShift Container Platform

Container Native Storage Recommendations

OKD offers container native storage (CNS) storage, which makes it easier for OKD users to fulfill their storage needs. With CNS, solution users and administrators are empowered to have storage and application pods running together on the same infrastructure and sharing the same resources.

See Container-Native Storage for OpenShift Container Platform for configuring CNS as part of an OKD cluster.

Creation Time of Volumes with Container Native Storage

Building environment storage can influence the time it takes for an application to start. For example, if the application pod requires a persistent volume claim (PVC), then extra time might have to be considered for CNS to be created and bound to the corresponding PVC. This effects the build time for an application pod to start.

Creation time of CNS volumes scales linearly up to 100 volumes. In the latest tests, each volume took approximately 6 seconds to be created, allocated, and bound to a pod.

All tests were performed on one trusted storage pool (TSP), using hardware configuration for CNS per the Container-Native Storage for OpenShift Container Platform guidelines.

Dynamic storage provisioning and storage classes were also configured and used when provisioning the PVC.

Deletion Time of Volumes with Container Native Storage

When you delete a PVC that is used by an application pod, then that action will trigger the deletion of the CNS volume that was used by the PVC.

PVCs will disappear immediately from the oc get pvc output. However, the time to delete and recycle CNS volumes depends on the number of CNS volumes. In the latest tests, the deletion time of CNS volumes proved to scale linearly up to 100 volumes.

Deletion time does not affect application users. CNS deletion behavior serves as orientation for CNS storage administrators to be able to estimate how long it will approximately take for CNS volumes to be removed from a CNS cluster.

Recommended Memory Requirements for Container Native Storage

The recommended memory requirements are 32 GB per OKD node hosting CNS pods.

Follow the planning guidelines when planning hardware for a CNS storage environment to ensure that you have enough memory.

Dedicated Storage Cluster

If you have a dedicated Red Hat Gluster Storage cluster available in your environment, you can configure OKD’s Gluster volume plug-in. The dedicated storage cluster delivers persistent Red Hat Gluster Storage file storage for containerized applications over the network. The applications access storage served out from the storage clusters through common storage protocols.

Figure 2. Architecture - Dedicated Red Hat Gluster Storage Cluster Using the OKD Volume Plug-in

You can also dynamically provision volumes in a dedicated Red Hat Gluster Storage cluster that are enabled by Heketi. See Managing Volumes Using Heketi in the Red Hat Gluster Storage 3.3 Administration Guide for more information.

This solution is a conventional deployment where containerized compute applications run on an OKD cluster. The remaining sections in this topic provide the step-by-step instructions for the dedicated Red Hat Gluster Storage solution.

This topic presumes some familiarity with OKD and GlusterFS:

See the Persistent Storage topic for details on the OKD PV framework in general.
See the Red Hat Gluster Storage 3.3 Administration Guide for more on GlusterFS.

High-availability of storage in the infrastructure is left to the underlying storage provider.

Support Requirements

The following requirements must be met to create a supported integration of Red Hat Gluster Storage and OKD.

Supported Operating Systems

The following table lists the supported versions of OKD with Red Hat Gluster Storage Server.

Red Hat Gluster Storage	OKD
3.1.3	3.1 or later

Red Hat Gluster Storage

OKD

3.1.3

3.1 or later

Environment Requirements

The environment requirements for OKD and Red Hat Gluster Storage are described in this section.

Red Hat Gluster Storage

All installations of Red Hat Gluster Storage must have valid subscriptions to Red Hat Network channels and Subscription Management repositories.
Red Hat Gluster Storage installations must adhere to the requirements laid out in the Red Hat Gluster Storage 3.3 Installation Guide.
Red Hat Gluster Storage installations must be completely up to date with the latest patches and upgrades. Refer to the Red Hat Gluster Storage 3.3 Installation Guide to upgrade to the latest version.
The versions of OKD and Red Hat Gluster Storage integrated must be compatible, according to the information in Supported Operating Systems.
A fully-qualified domain name (FQDN) must be set for each hypervisor and Red Hat Gluster Storage server node. Ensure that correct DNS records exist, and that the FQDN is resolvable via both forward and reverse DNS lookup.

Red Hat OKD

All installations of OKD must have valid subscriptions to Red Hat Network channels and Subscription Management repositories.
OKD installations must adhere to the requirements laid out in the Installation and Configuration documentation.
The OKD cluster must be up and running.
A user with cluster-admin permissions must be created.
All OKD nodes on RHEL systems must have the glusterfs-fuse RPM installed, which should match the version of Red Hat Gluster Storage server running in the containers. For more information on installing glusterfs-fuse, see Native Client in the Red Hat Gluster 3.3 Storage Administration Guide.

Provisioning

To provision GlusterFS volumes using the dedicated storage cluster solution, the following are required:

An existing storage device in your underlying infrastructure.
A distinct list of servers (IP addresses) in the Gluster cluster, to be defined as endpoints.
A service, to persist the endpoints (optional).
An existing Gluster volume to be referenced in the persistent volume object.
glusterfs-fuse installed on each schedulable OKD node in your cluster:
```
$ yum install glusterfs-fuse
```

Persistent volumes (PVs) and persistent volume claims (PVCs) can share volumes across a single project. While the GlusterFS-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.

Creating Gluster Endpoints

An endpoints definition defines the GlusterFS cluster as EndPoints and includes the IP addresses of your Gluster servers. The port value can be any numeric value within the accepted range of ports. Optionally, you can create a service that persists the endpoints.

Define the following service:
Gluster Service Definition
```
apiVersion: v1
kind: Service
metadata:
  name: glusterfs-cluster (1)
spec:
  ports:
  - port: 1
```
1 This name must be defined in the endpoints definition. If using a service, then the endpoints name must match the service name.
Save the service definition to a file, for example gluster-service.yaml, then create the service:
```
$ oc create -f gluster-service.yaml
```

Verify that the service was created:

$ oc get services
NAME                       CLUSTER_IP       EXTERNAL_IP   PORT(S)    SELECTOR        AGE
glusterfs-cluster          172.30.205.34    <none>        1/TCP      <none>          44s

Define the Gluster endpoints:

Gluster Endpoints Definition

apiVersion: v1
kind: Endpoints
metadata:
  name: glusterfs-cluster (1)
subsets:
  - addresses:
      - ip: 192.168.122.221 (2)
    ports:
      - port: 1
  - addresses:
      - ip: 192.168.122.222 (2)
    ports:
      - port: 1 (3)

1	This name must match the service name from step 1.
2	The `ip` values must be the actual IP addresses of a Gluster server, not fully-qualified host names.
3	The port number is ignored.

Save the endpoints definition to a file, for example gluster-endpoints.yaml, then create the endpoints:
```
$ oc create -f gluster-endpoints.yaml
endpoints "glusterfs-cluster" created
```

Verify that the endpoints were created:

$ oc get endpoints
NAME                ENDPOINTS                             AGE
docker-registry     10.1.0.3:5000                         4h
glusterfs-cluster   192.168.122.221:1,192.168.122.222:1   11s
kubernetes          172.16.35.3:8443                      4d

Creating the Persistent Volume

GlusterFS does not support the 'Recycle' reclaim policy.

Next, define the PV in an object definition before creating it in OKD:

Persistent Volume Object Definition Using GlusterFS

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gluster-default-volume (1)
spec:
  capacity:
    storage: 2Gi (2)
  accessModes: (3)
    - ReadWriteMany
  glusterfs: (4)
    endpoints: glusterfs-cluster (5)
    path: myVol1 (6)
    readOnly: false
  persistentVolumeReclaimPolicy: Retain (7)

1	The name of the volume. This is how it is identified via persistent volume claims or from pods.
2	The amount of storage allocated to this volume.
3	`accessModes` are used as labels to match a PV and a PVC. They currently do not define any form of access control.
4	The volume type being used, in this case the glusterfs plug-in.
5	The endpoints name that defines the Gluster cluster created in Creating Gluster Endpoints.
6	The Gluster volume that will be accessed, as shown in the `gluster volume status` command.
7	The volume reclaim policy `Retain` indicates that the volume will be preserved after the pods accessing it terminates. For GlusterFS, the accepted values include `Retain`, and `Delete`.

Endpoints are name-spaced. Each project accessing the Gluster volume needs its own endpoints.

Save the definition to a file, for example gluster-pv.yaml, and create the persistent volume:
```
$ oc create -f gluster-pv.yaml
```

Verify that the persistent volume was created:

$ oc get pv
NAME                     LABELS    CAPACITY     ACCESSMODES   STATUS      CLAIM     REASON    AGE
gluster-default-volume   <none>    2147483648   RWX           Available                       2s

Creating the Persistent Volume Claim

Developers request GlusterFS storage by referencing either a PVC or the Gluster volume plug-in directly in the volumes section of a pod spec. A PVC exists only in the user’s project and can only be referenced by pods within that project. Any attempt to access a PV across a project causes the pod to fail.

Create a PVC that will bind to the new PV:
PVC Object Definition
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gluster-claim
spec:
  accessModes:
  - ReadWriteMany (1)
  resources:
     requests:
       storage: 1Gi (2)
```
1 accessModes do not enforce security, but rather act as labels to match a PV to a PVC.

2 This claim will look for PVs offering 1Gi or greater capacity.
Save the definition to a file, for example gluster-claim.yaml, and create the PVC:
```
$ oc create -f gluster-claim.yaml
```
PVs and PVCs make sharing a volume across a project simpler. The gluster-specific information contained in the PV definition can also be defined directly in a pod specification.

Gluster Volume Security

This section covers Gluster volume security, including matching permissions and SELinux considerations. Understanding the basics of POSIX permissions, process UIDs, supplemental groups, and SELinux is presumed.

See the full Volume Security topic before implementing Gluster volumes.

As an example, assume that the target Gluster volume, HadoopVol is mounted under /mnt/glusterfs/, with the following POSIX permissions and SELinux labels:

$ ls -lZ /mnt/glusterfs/
drwxrwx---. yarn hadoop system_u:object_r:fusefs_t:s0    HadoopVol

$ id yarn
uid=592(yarn) gid=590(hadoop) groups=590(hadoop)

In order to access the HadoopVol volume, containers must match the SELinux label, and run with a UID of 592 or 590 in their supplemental groups. The OKD GlusterFS plug-in mounts the volume in the container with the same POSIX ownership and permissions found on the target gluster mount, namely the owner will be 592 and group ID will be 590. However, the container is not run with its effective UID equal to 592, nor with its GID equal to 590, which is the desired behavior. Instead, a container’s UID and supplemental groups are determined by Security Context Constraints (SCCs) and the project defaults.

Group IDs

Configure Gluster volume access by using supplemental groups, assuming it is not an option to change permissions on the Gluster mount. Supplemental groups in OKD are used for shared storage, such as GlusterFS. In contrast, block storage, such as Ceph RBD or iSCSI, use the fsGroup SCC strategy and the fsGroup value in the pod’s securityContext.

Use supplemental group IDs instead of user IDs to gain access to persistent storage. Supplemental groups are covered further in the full Volume Security topic.

The group ID on the target Gluster mount example above is 590. Therefore, a pod can define that group ID using supplementalGroups under the pod-level securityContext definition. For example:

spec:
  containers:
    - name:
    ...
  securityContext: (1)
    supplementalGroups: [590] (2)

1	`securityContext` must be defined at the pod level, not under a specific container.
2	An array of GIDs defined at the pod level.

Assuming there are no custom SCCs that satisfy the pod’s requirements, the pod matches the restricted SCC. This SCC has the supplementalGroups strategy set to RunAsAny, meaning that any supplied group IDs are accepted without range checking.

As a result, the above pod will pass admissions and can be launched. However, if group ID range checking is desired, use a custom SCC, as described in pod security and custom SCCs. A custom SCC can be created to define minimum and maximum group IDs, enforce group ID range checking, and allow a group ID of 590.

User IDs

User IDs can be defined in the container image or in the pod definition. The full Volume Security topic covers controlling storage access based on user IDs, and should be read prior to setting up NFS persistent storage.

Use supplemental group IDs instead of user IDs to gain access to persistent storage.

In the target Gluster mount example above, the container needs a UID set to 592, so the following can be added to the pod definition:

spec:
  containers: (1)
  - name:
  ...
    securityContext:
      runAsUser: 592 (2)

1	Pods contain a `securtityContext` specific to each container and a pod-level `securityContext`, which applies to all containers defined in the pod.
2	The UID defined on the Gluster mount.

With the default project and the restricted SCC, a pod’s requested user ID of 592 will not be allowed, and the pod will fail. This is because:

The pod requests 592 as its user ID.
All SCCs available to the pod are examined to see which SCC will allow a user ID of 592.
Because all available SCCs use MustRunAsRange for their runAsUser strategy, UID range checking is required.
592 is not included in the SCC or project’s user ID range.

Do not modify the predefined SCCs. Insead, create a custom SCC so that minimum and maximum user IDs are defined, UID range checking is still enforced, and the UID of 592 will be allowed.

SELinux

See the full Volume Security topic for information on controlling storage access in conjunction with using SELinux.

By default, SELinux does not allow writing from a pod to a remote Gluster server.

To enable writing to GlusterFS volumes with SELinux enforcing on each node, run:

$ sudo setsebool -P virt_sandbox_use_fusefs on

The virt_sandbox_use_fusefs boolean is defined by the docker-selinux package. If you get an error saying it is not defined, ensure that this package is installed.

The -P option makes the bool persistent between reboots.

1	`accessModes` do not enforce security, but rather act as labels to match a PV to a PVC.
2	This claim will look for PVs offering 1Gi or greater capacity.

Persistent Storage Using GlusterFS