$ oc get pvc
Because OKD does not scope a persistent volume (PV) to a single project, it can be shared across the cluster and claimed by any project using a persistent volume claim (PVC). This can lead to a number of issues that require troubleshooting.
A persistent volume claim (PVC) can get stuck in a Pending
state for a number of reasons. For example:
Insufficient computing resources
Network problems
Mismatched storage class or node selector
No available volumes
The node with the persistent volume (PV) is in a Not Ready
state
Identify the cause by using the oc describe
command to review details about the stuck PVC.
Retrieve the list of PVCs by running the following command:
$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
lvms-test Pending lvms-vg1 11s
Inspect the events associated with a PVC stuck in the Pending
state by running the following command:
$ oc describe pvc <pvc_name> (1)
1 | Replace <pvc_name> with the name of the PVC. For example, lvms-vg1 . |
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 4s (x2 over 17s) persistentvolume-controller storageclass.storage.k8s.io "lvms-vg1" not found
If you encounter a storage class "not found" error, check the LVMCluster
resource and ensure that all the logical volume manager storage (LVMS) pods are running. You can create an LVMCluster
resource if it does not exist.
Verify the presence of the LVMCluster resource by running the following command:
$ oc get lvmcluster -n openshift-storage
NAME AGE
my-lvmcluster 65m
If the cluster doesn’t have an LVMCluster
resource, create one by running the following command:
$ oc create -n openshift-storage -f <custom_resource> (1)
1 | Replace <custom_resource> with a custom resource URL or file tailored to your requirements. |
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: my-lvmcluster
spec:
storage:
deviceClasses:
- name: vg1
default: true
thinPoolConfig:
name: thin-pool-1
sizePercent: 90
overprovisionRatio: 10
Check that all the pods from LVMS are in the Running
state in the openshift-storage
namespace by running the following command:
$ oc get pods -n openshift-storage
NAME READY STATUS RESTARTS AGE
lvms-operator-7b9fb858cb-6nsml 3/3 Running 0 70m
topolvm-controller-5dd9cf78b5-7wwr2 5/5 Running 0 66m
topolvm-node-dr26h 4/4 Running 0 66m
vg-manager-r6zdv 1/1 Running 0 66m
The expected output is one running instance of lvms-operator
and vg-manager
. One instance of topolvm-controller
and topolvm-node
is expected for each node.
If topolvm-node
is stuck in the Init
state, there is a failure to locate an available disk for LVMS to use. To retrieve the information necessary to troubleshoot, review the logs of the vg-manager
pod by running the following command:
$ oc logs -l app.kubernetes.io/component=vg-manager -n openshift-storage
Sometimes a persistent volume claim (PVC) is stuck in a Pending
state because a particular node in the cluster has failed. To identify the failed node, you can examine the restart count of the topolvm-node
pod. An increased restart count indicates potential problems with the underlying node, which may require further investigation and troubleshooting.
Examine the restart count of the topolvm-node
pod instances by running the following command:
$ oc get pods -n openshift-storage
NAME READY STATUS RESTARTS AGE
lvms-operator-7b9fb858cb-6nsml 3/3 Running 0 70m
topolvm-controller-5dd9cf78b5-7wwr2 5/5 Running 0 66m
topolvm-node-dr26h 4/4 Running 0 66m
topolvm-node-54as8 4/4 Running 0 66m
topolvm-node-78fft 4/4 Running 17 (8s ago) 66m
vg-manager-r6zdv 1/1 Running 0 66m
vg-manager-990ut 1/1 Running 0 66m
vg-manager-an118 1/1 Running 0 66m
After you resolve any issues with the node, you might need to perform the forced cleanup procedure if the PVC is still stuck in a Pending
state.
If you see a failure message while inspecting the events associated with the persistent volume claim (PVC), there might be a problem with the underlying volume or disk. Disk and volume provisioning issues often result with a generic error first, such as Failed to provision volume with StorageClass <storage_class_name>
. A second, more specific error message usually follows.
Inspect the events associated with a PVC by running the following command:
$ oc describe pvc <pvc_name> (1)
1 | Replace <pvc_name> with the name of the PVC. Here are some examples of disk or volume failure error messages and their causes:
|
Establish a direct connection to the host where the problem is occurring.
Resolve the disk issue.
After you have resolved the issue with the disk, you might need to perform the forced cleanup procedure if failure messages persist or reoccur.
If disk- or node-related problems persist after you complete the troubleshooting procedures, it might be necessary to perform a forced cleanup procedure. A forced cleanup is used to comprehensively address persistent issues and ensure the proper functioning of the LVMS.
All of the persistent volume claims (PVCs) created using the logical volume manager storage (LVMS) driver have been removed.
The pods using those PVCs have been stopped.
Switch to the openshift-storage
namespace by running the following command:
$ oc project openshift-storage
Ensure there is no Logical Volume
custom resource (CR) remaining by running the following command:
$ oc get logicalvolume
No resources found
If there are any LogicalVolume
CRs remaining, remove their finalizers by running the following command:
$ oc patch logicalvolume <name> -p '{"metadata":{"finalizers":[]}}' --type=merge (1)
1 | Replace <name> with the name of the CR. |
After removing their finalizers, delete the CRs by running the following command:
$ oc delete logicalvolume <name> (1)
1 | Replace <name> with the name of the CR. |
Make sure there are no LVMVolumeGroup
CRs left by running the following command:
$ oc get lvmvolumegroup
No resources found
If there are any LVMVolumeGroup
CRs left, remove their finalizers by running the following command:
$ oc patch lvmvolumegroup <name> -p '{"metadata":{"finalizers":[]}}' --type=merge (1)
1 | Replace <name> with the name of the CR. |
After removing their finalizers, delete the CRs by running the following command:
$ oc delete lvmvolumegroup <name> (1)
1 | Replace <name> with the name of the CR. |
Remove any LVMVolumeGroupNodeStatus
CRs by running the following command:
$ oc delete lvmvolumegroupnodestatus --all
Remove the LVMCluster
CR by running the following command:
$ oc delete lvmcluster --all