Description of problem: PVs are stuck in Released and can not be reused causing POD deployment failed $ omg get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-11f7597f 10Gi RWO Delete Released cran1/vdupvc localfs-lvm-sc 1d local-pv-1fcd7929 10Gi RWO Delete Bound cran1/um-pvc localfs-lvm-sc 1h51m local-pv-28f59b45 10Gi RWO Delete Bound cran1/fm-storage localfs-lvm-sc 1h51m local-pv-2eb404fd 10Gi RWO Delete Bound cran1/vdupvc-l2rt localfs-lvm-sc 1h51m local-pv-3a7eb9ab 10Gi RWO Delete Bound cran1/logservice-syslog localfs-lvm-sc 1h51m local-pv-3f52128 120Gi RWO Delete Bound openshift-image-registry/registry-storage localfs-lvm-sc 74d local-pv-49c193d3 10Gi RWO Delete Bound cran1/hc-pvc localfs-lvm-sc 1h51m local-pv-60fd2dd1 10Gi RWO Delete Bound cran1/vdupvc-l1hi localfs-lvm-sc 1h51m local-pv-619d4f00 10Gi RWO Delete Bound cran1/vaultserver-storage-pvc localfs-lvm-sc 1h51m local-pv-647573e0 10Gi RWO Delete Released cran1/certman-tlsconfig-pvc localfs-lvm-sc 1d local-pv-6d650a8a 10Gi RWO Delete Bound cran1/sec-storage-pvc localfs-lvm-sc 1h51m local-pv-7753355f 10Gi RWO Delete Bound cran1/vdupvc localfs-lvm-sc 1h51m local-pv-82ef5464 10Gi RWO Delete Released cran1/apigw-pvc localfs-lvm-sc 1d local-pv-a87e2928 10Gi RWO Delete Released cran1/providercontroller-pvc localfs-lvm-sc 1d local-pv-b59c9425 10Gi RWO Delete Bound cran1/logservice-auditlog localfs-lvm-sc 1h51m local-pv-b90b3bf6 10Gi RWO Delete Bound cran1/vdupvc-oamcm-oamasm-shared-plan localfs-lvm-sc 1h51m local-pv-bd9fee95 10Gi RWO Delete Bound cran1/config localfs-lvm-sc 1h51m local-pv-c023b0e7 10Gi RWO Delete Bound cran1/security-storage-oamasm-pv-claim localfs-lvm-sc 1h51m local-pv-c0e9b3dc 10Gi RWO Delete Released cran1/cnum-pvc localfs-lvm-sc 1d local-pv-c55625e6 10Gi RWO Delete Bound cran1/vdupvc-oamasm-asmdiag localfs-lvm-sc 1h51m local-pv-cb81bdac 10Gi RWO Delete Bound cran1/vdupvc-l2hi localfs-lvm-sc 1h51m local-pv-d9308846 10Gi RWO Delete Released cran1/ne3sagent-pvc localfs-lvm-sc 1d local-pv-ef710632 10Gi RWO Delete Bound cran1/dhcplease-storage localfs-lvm-sc 1h51m local-pv-f60a26bb 10Gi RWO Delete Bound cran1/result localfs-lvm-sc 1h51m Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3h25m default-scheduler 0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. Version-Release number of selected component (if applicable): local-storage-operator.4.10.0-202204090935 How reproducible: Quite often in customer's environment Steps to Reproduce: 1. Delete and redeployment Pods 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
The sosreport is in https://access.redhat.com/support/cases/#/case/03241839/discussion?attachmentId=a096R00002nx6qAQAQ
The sosreport was not collected exactly when the issue occurred but after it. I see in disk-manager log, local-pv-d9308846 as an example: 2022-06-13T05:23:37.162712161+00:00 stderr F I0613 05:23:37.162674 594872 deleter.go:323] Cleanup pv "local-pv-d9308846": StderrBuf - "/mnt/local-storage/localfs-lvm-sc/dm-name-ocp--ls--vg01-pv_vol17 is apparently in use by the system; will not make a filesystem here!" $ grep local-pv-d9308846 var/log/pods/openshift-local-storage_diskmaker-manager-ctdvj_9470d37e-e5ef-46a6-b88c-002ee7cd915d/diskmaker-manager/0.log | grep "apparently in use" | wc -l 185 Does it mean, some other pod mounts the PV or the underlying device is busy or opened by other process ? The customer created LVM and volumeset uses LVM as the backend devices. Best Regards, Chen