Bug 2096179

Summary: PVs are stuck in Released and can not be reused causing POD deployment failed
Product: OpenShift Container Platform Reporter: Chen <cchen>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: unspecified CC: jsafrane, pehunt
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-21 14:13:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chen 2022-06-13 08:03:44 UTC
Description of problem:

PVs are stuck in Released and can not be reused causing POD deployment failed

$ omg get pv
NAME               CAPACITY  ACCESS MODES  RECLAIM POLICY  STATUS    CLAIM                                      STORAGECLASS    REASON  AGE
local-pv-11f7597f  10Gi      RWO           Delete          Released  cran1/vdupvc                               localfs-lvm-sc          1d
local-pv-1fcd7929  10Gi      RWO           Delete          Bound     cran1/um-pvc                               localfs-lvm-sc          1h51m
local-pv-28f59b45  10Gi      RWO           Delete          Bound     cran1/fm-storage                           localfs-lvm-sc          1h51m
local-pv-2eb404fd  10Gi      RWO           Delete          Bound     cran1/vdupvc-l2rt                          localfs-lvm-sc          1h51m
local-pv-3a7eb9ab  10Gi      RWO           Delete          Bound     cran1/logservice-syslog                    localfs-lvm-sc          1h51m
local-pv-3f52128   120Gi     RWO           Delete          Bound     openshift-image-registry/registry-storage  localfs-lvm-sc          74d
local-pv-49c193d3  10Gi      RWO           Delete          Bound     cran1/hc-pvc                               localfs-lvm-sc          1h51m
local-pv-60fd2dd1  10Gi      RWO           Delete          Bound     cran1/vdupvc-l1hi                          localfs-lvm-sc          1h51m
local-pv-619d4f00  10Gi      RWO           Delete          Bound     cran1/vaultserver-storage-pvc              localfs-lvm-sc          1h51m
local-pv-647573e0  10Gi      RWO           Delete          Released  cran1/certman-tlsconfig-pvc                localfs-lvm-sc          1d
local-pv-6d650a8a  10Gi      RWO           Delete          Bound     cran1/sec-storage-pvc                      localfs-lvm-sc          1h51m
local-pv-7753355f  10Gi      RWO           Delete          Bound     cran1/vdupvc                               localfs-lvm-sc          1h51m
local-pv-82ef5464  10Gi      RWO           Delete          Released  cran1/apigw-pvc                            localfs-lvm-sc          1d
local-pv-a87e2928  10Gi      RWO           Delete          Released  cran1/providercontroller-pvc               localfs-lvm-sc          1d
local-pv-b59c9425  10Gi      RWO           Delete          Bound     cran1/logservice-auditlog                  localfs-lvm-sc          1h51m
local-pv-b90b3bf6  10Gi      RWO           Delete          Bound     cran1/vdupvc-oamcm-oamasm-shared-plan      localfs-lvm-sc          1h51m
local-pv-bd9fee95  10Gi      RWO           Delete          Bound     cran1/config                               localfs-lvm-sc          1h51m
local-pv-c023b0e7  10Gi      RWO           Delete          Bound     cran1/security-storage-oamasm-pv-claim     localfs-lvm-sc          1h51m
local-pv-c0e9b3dc  10Gi      RWO           Delete          Released  cran1/cnum-pvc                             localfs-lvm-sc          1d
local-pv-c55625e6  10Gi      RWO           Delete          Bound     cran1/vdupvc-oamasm-asmdiag                localfs-lvm-sc          1h51m
local-pv-cb81bdac  10Gi      RWO           Delete          Bound     cran1/vdupvc-l2hi                          localfs-lvm-sc          1h51m
local-pv-d9308846  10Gi      RWO           Delete          Released  cran1/ne3sagent-pvc                        localfs-lvm-sc          1d
local-pv-ef710632  10Gi      RWO           Delete          Bound     cran1/dhcplease-storage                    localfs-lvm-sc          1h51m
local-pv-f60a26bb  10Gi      RWO           Delete          Bound     cran1/result                               localfs-lvm-sc          1h51m

Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3h25m  default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind.

Version-Release number of selected component (if applicable):

local-storage-operator.4.10.0-202204090935

How reproducible:

Quite often in customer's environment

Steps to Reproduce:
1. Delete and redeployment Pods
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 3 Chen 2022-06-13 12:41:30 UTC
The sosreport was not collected exactly when the issue occurred but after it. 

I see in disk-manager log, local-pv-d9308846 as an example:

2022-06-13T05:23:37.162712161+00:00 stderr F I0613 05:23:37.162674  594872 deleter.go:323] Cleanup pv "local-pv-d9308846": StderrBuf - "/mnt/local-storage/localfs-lvm-sc/dm-name-ocp--ls--vg01-pv_vol17 is apparently in use by the system; will not make a filesystem here!"

$ grep local-pv-d9308846 var/log/pods/openshift-local-storage_diskmaker-manager-ctdvj_9470d37e-e5ef-46a6-b88c-002ee7cd915d/diskmaker-manager/0.log | grep "apparently in use" | wc -l
185

Does it mean, some other pod mounts the PV or the underlying device is busy or opened by other process ? The customer created LVM and volumeset uses LVM as the backend devices.

Best Regards,
Chen