Bug 2096179 - PVs are stuck in Released and can not be reused causing POD deployment failed
Summary: PVs are stuck in Released and can not be reused causing POD deployment failed
Keywords:
Status: CLOSED DUPLICATE of bug 2094865
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-13 08:03 UTC by Chen
Modified: 2022-06-25 00:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-21 14:13:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Chen 2022-06-13 08:03:44 UTC
Description of problem:

PVs are stuck in Released and can not be reused causing POD deployment failed

$ omg get pv
NAME               CAPACITY  ACCESS MODES  RECLAIM POLICY  STATUS    CLAIM                                      STORAGECLASS    REASON  AGE
local-pv-11f7597f  10Gi      RWO           Delete          Released  cran1/vdupvc                               localfs-lvm-sc          1d
local-pv-1fcd7929  10Gi      RWO           Delete          Bound     cran1/um-pvc                               localfs-lvm-sc          1h51m
local-pv-28f59b45  10Gi      RWO           Delete          Bound     cran1/fm-storage                           localfs-lvm-sc          1h51m
local-pv-2eb404fd  10Gi      RWO           Delete          Bound     cran1/vdupvc-l2rt                          localfs-lvm-sc          1h51m
local-pv-3a7eb9ab  10Gi      RWO           Delete          Bound     cran1/logservice-syslog                    localfs-lvm-sc          1h51m
local-pv-3f52128   120Gi     RWO           Delete          Bound     openshift-image-registry/registry-storage  localfs-lvm-sc          74d
local-pv-49c193d3  10Gi      RWO           Delete          Bound     cran1/hc-pvc                               localfs-lvm-sc          1h51m
local-pv-60fd2dd1  10Gi      RWO           Delete          Bound     cran1/vdupvc-l1hi                          localfs-lvm-sc          1h51m
local-pv-619d4f00  10Gi      RWO           Delete          Bound     cran1/vaultserver-storage-pvc              localfs-lvm-sc          1h51m
local-pv-647573e0  10Gi      RWO           Delete          Released  cran1/certman-tlsconfig-pvc                localfs-lvm-sc          1d
local-pv-6d650a8a  10Gi      RWO           Delete          Bound     cran1/sec-storage-pvc                      localfs-lvm-sc          1h51m
local-pv-7753355f  10Gi      RWO           Delete          Bound     cran1/vdupvc                               localfs-lvm-sc          1h51m
local-pv-82ef5464  10Gi      RWO           Delete          Released  cran1/apigw-pvc                            localfs-lvm-sc          1d
local-pv-a87e2928  10Gi      RWO           Delete          Released  cran1/providercontroller-pvc               localfs-lvm-sc          1d
local-pv-b59c9425  10Gi      RWO           Delete          Bound     cran1/logservice-auditlog                  localfs-lvm-sc          1h51m
local-pv-b90b3bf6  10Gi      RWO           Delete          Bound     cran1/vdupvc-oamcm-oamasm-shared-plan      localfs-lvm-sc          1h51m
local-pv-bd9fee95  10Gi      RWO           Delete          Bound     cran1/config                               localfs-lvm-sc          1h51m
local-pv-c023b0e7  10Gi      RWO           Delete          Bound     cran1/security-storage-oamasm-pv-claim     localfs-lvm-sc          1h51m
local-pv-c0e9b3dc  10Gi      RWO           Delete          Released  cran1/cnum-pvc                             localfs-lvm-sc          1d
local-pv-c55625e6  10Gi      RWO           Delete          Bound     cran1/vdupvc-oamasm-asmdiag                localfs-lvm-sc          1h51m
local-pv-cb81bdac  10Gi      RWO           Delete          Bound     cran1/vdupvc-l2hi                          localfs-lvm-sc          1h51m
local-pv-d9308846  10Gi      RWO           Delete          Released  cran1/ne3sagent-pvc                        localfs-lvm-sc          1d
local-pv-ef710632  10Gi      RWO           Delete          Bound     cran1/dhcplease-storage                    localfs-lvm-sc          1h51m
local-pv-f60a26bb  10Gi      RWO           Delete          Bound     cran1/result                               localfs-lvm-sc          1h51m

Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3h25m  default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind.

Version-Release number of selected component (if applicable):

local-storage-operator.4.10.0-202204090935

How reproducible:

Quite often in customer's environment

Steps to Reproduce:
1. Delete and redeployment Pods
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 3 Chen 2022-06-13 12:41:30 UTC
The sosreport was not collected exactly when the issue occurred but after it. 

I see in disk-manager log, local-pv-d9308846 as an example:

2022-06-13T05:23:37.162712161+00:00 stderr F I0613 05:23:37.162674  594872 deleter.go:323] Cleanup pv "local-pv-d9308846": StderrBuf - "/mnt/local-storage/localfs-lvm-sc/dm-name-ocp--ls--vg01-pv_vol17 is apparently in use by the system; will not make a filesystem here!"

$ grep local-pv-d9308846 var/log/pods/openshift-local-storage_diskmaker-manager-ctdvj_9470d37e-e5ef-46a6-b88c-002ee7cd915d/diskmaker-manager/0.log | grep "apparently in use" | wc -l
185

Does it mean, some other pod mounts the PV or the underlying device is busy or opened by other process ? The customer created LVM and volumeset uses LVM as the backend devices.

Best Regards,
Chen


Note You need to log in before you can comment on or make changes to this bug.