Bug 2082379

Summary: windows 19 pvc is stuck in terminating state post clean up of test
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Debarati Basu-Nag <dbasunag>
Component: cephAssignee: Scott Ostapovicz <sostapov>
Status: CLOSED CURRENTRELEASE QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: akalenyu, alitke, bniver, cnv-qe-bugs, madam, muagarwa, ndevos, ocs-bugs, odf-bz-bot, sostapov, tnielsen
Target Milestone: ---Flags: tnielsen: needinfo? (dbasunag)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-13 09:20:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1972013    
Bug Blocks:    

Comment 1 Debarati Basu-Nag 2022-05-06 16:29:07 UTC
No workaround exists, as deleting the pod by force, cleans us up the terminating pvc, but it also stops us from being able to create pvc on the same cluster.
======================================
message: '0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.'

Comment 2 Alex Kalenyuk 2022-05-08 10:11:09 UTC
Are there any interesting alerts in the UI? maybe something about OCS reaching its threshold?

Comment 3 Debarati Basu-Nag 2022-05-09 13:36:30 UTC
@akalenyu you are right, I do see UI alerts indicating OCS reaching it's threshold. So I would guess, this is a side effect of OCS getting into a bad state due to lack of storage?

Comment 4 Alex Kalenyuk 2022-05-09 15:50:12 UTC
Yes.
I am worried about OCS not recovering though (success deleting the PVC) since it appeared to be fixed in https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9/html/4.9_release_notes/bug_fixes
(Deletion of data is allowed when the storage cluster is full)

@alitke Should we keep this open on ODF/OCS?

Comment 5 Adam Litke 2022-05-12 17:05:32 UTC
Yes.  I agree that this situation should be investigated by the ODF team.

Comment 7 Yaniv Kaul 2022-05-17 09:53:09 UTC
Any updates?

Comment 8 Scott Ostapovicz 2022-05-17 15:22:04 UTC
@tnielsen any suggestions?

Comment 9 Travis Nielsen 2022-05-17 17:33:04 UTC
Could you get the ODF must-gather? The CSI logs should show more details about why the PV cannot be released. CSI team will need to take a look at this, thanks.

Comment 10 Niels de Vos 2022-05-23 08:10:43 UTC
Would be good to verify that this is not the same as bug 1978769 (from the ocs-4.9 release notes), indeed. However, according to the bug, it will only be completely fixed with ODF-4.11.

It also might be that the import process of the VM was terminated, and no outstanding I/O needed to get flushed. If there was outstanding I/O, unmapping the RBD-image will be blocked until the I/O was written to the Ceph cluster. In case there is not sufficient space available on the cluster, the writes will be hanging until space becomes available.

Moving back to Ceph, as the cluster-full scenario should get addressed with their updates included in rook-ceph and ceph-csi.

Comment 11 Niels de Vos 2022-05-23 08:14:56 UTC
Sorry, linked the wrong bug in the previous update. Bug 1972013 is the one that follows up on bz#1978769 and is pending for release in ODF-4.11.

Comment 12 Mudit Agarwal 2022-06-13 09:20:49 UTC
Closing as current release, please reopen if it is reproducible with 4.11