Bug 1845666

Summary:	Application pods using RBD/RWO PVC are stuck waiting for PV lock release when the node it was running on is going down
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	svolkov
Component:	csi-driver	Assignee:	Rakshith <rar>
Status:	CLOSED MIGRATED	QA Contact:	Elad <ebenahar>
Severity:	high	Docs Contact:
Priority:	medium
Version:	4.3	CC:	mbukatov, mrajanna, muagarwa, ndevos, ocs-bugs, odf-bz-bot, rar, tdesala, ypadia
Target Milestone:	---	Keywords:	AutomationBackLog, FutureFeature
Target Release:	---
Hardware:	All
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-04-05 09:54:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1795372
Bug Blocks:	1948728

Description svolkov 2020-06-09 19:12:00 UTC

Description of problem (please be detailed as possible and provide log
snippests):
The scenario is simple.
an application pod is using our PVC (RBD/RWO). if the node the application pod is running on is going down (shutdown), it takes time until k8s understand the node is down (I think 20s), and then while the pod is moving to another location (node), the application pod hangs and waiting for k8s to release the lock on the PVC. IIRC k8s will never release the lock and wants a manual user intervention.
This of course basically fails any SLA the customer might have for the application (think PostgreSQL pod on a failed node moving to another node and never completing the startup process since it can't acquire lock on the PVC the previous pod used).

This bz is that we can track this issue from the application perspective.
bz 1795372 relates to the solution (I guess).
Version of all relevant components (if applicable):
any OCS4 version

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
the expect behavior is the application pod will migrate to the new node and that the PVC will move to it - thus not blocking any IOs on the application, at least not for an endless period of time.

Is there any workaround available to the best of your knowledge?
force kill the old application pod.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
1. create a pod that needs a PVC from OCS of type block/RBD
2. kill or shutdown the *node* the pod is currently running
3. monitor the creation of the new pod.

Actual results:

Expected results:

Additional info:
From competitive perspective, Portworx have a solution for this (Annette checked it), when you fail a node, the application pod migrates to a new node and in matter of seconds gets the PVC attached to it.

Comment 2 Travis Nielsen 2020-06-09 19:24:29 UTC

This may be a dup of 1795372, but I'd recommend we leave it open to track the application impact of this issue and raise visibility.
More discussion is here: https://github.com/ceph/ceph-csi/issues/578#issuecomment-583501921

Comment 3 Yaniv Kaul 2020-06-10 12:15:00 UTC

This has nothing to do with OCS though - it happens with other providers as well?

Comment 4 svolkov 2020-06-10 15:00:50 UTC

as I wrote, this doesn't happen in Portworx. They figured out a way around this problem.
I've also checked with someone I know in MayaData, and they also have a way around this making the lock disappear in matter of seconds.

regardless, waiting on k8s to solve this (which might be never) is (IMHO) the wrong approach as OCS (and OCP) customers are facing this problem today and it will delay the move of their stateful applications to k8s, or the customer will choose the application method for HA (meaning, application specific solution like crunchy, percona or Mongo) and bypass the need to buy OCS.

Comment 7 Michael Adam 2020-06-26 13:03:26 UTC

Making this depend on the "solution bug" #1795372.
That one is targeted to 4.6 ==> moving this one to 4.6 as well

Comment 8 Humble Chirammal 2020-06-29 11:11:11 UTC

(In reply to Michael Adam from comment #7)
> Making this depend on the "solution bug" #1795372.
> That one is targeted to 4.6 ==> moving this one to 4.6 as well

Sure, We have to reconsider the possible workarounds/solutions though. Till then lets keep it on OCS 4.6 target

Comment 9 Mudit Agarwal 2020-07-12 13:09:56 UTC

*** Bug 1841611 has been marked as a duplicate of this bug. ***

Comment 10 Mudit Agarwal 2020-09-22 11:33:33 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1795372 is targeted for 4.7

Comment 11 Niels de Vos 2021-01-07 15:41:29 UTC

Similar to bug 1795372 (should this really not be closed as a duplicate), moving out of ocs-4.7.

Comment 12 Humble Chirammal 2021-05-05 06:48:51 UTC

Few of the CSI spec changes happening/proposed could help us I believe:
 https://github.com/container-storage-interface/spec/pull/477

Comment 13 Mudit Agarwal 2021-05-25 11:33:18 UTC

Depends on https://bugzilla.redhat.com/show_bug.cgi?id=1795372

Comment 21 Humble Chirammal 2023-01-23 12:27:03 UTC

"Non graceful node shutdown feature" is beta in v1.26 version of kubernetes upstream, so mostly OCP 4.13 will be enabling this feature. The detection of node (not ready) state or triggering the failover is still a process outside of this mentioned feature though.