Bug 1845666
Summary: | Application pods using RBD/RWO PVC are stuck waiting for PV lock release when the node it was running on is going down | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | svolkov |
Component: | csi-driver | Assignee: | Rakshith <rar> |
Status: | CLOSED MIGRATED | QA Contact: | Elad <ebenahar> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.3 | CC: | mbukatov, mrajanna, muagarwa, ndevos, ocs-bugs, odf-bz-bot, rar, tdesala, ypadia |
Target Milestone: | --- | Keywords: | AutomationBackLog, FutureFeature |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-04-05 09:54:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1795372 | ||
Bug Blocks: | 1948728 |
Description
svolkov
2020-06-09 19:12:00 UTC
This may be a dup of 1795372, but I'd recommend we leave it open to track the application impact of this issue and raise visibility. More discussion is here: https://github.com/ceph/ceph-csi/issues/578#issuecomment-583501921 This has nothing to do with OCS though - it happens with other providers as well? as I wrote, this doesn't happen in Portworx. They figured out a way around this problem. I've also checked with someone I know in MayaData, and they also have a way around this making the lock disappear in matter of seconds. regardless, waiting on k8s to solve this (which might be never) is (IMHO) the wrong approach as OCS (and OCP) customers are facing this problem today and it will delay the move of their stateful applications to k8s, or the customer will choose the application method for HA (meaning, application specific solution like crunchy, percona or Mongo) and bypass the need to buy OCS. Making this depend on the "solution bug" #1795372. That one is targeted to 4.6 ==> moving this one to 4.6 as well (In reply to Michael Adam from comment #7) > Making this depend on the "solution bug" #1795372. > That one is targeted to 4.6 ==> moving this one to 4.6 as well Sure, We have to reconsider the possible workarounds/solutions though. Till then lets keep it on OCS 4.6 target *** Bug 1841611 has been marked as a duplicate of this bug. *** https://bugzilla.redhat.com/show_bug.cgi?id=1795372 is targeted for 4.7 Similar to bug 1795372 (should this really not be closed as a duplicate), moving out of ocs-4.7. Few of the CSI spec changes happening/proposed could help us I believe: https://github.com/container-storage-interface/spec/pull/477 "Non graceful node shutdown feature" is beta in v1.26 version of kubernetes upstream, so mostly OCP 4.13 will be enabling this feature. The detection of node (not ready) state or triggering the failover is still a process outside of this mentioned feature though. |