Description of problem (please be detailed as possible and provide log snippests): noobaa DB pod stuck init state with Multi-Attach error for volume after rescheduling to a new worker node on IBM Power. While verifying the epic "Fast recovery for NooBaa core and DB pods in case of node failure" - https://issues.redhat.com/browse/RHSTOR-3355 worker node where noobaa-db-pg-0 pod is scheduled is shutdown and then after rescheduling to another healthy worker node pod is stuck in init state with below error message. Warning FailedAttachVolume 115s attachdetach-controller Multi-Attach error for volume "pvc-b95e5212-39b0-40f8-9a6f-c642632ca966" Volume is already exclusively attached to one node and can't be attached to another Note: noobaa core pod is successfully rescheduled to another worker node and working fine. Version of all relevant components (if applicable): [root@nara1-nba-odf-c1f3-sao01-bastion-0 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-ppc64le-2023-05-02-182828 True False 27h Cluster version is 4.13.0-0.nightly-ppc64le-2023-05-02-182828 [root@nara1-nba-odf-c1f3-sao01-bastion-0 ~]# oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full Labels: full_version=4.13.0-207 f:full_version: [root@nara1-nba-odf-c1f3-sao01-bastion-0 ~]# Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? not able to verify epic https://issues.redhat.com/browse/RHSTOR-3355 Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. shutdown the worker node where noobaa db pod is running 2. pod will be rescheduled to another worker node but stuck in init state. Actual results: if there is node failure then noobaa-db-pg-0 pod is rescheduled to another worker node and but stuck in init state. Expected results: if there is node failure then noobaa-db-pg-0 pod should successfully rescheduled another worker node and run. Additional info: we need to know how to reschedule the noobaa-db-pg-0 pod to verify the epic - https://issues.redhat.com/browse/RHSTOR-3355
This is a known bug (refer to https://bugzilla.redhat.com/show_bug.cgi?id=1795372#c21) and https://issues.redhat.com/browse/RHSTOR-2500 is aimed at reducing the time taken for remount. The https://issues.redhat.com/browse/RHSTOR-3355 epic which you are trying to validate is about validating `Faster recovery for NooBaa core and DB pods in case of node failure` in 4.13 compared to 4.12. It does not mean there will be no delay in re-spin at all. Please read the comments in the epic https://issues.redhat.com/browse/RHSTOR-3355 and story https://issues.redhat.com/browse/RHSTOR-3972 Contact QE who validated it on AWS setup for more information. *** This bug has been marked as a duplicate of bug 1795372 ***