Description of problem (please be detailed as possible and provide log snippests): Version of all relevant components (if applicable): OCP 4.14.0-0.nightly-2023-10-18-004928 advanced-cluster-management.v2.9.0-188 ODF 4.14.0-156 ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable) Submariner image: brew.registry.redhat.io/rh-osbs/iib:599799 ACM 2.9.0-DOWNSTREAM-2023-10-18-17-59-25 Latency 50ms RTT Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. On a RDR setup, deploy multiple rbd and cephfs based workloads of both subscription and appset based and run IOs for a few days. In this case, a cephfs workload was deployed on C2 but all other workloads were on C1. 2. Perform hub recovery by bringing active hub down. 3. Restore backup on passive hub, ensure managed clusters are successfully imported, DRPolicy gets validated, drpc gets created, managed clusters are healthy and sync of all the workloads are working fine. 4. Failover the cephfs workload running on C2 to C1 with all nodes of C2 up and running. Then let IOs continue for some more time (a few hrs) for all workloads. 5. Now bring master nodes of primary cluster down and wait for the cluster status to change to unknown on the RHACM console. 6. Now perform failover of all workloads running on C1 to the secondary managed cluster C2. Actual results: Failover doesn't complete 2 workloads busybox-appset-cephfs-placement-drpc and busybox-workloads-1-placement-1-drpc were failedover to cluster amagrawa-2nd, failover completed for cephfs and is waiting for cleanup as older primary remains down, however, failover is messed up for rbd based workload. Before failover was triggered, busybox-workloads-1-placement-1-drpc was in Deployed state however, busybox-appset-cephfs-placement-drpc was failedover to cluster amagrawa-1st and was in FailedOver state and had started data sync. Data sync was working fine for both these workloads and resources were healthy. From C2 (cluster amagrawa-2nd)- amagrawa:c2$ busybox-1 Already on project "busybox-workloads-1" on server "https://api.amagrawa-2nd.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/busybox-pvc-1 Bound pvc-de5bd6db-99ed-4765-b28a-7d3b3dee5a04 94Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-1-20231025075057 Bound pvc-fbb56b63-48c5-47eb-b822-ebe1498fb23d 94Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-10 Bound pvc-04b51cfb-c49a-4aff-8e0e-df2a5aef6d3f 87Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-10-20231025075050 Bound pvc-43fba93e-aca6-484c-b4b7-3eeda8949b8f 87Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-11 Bound pvc-58fafdad-ca6f-482e-9e88-6d81bab777f5 33Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-11-20231025075114 Bound pvc-1e189f90-08cd-46dd-b643-0875ae947809 33Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-12 Bound pvc-e04ed345-1a93-4657-aec1-8091063a1314 147Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-12-20231025075116 Bound pvc-58386339-696e-4216-bd19-4212c6a74228 147Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-13 Bound pvc-457644b7-899a-4d63-8b86-7545922db822 77Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-13-20231025075101 Bound pvc-4eeeb8c0-b599-4cc7-9b0b-3c31cdc914b9 77Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-14 Bound pvc-13956812-ff44-483e-b534-83326292eaa7 70Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-14-20231025075101 Bound pvc-a304ab44-8544-47c2-9d3c-7ed4cf2ec886 70Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-15 Bound pvc-2bfe46cc-c2eb-431b-abff-4a716f3e7e61 131Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-15-20231025075115 Bound pvc-91856426-5501-4ee5-ab3b-5a4e908bce3b 131Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-16 Bound pvc-3db7e992-630e-4b70-93e4-3fbc19b9fcb5 127Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-16-20231025075100 Bound pvc-5e58af89-93f5-4130-926e-0e7bdf7d8fd2 127Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-17 Bound pvc-9317c6a0-134f-47d8-b008-444a8cf383d9 58Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-17-20231025075052 Bound pvc-d366e7c8-0ee5-4886-9231-30d43a6fc044 58Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-18 Bound pvc-18b039a7-b7c3-4b15-ac3e-854384d2645e 123Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-18-20231025075103 Bound pvc-91c17f4b-8494-4e0a-a2b6-9a6952037b64 123Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-19 Bound pvc-5d7f6252-4192-438e-80e6-fb0652192fe3 61Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-19-20231025075103 Bound pvc-6e75287b-d1c6-4363-9397-c16e4606e197 61Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-2 Bound pvc-3d26f5ff-575f-4aa4-90bb-1e2a6bc173ca 44Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-2-20231025075048 Bound pvc-ea494ad9-0ea9-4e74-9b8f-5e2306f7e24b 44Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-20 Bound pvc-a7d92d8d-03f3-43c7-9b7d-5bf9267a8924 33Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-20-20231025075100 Bound pvc-17a66af5-29d9-425f-b7d2-e0721c2b6574 33Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-3 Bound pvc-706ed444-9317-47f0-9fc1-552f0441307e 76Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-3-20231025075100 Bound pvc-e3f7b630-3f31-49ef-aa80-ab192d3e9138 76Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-4 Bound pvc-014b7389-79ce-4a09-944d-30dc8eed1abf 144Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-4-20231025075115 Bound pvc-c9b7c15d-dab8-4051-a128-c27c4aaee646 144Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-5 Bound pvc-31e6e2cc-ee8f-4d99-bf8c-7f4182d84581 107Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-5-20231025075052 Bound pvc-a798b1ec-15f3-4307-b584-71b5168080b8 107Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-6 Bound pvc-7564bb79-1482-4c39-951c-2b0be326edfb 123Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-6-20231025075050 Bound pvc-603b1980-a113-4d30-840c-502a64ca6dcb 123Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-7 Bound pvc-ae183bc7-661a-4d68-8ae2-d73704c8acd2 90Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-7-20231025075053 Bound pvc-577519db-edee-4631-921c-f55aecc01974 90Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-8 Bound pvc-6de6feaa-ed4d-4358-a4db-8fa39e8f0335 91Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-8-20231025075103 Bound pvc-e33b7b37-7b9a-431c-8bd6-04ce388281ba 91Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem persistentvolumeclaim/busybox-pvc-9 Bound pvc-8200c8c6-530a-4b46-aafb-3d78941779ec 111Gi RWO ocs-storagecluster-ceph-rbd 38h Filesystem persistentvolumeclaim/busybox-pvc-9-20231025075050 Bound pvc-1d1dbea3-ee73-4b8e-9d49-15a63a9563ef 111Gi ROX ocs-storagecluster-ceph-rbd 113m Filesystem NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/busybox-workloads-1-placement-1-drpc primary Secondary Pods were not created after failover even after a few hours, and it created ROX PVCs for this workload thought it's backed by rbd which shouldn't have happened (older primary cluster remains down). Must gather logs are kept here- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/25oct23/ From passive hub- amagrawa:~$ drpc NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-1 busybox-workloads-1-placement-1-drpc 40h amagrawa-1st amagrawa-2nd Failover FailingOver WaitingForResourceRestore 2023-10-25T08:06:56Z False busybox-workloads-3 busybox-sub-cephfs-placement-1-drpc 40h amagrawa-2nd Relocate Relocated EnsuringVolSyncSetup 2023-10-25T07:11:30Z 4m41.188234434s True openshift-gitops busybox-appset-cephfs-placement-drpc 40h amagrawa-2nd amagrawa-2nd Failover FailedOver Cleaning Up 2023-10-25T08:07:34Z False openshift-gitops busybox-workloads-2-placement-drpc 40h amagrawa-1st amagrawa-2nd Failover FailedOver WaitForReadiness 2023-10-24T18:42:16Z False Expected results: Failover should complete, workloads should be created on the failover cluster, VRG both status should be marked as primary, ROX volumes shouldn't be created for rbd based workloads. Additional info:
This is not always reproducible and we have a workaround as mentioned by Benamar in https://bugzilla.redhat.com/show_bug.cgi?id=2246186#c3 IMO, we should move both these BZs to 4.14.z as this is a corner case and it might require some code restructuring in MCO
(In reply to Mudit Agarwal from comment #5) > This is not always reproducible and we have a workaround as mentioned by > Benamar in https://bugzilla.redhat.com/show_bug.cgi?id=2246186#c3 > IMO, we should move both these BZs to 4.14.z as this is a corner case and it > might require some code restructuring in MCO Actually, No. The WA didn't work as expected and Benamar knows this. On the reproducibility, I am sure it should be reproducible as the workloads were in deployed state before active hub went down, and it is just a normal failover scenario which is blocked due to this BZ, so certainly a hub-recovery blocker BZ.
This issue was hit again with- OCP 4.14.0-0.nightly-2023-10-30-170011 advanced-cluster-management.v2.9.0-188 ODF 4.14.0-157 ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable) ACM 2.9.0-DOWNSTREAM-2023-10-18-17-59-25 Submariner brew.registry.redhat.io/rh-osbs/iib:607438 Steps: 1. On a hub recovery RDR setup, ensure backups are being created on active and passive hub clusters. Failover and relocate different workloads so that it is running on the primary managed cluster after the failover and relocate operation completes. Ensure latest backups are taken and no action of any of the workloads (cephfs, rbd- appset or subscription type) is in progress. 2. Collect drpc status. Bring primary managed cluster down, and then bring active hub down. 3. Ensure secondary managed cluster is properly imported on the passive hub and then DRPolicy gets validated. 4. Check the drpc status from passive hub and compare it with the output taken from active hub when it was up. We notice that post hub recovery, a sanity check is run for all the workloads which were failedover or relocated where we again perform the same action on those workloads which was performed from the active hub, which marks peer ready as false for those workloads. From active hub- NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-2 subscription-cephfs-placement-1-drpc 9h amagrawa-31o-prim amagrawa-passivee Relocate Relocated Completed 2023-11-01T17:54:21Z 30.282249722s True busybox-workloads-5 subscription-rbd1-placement-1-drpc 9h amagrawa-31o-prim amagrawa-31o-prim Failover FailedOver Completed 2023-11-01T13:57:37Z 47m3.364814169s True busybox-workloads-6 subscription-rbd2-placement-1-drpc 9h amagrawa-31o-prim amagrawa-passivee Relocate Relocated Completed 2023-11-01T14:16:28Z 3h17m50.318760845s True openshift-gitops appset-cephfs-placement-drpc 9h amagrawa-31o-prim amagrawa-passivee Failover FailedOver Completed 2023-11-01T13:20:45Z 5m59.4021061s True openshift-gitops appset-rbd1-placement-drpc 9h amagrawa-31o-prim amagrawa-31o-prim Failover FailedOver Completed 2023-11-01T14:15:30Z 41m2.588884417s True openshift-gitops appset-rbd2-placement-drpc 9h amagrawa-passivee Deployed Completed True From passive hub- amagrawa:~$ drpc NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-2 subscription-cephfs-placement-1-drpc 57m amagrawa-31o-prim amagrawa-passivee Relocate Relocating 2023-11-01T18:59:35Z False busybox-workloads-5 subscription-rbd1-placement-1-drpc 57m amagrawa-31o-prim amagrawa-31o-prim Failover FailingOver WaitForStorageMaintenanceActivation 2023-11-01T18:59:36Z False busybox-workloads-6 subscription-rbd2-placement-1-drpc 57m amagrawa-31o-prim amagrawa-passivee Relocate True openshift-gitops appset-cephfs-placement-drpc 57m amagrawa-31o-prim amagrawa-passivee Failover FailedOver EnsuringVolSyncSetup True openshift-gitops appset-rbd1-placement-drpc 57m amagrawa-31o-prim amagrawa-31o-prim Failover FailingOver FailingOverToCluster 2023-11-01T18:59:36Z False openshift-gitops appset-rbd2-placement-drpc 57m amagrawa-passivee Deployed Completed True Since peer ready is now marked as false due to sanity check, subscription-cephfs-placement-1-drpc and subscription-rbd1-placement-1-drpc and appset-rbd1-placement-drpc can not be failedover in this example. This sanity check is needed as per k8s recommended guidelines and we should not backup the currentstate of the workloads as confirmed by @bmekhiss so the issue will always persist. As of now, the only option is to trigger a failover by editing drpc yaml (which would be addressed by BZ2247537). So all these apps were failedover via CLI to the secondary managed cluster which was available but the failover didn't succeed for rbd backed workloads as volumereplicationclass was not backed up/got deleted. @bmekhiss tried a WA which created the volumereplicationclass on the secondary managed cluster which was available. This helped failover to proceed and created the workloads pods but not the VR's for rbd backed workloads, so VRG CURRENTSTATE couldn't be marked as Primary. We need VR's to be created for rbd backed workloads so the workaround didn't work as expected. From passive hub after triggering failover from CLI- amagrawa:~$ drpc NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-2 subscription-cephfs-placement-1-drpc 3h21m amagrawa-31o-prim amagrawa-passivee Failover FailingOver WaitingForResourceRestore 2023-11-01T18:59:35Z False busybox-workloads-5 subscription-rbd1-placement-1-drpc 3h21m amagrawa-31o-prim amagrawa-passivee Failover FailedOver WaitForReadiness 2023-11-01T18:59:36Z True busybox-workloads-6 subscription-rbd2-placement-1-drpc 3h21m amagrawa-31o-prim amagrawa-passivee Failover FailedOver WaitForReadiness 2023-11-01T20:12:09Z True openshift-gitops appset-cephfs-placement-drpc 3h21m amagrawa-31o-prim amagrawa-passivee Failover FailedOver EnsuringVolSyncSetup True openshift-gitops appset-rbd1-placement-drpc 3h21m amagrawa-31o-prim amagrawa-passivee Failover FailedOver WaitForReadiness 2023-11-01T18:59:36Z True openshift-gitops appset-rbd2-placement-drpc 3h21m amagrawa-passivee Deployed Completed True From secondary available managed cluster to which failover was triggered- amagrawa:~$ busybox-5 Now using project "busybox-workloads-5" on server "https://api.amagrawa-passivee.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/busybox-pvc-21 Bound pvc-81ff5583-61e1-45fd-a739-0ad850f9d803 43Gi RWO ocs-storagecluster-ceph-rbd 70m Filesystem persistentvolumeclaim/busybox-pvc-22 Bound pvc-b14f6c3b-f1ed-42dd-b658-abaaf3e77a3d 43Gi RWO ocs-storagecluster-ceph-rbd 70m Filesystem persistentvolumeclaim/busybox-pvc-23 Bound pvc-345815af-9b83-4e27-b8fa-6946f638e3c6 52Gi RWO ocs-storagecluster-ceph-rbd 70m Filesystem persistentvolumeclaim/busybox-pvc-24 Bound pvc-3345a8f9-4552-4f2e-80ad-670088e3334a 20Gi RWO ocs-storagecluster-ceph-rbd 70m Filesystem persistentvolumeclaim/busybox-pvc-25 Bound pvc-7088a4bf-5607-4b71-b578-7682ecd6fe24 45Gi RWO ocs-storagecluster-ceph-rbd 70m Filesystem NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/subscription-rbd1-placement-1-drpc primary NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/busybox-21-7d6dfb858-qdkqn 1/1 Running 0 70m 10.129.3.25 compute-2 <none> <none> pod/busybox-22-6cf5dcc584-b9lwx 1/1 Running 0 70m 10.129.3.26 compute-2 <none> <none> pod/busybox-23-5bf89b9cc8-g62tl 1/1 Running 0 70m 10.131.0.97 compute-0 <none> <none> pod/busybox-24-6d5bc476dd-sx9xt 1/1 Running 0 70m 10.129.3.28 compute-2 <none> <none> pod/busybox-25-84d6dd6dc4-jqth2 1/1 Running 0 70m 10.131.0.98 compute-0 <none> <none> amagrawa:~$ busybox-6 Now using project "busybox-workloads-6" on server "https://api.amagrawa-passivee.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/mysql-pv-claim Bound pvc-6ea645c2-b6f8-44d2-9526-9911282aa487 24Gi RWO ocs-storagecluster-ceph-rbd 70m Filesystem NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/subscription-rbd2-placement-1-drpc primary NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/data-viewer-1-build 0/1 Completed 0 70m 10.129.3.24 compute-2 <none> <none> pod/data-viewer-775bb7cb4d-zvgt5 1/1 Running 0 69m 10.129.3.29 compute-2 <none> <none> pod/io-writer-mysql-68475c9785-bxvpp 1/1 Running 0 70m 10.129.3.22 compute-2 <none> <none> pod/io-writer-mysql-68475c9785-q74zw 1/1 Running 0 70m 10.131.0.96 compute-0 <none> <none> pod/io-writer-mysql-68475c9785-qgdh7 1/1 Running 0 70m 10.129.3.23 compute-2 <none> <none> pod/io-writer-mysql-68475c9785-qkhck 1/1 Running 0 70m 10.131.0.95 compute-0 <none> <none> pod/io-writer-mysql-68475c9785-ttmzv 1/1 Running 0 70m 10.128.3.88 compute-1 <none> <none> pod/mysql-7c88dd4dff-gsvcr 1/1 Running 1 (69m ago) 70m 10.129.3.27 compute-2 <none> <none> amagrawa:~$ busybox-3 Now using project "busybox-workloads-3" on server "https://api.amagrawa-passivee.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/dd-io-pvc-1 Bound pvc-4cb8fad8-cd23-4e25-a6df-e8f00e2583a1 117Gi RWO ocs-storagecluster-ceph-rbd 69m Filesystem persistentvolumeclaim/dd-io-pvc-2 Bound pvc-eef9d77b-d0bf-4b0b-9b67-cf1df477fdfc 143Gi RWO ocs-storagecluster-ceph-rbd 69m Filesystem persistentvolumeclaim/dd-io-pvc-3 Bound pvc-ed60b47a-1724-4685-bf72-2925535114df 134Gi RWO ocs-storagecluster-ceph-rbd 69m Filesystem persistentvolumeclaim/dd-io-pvc-4 Bound pvc-e56afbd0-65d3-4c67-b64d-24a5c301a65d 106Gi RWO ocs-storagecluster-ceph-rbd 69m Filesystem persistentvolumeclaim/dd-io-pvc-5 Bound pvc-4e9e86a1-75d3-463a-ba9e-79abe33512aa 115Gi RWO ocs-storagecluster-ceph-rbd 69m Filesystem persistentvolumeclaim/dd-io-pvc-6 Bound pvc-e541b7b9-36e4-4572-87aa-4276e7267b3e 129Gi RWO ocs-storagecluster-ceph-rbd 69m Filesystem persistentvolumeclaim/dd-io-pvc-7 Bound pvc-075a6bca-0c69-47c9-8e37-9a79a8f10f29 149Gi RWO ocs-storagecluster-ceph-rbd 69m Filesystem NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/appset-rbd1-placement-drpc primary NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/dd-io-1-854f867867-rcfd5 1/1 Running 0 69m 10.129.3.45 compute-2 <none> <none> pod/dd-io-2-56679fb667-7bjb7 1/1 Running 0 69m 10.129.3.44 compute-2 <none> <none> pod/dd-io-3-5757659b99-2th5r 1/1 Running 0 69m 10.131.0.100 compute-0 <none> <none> pod/dd-io-4-75bd89888c-x9rrv 1/1 Running 0 69m 10.129.3.47 compute-2 <none> <none> pod/dd-io-5-86c65fd579-8c6m7 1/1 Running 0 69m 10.129.3.46 compute-2 <none> <none> pod/dd-io-6-fd8994467-rcrkt 1/1 Running 0 69m 10.131.0.102 compute-0 <none> <none> pod/dd-io-7-685b4f6699-l7lb8 1/1 Running 0 69m 10.131.0.101 compute-0 <none> <none> Benamar, could you pls check why VR's were not created for any of these workloads? Logs collected before applying the workaround to create volumereplicationclass- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/02nov23-1/
Logs are kept here (collected a few hours after triggering failover from CLI)- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/02nov23-2/
Moving Hub Recovery issues to 4.14.z based on offline discussion
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days