Description of problem (please be detailed as possible and provide log snippests): The active hub was located at a neutral site. Version of all relevant components (if applicable): OCP 4.14.0-0.nightly-2023-11-06-203803 advanced-cluster-management.v2.9.0-204 ACM 2.9.0-DOWNSTREAM-2023-11-03-14-27-40 Submariner brew.registry.redhat.io/rh-osbs/iib:615928 ODF 4.14.0-161 ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable) Latency 50ms RTT Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. On a RDR setup, configure it for hub recovery where we had deployed multiple workloads of both appset and subscription types backed by rbd and cephfs. Failover (with nodes up) some of them to C2 and then back to C1. Relocate some of them to C2 and back to C1. 2. Also leave a few worklaods on C1 in deployed state (both types). Also deploy a few rbd and cephfs workloads on C2 and left them in deployed state. 3. When the failover/relocate was done, it was ensured that the progression reported completed for all of them and new backup is taken on both active and passive hub clusters for all the workloads with their final state. 4. Do all pre-checks such as sync status, volumereplicationclass, ceph health, mirror status, lastGroupSyncTime, managedclusters -o wide status, alerts, odf pods, etc. 5. Collect drpc -o wide output from active hub and then bring active hub down. 6. Restore backup on passive hub and ensure both the managed clusters are successfully imported. 7. Wait for DRPolicy to get validated. 8. Check drpc -o wide on passive hub and match it to the output taken from active hub. Actual results: Out of all the workloads, one of the appset based cephfs workload which was in failedover state on active hub changed it's progression from completed to cleaning up on passive hub. From active hub- openshift-gitops appset-cephfs1-placement-drpc 3h3m amagrawa-m1-7nov amagrawa-m1-7nov Failover FailedOver Completed 2023-11-07T18:38:48Z 2m52.538302007s True From passive hub- openshift-gitops appset-cephfs1-placement-drpc 21m amagrawa-m1-7nov amagrawa-m1-7nov Failover FailedOver Cleaning Up It's running in NS busybox-workloads-3 C1 (amagrawa-m1-7nov)- amagrawa:~$ busybox-3 Already on project "busybox-workloads-3" on server "https://api.amagrawa-m1-7nov.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/busybox-pvc-1 Bound pvc-ee53ddcf-fbd7-495f-aa13-c23a31e61203 94Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-10 Bound pvc-3ee14172-ee31-4c9f-9610-c0a858fdd427 87Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-11 Bound pvc-8f155598-856c-4f4a-bd15-e3c0e4500d98 33Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-12 Bound pvc-76d04364-6344-4cc6-b98e-5aad949421d1 147Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-13 Bound pvc-915b7392-2852-4a1c-89f6-8a1b1a54f752 77Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-14 Bound pvc-6e84ce0b-52f8-4d5d-ba64-b344783a9e70 70Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-15 Bound pvc-c258d8ec-8bd4-4fa0-8234-d82186bd4cad 131Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-16 Bound pvc-76f25ddc-d28d-400d-a9b3-72f0d6e4bc25 127Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-17 Bound pvc-1b689b65-d21f-4f15-b826-a0f00fe5f05e 58Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-18 Bound pvc-0f3b765f-2981-413d-95ef-6a956e9c5bb3 123Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-19 Bound pvc-18bb8470-e1ab-4135-9586-fade1d01acfa 61Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-2 Bound pvc-94a731b0-7460-453c-bb7e-3967f1d2f745 44Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-20 Bound pvc-6acd5f0a-8c29-441b-8172-04b7b69497ed 33Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-3 Bound pvc-ad829d43-edd6-40c0-9938-fc7c1ada3c00 76Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-4 Bound pvc-ffcd0520-3a0b-4040-ba64-31b06279619e 144Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-5 Bound pvc-abc844ef-4d0b-4a73-bb13-abdc5df2e172 107Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-6 Bound pvc-81022044-f1cc-4dd0-a400-8f028b50970a 123Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-7 Bound pvc-c7a1e894-6340-4afc-a912-18f967d27999 90Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-8 Bound pvc-361b194d-2606-4871-b12a-3557d6e02da9 91Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem persistentvolumeclaim/busybox-pvc-9 Bound pvc-1b533598-db26-429e-a183-2cb9b7239edd 111Gi RWX ocs-storagecluster-cephfs 4h6m Filesystem NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/appset-cephfs1-placement-drpc secondary Secondary NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/busybox-1-7f7bf8c5d9-skq4d 1/1 Running 0 81m 10.131.3.17 compute-5 <none> <none> pod/busybox-10-7b7bddddf8-pnfkj 1/1 Running 0 81m 10.131.1.93 compute-1 <none> <none> pod/busybox-11-6c4cf4bfb-k2rv2 1/1 Running 0 81m 10.131.3.15 compute-5 <none> <none> pod/busybox-12-7968f7d4bb-wtszw 1/1 Running 0 81m 10.129.2.206 compute-2 <none> <none> pod/busybox-13-674b97b564-ddwfn 1/1 Running 0 81m 10.129.2.205 compute-2 <none> <none> pod/busybox-14-f59899658-hhnh7 1/1 Running 0 81m 10.131.3.16 compute-5 <none> <none> pod/busybox-15-867dd79cbd-dr4t2 1/1 Running 0 81m 10.131.1.96 compute-1 <none> <none> pod/busybox-16-866d576d54-x4pxw 1/1 Running 0 81m 10.131.1.95 compute-1 <none> <none> pod/busybox-17-8d7df8b76-b5dkr 1/1 Running 0 81m 10.128.4.214 compute-3 <none> <none> pod/busybox-18-75cdf6f4c4-9t2rd 1/1 Running 0 81m 10.130.2.246 compute-4 <none> <none> pod/busybox-19-6bcbc84d68-2fmdf 1/1 Running 0 81m 10.129.2.204 compute-2 <none> <none> pod/busybox-2-5cffb67686-cv7bz 1/1 Running 0 81m 10.130.2.244 compute-4 <none> <none> pod/busybox-20-fdbd78dbd-bfv4b 1/1 Running 0 81m 10.128.4.212 compute-3 <none> <none> pod/busybox-3-7ffc7c8fbb-5krv6 1/1 Running 0 81m 10.131.1.94 compute-1 <none> <none> pod/busybox-4-66688c494b-zdwsz 1/1 Running 0 81m 10.129.2.203 compute-2 <none> <none> pod/busybox-5-56978ff94-lr4sb 1/1 Running 0 81m 10.131.3.14 compute-5 <none> <none> pod/busybox-6-57544b458b-5zntb 1/1 Running 0 81m 10.128.4.213 compute-3 <none> <none> pod/busybox-7-77ff998b8b-mj66t 1/1 Running 0 81m 10.130.2.245 compute-4 <none> <none> pod/busybox-8-6d5cdc5678-9ljxx 1/1 Running 0 81m 10.129.2.202 compute-2 <none> <none> pod/busybox-9-79c789995d-zs66w 1/1 Running 0 81m 10.129.2.207 compute-2 <none> <none> C2 (amagrawa-m2-7nov)- amagrawa:~$ busybox-3 Already on project "busybox-workloads-3" on server "https://api.amagrawa-m2-7nov.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/busybox-pvc-1 Bound pvc-1675b311-82f1-4d64-907c-71e63bae43d6 94Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-10 Bound pvc-263924c4-89a6-49ca-a128-2fc44294e738 87Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-11 Bound pvc-7fecdbf9-cd7e-47d0-8085-dca4b0391969 33Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-12 Bound pvc-0c7f9f9b-5796-4b0f-9a11-7d885105b856 147Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-13 Bound pvc-bd4ee9eb-07be-4ee1-a767-cd16284ef9a0 77Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-14 Bound pvc-7e52d0be-7110-4535-8fba-3354d1f201ea 70Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-15 Bound pvc-e9eb54ab-2736-4e71-8f39-d7ee3fc0d5b3 131Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-16 Bound pvc-d6242a28-a9c5-4b69-8b97-82844b925b11 127Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-17 Bound pvc-d9eebab7-45d2-4fea-bce4-944b5ea62286 58Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-18 Bound pvc-02ff5b23-65f6-4858-a5f4-89eefe996228 123Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-19 Bound pvc-cdcbb948-1cbe-4450-a316-7f58049c0847 61Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-2 Bound pvc-9a2e8fe2-b9fa-4302-875d-892f8e40f005 44Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-20 Bound pvc-4cb1565c-ae78-4149-bff9-635a9cb5e7b3 33Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-3 Bound pvc-80b59ec4-7538-44cb-ba0d-23cef425d3f6 76Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-4 Bound pvc-da4b7af5-f8e4-4570-bd8f-b846eba8ca37 144Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-5 Bound pvc-9cc23ce8-9972-4af0-8567-edf5c23380b4 107Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-6 Bound pvc-48e1e666-a0fb-4e05-afec-284341518040 123Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-7 Bound pvc-7ba675ed-620d-40e8-b647-4ced9705c03f 90Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-8 Bound pvc-aa7c324b-34dd-4ce7-a98b-027333166369 91Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem persistentvolumeclaim/busybox-pvc-9 Bound pvc-71d43337-a656-4747-b7e3-2dfbd6a99286 111Gi RWX ocs-storagecluster-cephfs 4h3m Filesystem NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/appset-cephfs1-placement-drpc secondary Secondary NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/volsync-rsync-tls-dst-busybox-pvc-1-nd7fg 1/1 Running 0 4m55s 10.128.2.131 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-10-j8p5j 1/1 Running 0 5m4s 10.131.1.11 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-11-kw5n7 1/1 Running 0 4m45s 10.128.2.136 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-12-dm4bh 1/1 Running 0 5m4s 10.131.1.10 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-13-69fxm 1/1 Running 0 4m54s 10.131.1.12 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-14-6cr48 1/1 Running 0 5m1s 10.128.2.130 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-15-bwhp7 1/1 Running 0 5m10s 10.131.1.9 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-16-pxmc8 1/1 Running 0 4m52s 10.128.2.134 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-17-7gq8d 1/1 Running 0 4m43s 10.131.1.14 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-18-6w6rv 1/1 Running 0 4m43s 10.128.2.137 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-19-hnjv4 1/1 Running 0 5m16s 10.128.2.127 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-2-gx2qs 1/1 Running 0 5m1s 10.128.2.129 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-20-76tz4 1/1 Running 0 4m46s 10.128.2.135 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-3-7wlwx 1/1 Running 0 4m40s 10.131.1.15 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-4-2fq79 1/1 Running 0 4m54s 10.128.2.132 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-5-m8b9z 1/1 Running 0 5m10s 10.128.2.128 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-6-5m5dr 1/1 Running 0 5m13s 10.131.1.8 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-7-wzqjk 1/1 Running 0 4m53s 10.128.2.133 compute-1 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-8-rx4sz 1/1 Running 0 4m33s 10.131.1.16 compute-2 <none> <none> pod/volsync-rsync-tls-dst-busybox-pvc-9-8hmsz 1/1 Running 0 4m52s 10.131.1.13 compute-2 <none> <none> It should be primary on C1 however VRG on both sides were marked as secondary. If we look at the pods, src pods are being created on C1 and dst pods on C2 which is fine. Since it goes to Cleaning up state, further failover/relocate can not be performed and data sync stops for this workload. Expected results: The workload should be reconciled to the right state post hub recovery. Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383