Description of problem (please be detailed as possible and provide log snippests): Version of all relevant components (if applicable): ODF 4.14.0-132.stable OCP 4.14.0-0.nightly-2023-09-02-132842 ACM 2.9.0-DOWNSTREAM-2023-08-24-09-30-12 subctl version: v0.16.0 ceph version 17.2.6-138.el9cp (b488c8dad42b2ecffcd96f3d76eeeecce48b8590) quincy (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. On a RDR setup, deploy cephfs based DR protected workloads on both primary and secondary clusters. Do **not** perform any failover/relocate operations on the workloads. 2. Run IOs for a week or so and keep monitoring the pod/pvc status on primary and secondary managed clusters, lastGroupSyncTime on hub etc. 3. Actual results: Source pods remain stuck on the primary cluster and sync stops for cephfs workloads amagrawa:c2$ busybox-5 Now using project "busybox-workloads-5" on server "https://api.amagrawa-c2.qe.rh-ocs.com:6443". NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/dd-io-pvc-1 Bound pvc-63d64bd6-1524-487e-83be-76773c05a906 117Gi RWO ocs-storagecluster-cephfs 5d3h Filesystem persistentvolumeclaim/dd-io-pvc-2 Bound pvc-935999f5-ab46-404f-ac81-d713cdcd9d4a 143Gi RWO ocs-storagecluster-cephfs 5d3h Filesystem persistentvolumeclaim/dd-io-pvc-3 Bound pvc-628fd825-c9e7-4959-9ded-c8107efee004 134Gi RWO ocs-storagecluster-cephfs 5d3h Filesystem persistentvolumeclaim/dd-io-pvc-4 Bound pvc-8f458cd3-5a71-4357-8c1c-eb59af04b68f 106Gi RWO ocs-storagecluster-cephfs 5d3h Filesystem persistentvolumeclaim/dd-io-pvc-5 Bound pvc-b52b6e64-4be6-4865-ae74-5524fc398f97 115Gi RWO ocs-storagecluster-cephfs 5d3h Filesystem persistentvolumeclaim/dd-io-pvc-6 Bound pvc-e3575099-8433-4f28-9360-e3d7865c23b2 129Gi RWO ocs-storagecluster-cephfs 5d3h Filesystem persistentvolumeclaim/dd-io-pvc-7 Bound pvc-77563d35-cb7a-46fd-83cc-f929c52dcdd3 149Gi RWO ocs-storagecluster-cephfs 5d3h Filesystem persistentvolumeclaim/volsync-dd-io-pvc-1-src Bound pvc-601f29dd-6cf3-4c0a-865d-d82398f9e324 117Gi ROX ocs-storagecluster-cephfs-vrg 15h Filesystem persistentvolumeclaim/volsync-dd-io-pvc-2-src Bound pvc-4cfc6648-9b1a-422f-a2db-c2b2ed96146d 143Gi ROX ocs-storagecluster-cephfs-vrg 4s Filesystem persistentvolumeclaim/volsync-dd-io-pvc-3-src Bound pvc-1c736f2e-a7e8-4175-90c9-9dbeb3660952 134Gi ROX ocs-storagecluster-cephfs-vrg 3s Filesystem persistentvolumeclaim/volsync-dd-io-pvc-4-src Bound pvc-a99c1bea-46bb-473b-9525-a294ec075663 106Gi ROX ocs-storagecluster-cephfs-vrg 15h Filesystem persistentvolumeclaim/volsync-dd-io-pvc-5-src Bound pvc-14da1d92-99b4-4e28-a1c8-12ae885684ed 115Gi ROX ocs-storagecluster-cephfs-vrg 15h Filesystem persistentvolumeclaim/volsync-dd-io-pvc-6-src Bound pvc-6ccc5e3c-2815-43a3-a266-14287dfc2d39 129Gi ROX ocs-storagecluster-cephfs-vrg 15h Filesystem persistentvolumeclaim/volsync-dd-io-pvc-7-src Bound pvc-4bbb501e-8543-4292-a53a-d97febfa032c 149Gi ROX ocs-storagecluster-cephfs-vrg 15h Filesystem NAME DESIREDSTATE CURRENTSTATE volumereplicationgroup.ramendr.openshift.io/busybox-workloads-5-placement-1-drpc primary Primary NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/dd-io-1-5dbcfccf76-6bnwn 1/1 Running 0 5d3h 10.131.0.208 compute-0 <none> <none> pod/dd-io-2-684fc84b64-jfxzh 1/1 Running 0 5d3h 10.131.0.209 compute-0 <none> <none> pod/dd-io-3-68bf99586d-kznw8 1/1 Running 0 5d3h 10.129.3.15 compute-1 <none> <none> pod/dd-io-4-757c8d8b7b-s5ld2 1/1 Running 0 5d3h 10.131.0.207 compute-0 <none> <none> pod/dd-io-5-74768ccf84-bqk45 1/1 Running 0 5d3h 10.128.2.136 compute-2 <none> <none> pod/dd-io-6-68d5769c76-5wczl 1/1 Running 0 5d3h 10.129.3.16 compute-1 <none> <none> pod/dd-io-7-67d87688b4-ffwmr 1/1 Running 0 5d3h 10.131.0.206 compute-0 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-1-xznbk 0/1 ContainerCreating 0 78s <none> compute-0 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-2-fkb48 0/1 ContainerCreating 0 5s <none> compute-2 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-3-nzj5k 0/1 ContainerCreating 0 4s <none> compute-0 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-4-2zlkv 1/1 Running 0 28s 10.128.2.224 compute-2 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-4-5fbtg 0/1 Error 0 3m48s 10.128.2.202 compute-2 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-5-rm7l4 1/1 Running 0 85s 10.131.1.79 compute-0 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-6-vt4dl 0/1 Error 0 3m17s 10.131.1.64 compute-0 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-6-wjh8l 0/1 ContainerCreating 0 4s <none> compute-0 <none> <none> pod/volsync-rsync-tls-src-dd-io-pvc-7-2dspj 1/1 Running 0 83s 10.131.1.82 compute-0 <none> <none> Expected results: Source pod should reach Running state on the primary cluster and shouldn't remain stuck. Sync shouldn't stop for cephfs workloads on both the managed clusters where workloads are running. Additional info:
ACM team addressed this issue, fixes are included in submariner-operator-bundle-container-v0.16.0-23 (and later). -> ON_QA
Newer version available, please use submariner-operator-bundle-container-v0.16.0-25 (and later).
VERIFICATION COMMENTS ===================== Steps to Reproduce: 1. On a RDR setup, deploy cephfs based DR protected workloads on both primary and secondary clusters. Do **not** perform any failover/relocate operations on the workloads. 2. Run IOs for a week or so and keep monitoring the pod/pvc status on primary and secondary managed clusters, lastGroupSyncTime on hub etc. Actual issue: Source pods remain stuck on the primary cluster and sync stops for cephfs workloads With the fix didn't observe the above behavior Output on C1 ------------ $ oc get replicationsources.volsync.backube NAME SOURCE LAST SYNC DURATION NEXT SYNC dd-io-pvc-1 dd-io-pvc-1 2023-10-30T04:41:28Z 1m28.064760303s 2023-10-30T04:50:00Z dd-io-pvc-2 dd-io-pvc-2 2023-10-30T04:41:27Z 1m27.962788093s 2023-10-30T04:50:00Z dd-io-pvc-3 dd-io-pvc-3 2023-10-30T04:41:21Z 1m21.964984125s 2023-10-30T04:50:00Z dd-io-pvc-4 dd-io-pvc-4 2023-10-30T04:41:24Z 1m24.495460567s 2023-10-30T04:50:00Z dd-io-pvc-5 dd-io-pvc-5 2023-10-30T04:41:22Z 1m22.898981791s 2023-10-30T04:50:00Z dd-io-pvc-6 dd-io-pvc-6 2023-10-30T04:41:29Z 1m29.050621317s 2023-10-30T04:50:00Z dd-io-pvc-7 dd-io-pvc-7 2023-10-30T04:41:30Z 1m30.858916464s 2023-10-30T04:50:00Z $ pods NAME READY STATUS RESTARTS AGE dd-io-1-5dbcfccf76-q4twv 1/1 Running 3 4d20h dd-io-2-684fc84b64-f4ztj 1/1 Running 2 2d17h dd-io-3-68bf99586d-7czc4 1/1 Running 3 4d20h dd-io-4-757c8d8b7b-2xgm2 1/1 Running 3 4d20h dd-io-5-74768ccf84-s9gqr 1/1 Running 3 4d20h dd-io-6-68d5769c76-qkfvm 1/1 Running 3 4d20h dd-io-7-67d87688b4-kpnnm 1/1 Running 2 2d17h $ pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dd-io-pvc-1 Bound pvc-e479369e-8ea3-416c-8f51-1bee7c26b471 117Gi RWO ocs-storagecluster-cephfs 4d20h dd-io-pvc-2 Bound pvc-e17ca78e-e2bf-466e-be8e-d44ba14cc14d 143Gi RWO ocs-storagecluster-cephfs 4d20h dd-io-pvc-3 Bound pvc-f00f84ed-c662-4b91-a781-8cf27912e54f 134Gi RWO ocs-storagecluster-cephfs 4d20h dd-io-pvc-4 Bound pvc-9fc0cce1-18c7-4612-84c4-8cf4b839cc49 106Gi RWO ocs-storagecluster-cephfs 4d20h dd-io-pvc-5 Bound pvc-e7bf1a2d-3d77-4b76-92c5-1bce1854072e 115Gi RWO ocs-storagecluster-cephfs 4d20h dd-io-pvc-6 Bound pvc-9f1c932d-69d0-4f89-a938-717f7beaf516 129Gi RWO ocs-storagecluster-cephfs 4d20h dd-io-pvc-7 Bound pvc-b07966c2-83bf-489d-9c32-0656f3ea8622 149Gi RWO ocs-storagecluster-cephfs 4d20h $ oc get vrg NAME DESIREDSTATE CURRENTSTATE busybox-1-cephfs-c1-placement-drpc primary Primary On C2 ----- $ oc get replicationdestinations.volsync.backube NAME LAST SYNC DURATION NEXT SYNC dd-io-pvc-1 2023-10-30T04:41:32Z 9m43.583270288s dd-io-pvc-2 2023-10-30T04:41:36Z 9m53.64786527s dd-io-pvc-3 2023-10-30T04:41:23Z 9m38.1503665s dd-io-pvc-4 2023-10-30T04:41:29Z 10m14.653941103s dd-io-pvc-5 2023-10-30T04:41:23Z 9m40.087622682s dd-io-pvc-6 2023-10-30T04:41:35Z 10m3.789656323s dd-io-pvc-7 2023-10-30T04:41:39Z 10m29.950678547s $ pods NAME READY STATUS RESTARTS AGE volsync-rsync-tls-dst-dd-io-pvc-1-q6nhw 1/1 Running 0 8m3s volsync-rsync-tls-dst-dd-io-pvc-2-z8vk2 1/1 Running 0 7m58s volsync-rsync-tls-dst-dd-io-pvc-3-ssmn6 1/1 Running 0 8m12s volsync-rsync-tls-dst-dd-io-pvc-4-m5qnn 1/1 Running 0 8m6s volsync-rsync-tls-dst-dd-io-pvc-5-mk64p 1/1 Running 0 8m12s volsync-rsync-tls-dst-dd-io-pvc-6-7n4jh 1/1 Running 0 7m59s volsync-rsync-tls-dst-dd-io-pvc-7-798gl 1/1 Running 0 7m56s Verified on ----------- ODF - 4.14.0-150 OCP - 4.14.0-0.nightly-2023-10-17-113123 MCO - 4.14.0-150 Submariner - 0.16.0(brew.registry.redhat.io/rh-osbs/iib:599799) ACM - 2.9.0 (2.9.0-DOWNSTREAM-2023-10-03-20-08-35) Must gather ----------- C1 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/bz-v/bz-2239776/c1/ C2 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/bz-v/bz-2239776/c2/ HUB - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/bz-v/bz-2239776/hub/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832