Description of problem (please be detailed as possible and provide log snippests): [RDR][CEPHFS] sync for some pvc hangs Version of all relevant components (if applicable): OCP version:- 4.14.0-0.nightly-2023-09-15-055234 ODF version:- 4.14.0-135 CEPH version:- ceph version 17.2.6-138.el9cp (b488c8dad42b2ecffcd96f3d76eeeecce48b8590) quincy (stable) ACM version:- 2.9.0-109 SUBMARINER version:- devel VOLSYNC version:- volsync-product.v0.7.4 VOLSYNC method:- destinationCopyMethod: LocalDirect Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy RDR cluster 2. Deploy cephfs workload 3. Keep workload running for some days Actual results: ### from primary cluster volsync-rsync-tls-src-dd-io-pvc-1-7n8wt 1/1 Running 0 4h12m volsync-rsync-tls-src-dd-io-pvc-3-7jl95 1/1 Running 0 4h12m volsync-rsync-tls-src-dd-io-pvc-6-gtbvm 1/1 Running 0 4h12m ### from Secondary pods NAME READY STATUS RESTARTS AGE volsync-rsync-tls-dst-dd-io-pvc-1-local-bmqjc 1/1 Running 0 4h13m volsync-rsync-tls-dst-dd-io-pvc-1-sbfwc 1/1 Running 0 4h14m volsync-rsync-tls-dst-dd-io-pvc-2-cqhh8 1/1 Running 0 51s volsync-rsync-tls-dst-dd-io-pvc-2-local-lg9b7 1/1 Running 0 22s volsync-rsync-tls-dst-dd-io-pvc-3-local-85vxf 1/1 Running 0 4h13m volsync-rsync-tls-dst-dd-io-pvc-3-tpbjt 1/1 Running 0 4h14m volsync-rsync-tls-dst-dd-io-pvc-4-local-nftrq 1/1 Running 0 13s volsync-rsync-tls-dst-dd-io-pvc-4-z26x2 1/1 Running 0 39s volsync-rsync-tls-dst-dd-io-pvc-5-cxzbl 1/1 Running 0 33s volsync-rsync-tls-dst-dd-io-pvc-5-local-fm8lv 1/1 Running 0 9m10s volsync-rsync-tls-dst-dd-io-pvc-6-bhxdj 1/1 Running 0 4h14m volsync-rsync-tls-dst-dd-io-pvc-6-local-p5gp4 1/1 Running 0 4h11m volsync-rsync-tls-dst-dd-io-pvc-7-local-r7spb 1/1 Running 0 8m46s volsync-rsync-tls-dst-dd-io-pvc-7-qsd4g 0/1 ContainerCreating 0 14s volsync-rsync-tls-src-dd-io-pvc-5-local-hp676 1/1 Running 0 32s volsync-rsync-tls-src-dd-io-pvc-7-local-l8vdz 0/1 ContainerCreating 0 13s Expected results: sync should not hang Additional info:
Must-gather :- http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/volsync_issue/sep13/
Seems to be related to Submariner issues, Submariner team is investigating.
Talur, PTAL
BZ2246185 is a temp. fix for 4.14 release and we would still need to RCA this bug and understand why it's happening? It can be targetted for 4.15 and then backport to 4.14.z if the fix is from ODF (or track with submariner team if needed).
No update since October, is this still a blocker?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days