Bug 2136864

Summary: [ACM Tracker] [DR][CEPHFS] volsync-rsync-src pods are in Error state as they are unable to connect to volsync-rsync-dst
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Pratik Surve <prsurve>
Component: odf-drAssignee: Benamar Mekhissi <bmekhiss>
odf-dr sub component: ramen QA Contact: Pratik Surve <prsurve>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: bmekhiss, kbg, kramdoss, kseeger, madam, muagarwa, nsoffer, nyechiel, ocs-bugs, odf-bz-bot, sgaddam, srangana, yboaron
Version: 4.12   
Target Milestone: ---   
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, CephFS volumes that are DR protected, failed to sync their data across clusters because of the incorrect MTU size configuration for intra-cluster submariner based setups. With this update, in every schedule interval, a `VolSync` job is created for every schedule time interval to sync the delta change between the source and the destination.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-31 00:19:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pratik Surve 2022-10-21 16:09:39 UTC
Description of problem (please be detailed as possible and provide log
snippets):

[DR][CEPHFS] volsync-rsync-src pods are in Error state as they are unable to connect to volsync-rsync-dst 

Version of all relevant components (if applicable):
OCP version:- 4.12.0-0.nightly-2022-10-18-192348
ODF version:- 4.12.0-79
CEPH version:- ceph version 16.2.10-50.el8cp (f311fa3856a155d4cd9b658e25a78def0ae7a7c3) pacific (stable)
ACM version:- 2.6.1
SUBMARINER version:- v0.13.0
VOLSYNC version:- volsync-product.v0.5.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy RDR cluster 
2.Run cephfs dr workload
3.check volsync-rsync-src pod logs


Actual results:
VolSync rsync container version: ACM-0.5.0-df22d29
Syncing data to volsync-rsync-dst-busybox-pvc-1.busybox-workloads-1.svc.clusterset.local:22 ...
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 2 seconds. Retry 1/5.
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 4 seconds. Retry 2/5.
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 8 seconds. Retry 3/5.
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 16 seconds. Retry 4/5.
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 32 seconds. Retry 5/5.
Rsync completed in 665s
Synchronization failed. rsync returned: 255


Expected results:


Additional info:

Comment 11 Benamar Mekhissi 2022-10-27 18:31:37 UTC
@prsurve; tt this point, this looks like a submariner issue and we believe it is fixed by this PR: https://github.com/submariner-io/submariner/pull/2087

Comment 12 Mudit Agarwal 2022-10-29 03:56:35 UTC
Can this be tested with the latest builds with the fix?

Comment 17 Shyamsundar 2022-11-08 12:14:09 UTC
*** Bug 2132566 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2023-01-31 00:19:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0551

Comment 34 Red Hat Bugzilla 2023-12-08 04:31:01 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days