Bug 2246185 - [Tracker Volsync] [RDR] Request to enable TCP keepalive timeout and lower its value in order to detect broken connection within 15mins
Summary: [Tracker Volsync] [RDR] Request to enable TCP keepalive timeout and lower its...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.14.0
Assignee: Benamar Mekhissi
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 2239587
TreeView+ depends on / blocked
 
Reported: 2023-10-25 17:45 UTC by Aman Agrawal
Modified: 2023-11-08 18:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 18:55:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github backube volsync pull 967 0 None Merged [release-0.8] Enable TCP-level keepalive for rsync-tls mover 2023-10-31 14:51:47 UTC
Red Hat Product Errata RHSA-2023:6832 0 None None None 2023-11-08 18:56:19 UTC

Description Aman Agrawal 2023-10-25 17:45:38 UTC
Description of problem (please be detailed as possible and provide log
snippests): As a temporary fix for this issue https://bugzilla.redhat.com/show_bug.cgi?id=2239587#c19, we should default the stunnel TIMEOUTidle to 30mins from 12hrs as the hanged pvc gets reset after 12hrs as of now, and thus would lead to halted data sync for that cephfs workload for 12hr interval which is too much. The actual issue should be RCA and fixed as part of the original BZ2239587.  


Version of all relevant components (if applicable):
OCP 4.14.0-0.nightly-2023-10-18-004928
advanced-cluster-management.v2.9.0-188 
ODF 4.14.0-156
ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable)
Submariner   image: brew.registry.redhat.io/rh-osbs/iib:599799
ACM 2.9.0-DOWNSTREAM-2023-10-18-17-59-25
Latency 50ms RTT


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 2 Aman Agrawal 2023-10-25 17:48:18 UTC
Proposing this as ODF 4.14 GA blocker looking at the importance of the temp. fix and it's impact on the volsync solution for cephfs backed workloads.

Comment 3 Mudit Agarwal 2023-10-26 13:48:31 UTC
Is this fix in ramen or volsync? Can we increase this timeout manually (as a workaround) and wait for the fix in 4.14.1?

Comment 4 Aman Agrawal 2023-10-26 16:19:02 UTC
(In reply to Mudit Agarwal from comment #3)
> Is this fix in ramen or volsync? Can we increase this timeout manually (as a
> workaround) and wait for the fix in 4.14.1?

Benamar can help us answer this, but we need this fix in 4.14 for cephfs GA.

Comment 5 Benamar Mekhissi 2023-10-30 13:18:09 UTC
We have asked the VolSync team to include the timeout in their final 0.8 release.

Comment 8 Benamar Mekhissi 2023-10-31 14:51:48 UTC
PR here: https://github.com/backube/volsync/pull/967

Comment 12 errata-xmlrpc 2023-11-08 18:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832


Note You need to log in before you can comment on or make changes to this bug.