2246185 – [Tracker Volsync] [RDR] Request to enable TCP keepalive timeout and lower its value in order to detect broken connection within 15mins

Bug 2246185 - [Tracker Volsync] [RDR] Request to enable TCP keepalive timeout and lower its value in order to detect broken connection within 15mins

Summary: [Tracker Volsync] [RDR] Request to enable TCP keepalive timeout and lower its...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.14.0
Assignee:	Benamar Mekhissi
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2239587
TreeView+	depends on / blocked

Reported:	2023-10-25 17:45 UTC by Aman Agrawal
Modified:	2023-11-08 18:56 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 18:55:29 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	backube volsync pull 967	0	None	Merged	[release-0.8] Enable TCP-level keepalive for rsync-tls mover	2023-10-31 14:51:47 UTC
Red Hat Product Errata	RHSA-2023:6832	0	None	None	None	2023-11-08 18:56:19 UTC

Description Aman Agrawal 2023-10-25 17:45:38 UTC

Description of problem (please be detailed as possible and provide log
snippests): As a temporary fix for this issue https://bugzilla.redhat.com/show_bug.cgi?id=2239587#c19, we should default the stunnel TIMEOUTidle to 30mins from 12hrs as the hanged pvc gets reset after 12hrs as of now, and thus would lead to halted data sync for that cephfs workload for 12hr interval which is too much. The actual issue should be RCA and fixed as part of the original BZ2239587.  


Version of all relevant components (if applicable):
OCP 4.14.0-0.nightly-2023-10-18-004928
advanced-cluster-management.v2.9.0-188 
ODF 4.14.0-156
ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable)
Submariner   image: brew.registry.redhat.io/rh-osbs/iib:599799
ACM 2.9.0-DOWNSTREAM-2023-10-18-17-59-25
Latency 50ms RTT


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 2 Aman Agrawal 2023-10-25 17:48:18 UTC

Proposing this as ODF 4.14 GA blocker looking at the importance of the temp. fix and it's impact on the volsync solution for cephfs backed workloads.

Comment 3 Mudit Agarwal 2023-10-26 13:48:31 UTC

Is this fix in ramen or volsync? Can we increase this timeout manually (as a workaround) and wait for the fix in 4.14.1?

Comment 4 Aman Agrawal 2023-10-26 16:19:02 UTC

(In reply to Mudit Agarwal from comment #3)
> Is this fix in ramen or volsync? Can we increase this timeout manually (as a
> workaround) and wait for the fix in 4.14.1?

Benamar can help us answer this, but we need this fix in 4.14 for cephfs GA.

Comment 5 Benamar Mekhissi 2023-10-30 13:18:09 UTC

We have asked the VolSync team to include the timeout in their final 0.8 release.

Comment 8 Benamar Mekhissi 2023-10-31 14:51:48 UTC

PR here: https://github.com/backube/volsync/pull/967

Comment 12 errata-xmlrpc 2023-11-08 18:55:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832

Note You need to log in before you can comment on or make changes to this bug.