Bug 1628605

Summary: One client hangs when another client loses communication with bricks during intensive write I/O
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: rpcAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-25 16:30:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xavi Hernandez 2018-09-13 14:14:44 UTC
Description of problem:

Bricks don't detect abrupt client disconnections in a reasonable time. If this happens when the dead client had locks held, accessing the locked file from other clients will take a huge amount of time.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2018-09-13 14:16:57 UTC
REVIEW: https://review.gluster.org/21170 (socket: set 42 as default tpc-user-timeout) posted (#2) for review on master by Xavi Hernandez

Comment 2 Worker Ant 2018-09-14 05:38:08 UTC
COMMIT: https://review.gluster.org/21170 committed in master by "Raghavendra G" <rgowdapp> with a commit message- socket: set 42 as default tpc-user-timeout

The 'tcp-user-timeout' option is define in the 'socket' module, but it's
configured in 'protocol/server' and 'protocol/client', which are the
parents of the 'socket' module.

However, current options management logic only takes into consideration
default values specified in the 'socket' module itself, ignoring values
defined in the owner xlator.

This patch simply sets the default value of tcp-user-timeout in the
'socket' module so that server and client use the expected value.

Change-Id: Ib8ad7c4ac6aac725b01a78f8c3d10cf4063d2ee6
fixes: bz#1628605
Signed-off-by: Xavi Hernandez <xhernandez>

Comment 3 Shyamsundar 2019-03-25 16:30:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/