Bug 1408158

Summary: IO is paused for minimum one and half minute when one of the EC volume hosted cluster node goes down.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Byreddy <bsrirama>
Component: rpcAssignee: Raghavendra G <rgowdapp>
Status: CLOSED ERRATA QA Contact: Sri Vignesh Selvan <sselvan>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, aspandey, mchangir, nchilaka, rgowdapp, rhs-bugs, sheggodu, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: rebase
Fixed In Version: glusterfs-3.12.2-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:29:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1408354    
Bug Blocks: 1503134    

Description Byreddy 2016-12-22 10:37:33 UTC
Description of problem:
=======================
IO is paused  for one and half minute when one of the EC volume hosted cluster node goes down.

Check the steps to reproduce the issue.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-9.el6rhs.x86_64


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Create 6 node rhgs cluster (n1,..,n6)
2. Create 2*(4+2) volume and fuse mount the volume
3. start io in the fuse mount point //untar Linux kernel 
4. reboot/halt one of the cluster node (in my case node6) and observe the IO status in the mount point.

Actual results:
===============
IO is  paused  for one and half minute when one of the EC volume hosted cluster node goes down.


Expected results:
=================
IO should not go for pause state, Applications consuming the mount point will hang/crash/io starvation happen/etc 

Additional info:
================
I am not seeing this issue when volume hosted bricks are killed instead of node reboot.

Comment 5 Ashish Pandey 2016-12-23 12:15:45 UTC
This issue can be seen in replica volume too.

I followed the same steps and at the same time and afr and ec volume paused for 45 seconds.

I think this is an issue with rpc when it takes some time to detect network issue due to node down. However, if we kill bricks it works fine.

Raghavendra, Can we move it to rpc?

Comment 7 Raghavendra G 2017-02-23 04:45:31 UTC
A possible RCA can be found at https://bugzilla.redhat.com/show_bug.cgi?id=1408354#c26. However, need to confirm it.

Comment 9 Raghavendra G 2017-09-06 10:42:25 UTC
Duplicate of bz 1408354. Backport of https://review.gluster.org/16731 is needed to fix the issue.

Comment 14 Sri Vignesh Selvan 2018-07-20 10:32:16 UTC
Build version :
---------------
glusterfs-3.12.2-13.el7rhgs.x86_64

I still see 40-100 seconds pause on I/O when the node is halted/rebooted. 
Also, there is some pause in the I/O even after the brick comes up(heals are either in progress/pending)


Due to above, I may have to FAILQA the bug.

Comment 22 Sri Vignesh Selvan 2018-08-21 11:28:44 UTC
tested With the same vol configuration and setting as mentioned in c#2

time taken about 60-100secs to resume IOs on a node reboot

as part of what was mentioned by Dev, to validate this fix, did below by changing the mentioned tunables


Tried running same after setting below option.

Test#1:
1.  transport.tcp-user-timeout to a small value like 10.
2. transport.socket.keepalive-time to a small value like 5.

when a node is rebooted, IOs are now getting paused for 20-30 seconds

Test#2:
client.tcp-user-timeout 5
gluster v set vol server.tcp-user-timeout 3


now, when a node is rebooted, IOs are now getting paused for <10 seconds
It's comparatively lower time than what it was before.

Lower the values comparatively lesser the time it take for io's to be back from pause state.





Now, Milind can you confirm If we should be moving this bug to verified based o n above findings and below comments:
1) what is the impact of these tunables if customer sets this for all his volumes, is there a chance for something else to break or any problem that can be seen
2) if we don't forsee any problems with those tunables, and given that they are not default , shouldn't we be making them the default, as customer can't just change them before doing a node reboot everytime, to avoid facing this problem



based on answers to above 2 comments, we can decide on moving this bug to verified

Comment 23 Milind Changire 2018-08-21 11:46:09 UTC
(In reply to Sri Vignesh Selvan from comment #22)
> tested With the same vol configuration and setting as mentioned in c#2
> 
> time taken about 60-100secs to resume IOs on a node reboot
> 
> as part of what was mentioned by Dev, to validate this fix, did below by
> changing the mentioned tunables
> 
> 
> Tried running same after setting below option.
> 
> Test#1:
> 1.  transport.tcp-user-timeout to a small value like 10.
> 2. transport.socket.keepalive-time to a small value like 5.
> 
> when a node is rebooted, IOs are now getting paused for 20-30 seconds
> 
> Test#2:
> client.tcp-user-timeout 5
> gluster v set vol server.tcp-user-timeout 3
> 
> 
> now, when a node is rebooted, IOs are now getting paused for <10 seconds
> It's comparatively lower time than what it was before.
> 
> Lower the values comparatively lesser the time it take for io's to be back
> from pause state.
> 
> 
> 
> 
> 
> Now, Milind can you confirm If we should be moving this bug to verified
> based o n above findings and below comments:
> 1) what is the impact of these tunables if customer sets this for all his
> volumes, is there a chance for something else to break or any problem that
> can be seen

Aggressive (lower) settings for keepalive-time and tcp-user-timeout can be used where the network is highly reliable or else customers may face spurious disconnects and loss of service.

> 2) if we don't forsee any problems with those tunables, and given that they
> are not default , shouldn't we be making them the default, as customer can't
> just change them before doing a node reboot everytime, to avoid facing this
> problem

Any default value is relative to the network reliability and the workload that is running on the cluster. This needs discussion and probably incorporated in gluster workload profiles that we maintain for some workloads.

> 
> 
> 
> based on answers to above 2 comments, we can decide on moving this bug to
> verified

Comment 24 Sri Vignesh Selvan 2018-08-23 13:17:17 UTC
Based on the comment #23, moving this bug to verified state.

Comment 25 errata-xmlrpc 2018-09-04 06:29:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607