Bug 1739335

Summary: Multiple disconnect events being propagated for the same child
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: rpcAssignee: bugs <bugs>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6CC: amgad.saleh, bugs, ravishankar, rgowdapp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1716979 Environment:
Last Closed: 2019-08-13 05:56:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1703423, 1716979, 1739336    
Bug Blocks: 1739334    

Description Ravishankar N 2019-08-09 04:44:16 UTC
+++ This bug was initially created as a clone of Bug #1716979 +++

+++ This bug was initially created as a clone of Bug #1703423 +++

Description of problem:
Issue was reported upstream by a user via https://github.com/gluster/glusterfs/issues/648

I'm seeing that if I kill a brick in a replica 3 system, AFR keeps getting child_down event repeatedly for the same child. This seems to be a regression in behaviour as it does not occur in rhgs-3.4.0. In 3.4.0, I get exactly one GF_EVENT_CHILD_DOWN for 1 disconnect.

Version-Release number of selected component (if applicable):
rhgs-3.5 branch (source install)

How reproducible:
Always.

Steps to Reproduce:
1. Create a replica 3 volume and start it.
2. Put  a break point in __afr_handle_child_down_event() in glustershd process.
3. Kill any one brick.

Actual results:
The break point keeps getting hit once every 3 seconds or so repeatedly.

Expected results:
Only 1 event per one disconnect.

Additional info:
I haven't checked if the same happens for GF_EVENT_CHILD_UP as well. I think this is regression that needs to be fixed. If this is not a bug please feel free to close stating why.

Comment 1 Worker Ant 2019-08-09 04:47:01 UTC
REVIEW: https://review.gluster.org/23180 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) posted (#1) for review on release-6 by Ravishankar N

Comment 2 Worker Ant 2019-08-13 05:56:46 UTC
REVIEW: https://review.gluster.org/23180 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) merged (#1) on release-6 by Ravishankar N