1739336 – Multiple disconnect events being propagated for the same child

Bug 1739336 - Multiple disconnect events being propagated for the same child

Summary: Multiple disconnect events being propagated for the same child

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	rpc
Sub Component:
Version:	5
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1703423 1716979
Blocks:	1739334 1739335 1759832
TreeView+	depends on / blocked

Reported:	2019-08-09 05:02 UTC by Ravishankar N
Modified:	2019-10-09 08:50 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:	1716979
Environment:
Last Closed:	2019-08-12 07:19:58 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	23181	0	None	Merged	protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick	2019-08-12 07:19:57 UTC

Description Ravishankar N 2019-08-09 05:02:48 UTC

+++ This bug was initially created as a clone of Bug #1716979 +++

+++ This bug was initially created as a clone of Bug #1703423 +++

Description of problem:
Issue was reported upstream by a user via https://github.com/gluster/glusterfs/issues/648

I'm seeing that if I kill a brick in a replica 3 system, AFR keeps getting child_down event repeatedly for the same child. This seems to be a regression in behaviour as it does not occur in rhgs-3.4.0. In 3.4.0, I get exactly one GF_EVENT_CHILD_DOWN for 1 disconnect.

Version-Release number of selected component (if applicable):
rhgs-3.5 branch (source install)

How reproducible:
Always.

Steps to Reproduce:
1. Create a replica 3 volume and start it.
2. Put  a break point in __afr_handle_child_down_event() in glustershd process.
3. Kill any one brick.

Actual results:
The break point keeps getting hit once every 3 seconds or so repeatedly.

Expected results:
Only 1 event per one disconnect.

Additional info:
I haven't checked if the same happens for GF_EVENT_CHILD_UP as well. I think this is regression that needs to be fixed. If this is not a bug please feel free to close stating why.

Comment 1 Worker Ant 2019-08-09 05:05:17 UTC

REVIEW: https://review.gluster.org/23181 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) posted (#1) for review on release-5 by Ravishankar N

Comment 2 Worker Ant 2019-08-12 07:19:58 UTC

REVIEW: https://review.gluster.org/23181 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) merged (#1) on release-5 by Ravishankar N

Note You need to log in before you can comment on or make changes to this bug.