Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1385561 - [Eventing]: BRICK_CONNECTED and BRICK_DISCONNECTED events seen at every heartbeat when a brick-is-killed/volume-stopped
[Eventing]: BRICK_CONNECTED and BRICK_DISCONNECTED events seen at every heart...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
3.2
Unspecified Unspecified
unspecified Severity low
: ---
: RHGS 3.2.0
Assigned To: Atin Mukherjee
Sweta Anandpara
:
Depends On: 1387544
Blocks: 1351528
  Show dependency treegraph
 
Reported: 2016-10-17 06:16 EDT by Sweta Anandpara
Modified: 2017-03-23 02:10 EDT (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-4
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-03-23 02:10:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description Sweta Anandpara 2016-10-17 06:16:51 EDT
Description of problem:
========================
In a 4 node cluster with eventing enabled, if a brick goes down or a volume is stopped, multiple BRICK_CONNECTED and BRICK_DISCONNECTED events are seen for bricks belonging to one of the concerned nodes. And these events keep getting generated for ever, until the brick is brought back up, or the volume is deleted.

Firstly, we should not be seeing BRICK_CONNECTED messages at all, if the brick is disconnected. Secondly, we get a continuous traffic of these events at every heartbeat, but I'm suspecting that is by design. Is there a better alternative to it? Thirdly, I do not understand why am I getting these messages only for the bricks belonging to one of the nodes, and not the other ones.


Version-Release number of selected component (if applicable):
===========================================================
3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have a 4 node cluster, with eventing enabled. Create a disperse volume 
2. Stop one of the bricks with the command 'kill -15 <brick_pid>' or stop the volume using 'volume stop <volname>'
3. Monitor the events seen

Actual results:
==============
Multiple BRICK_CONNECTED and BRICK_DISCONNECTED events seen at every heartbeat


Expected results:
================
Only BRICK_DISCONNECTED events should be seen. Also, the interval at which it should get resent is to be discussed.


Additional info:
===============


{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick0/disp'}, u'event': u'BRICK_CONNECTED', u'ts': 1476697365, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:16] "POST /listen HTTP/1.1" 200 -
{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick0/disp'}, u'event': u'BRICK_DISCONNECTED', u'ts': 1476697365, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:16] "POST /listen HTTP/1.1" 200 -
{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick1/disp'}, u'event': u'BRICK_CONNECTED', u'ts': 1476697368, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:19] "POST /listen HTTP/1.1" 200 -
{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick1/disp'}, u'event': u'BRICK_DISCONNECTED', u'ts': 1476697368, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:19] "POST /listen HTTP/1.1" 200 -
{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick0/disp'}, u'event': u'BRICK_DISCONNECTED', u'ts': 1476697368, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:19] "POST /listen HTTP/1.1" 200 -
{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick0/disp'}, u'event': u'BRICK_CONNECTED', u'ts': 1476697368, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:19] "POST /listen HTTP/1.1" 200 -
{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick1/disp'}, u'event': u'BRICK_CONNECTED', u'ts': 1476697371, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:22] "POST /listen HTTP/1.1" 200 -
{u'message': {u'peer': u'10.70.46.240', u'volume': u'disp', u'brick': u'/bricks/brick1/disp'}, u'event': u'BRICK_DISCONNECTED', u'ts': 1476697371, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}
10.70.46.240 - - [12/Oct/2016 11:27:22] "POST /listen HTTP/1.1" 200 -
Comment 3 Sweta Anandpara 2016-10-18 04:49:39 EDT
I was able to see this in my setup multiple times until yesterday. Unfortunately, the volume is deleted and I am not able to reproduce this with the steps mentioned in the description on a _new_ volume. 

Reducing the severity of this BZ for now. Will go ahead with my testing and update if I hit this again. After a substantial amount of testing if I am not able to reproduce this, this BZ will meet its closure.
Comment 4 Atin Mukherjee 2016-10-21 13:52:17 EDT
Allright, so I believe you hit this race which is similar to BZ 1387544.
Comment 5 Atin Mukherjee 2016-10-21 13:54:43 EDT
Upstream patch detail is available at BZ 1387544, moving it to POST state
Comment 8 Atin Mukherjee 2016-11-08 00:33:08 EST
upstream mainline : http://review.gluster.org/#/c/15699
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/89352

upstream 3.9 patch : http://review.gluster.org/#/c/15722/ is also posted, however given the merge window is blocked as 3.9 release is round the corner, at worst same will be merged for 3.9.1.
Comment 10 Sweta Anandpara 2016-11-22 04:26:07 EST
Have not seen this again in my events testing of past few weeks. The status remains as is mentioned in comment3, and I have not really seen any unnecessary events other than multiple CLIENT_CONNECTS and CLIENT_DISCONNECTS.

Moving this BZ to verified in 3.2
Comment 12 errata-xmlrpc 2017-03-23 02:10:59 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.