+++ This bug was initially created as a clone of Bug #1388862 +++ +++ This bug was initially created as a clone of Bug #1384316 +++ Description of problem: ======================= Have a 4 node cluster, with eventing enabled. Login to N3's console/terminal, and create a distribute volume of 2 bricks residing on N1 and N2, and another distribute replicate volume 1*2, again residing on N1 and N2. Execute bitrot related commands and monitor the events that are seen. The bitrot commands when triggered from N1, N2, N3 successfully generate an event, however any command that is executed on N4 results in no events. How reproducible: ================ Seeing it across 2 volumes in the present setup Steps to Reproduce: =================== 1. Have a 4 node cluster,enable eventing. 2. Login to N3, create 'dist' with B1 of N1 and B2 of N2. Create another volume 'distrep' 1*2 , with B2 of N1 and B2 of N2. 3. Enable bitrot and play around the scrub options from the console of N1/N2/N3. 4. Login to N4, and execute the same commands in step3 on either of the volumes 'dist' or 'distrep' Actual results: ============== Events seen as expected in step3. NO events seen in step4 Expected results: ================= Events should be seen irrespective of the peer from which the command is executed. Additional info: ================ ---- N2 ---- [root@dhcp35-100 ~]# gluster v bitrot distrep scrub-throttle normal volume bitrot: success [root@dhcp35-100 ~]# gluster v bitrot dist scrub-frequency weekly volume bitrot: success [root@dhcp35-100 ~]# {u'message': {u'name': u'distrep', u'value': u'normal'}, u'event': u'BITROT_SCRUB_THROTTLE', u'ts': 1476336724, u'nodeid': u'fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5'} {u'message': {u'name': u'dist', u'value': u'weekly'}, u'event': u'BITROT_SCRUB_FREQ', u'ts': 1476336747, u'nodeid': u'fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5'} ====================================================================================================================== ----- N1 ----- [root@dhcp35-115 ~]# gluster v bitrot dist scrub pause volume bitrot: success [root@dhcp35-115 ~]# {u'message': {u'name': u'dist', u'value': u'pause'}, u'event': u'BITROT_SCRUB_OPTION', u'ts': 1476336842, u'nodeid': u'6ac165c0-317f-42ad-8262-953995171dbb'} ====================================================================================================================== ----- N3 ----- [root@dhcp35-101 ~]# gluster v bitrot dist scrub resume volume bitrot: success [root@dhcp35-101 ~]# {u'message': {u'name': u'dist', u'value': u'resume'}, u'event': u'BITROT_SCRUB_OPTION', u'ts': 1476336858, u'nodeid': u'a3bd23b9-f70a-47f5-9c95-7a271f5f1e18'} ====================================================================================================================== ---- N4 ---- [root@dhcp35-104 ~]# gluster v bitrot dist scrub pause volume bitrot: success [root@dhcp35-104 ~]# gluster v bitrot distrep scrub pause volume bitrot: success [root@dhcp35-104 ~]# [root@dhcp35-104 ~]# gluster v bitrot dist scrub resume volume bitrot: success [root@dhcp35-104 ~]# [root@dhcp35-104 ~]# gluster v bitrot dist scrub status Volume name : dist State of scrub: Active (Idle) Scrub impact: aggressive Scrub frequency: weekly Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log ========================================================= Node: dhcp35-115.lab.eng.blr.redhat.com Number of Scrubbed files: 0 Number of Skipped files: 0 Last completed scrub time: 2016-10-13 05:32:47 Duration of last scrub (D:M:H:M:S): 0:0:0:0 Error count: 0 ========================================================= Node: 10.70.35.100 Number of Scrubbed files: 0 Number of Skipped files: 0 Last completed scrub time: 2016-10-13 05:32:45 Duration of last scrub (D:M:H:M:S): 0:0:0:0 Error count: 0 ========================================================= [root@dhcp35-104 ~]# <<<<<<<<< No events seen >>>>>>>> [root@dhcp35-104 ~]# gluster-eventsapi status Webhooks: http://10.70.35.109:9000/listen +-----------------------------------+-------------+-----------------------+ | NODE | NODE STATUS | GLUSTEREVENTSD STATUS | +-----------------------------------+-------------+-----------------------+ | dhcp35-115.lab.eng.blr.redhat.com | UP | UP | | dhcp35-101.lab.eng.blr.redhat.com | UP | UP | | 10.70.35.100 | UP | UP | | localhost | UP | UP | +-----------------------------------+-------------+-----------------------+ [root@dhcp35-104 ~]# gluster peer tsatus unrecognized word: tsatus (position 1) [root@dhcp35-104 ~]# gluster peer status Number of Peers: 3 Hostname: dhcp35-115.lab.eng.blr.redhat.com Uuid: 6ac165c0-317f-42ad-8262-953995171dbb State: Peer in Cluster (Connected) Hostname: dhcp35-101.lab.eng.blr.redhat.com Uuid: a3bd23b9-f70a-47f5-9c95-7a271f5f1e18 State: Peer in Cluster (Connected) Hostname: 10.70.35.100 Uuid: fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5 State: Peer in Cluster (Connected) [root@dhcp35-104 ~]# [root@dhcp35-104 ~]# [root@dhcp35-104 ~]# --- Additional comment from Sweta Anandpara on 2016-10-13 02:55:44 EDT --- Added debuginfo package, and Atin figured out that the event IS actually being sent. Did a glustereventsd reload on the affected node N4, and started receiving events. node-reload is one of the program called when we do a webhook-add, which would in turn do a glustereventsd reload. For some reason when I did a webhook add in this setup, glustereventsd reload would have failed. Just a hypothesis as of now. Will create a new webhook and add it in this same setup. Will observe the success/failure/errors seen while doing so, and will update. Until then anyone seeing similar issue can do a work around of 'service glustereventsd reload' on the impacted node, and the cluster and its events should work as expected. --- Additional comment from Sweta Anandpara on 2016-10-13 03:38:34 EDT --- Deleted the said webhook, and tried to add the same webhook again to the cluster. That did show up an exception where it failed to run 'gluster system:: execute eventsapi.py node-reload' It fails in the same node N4 everytime, and I am unable to figure out the reason why. It works on all the other nodes of the cluster. [root@dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-del http://10.70.35.109:9000/listen Traceback (most recent call last): File "/usr/sbin/gluster-eventsapi", line 459, in <module> runcli() File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line 212, in runcli cls.run(args) File "/usr/sbin/gluster-eventsapi", line 274, in run sync_to_peers() File "/usr/sbin/gluster-eventsapi", line 129, in sync_to_peers out = execute_in_peers("node-reload") File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line 125, in execute_in_peers raise GlusterCmdException((rc, out, err, " ".join(cmd))) gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Commit failed on 10.70.35.104. Error: Unable to end. Error : Success\n', 'gluster system:: execute eventsapi.py node-reload') [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# gluster-eventsapi status Webhooks: None +-----------------------------------+-------------+-----------------------+ | NODE | NODE STATUS | GLUSTEREVENTSD STATUS | +-----------------------------------+-------------+-----------------------+ | 10.70.35.100 | UP | UP | | 10.70.35.104 | UP | UP | | dhcp35-115.lab.eng.blr.redhat.com | UP | UP | | localhost | UP | UP | +-----------------------------------+-------------+-----------------------+ [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-test http://10.70.35.109:9000/listen +-----------------------------------+-------------+----------------+ | NODE | NODE STATUS | WEBHOOK STATUS | +-----------------------------------+-------------+----------------+ | 10.70.35.100 | UP | OK | | 10.70.35.104 | UP | OK | | dhcp35-115.lab.eng.blr.redhat.com | UP | OK | | localhost | UP | OK | +-----------------------------------+-------------+----------------+ [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-add http://10.70.35.109:9000/listen Traceback (most recent call last): File "/usr/sbin/gluster-eventsapi", line 459, in <module> runcli() File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line 212, in runcli cls.run(args) File "/usr/sbin/gluster-eventsapi", line 232, in run sync_to_peers() File "/usr/sbin/gluster-eventsapi", line 129, in sync_to_peers out = execute_in_peers("node-reload") File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line 125, in execute_in_peers raise GlusterCmdException((rc, out, err, " ".join(cmd))) gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Commit failed on 10.70.35.104. Error: Unable to end. Error : Success\n', 'gluster system:: execute eventsapi.py node-reload') [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# gluster-eventsapi status Webhooks: http://10.70.35.109:9000/listen +-----------------------------------+-------------+-----------------------+ | NODE | NODE STATUS | GLUSTEREVENTSD STATUS | +-----------------------------------+-------------+-----------------------+ | 10.70.35.100 | UP | UP | | 10.70.35.104 | UP | UP | | dhcp35-115.lab.eng.blr.redhat.com | UP | UP | | localhost | UP | UP | +-----------------------------------+-------------+-----------------------+ [root@dhcp35-101 yum.repos.d]# [root@dhcp35-101 yum.repos.d]# --- Additional comment from Aravinda VK on 2016-10-17 08:36:14 EDT --- This is similar to BZ 1379963. `glustereventsd` on one node is not reloaded and it doesn't know the information about new Webhook added. --- Additional comment from Worker Ant on 2016-10-26 06:29:03 EDT --- REVIEW: http://review.gluster.org/15731 (eventsapi: Auto reload Webhooks data when modified) posted (#1) for review on master by Aravinda VK (avishwan) --- Additional comment from Worker Ant on 2016-11-17 01:28:21 EST --- REVIEW: http://review.gluster.org/15731 (eventsapi: Auto reload Webhooks data when modified) posted (#2) for review on master by Aravinda VK (avishwan) --- Additional comment from Worker Ant on 2016-11-17 06:10:31 EST --- COMMIT: http://review.gluster.org/15731 committed in master by Aravinda VK (avishwan) ------ commit b7ebffbda9ba784ccfae6d1a90766d5310cdaa15 Author: Aravinda VK <avishwan> Date: Wed Oct 26 15:51:17 2016 +0530 eventsapi: Auto reload Webhooks data when modified glustereventsd depends on reload signal to reload the Webhooks configurations. But if reload signal missed, no events will be sent to newly added Webhook. Added auto reload based on webhooks file mtime. Before pushing events to Webhooks, reloads webhooks configurations if previously recorded mtime is different than current mtime. BUG: 1388862 Change-Id: I83a41d6a52d8fa1d70e88294298f4a5c396d4158 Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/15731 Reviewed-by: Prashanth Pai <ppai> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
REVIEW: http://review.gluster.org/15963 (eventsapi: Auto reload Webhooks data when modified) posted (#1) for review on release-3.9 by Aravinda VK (avishwan)
COMMIT: http://review.gluster.org/15963 committed in release-3.9 by Aravinda VK (avishwan) ------ commit 8485cec47b9f6a60d3ca2c8804e7f37f7b94c64b Author: Aravinda VK <avishwan> Date: Wed Oct 26 15:51:17 2016 +0530 eventsapi: Auto reload Webhooks data when modified glustereventsd depends on reload signal to reload the Webhooks configurations. But if reload signal missed, no events will be sent to newly added Webhook. Added auto reload based on webhooks file mtime. Before pushing events to Webhooks, reloads webhooks configurations if previously recorded mtime is different than current mtime. > Reviewed-on: http://review.gluster.org/15731 > Reviewed-by: Prashanth Pai <ppai> > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> BUG: 1399482 Change-Id: I83a41d6a52d8fa1d70e88294298f4a5c396d4158 Signed-off-by: Aravinda VK <avishwan> (cherry picked from commit b7ebffbda9ba784ccfae6d1a90766d5310cdaa15) Reviewed-on: http://review.gluster.org/15963 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/