+++ This bug was initially created as a clone of Bug #1357754 +++ Description of problem: If Webhook takes more time to consume the events, it may affect the further pushing of Events. Add configurable Timeout option while calling Webhook. # Example, Set Timeout in seconds gluster-eventsapi config-set webhook-timeout 30 --- Additional comment from Aravinda VK on 2016-11-16 04:50:13 EST --- Also observed that, if one webhook is slow then future events gets delayed. Webhooks should be called asynchronously to avoid delays.
Upstream patch sent to Master branch http://review.gluster.org/15966
Separate threads maintained for each Webhook so that slow Webhook will not affect the other webhooks. With this change we do not need configurable option for Timeout.
upstream mainline : http://review.gluster.org/15966 upstream 3.9 : http://review.gluster.org/#/c/16021 downstream : https://code.engineering.redhat.com/gerrit/92047
Tested and verified this on the build 3.8.4-8 Registered 2 webhooks in my 4-node cluster setup and had a delay of 5 seconds in one of the webhooks. Ended up doing multiple operations (like volume start/stop, georep start/stop, georep config set, bitrot enable/disable, quota enable/disable) which in turn generated the corresponding events. Delay was seen only in the webhook in which sleep (of 5 seconds) was configured. The other webhook always displayed the events as and when they were generated. Moving this BZ to verified in 3.2 [root@dhcp47-60 ~]# rpm -qa | grep gluster glusterfs-3.8.4-8.el7rhgs.x86_64 glusterfs-cli-3.8.4-8.el7rhgs.x86_64 glusterfs-api-3.8.4-8.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-8.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.el7rhgs.noarch gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-8.el7rhgs.x86_64 glusterfs-server-3.8.4-8.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-fuse-3.8.4-8.el7rhgs.x86_64 glusterfs-events-3.8.4-8.el7rhgs.x86_64 glusterfs-libs-3.8.4-8.el7rhgs.x86_64 python-gluster-3.8.4-8.el7rhgs.noarch [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# gluster peer status Number of Peers: 3 Hostname: 10.70.47.61 Uuid: f4b259db-7add-4d01-bb5e-3c7f9c077bb4 State: Peer in Cluster (Connected) Hostname: 10.70.47.26 Uuid: 95c24075-02aa-49c1-a1e4-c7e0775e7128 State: Peer in Cluster (Connected) Hostname: 10.70.47.27 Uuid: 8d1aaf3a-059e-41c2-871b-6c7f5c0dd90b State: Peer in Cluster (Connected) [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# gluster-eventsapi status Webhooks: http://10.70.46.245:9000/listen http://10.70.46.246:9000/listen +-------------+-------------+-----------------------+ | NODE | NODE STATUS | GLUSTEREVENTSD STATUS | +-------------+-------------+-----------------------+ | 10.70.47.61 | UP | OK | | 10.70.47.26 | UP | OK | | 10.70.47.27 | UP | OK | | localhost | UP | OK | +-------------+-------------+-----------------------+ [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# gluster v list gluster_shared_storage ozone [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# [root@dhcp47-60 ~]# gluster v info Volume Name: gluster_shared_storage Type: Replicate Volume ID: 78323062-1c40-4153-8c65-5235450ca620 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.47.61:/var/lib/glusterd/ss_brick Brick2: 10.70.47.26:/var/lib/glusterd/ss_brick Brick3: dhcp47-60.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on cluster.enable-shared-storage: enable Volume Name: ozone Type: Distributed-Replicate Volume ID: 2a014bec-4feb-45f8-b2c3-4741a64b2e45 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.60:/bricks/brick0/ozone0 Brick2: 10.70.47.61:/bricks/brick0/ozone1 Brick3: 10.70.47.26:/bricks/brick0/ozone2 Brick4: 10.70.47.27:/bricks/brick0/ozone3 Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on features.scrub-throttle: aggressive features.scrub-freq: hourly features.scrub: Inactive features.bitrot: off changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on cluster.enable-shared-storage: enable [root@dhcp47-60 ~]# gluster v geo-rep status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.47.60 ozone /bricks/brick0/ozone0 root ssh://10.70.46.239::slave 10.70.46.242 Active Changelog Crawl 2016-12-17 22:03:43 10.70.47.26 ozone /bricks/brick0/ozone2 root ssh://10.70.46.239::slave 10.70.46.239 Passive N/A N/A 10.70.47.61 ozone /bricks/brick0/ozone1 root ssh://10.70.46.239::slave 10.70.46.240 Passive N/A N/A 10.70.47.27 ozone /bricks/brick0/ozone3 root ssh://10.70.46.239::slave 10.70.46.218 Active Changelog Crawl 2016-12-17 22:03:37 [root@dhcp47-60 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html