Bug 1395613

Summary: Delayed Events if any one Webhook is slow
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Aravinda VK <avishwan>
Component: eventsapiAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: Sweta Anandpara <sanandpa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, rhinduja
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1357754 Environment:
Last Closed: 2017-03-23 06:19:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1357754, 1401261    
Bug Blocks: 1351528    

Description Aravinda VK 2016-11-16 09:53:29 UTC
+++ This bug was initially created as a clone of Bug #1357754 +++

Description of problem:
If Webhook takes more time to consume the events, it may affect the further pushing of Events. Add configurable Timeout option while calling Webhook.

# Example, Set Timeout in seconds
gluster-eventsapi config-set webhook-timeout 30

--- Additional comment from Aravinda VK on 2016-11-16 04:50:13 EST ---

Also observed that, if one webhook is slow then future events gets delayed. Webhooks should be called asynchronously to avoid delays.

Comment 4 Aravinda VK 2016-11-29 11:15:09 UTC
Upstream patch sent to Master branch http://review.gluster.org/15966

Comment 5 Aravinda VK 2016-11-29 11:16:31 UTC
Separate threads maintained for each Webhook so that slow Webhook will not affect the other webhooks. With this change we do not need configurable option for Timeout.

Comment 6 Atin Mukherjee 2016-12-04 04:48:36 UTC
upstream mainline : http://review.gluster.org/15966
upstream 3.9 : http://review.gluster.org/#/c/16021
downstream : https://code.engineering.redhat.com/gerrit/92047

Comment 8 Sweta Anandpara 2016-12-17 17:03:47 UTC
Tested and verified this on the build 3.8.4-8

Registered 2 webhooks in my 4-node cluster setup and had a delay of 5 seconds in one of the webhooks. Ended up doing multiple operations (like volume start/stop, georep start/stop, georep config set, bitrot enable/disable, quota enable/disable) which in turn generated the corresponding events. 

Delay was seen only in the webhook in which sleep (of 5 seconds) was configured. The other webhook always displayed the events as and when they were generated. 

Moving this BZ to verified in 3.2

[root@dhcp47-60 ~]# rpm -qa | grep gluster
glusterfs-3.8.4-8.el7rhgs.x86_64
glusterfs-cli-3.8.4-8.el7rhgs.x86_64
glusterfs-api-3.8.4-8.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-8.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.el7rhgs.noarch
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-8.el7rhgs.x86_64
glusterfs-server-3.8.4-8.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-fuse-3.8.4-8.el7rhgs.x86_64
glusterfs-events-3.8.4-8.el7rhgs.x86_64
glusterfs-libs-3.8.4-8.el7rhgs.x86_64
python-gluster-3.8.4-8.el7rhgs.noarch
[root@dhcp47-60 ~]# 
[root@dhcp47-60 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.47.61
Uuid: f4b259db-7add-4d01-bb5e-3c7f9c077bb4
State: Peer in Cluster (Connected)

Hostname: 10.70.47.26
Uuid: 95c24075-02aa-49c1-a1e4-c7e0775e7128
State: Peer in Cluster (Connected)

Hostname: 10.70.47.27
Uuid: 8d1aaf3a-059e-41c2-871b-6c7f5c0dd90b
State: Peer in Cluster (Connected)
[root@dhcp47-60 ~]# 
[root@dhcp47-60 ~]# 
[root@dhcp47-60 ~]# gluster-eventsapi  status
Webhooks: 
http://10.70.46.245:9000/listen
http://10.70.46.246:9000/listen

+-------------+-------------+-----------------------+
|     NODE    | NODE STATUS | GLUSTEREVENTSD STATUS |
+-------------+-------------+-----------------------+
| 10.70.47.61 |          UP |                    OK |
| 10.70.47.26 |          UP |                    OK |
| 10.70.47.27 |          UP |                    OK |
|  localhost  |          UP |                    OK |
+-------------+-------------+-----------------------+
[root@dhcp47-60 ~]# 
[root@dhcp47-60 ~]# gluster v list
gluster_shared_storage
ozone
[root@dhcp47-60 ~]# 
[root@dhcp47-60 ~]# 
[root@dhcp47-60 ~]# gluster v info
 
Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 78323062-1c40-4153-8c65-5235450ca620
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.61:/var/lib/glusterd/ss_brick
Brick2: 10.70.47.26:/var/lib/glusterd/ss_brick
Brick3: dhcp47-60.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.enable-shared-storage: enable
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: 2a014bec-4feb-45f8-b2c3-4741a64b2e45
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.60:/bricks/brick0/ozone0
Brick2: 10.70.47.61:/bricks/brick0/ozone1
Brick3: 10.70.47.26:/bricks/brick0/ozone2
Brick4: 10.70.47.27:/bricks/brick0/ozone3
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.scrub-throttle: aggressive
features.scrub-freq: hourly
features.scrub: Inactive
features.bitrot: off
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.enable-shared-storage: enable
[root@dhcp47-60 ~]# gluster v geo-rep status
 
MASTER NODE    MASTER VOL    MASTER BRICK             SLAVE USER    SLAVE                        SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
10.70.47.60    ozone         /bricks/brick0/ozone0    root          ssh://10.70.46.239::slave    10.70.46.242    Active     Changelog Crawl    2016-12-17 22:03:43          
10.70.47.26    ozone         /bricks/brick0/ozone2    root          ssh://10.70.46.239::slave    10.70.46.239    Passive    N/A                N/A                          
10.70.47.61    ozone         /bricks/brick0/ozone1    root          ssh://10.70.46.239::slave    10.70.46.240    Passive    N/A                N/A                          
10.70.47.27    ozone         /bricks/brick0/ozone3    root          ssh://10.70.46.239::slave    10.70.46.218    Active     Changelog Crawl    2016-12-17 22:03:37          
[root@dhcp47-60 ~]#

Comment 10 errata-xmlrpc 2017-03-23 06:19:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html