Bug 1576794

Summary: Gluster native event webhook fails sometimes
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rohan Kanade <rkanade>
Component: web-admin-tendrl-gluster-integrationAssignee: Shubhendu Tripathi <shtripat>
Status: CLOSED ERRATA QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: fbalak, mbukatov, nthomas, rhs-bugs, sankarshan
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-ui-1.6.3-2.el7rhgs tendrl-ansible-1.6.3-4.el7rhgs tendrl-notifier-1.6.3-3.el7rhgs tendrl-commons-1.6.3-5.el7rhgs tendrl-api-1.6.3-3.el7rhgs tendrl-monitoring-integration-1.6.3-3.el7rhgs tendrl-node-agent-1.6.3-5.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 07:05:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503137    

Description Rohan Kanade 2018-05-10 12:08:28 UTC
Description of problem:

Gluster sends frequent events to tendrl's registered webhook (callback) [0] , it is observed the tendrl webhook fails to accept the http POST request sent by glustereventsd [1]

[1] https://pastebin.com/xQrwSDFt


Version-Release number of selected component (if applicable):


How reproducible:
70%

Steps to Reproduce:
1. Generate large number of gluster native events for RHGSWA managed gluster cluster


Actual results:
tendrl-gluster-integration events webhook fails to handle all gluster native events

Expected results:
All gluster native events should be processed by tendrl-gluster-integration.



Additional info:
Use cherrypy to handle tendrl-gluster-integration events webhook

Comment 2 Nishanth Thomas 2018-05-10 13:25:21 UTC
Package version where we saw this was tendrl-gluster-integration-1.6.3-2.el7rhgs

Comment 5 Filip Balák 2018-06-27 13:02:28 UTC
I tried to reproduce it with calling peer probe and peer detach in loop on several nodes at the same time but I was unable to reproduce it with tendrl-gluster-integration-1.6.3-2.el7rhgs.
`for x in {1..1000}; do gluster peer detach <node>; gluster peer probe <node>; done&`

Rohan, can you please provide better reproducer steps to generate large number of gluster native events?

Comment 6 Rohan Kanade 2018-06-28 12:50:07 UTC
I dont have any more info about this, please close this if not required

Comment 8 Martin Bukatovic 2018-06-29 16:24:13 UTC
Comment 6 doesn't answer the need info request.

Comment 9 Rohan Kanade 2018-07-10 12:25:35 UTC
Apologies, I missed out one detail

To reproduce this issue:
1) Send a very large number of HTTP POST requests to "http://$storage-node:8697/listen"

2) Check tendrl-monitoring-integration error logs or check HTTP response for error codes 500, 404 etc or check if any request has been dropped and not processed

Comment 11 Filip Balák 2018-07-26 09:05:59 UTC
I tested this with old version:
tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
tendrl-ansible-1.5.4-7.el7rhgs.noarch
tendrl-api-1.5.4-4.el7rhgs.noarch
tendrl-api-httpd-1.5.4-4.el7rhgs.noarch
tendrl-commons-1.5.4-9.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-14.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch
tendrl-node-agent-1.5.4-16.el7rhgs.noarch
tendrl-notifier-1.5.4-6.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.5.4-6.el7rhgs.noarch
and with new version:
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch

In both cases the server was able to process all requests that I sent to it but I have also done load testing with ApacheBench tool (ab):

$ ab -c 10 -n 50000 -p d.json -T application/json http://<gluster-node>:8697/listen
$ cat d.json
{"event": "CLIENT_DISCONNECT", "message": {"brick_path": "/gluster/brick1/brick1", "client_identifier": "172.28.128.204:49132", "client_uid": "tendrl-node-1-1340-2018/05/02-07:01:16:694187-glustervol-client-0-0-0", "server_identifier": "172.28.128.204:49152"}, "nodeid": "3f7532a7-cd02-4536-9371-c97a00a2fa3e", "ts": 1525244478}


Results of this load testing suggest that after usage of cherrypy the number of requests that server can handle is significantly greater. --> VERIFIED

Old version
-----------
Time taken for tests:   101.149 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Time per request:       20.230 [ms] (mean)
Time per request:       2.023 [ms] (mean, across all concurrent requests)
Transfer rate:          80.13 [Kbytes/sec] received
                        244.26 kb/s sent
                        324.40 kb/s total

New version
-----------
Time taken for tests:   74.399 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Requests per second:    672.05 [#/sec] (mean)
Time per request:       14.880 [ms] (mean)
Time per request:       1.488 [ms] (mean, across all concurrent requests)
Transfer rate:          95.82 [Kbytes/sec] received
                        332.09 kb/s sent
                        427.91 kb/s total

Comment 14 errata-xmlrpc 2018-09-04 07:05:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616