Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1576794 - Gluster native event webhook fails sometimes
Gluster native event webhook fails sometimes
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-gluster-integration (Show other bugs)
3.4
Unspecified Unspecified
unspecified Severity high
: ---
: RHGS 3.4.0
Assigned To: Shubhendu Tripathi
Filip Balák
:
Depends On:
Blocks: 1503137
  Show dependency treegraph
 
Reported: 2018-05-10 08:08 EDT by Rohan Kanade
Modified: 2018-09-04 03:06 EDT (History)
5 users (show)

See Also:
Fixed In Version: tendrl-ui-1.6.3-2.el7rhgs tendrl-ansible-1.6.3-4.el7rhgs tendrl-notifier-1.6.3-3.el7rhgs tendrl-commons-1.6.3-5.el7rhgs tendrl-api-1.6.3-3.el7rhgs tendrl-monitoring-integration-1.6.3-3.el7rhgs tendrl-node-agent-1.6.3-5.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-04 03:05:38 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github /Tendrl/gluster-integration/issues/629 None None None 2018-05-10 08:08 EDT
Red Hat Product Errata RHSA-2018:2616 None None None 2018-09-04 03:06 EDT

  None (edit)
Description Rohan Kanade 2018-05-10 08:08:28 EDT
Description of problem:

Gluster sends frequent events to tendrl's registered webhook (callback) [0] , it is observed the tendrl webhook fails to accept the http POST request sent by glustereventsd [1]

[1] https://pastebin.com/xQrwSDFt


Version-Release number of selected component (if applicable):


How reproducible:
70%

Steps to Reproduce:
1. Generate large number of gluster native events for RHGSWA managed gluster cluster


Actual results:
tendrl-gluster-integration events webhook fails to handle all gluster native events

Expected results:
All gluster native events should be processed by tendrl-gluster-integration.



Additional info:
Use cherrypy to handle tendrl-gluster-integration events webhook
Comment 2 Nishanth Thomas 2018-05-10 09:25:21 EDT
Package version where we saw this was tendrl-gluster-integration-1.6.3-2.el7rhgs
Comment 5 Filip Balák 2018-06-27 09:02:28 EDT
I tried to reproduce it with calling peer probe and peer detach in loop on several nodes at the same time but I was unable to reproduce it with tendrl-gluster-integration-1.6.3-2.el7rhgs.
`for x in {1..1000}; do gluster peer detach <node>; gluster peer probe <node>; done&`

Rohan, can you please provide better reproducer steps to generate large number of gluster native events?
Comment 6 Rohan Kanade 2018-06-28 08:50:07 EDT
I dont have any more info about this, please close this if not required
Comment 8 Martin Bukatovic 2018-06-29 12:24:13 EDT
Comment 6 doesn't answer the need info request.
Comment 9 Rohan Kanade 2018-07-10 08:25:35 EDT
Apologies, I missed out one detail

To reproduce this issue:
1) Send a very large number of HTTP POST requests to "http://$storage-node:8697/listen"

2) Check tendrl-monitoring-integration error logs or check HTTP response for error codes 500, 404 etc or check if any request has been dropped and not processed
Comment 11 Filip Balák 2018-07-26 05:05:59 EDT
I tested this with old version:
tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
tendrl-ansible-1.5.4-7.el7rhgs.noarch
tendrl-api-1.5.4-4.el7rhgs.noarch
tendrl-api-httpd-1.5.4-4.el7rhgs.noarch
tendrl-commons-1.5.4-9.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-14.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch
tendrl-node-agent-1.5.4-16.el7rhgs.noarch
tendrl-notifier-1.5.4-6.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.5.4-6.el7rhgs.noarch
and with new version:
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch

In both cases the server was able to process all requests that I sent to it but I have also done load testing with ApacheBench tool (ab):

$ ab -c 10 -n 50000 -p d.json -T application/json http://<gluster-node>:8697/listen
$ cat d.json
{"event": "CLIENT_DISCONNECT", "message": {"brick_path": "/gluster/brick1/brick1", "client_identifier": "172.28.128.204:49132", "client_uid": "tendrl-node-1-1340-2018/05/02-07:01:16:694187-glustervol-client-0-0-0", "server_identifier": "172.28.128.204:49152"}, "nodeid": "3f7532a7-cd02-4536-9371-c97a00a2fa3e", "ts": 1525244478}


Results of this load testing suggest that after usage of cherrypy the number of requests that server can handle is significantly greater. --> VERIFIED

Old version
-----------
Time taken for tests:   101.149 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Time per request:       20.230 [ms] (mean)
Time per request:       2.023 [ms] (mean, across all concurrent requests)
Transfer rate:          80.13 [Kbytes/sec] received
                        244.26 kb/s sent
                        324.40 kb/s total

New version
-----------
Time taken for tests:   74.399 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Requests per second:    672.05 [#/sec] (mean)
Time per request:       14.880 [ms] (mean)
Time per request:       1.488 [ms] (mean, across all concurrent requests)
Transfer rate:          95.82 [Kbytes/sec] received
                        332.09 kb/s sent
                        427.91 kb/s total
Comment 14 errata-xmlrpc 2018-09-04 03:05:38 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.