Bug 1576794

Summary:	Gluster native event webhook fails sometimes
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rohan Kanade <rkanade>
Component:	web-admin-tendrl-gluster-integration	Assignee:	Shubhendu Tripathi <shtripat>
Status:	CLOSED ERRATA	QA Contact:	Filip Balák <fbalak>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.4	CC:	fbalak, mbukatov, nthomas, rhs-bugs, sankarshan
Target Milestone:	---
Target Release:	RHGS 3.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	tendrl-ui-1.6.3-2.el7rhgs tendrl-ansible-1.6.3-4.el7rhgs tendrl-notifier-1.6.3-3.el7rhgs tendrl-commons-1.6.3-5.el7rhgs tendrl-api-1.6.3-3.el7rhgs tendrl-monitoring-integration-1.6.3-3.el7rhgs tendrl-node-agent-1.6.3-5.el7rhgs	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-04 07:05:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1503137

Description Rohan Kanade 2018-05-10 12:08:28 UTC

Description of problem:

Gluster sends frequent events to tendrl's registered webhook (callback) [0] , it is observed the tendrl webhook fails to accept the http POST request sent by glustereventsd [1]

[1] https://pastebin.com/xQrwSDFt


Version-Release number of selected component (if applicable):


How reproducible:
70%

Steps to Reproduce:
1. Generate large number of gluster native events for RHGSWA managed gluster cluster


Actual results:
tendrl-gluster-integration events webhook fails to handle all gluster native events

Expected results:
All gluster native events should be processed by tendrl-gluster-integration.



Additional info:
Use cherrypy to handle tendrl-gluster-integration events webhook

Comment 2 Nishanth Thomas 2018-05-10 13:25:21 UTC

Package version where we saw this was tendrl-gluster-integration-1.6.3-2.el7rhgs

Comment 5 Filip Balák 2018-06-27 13:02:28 UTC

I tried to reproduce it with calling peer probe and peer detach in loop on several nodes at the same time but I was unable to reproduce it with tendrl-gluster-integration-1.6.3-2.el7rhgs.
`for x in {1..1000}; do gluster peer detach <node>; gluster peer probe <node>; done&`

Rohan, can you please provide better reproducer steps to generate large number of gluster native events?

Comment 6 Rohan Kanade 2018-06-28 12:50:07 UTC

I dont have any more info about this, please close this if not required

Comment 8 Martin Bukatovic 2018-06-29 16:24:13 UTC

Comment 6 doesn't answer the need info request.

Comment 9 Rohan Kanade 2018-07-10 12:25:35 UTC

Apologies, I missed out one detail

To reproduce this issue:
1) Send a very large number of HTTP POST requests to "http://$storage-node:8697/listen"

2) Check tendrl-monitoring-integration error logs or check HTTP response for error codes 500, 404 etc or check if any request has been dropped and not processed

Comment 11 Filip Balák 2018-07-26 09:05:59 UTC

I tested this with old version:
tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
tendrl-ansible-1.5.4-7.el7rhgs.noarch
tendrl-api-1.5.4-4.el7rhgs.noarch
tendrl-api-httpd-1.5.4-4.el7rhgs.noarch
tendrl-commons-1.5.4-9.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-14.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch
tendrl-node-agent-1.5.4-16.el7rhgs.noarch
tendrl-notifier-1.5.4-6.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.5.4-6.el7rhgs.noarch
and with new version:
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch

In both cases the server was able to process all requests that I sent to it but I have also done load testing with ApacheBench tool (ab):

$ ab -c 10 -n 50000 -p d.json -T application/json http://<gluster-node>:8697/listen
$ cat d.json
{"event": "CLIENT_DISCONNECT", "message": {"brick_path": "/gluster/brick1/brick1", "client_identifier": "172.28.128.204:49132", "client_uid": "tendrl-node-1-1340-2018/05/02-07:01:16:694187-glustervol-client-0-0-0", "server_identifier": "172.28.128.204:49152"}, "nodeid": "3f7532a7-cd02-4536-9371-c97a00a2fa3e", "ts": 1525244478}


Results of this load testing suggest that after usage of cherrypy the number of requests that server can handle is significantly greater. --> VERIFIED

Old version
-----------
Time taken for tests:   101.149 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Time per request:       20.230 [ms] (mean)
Time per request:       2.023 [ms] (mean, across all concurrent requests)
Transfer rate:          80.13 [Kbytes/sec] received
                        244.26 kb/s sent
                        324.40 kb/s total

New version
-----------
Time taken for tests:   74.399 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Requests per second:    672.05 [#/sec] (mean)
Time per request:       14.880 [ms] (mean)
Time per request:       1.488 [ms] (mean, across all concurrent requests)
Transfer rate:          95.82 [Kbytes/sec] received
                        332.09 kb/s sent
                        427.91 kb/s total

Comment 14 errata-xmlrpc 2018-09-04 07:05:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616