Description of problem: Gluster sends frequent events to tendrl's registered webhook (callback) [0] , it is observed the tendrl webhook fails to accept the http POST request sent by glustereventsd [1] [1] https://pastebin.com/xQrwSDFt Version-Release number of selected component (if applicable): How reproducible: 70% Steps to Reproduce: 1. Generate large number of gluster native events for RHGSWA managed gluster cluster Actual results: tendrl-gluster-integration events webhook fails to handle all gluster native events Expected results: All gluster native events should be processed by tendrl-gluster-integration. Additional info: Use cherrypy to handle tendrl-gluster-integration events webhook
Package version where we saw this was tendrl-gluster-integration-1.6.3-2.el7rhgs
I tried to reproduce it with calling peer probe and peer detach in loop on several nodes at the same time but I was unable to reproduce it with tendrl-gluster-integration-1.6.3-2.el7rhgs. `for x in {1..1000}; do gluster peer detach <node>; gluster peer probe <node>; done&` Rohan, can you please provide better reproducer steps to generate large number of gluster native events?
I dont have any more info about this, please close this if not required
Comment 6 doesn't answer the need info request.
Apologies, I missed out one detail To reproduce this issue: 1) Send a very large number of HTTP POST requests to "http://$storage-node:8697/listen" 2) Check tendrl-monitoring-integration error logs or check HTTP response for error codes 500, 404 etc or check if any request has been dropped and not processed
I tested this with old version: tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch tendrl-ansible-1.5.4-7.el7rhgs.noarch tendrl-api-1.5.4-4.el7rhgs.noarch tendrl-api-httpd-1.5.4-4.el7rhgs.noarch tendrl-commons-1.5.4-9.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-14.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch tendrl-node-agent-1.5.4-16.el7rhgs.noarch tendrl-notifier-1.5.4-6.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.5.4-6.el7rhgs.noarch and with new version: tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-api-1.6.3-4.el7rhgs.noarch tendrl-api-httpd-1.6.3-4.el7rhgs.noarch tendrl-commons-1.6.3-9.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch tendrl-node-agent-1.6.3-9.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-8.el7rhgs.noarch In both cases the server was able to process all requests that I sent to it but I have also done load testing with ApacheBench tool (ab): $ ab -c 10 -n 50000 -p d.json -T application/json http://<gluster-node>:8697/listen $ cat d.json {"event": "CLIENT_DISCONNECT", "message": {"brick_path": "/gluster/brick1/brick1", "client_identifier": "172.28.128.204:49132", "client_uid": "tendrl-node-1-1340-2018/05/02-07:01:16:694187-glustervol-client-0-0-0", "server_identifier": "172.28.128.204:49152"}, "nodeid": "3f7532a7-cd02-4536-9371-c97a00a2fa3e", "ts": 1525244478} Results of this load testing suggest that after usage of cherrypy the number of requests that server can handle is significantly greater. --> VERIFIED Old version ----------- Time taken for tests: 101.149 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Time per request: 20.230 [ms] (mean) Time per request: 2.023 [ms] (mean, across all concurrent requests) Transfer rate: 80.13 [Kbytes/sec] received 244.26 kb/s sent 324.40 kb/s total New version ----------- Time taken for tests: 74.399 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Requests per second: 672.05 [#/sec] (mean) Time per request: 14.880 [ms] (mean) Time per request: 1.488 [ms] (mean, across all concurrent requests) Transfer rate: 95.82 [Kbytes/sec] received 332.09 kb/s sent 427.91 kb/s total
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616