Description of problem ====================== It seems that tendrl fails to configure gluster event api daemon so that tendrl-notifier (or any other tendrl component) won't receive any events. This means that tendrl can't neither process gluster native events for it's internal purposes nor resent some of them as alerts. Version-Release =============== tendrl-notifier-1.5.4-3.el7rhgs.noarch [root@usm1-server ~]# rpm -qa | grep tendrl | sort tendrl-ansible-1.5.4-1.el7rhgs.noarch tendrl-api-1.5.4-2.el7rhgs.noarch tendrl-api-httpd-1.5.4-2.el7rhgs.noarch tendrl-commons-1.5.4-4.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch tendrl-node-agent-1.5.4-5.el7rhgs.noarch tendrl-notifier-1.5.4-3.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch tendrl-ui-1.5.4-4.el7rhgs.noarch [root@usm1-gl1 ~]# rpm -qa | grep tendrl | sort tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch tendrl-commons-1.5.4-4.el7rhgs.noarch tendrl-gluster-integration-1.5.4-4.el7rhgs.noarch tendrl-node-agent-1.5.4-5.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHGS WA using tendrl-ansible 2. Configure alerting to send events via both smtp and snmp 3. Import gluster trusted storage pool with a volume 4. Check status of gluster event daemons across trusted storage pool 5. Try reproducer for BZ 1516968 to create situation when some gluster native event is send When qe playbooks for alerting test setup are used: * https://github.com/usmqe/usmqe-setup/blob/master/test_setup.smtp.yml * https://github.com/usmqe/usmqe-setup/blob/master/test_setup.snmp.yml Actual results ============== Gluster event daemons doesn't contain list any tendrl component as a listener: ``` # gluster-eventsapi status Webhooks: http://0.0.0.0:8697/listen +-------------------------------+-------------+-----------------------+ | NODE | NODE STATUS | GLUSTEREVENTSD STATUS | +-------------------------------+-------------+-----------------------+ | mbukatov-usm1-gl1.example.com | UP | OK | | mbukatov-usm1-gl2.example.com | UP | OK | | mbukatov-usm1-gl3.example.com | UP | OK | | mbukatov-usm1-gl4.example.com | UP | OK | | mbukatov-usm1-gl6.example.com | UP | OK | | localhost | UP | OK | +-------------------------------+-------------+-----------------------+ ``` So that when gluster tries to send an event (eg. POSIX_HEALTH_CHECK_FAILED as described in BZ 1516968), we can see error about push failure in events.log file: ``` # grep POSIX_HEALTH_CHECK_FAILED /var/log/glusterfs/events.log [2017-11-23 11:09:01,087] WARNING [utils - 198:publish_to_webhook] - Event push failed to URL: http://0.0.0.0:8697/listen, Event: {"event": "POSIX_HEALTH_CHECK_FAILED", "message": {"brick": "usm1-gl1.example.com:/mnt/brick_gama_disperse_1/1", "error": "No such file or directory", "op": "open", "path": "/mnt/brick_gama_disperse_1/1/.glusterfs/health_check"}, "nodeid": "45d388d5-3979-4ee2-bdf6-e2a0fbf4ac7d", "ts": 1511453341}, Status Code: 500 ``` Expected results ================ There is some tendrl component configured as a listener for gluster native events. So that when gluster tires to send event to it, the deliver ends with success. Additional info =============== Based on discussion with Nishanth under BZ 1516968.
(In reply to sankarshan from comment #3) > So, tendrl-notifier is not expected to receive alerts. It is the processing > part of the alerting stack. Is this being filed on the wrong component? This is possible. Feel free to use you inner knowledge of tendrl stack and assign this BZ to more appropriate component, such as node-agent or gluster-integration. This BZ seems to prevent tendrl to receive native gluster events, and so test cases which are related to (re)sending such events as tendrl alerts are failing.
@mbukatov, As you mentioned in the bug description, there is no issue with the way tendrl subscribed to the gluster. The output(pasted in description) shows as expected. It's not required to subscribe each event separately, once subscribed all the events will be send to the web-hook. There might be issues with specific events being not reported(e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1516968) needs to be raised and tracked separately. Hence the issue described in this bug is `NOT A BUG` and nack from development
If you agree with the above, I would like to close the bug
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
Swetha, is it possible for tendrl to receive gluster events when the gluster events daemons is configured like this? ``` # gluster-eventsapi status Webhooks: http://0.0.0.0:8697/listen +-------------------------------+-------------+-----------------------+ | NODE | NODE STATUS | GLUSTEREVENTSD STATUS | +-------------------------------+-------------+-----------------------+ | mbukatov-usm1-gl1.example.com | UP | OK | | mbukatov-usm1-gl2.example.com | UP | OK | | mbukatov-usm1-gl3.example.com | UP | OK | | mbukatov-usm1-gl4.example.com | UP | OK | | mbukatov-usm1-gl6.example.com | UP | OK | | localhost | UP | OK | +-------------------------------+-------------+-----------------------+ ``` I don't think so. But since this opinion is based on my limited understanding of the feature and the fact that I haven't be able to receive any tendrl alert for any gluster native event so far, I'm asking you to evaluate. (In reply to Nishanth Thomas from comment #10) > If you agree with the above, I would like to close the bug I have my doubts about this, but if Swetha verifies that the setup looks ok and gluster is able to receive events, I'm ok to close it. That said I would wait with closing until I'm able to verify that I'm able to receive an alert for any gluster native event.
Based on my previous comment, reopening the BZ.
@arvinda, could you please answer martin's query
In every Gluster node, Tendrl gluster integration service is running on port :8697 and registered with gluster-eventsapi as webhook. It will receive all generated Gluster events. (Registered using `gluster-eventsapi webhook-add` command) Command `gluster-eventsapi status` shows the status of `glustereventsd` process in each Gluster nodes and also registered webhook. When Glustereventsd detects an event it pushes that event to registered webhooks using REST call. If Response from Webhook is not HTTP Status code 200, then it logs error. In the above issue HTTP error code is 500, that means and internal server error in the listener(tendrl gluster integration service) To understand better, gluster eventsapi calls similar to curl command bellow curl -XPOST -H "Content-Type: application/json" http://0.0.0.0:8697/listen -d '{"event": "POSIX_HEALTH_CHECK_FAILED", "message": {"brick": "usm1-gl1.example.com:/mnt/brick_gama_disperse_1/1", "error": "No such file or directory", "op": "open", "path": "/mnt/brick_gama_disperse_1/1/.glusterfs/health_check"}, "nodeid": "45d388d5-3979-4ee2-bdf6-e2a0fbf4ac7d", "ts": 1511453341}' Above command returns HTTP status code 500 instead of 200. These webhook errors will not affect Glustereventsd for sending other messages since it logs the error and continues. Fix is required in the Tendrl Gluster integration service to handle this event.
The issue is specific to handling POSIX_HEALTH_CHECK_FAILED and it is fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=1516968. If this explanation is satisfactory, please move the bug to appropriate state.