Bug 1517468

Summary: RHGS WA component should be listening to all native events originating from RHGS
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Bukatovic <mbukatov>
Component: web-admin-tendrl-gluster-integrationAssignee: Nishanth Thomas <nthomas>
Status: CLOSED NOTABUG QA Contact: sds-qe-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: abhaumik, avishwan, mbukatov, rhinduja, rhs-bugs, sankarshan, shtripat
Target Milestone: ---Keywords: Reopened, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-30 08:45:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1516968    

Description Martin Bukatovic 2017-11-25 17:49:47 UTC
Description of problem
======================

It seems that tendrl fails to configure gluster event api daemon so that
tendrl-notifier (or any other tendrl component) won't receive any events.

This means that tendrl can't neither process gluster native events for it's
internal purposes nor resent some of them as alerts.

Version-Release
===============

tendrl-notifier-1.5.4-3.el7rhgs.noarch

[root@usm1-server ~]# rpm -qa | grep tendrl | sort
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
tendrl-notifier-1.5.4-3.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-ui-1.5.4-4.el7rhgs.noarch

[root@usm1-gl1 ~]# rpm -qa | grep tendrl | sort
tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
tendrl-gluster-integration-1.5.4-4.el7rhgs.noarch
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHGS WA using tendrl-ansible
2. Configure alerting to send events via both smtp and snmp
3. Import gluster trusted storage pool with a volume
4. Check status of gluster event daemons across trusted storage
   pool
5. Try reproducer for BZ 1516968 to create situation when some
   gluster native event is send

When qe playbooks for alerting test setup are used:

* https://github.com/usmqe/usmqe-setup/blob/master/test_setup.smtp.yml
* https://github.com/usmqe/usmqe-setup/blob/master/test_setup.snmp.yml

Actual results
==============

Gluster event daemons doesn't contain list any tendrl component as a listener:

```
# gluster-eventsapi status
Webhooks: 
http://0.0.0.0:8697/listen

+-------------------------------+-------------+-----------------------+
|     NODE                      | NODE STATUS | GLUSTEREVENTSD STATUS |
+-------------------------------+-------------+-----------------------+
| mbukatov-usm1-gl1.example.com |          UP |                    OK |
| mbukatov-usm1-gl2.example.com |          UP |                    OK |
| mbukatov-usm1-gl3.example.com |          UP |                    OK |
| mbukatov-usm1-gl4.example.com |          UP |                    OK |
| mbukatov-usm1-gl6.example.com |          UP |                    OK |
| localhost                     |          UP |                    OK |
+-------------------------------+-------------+-----------------------+
```

So that when gluster tries to send an event (eg. POSIX_HEALTH_CHECK_FAILED
as described in BZ 1516968), we can see error about push failure in events.log
file:

```
# grep POSIX_HEALTH_CHECK_FAILED /var/log/glusterfs/events.log                                                              
[2017-11-23 11:09:01,087] WARNING [utils - 198:publish_to_webhook] - Event push failed to URL: http://0.0.0.0:8697/listen, Event: {"event": "POSIX_HEALTH_CHECK_FAILED", "message": {"brick": "usm1-gl1.example.com:/mnt/brick_gama_disperse_1/1", "error": "No such file or directory", "op": "open", "path": "/mnt/brick_gama_disperse_1/1/.glusterfs/health_check"}, "nodeid": "45d388d5-3979-4ee2-bdf6-e2a0fbf4ac7d", "ts": 1511453341}, Status Code: 500
```

Expected results
================

There is some tendrl component configured as a listener for gluster native
events.

So that when gluster tires to send event to it, the deliver ends with success.


Additional info
===============

Based on discussion with Nishanth under BZ 1516968.

Comment 4 Martin Bukatovic 2017-11-27 07:15:38 UTC
(In reply to sankarshan from comment #3)
> So, tendrl-notifier is not expected to receive alerts. It is the processing
> part of the alerting stack. Is this being filed on the wrong component?

This is possible. Feel free to use you inner knowledge of tendrl stack
and assign this BZ to more appropriate component, such as node-agent
or gluster-integration.

This BZ seems to prevent tendrl to receive native gluster events, and so
test cases which are related to (re)sending such events as tendrl alerts
are failing.

Comment 9 Nishanth Thomas 2017-11-28 07:43:52 UTC
@mbukatov, As you mentioned in the bug description, there is no issue with the way tendrl subscribed to the gluster. The output(pasted in description) shows as expected. It's not required to subscribe each event separately, once subscribed all the events will be send to the web-hook.

There might be issues with specific events being not reported(e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1516968) needs to be raised and tracked separately.

Hence the issue described in this bug is `NOT A BUG` and nack from development

Comment 10 Nishanth Thomas 2017-11-28 07:45:02 UTC
If you agree with the above, I would like to close the bug

Comment 11 RHEL Program Management 2017-11-28 07:52:43 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 12 Martin Bukatovic 2017-11-28 07:58:09 UTC
Swetha, is it possible for tendrl to receive gluster events when the gluster
events daemons is configured like this?

```
# gluster-eventsapi status
Webhooks: 
http://0.0.0.0:8697/listen

+-------------------------------+-------------+-----------------------+
|     NODE                      | NODE STATUS | GLUSTEREVENTSD STATUS |
+-------------------------------+-------------+-----------------------+
| mbukatov-usm1-gl1.example.com |          UP |                    OK |
| mbukatov-usm1-gl2.example.com |          UP |                    OK |
| mbukatov-usm1-gl3.example.com |          UP |                    OK |
| mbukatov-usm1-gl4.example.com |          UP |                    OK |
| mbukatov-usm1-gl6.example.com |          UP |                    OK |
| localhost                     |          UP |                    OK |
+-------------------------------+-------------+-----------------------+
```

I don't think so. But since this opinion is based on my limited understanding
of the feature and the fact that I haven't be able to receive any tendrl alert
for any gluster native event so far, I'm asking you to evaluate.


(In reply to Nishanth Thomas from comment #10)
> If you agree with the above, I would like to close the bug

I have my doubts about this, but if Swetha verifies that the setup looks ok
and gluster is able to receive events, I'm ok to close it. That said I would
wait with closing until I'm able to verify that I'm able to receive an alert
for any gluster native event.

Comment 13 Martin Bukatovic 2017-11-28 07:58:45 UTC
Based on my previous comment, reopening the BZ.

Comment 14 Nishanth Thomas 2017-11-28 09:42:10 UTC
@arvinda, could you please answer martin's query

Comment 15 Aravinda VK 2017-11-28 09:59:11 UTC
In every Gluster node, Tendrl gluster integration service is running on port :8697 and registered with gluster-eventsapi as webhook. It will receive all generated Gluster events. (Registered using `gluster-eventsapi webhook-add` command)

Command `gluster-eventsapi status` shows the status of `glustereventsd` process in each Gluster nodes and also registered webhook.

When Glustereventsd detects an event it pushes that event to registered webhooks using REST call. If Response from Webhook is not HTTP Status code 200, then it logs error. In the above issue HTTP error code is 500, that means and internal server error in the listener(tendrl gluster integration service)

To understand better, gluster eventsapi calls similar to curl command bellow

    curl -XPOST -H "Content-Type: application/json" http://0.0.0.0:8697/listen -d '{"event": "POSIX_HEALTH_CHECK_FAILED", "message": {"brick": "usm1-gl1.example.com:/mnt/brick_gama_disperse_1/1", "error": "No such file or directory", "op": "open", "path": "/mnt/brick_gama_disperse_1/1/.glusterfs/health_check"}, "nodeid": "45d388d5-3979-4ee2-bdf6-e2a0fbf4ac7d", "ts": 1511453341}'

Above command returns HTTP status code 500 instead of 200.

These webhook errors will not affect Glustereventsd for sending other messages since it logs the error and continues.

Fix is required in the Tendrl Gluster integration service to handle this event.

Comment 16 Nishanth Thomas 2017-11-28 10:16:49 UTC
The issue is specific to handling POSIX_HEALTH_CHECK_FAILED and it is fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=1516968.

If this explanation is satisfactory, please move the bug to appropriate state.