1616215 – All alerts Service: glustershd is disconnected in cluster are cleared when service starts on one node

Bug 1616215 - All alerts Service: glustershd is disconnected in cluster are cleared when service starts on one node

Summary: All alerts Service: glustershd is disconnected in cluster are cleared when se...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-notifier
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	gowtham
QA Contact:	Filip Balák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-08-15 09:50 UTC by Filip Balák
Modified:	2018-09-04 07:09 UTC (History)
CC List:	5 users (show)
Fixed In Version:	tendrl-gluster-integration-1.6.3-10.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 07:09:07 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Events page with cleared alerts and terminal indicating that glustershd runs only on first node (275.56 KB, image/png) 2018-08-15 09:50 UTC, Filip Balák	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	Tendrl gluster-integration issues 694	None	None	None	2018-08-16 14:50:04 UTC
Red Hat Bugzilla	1611601	unspecified	CLOSED	Alert Service: glustershd is disconnected in cluster is not cleared	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1616208	unspecified	CLOSED	glustershd alerts should mention affected node	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2018:2616	None	None	None	2018-09-04 07:09:16 UTC

Internal Links: 1611601 1616208

Description Filip Balák 2018-08-15 09:50:55 UTC

Created attachment 1476107 [details]
Events page with cleared alerts and terminal indicating that glustershd runs only on first node

Description of problem:
If glustershd service is stopped on multiple machines and on one is started then `Service: glustershd is disconnected in cluster <cluster>` is cleared and UI reports no alerts related to other nodes.

Version-Release number of selected component (if applicable):
tendrl-ansible-1.6.3-6.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-12.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-10.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch
tendrl-node-agent-1.6.3-10.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-10.el7rhgs.noarch

How reproducible:
100%

Steps to Reproduce:
1. Import cluster with distributed replicated volume.
2. Connect to more than one nodes of the volume and get pid of glustershd process:
$ cat /var/run/gluster/glustershd/glustershd.pid
<glustershd-pid>
3. kill <glustershd-pid> on all connected machines.
4. Wait for alert in UI.
5. restart glusterd service on node of the with killed glustershd. This should start glustershd. Don't restart the service on other nodes.

Actual results:
`Service: glustershd is disconnected in cluster <cluster>` is cleared but on some nodes glustershd is still not running.

Expected results:
There should remain alerts that service glustershd is disconnected on other nodes and that problems with glustershd are not resolved.

Additional info:

Comment 2 gowtham 2018-08-16 14:51:31 UTC

PR is under review: https://github.com/Tendrl/gluster-integration/pull/695

Comment 3 gowtham 2018-08-17 10:14:21 UTC

While raising SVC_CONNECTED and SVC_DISCONNECTED alert we are not assigning peer host_name to identify each node alert uniquely. So all nodes alerts are overwritten is the same alert again and again. That why we see only one SVC related alert in UI. And that is also cleared when any one node glustershd is back to normal. 

So, I have added peer host_name while raising an alert. So each node will raise its own SVC related alert and clear will happen for its own alert only.

Comment 4 Martin Bukatovic 2018-08-20 11:13:07 UTC

QE team will test this bug as noted in the description.

Comment 5 Nishanth Thomas 2018-08-20 11:17:46 UTC

https://github.com/Tendrl/gluster-integration/pull/695

Comment 6 Martin Bukatovic 2018-08-21 08:39:47 UTC

All acks provided, attaching to the tracker.

Comment 8 Filip Balák 2018-08-21 12:55:56 UTC

Service alerts seem to be cleared correctly. --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-7.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-12.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-10.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-10.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch
tendrl-node-agent-1.6.3-10.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-11.el7rhgs.noarch

Comment 10 errata-xmlrpc 2018-09-04 07:09:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.