1580385 – Node is DOWN alert not cleared properly

Bug 1580385 - Node is DOWN alert not cleared properly

Summary: Node is DOWN alert not cleared properly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-notifier
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	gowtham
QA Contact:	Filip Balák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-05-21 11:44 UTC by Filip Balák
Modified:	2018-09-04 07:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:	tendrl-commons-1.6.3-8.el7rhgs tendrl-node-agent-1.6.3-8.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 07:06:51 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Hosts page with alerts (128.84 KB, image/png) 2018-05-21 11:44 UTC, Filip Balák	no flags	Details
Node down alert not cleared when node is up (71.72 KB, image/png) 2018-06-19 10:18 UTC, gowtham	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	Tendrl commons issues 979	None	closed	Tendrl Dashboard showing wrong info and alert details.	2020-05-26 05:31:33 UTC
Github	Tendrl commons issues 995	None	closed	Node up alert requires integration_id from tendrl_context	2020-05-26 05:31:33 UTC
Github	Tendrl node-agent issues 820	None	open	Tendrl Dashboard showing wrong info and alert details.	2020-05-26 05:31:33 UTC
Red Hat Bugzilla	1600910	unspecified	CLOSED	Peer is Disconnected alert not cleared properly	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2018:2616	None	None	None	2018-09-04 07:07:42 UTC

Internal Links: 1600910

Description Filip Balák 2018-05-21 11:44:51 UTC

Created attachment 1439582 [details]
Hosts page with alerts

Description of problem:
When one of the gluster nodes is shut down and after a while started, there remains an alert:
`Node <node-id> is DOWN`
All other alerts are cleared correctly.

Version-Release number of selected component (if applicable):
tendrl-ansible-1.6.3-4.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-5.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-3.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-3.el7rhgs.noarch
tendrl-node-agent-1.6.3-5.el7rhgs.noarch
tendrl-notifier-1.6.3-3.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-2.el7rhgs.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install WA.
2. Import cluster with volume.
3. Shut down one node.
4. Wait until for tendrl to raise alerts.
5. Start the node.
6. Check alerts in UI.

Actual results:
There remains one alert:
`Node <node-id> is DOWN`

Expected results:
There should be no alerts if node started correctly.

Additional info:

Comment 4 Filip Balák 2018-06-07 12:49:01 UTC

Looks ok. --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-4.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-6.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-4.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-4.el7rhgs.noarch
tendrl-node-agent-1.6.3-6.el7rhgs.noarch
tendrl-notifier-1.6.3-3.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-3.el7rhgs.noarch

Comment 5 gowtham 2018-06-19 10:14:24 UTC

I saw similar issue is the latest build, what I feel is this issue should be happening in all older build also because when I found this  root cause this for this issue I realized it not because of latest build only, this should be occured in older builds also, But in my low configuration machine I can't reproduce this issue constantly, it happening very rarely. When I tested using some high configuration machines I can reproduce this issue all the time. I fixed this issue, and PR is under review https://github.com/Tendrl/commons/pull/996

So as per discussion with Martin, I am moving this issue to assigned state.

Comment 6 gowtham 2018-06-19 10:16:01 UTC

I saw this problem in latest build also

tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-ui-1.6.3-4.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch

Comment 7 gowtham 2018-06-19 10:18:41 UTC

Created attachment 1452872 [details]
Node down alert not cleared when node is up

Comment 8 Filip Balák 2018-07-13 10:53:00 UTC

This looks ok but similar bz have been filed during testing start stop node scenarios - BZ 1600910. --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-8.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-6.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-6.el7rhgs.noarch
tendrl-node-agent-1.6.3-8.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-6.el7rhgs.noarch

Comment 10 errata-xmlrpc 2018-09-04 07:06:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.