1517233 – clearing info alert doesn't remove warning alert

Bug 1517233 - clearing info alert doesn't remove warning alert

Summary: clearing info alert doesn't remove warning alert

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-monitoring-integration
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	gowtham
QA Contact:	Filip Balák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503134
TreeView+	depends on / blocked

Reported:	2017-11-24 11:17 UTC by Lubos Trilety
Modified:	2018-08-24 09:55 UTC (History)
CC List:	6 users (show)
Fixed In Version:	tendrl-ansible-1.6.1-2.el7rhgs.noarch.rpm, tendrl-api-1.6.1-1.el7rhgs.noarch.rpm, tendrl-commons-1.6.1-1.el7rhgs.noarch.rpm, tendrl-monitoring-integration-1.6.1-1.el7rhgs.noarch.rpm, tendrl-node-agent-1.6.1-1.el7, tendrl-ui-1.6.1-1.el7rhgs.noarch.rpm,
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-08-15 12:50:35 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
stale alert (108.81 KB, image/png) 2017-11-24 11:17 UTC, Lubos Trilety	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	Tendrl commons issues 800	None	None	None	2018-01-09 09:34:15 UTC
Github	Tendrl gluster-integration pull 543	None	None	None	2018-01-09 09:35:39 UTC
Red Hat Bugzilla	1611601	unspecified	CLOSED	Alert Service: glustershd is disconnected in cluster is not cleared	2021-02-22 00:41:40 UTC

Internal Links: 1611601

Description Lubos Trilety 2017-11-24 11:17:02 UTC

Created attachment 1358621 [details]
stale alert

Description of problem:
'Service: glustershd is disconnected in cluster' warning alert is not cleared with 'Service: glustershd is disconnected in cluster' event. It stays forever in Alerts drawer, no other 'Service: glustershd is disconnected in cluster' event clears it.


Version-Release number of selected component (if applicable):
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-ui-1.5.4-4.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
tendrl-notifier-1.5.4-3.el7rhgs.noarch

How reproducible:
30%

Steps to Reproduce:
1. Restart glusterd service when 'another transaction is in progress' for a volume. In my case some stale lock happened.
2.
3.


Actual results:
In first scenario RHGSWA generates two events with the same timestamp 'Service: glustershd is disconnected in cluster' and 'Service: glustershd is disconnected in cluster'. For the first one an alert is created (mail and snmp trap send if configured). The second event is 'ignored', no clear alert is generated.


Expected results:
A clear event should be processed even when it happens almost the same time.


Additional info:
Any clearing event which comes almost the same time as original alert doesn't clear the alert. I was able to have 'Status of peer: <hostname> in cluster <cluster_ID> changed from Connected to Disconnected' in Alerts drawer, because 'Disconnected to Connected' event comes almost the same time.

Used scenario (it has bigger reproducibility):
1. Switch off one gluster node
2. Load some data to the gluster volume
3. Start node
4. Restart glusterd service on some other node (if needed several times)

However for this particular alert another stop-start of glusterd service clears the warning.

Comment 1 Nishanth Thomas 2017-11-24 12:26:59 UTC

Probability of occurrence of this is very low in normal condtions. I couldn't reproduce in my setup. Also there is a workaround mentioned in case of occurrence. I don't think its a blocker for current release. Moving this out.

Comment 2 gowtham 2018-01-09 08:36:02 UTC

Svc clearing alert is not matched with warning alert, so it is not cleared. it is fixed now https://github.com/Tendrl/gluster-integration/pull/543, https://github.com/Tendrl/commons/pull/801

Comment 4 Filip Balák 2018-08-14 06:29:29 UTC

I was not able to reproduce the issue with original build nor I was able to reproduce it with current version. It seems that it is very difficult to turn gluster into `Another transaction is in progress` state with new version of gluster with given reproducer.

I came up with different scenario which leads to similar error. I was able to do it with repeatedly calling `gluster volume start <volume> force` from more nodes at once but when I restarted glusterd I didn't see any alert related to glustershd. I tested turning glustershd off which leads to described behaviour: BZ 1611601.

I propose to close this BZ 1517233 and track progress in new BZ 1611601.

Tested with:
glusterfs-3.12.2-15.el7rhgs.x86_64
tendrl-ansible-1.6.3-6.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-11.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-9.el7rhgs.noarch

Comment 5 Anand Paladugu 2018-08-15 03:01:23 UTC

PM ack is already set on this BZ for it to be dropped.

Comment 6 Martin Bukatovic 2018-08-15 12:50:35 UTC

I'm closing this BZ (see comment 4 for details on why), as was discussed on
program meeting on 2018-08-14. Both development (Nishant) and product management
(Anand) agrees.

Note You need to log in before you can comment on or make changes to this bug.