1341640 – hosts events binds

Bug 1341640 - hosts events binds

Summary: hosts events binds

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2
Assignee:	Darshan
QA Contact:	Lubos Trilety
Docs Contact:
URL:
Whiteboard:
Depends On:	1349813
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-01 11:52 UTC by Lubos Trilety
Modified:	2016-08-23 19:53 UTC (History)
CC List:	8 users (show)
Fixed In Version:	rhscon-core-0.0.28-1.el7scon.x86_64.rpm
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-23 19:53:21 UTC
Embargoed:

Attachments	(Terms of Use)
no alert (51.92 KB, image/png) 2016-06-01 11:52 UTC, Lubos Trilety	no flags	Details
OSD host (46.64 KB, image/png) 2016-06-07 14:37 UTC, Lubos Trilety	no flags	Details
OSD host event (52.25 KB, image/png) 2016-06-07 14:38 UTC, Lubos Trilety	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:1754	0	normal	SHIPPED_LIVE	New packages: Red Hat Storage Console 2.0	2017-04-18 19:09:06 UTC

Description Lubos Trilety 2016-06-01 11:52:58 UTC

Created attachment 1163602 [details]
no alert

Description of problem:
Events which are related to some hosts are not binded to the host properly. They are not displayed as alerts in list of hosts for the host.

Version-Release number of selected component (if applicable):
rhscon-core-0.0.21-1.el7scon.x86_64
rhscon-ceph-0.0.20-1.el7scon.x86_64
rhscon-ui-0.0.34-1.el7scon.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create some pool and fill data to the pool so some utilization threshold is crossed
2. Look on the Hosts page

Actual results:
All hosts have 0 alerts.

Expected results:
A host on which the threshold is crossed should have some alert related.

Additional info:

Comment 1 :Deb 2016-06-07 14:06:30 UTC

Don't think it's an UI bug.

As long as the API responds with alerts, the UI will show it. If there's a discrepancy in the API response, not a whole lot the UI can do about it.

Feel free to change the component, post verification & reproduceability.

Comment 2 Lubos Trilety 2016-06-07 14:30:36 UTC

(In reply to :Deb from comment #1)
> Don't think it's an UI bug.
> 
> As long as the API responds with alerts, the UI will show it. If there's a
> discrepancy in the API response, not a whole lot the UI can do about it.
> 
> Feel free to change the component, post verification & reproduceability.

I am sorry I forgot to said that the events related to some host are present on UI in the events list. But they are not counted in the alerts number for a host.

Comment 3 Lubos Trilety 2016-06-07 14:35:49 UTC

The screen-shot shows a situation where a host list is filtered for those where a critical or major alarm status is present. The host is correctly listed, but it has no alerts displayed.

Comment 4 Lubos Trilety 2016-06-07 14:37:54 UTC

Created attachment 1165680 [details]
OSD host

Comment 5 Lubos Trilety 2016-06-07 14:38:20 UTC

Created attachment 1165681 [details]
OSD host event

Comment 6 Lubos Trilety 2016-06-07 14:40:29 UTC

Another example of reproduction scenario is when the disk on the OSD machine is removed.
'OSD host event' shows an event created for such situation
'OSD host' shows that the number of alerts was not changed for the host

Comment 7 Ju Lim 2016-06-07 19:44:44 UTC

In reading the comments, it appears that Lubos is expecting that every related event that is active would show on the host list page.  It potentially may be a difference of understanding alert vs. event.  If I'm not mistaken, an alert is an event that is noteworthy, i.e. having a major or critical severity level.  Informational events don't qualify as alerts.

Depending on the classification of the event type itself, this might be the cause for why certain events/alerts are not being bound to the host.

Comment 8 Nishanth Thomas 2016-06-09 09:17:57 UTC

Pool related events are not propagated to the hosts, rather you can view those in the pool level as well as cluster level so this is not a valid use case
But the issue mentioned as part of comment 6 is valid which needs to be fixed

Comment 9 Lubos Trilety 2016-06-09 11:35:34 UTC

(In reply to Nishanth Thomas from comment #8)
> Pool related events are not propagated to the hosts, rather you can view
> those in the pool level as well as cluster level so this is not a valid use
> case

I don't think that's correct behaviour. The event is related to the current osd and so to the current host. Moreover the event is counted for Hosts object on dashboard page. And by clicking on the alerts number, hosts are filtered and only the one which is related to the event is displayed. Hence I think such event should be counted as alert for some host.

> But the issue mentioned as part of comment 6 is valid which needs to be fixed

Comment 10 Nishanth Thomas 2016-06-21 06:56:38 UTC

I am not able understand this correctly. Suppose a pool created out of 6 OSDs residing in 6 different hosts, so what you expect the pool related alerts are attached to all the 6 hosts? As far I know pool is treated as a cluster level entity and hence pool related events are attached to the cluster

Please let me know, if you agree I will close this bug

Comment 11 Nishanth Thomas 2016-06-21 07:03:00 UTC

I missed the issue mentioned at comment 6, that needs to be addressed

Comment 12 Lubos Trilety 2016-06-21 14:13:32 UTC

(In reply to Nishanth Thomas from comment #10)
> I am not able understand this correctly. Suppose a pool created out of 6
> OSDs residing in 6 different hosts, so what you expect the pool related
> alerts are attached to all the 6 hosts? As far I know pool is treated as a
> cluster level entity and hence pool related events are attached to the
> cluster
> 
> Please let me know, if you agree I will close this bug

Even the scenario mentioned in first comment shows the problem related not just to the pool, but to specific OSD(s) so it's possible to relate it to specific host(s). In other words if the event has some specific host (or OSD) mentioned than it should be mentioned for the host in the hosts list.
I agree if there will be event/alert for the pool which doesn't specify any host (or OSD) than that's pool alert which will not be counted for any host.

Comment 14 Darshan 2016-06-27 12:50:54 UTC

As discussed, we are binding all OSD related events to host, we have tested it for the following osd related events and it works as expected:
1. osd utilization crossing the threshold.
2. osd state changing.

However removing an underlying disk from OSD is not triggering OSD state change in ceph cli, hence calamari is not sending event for that. We have raised a bug(https://bugzilla.redhat.com/show_bug.cgi?id=1349813) in ceph for fixing that.

Marking this bug as dependent of BZ1349813 as the above scenario is mentioned in comment 6.

Comment 15 Darshan 2016-06-28 07:37:09 UTC

Got an update from ceph team regarding BZ1349813. For ceph to report osd as down when disk is removed, there has to be atleast 2 (this is default value of config mon_osd_min_down_reporters it can be changed) up osds to report this and also this is reported only after first write.

Considering this as the current behavior of ceph, moving this Bug to ON_QA.

Comment 17 Lubos Trilety 2016-08-05 13:52:01 UTC

Tested on:
rhscon-core-0.0.39-1.el7scon.x86_64
rhscon-core-selinux-0.0.39-1.el7scon.noarch
rhscon-ceph-0.0.39-1.el7scon.x86_64
rhscon-ui-0.0.51-1.el7scon.noarch

Comment 19 errata-xmlrpc 2016-08-23 19:53:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754

Note You need to log in before you can comment on or make changes to this bug.