1076211 – Alerts on HA reservation not updated properly

Bug 1076211 - Alerts on HA reservation not updated properly

Summary: Alerts on HA reservation not updated properly

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.4.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Doron Fediuck
QA Contact:	Lukas Svaty
Docs Contact:
URL:
Whiteboard:	sla
Depends On:
Blocks:	1128462
TreeView+	depends on / blocked

Reported:	2014-03-13 19:09 UTC by Lukas Svaty
Modified:	2016-02-10 20:18 UTC (History)
CC List:	13 users (show)
Fixed In Version:	vt2.2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1128462 (view as bug list)
Environment:
Last Closed:	2015-02-17 17:16:12 UTC
oVirt Team:	SLA
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	25949	0	master	MERGED	engine: Adding a down event for HA Reservation	Never
oVirt gerrit	27471	0	ovirt-engine-3.4	MERGED	engine: Adding a down event for HA Reservation	Never

Description Lukas Svaty 2014-03-13 19:09:54 UTC

Description of problem:
After HA reservation is enabled and cluster is NOT HA safe. When adding additional hosts system is set to HA safe in the log. But in the portal alert is still displayed:

Cluster Default failed the HA Reservation check, HA VMs on host(s): red will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs.

Version-Release number of selected component (if applicable):
av2.1

How reproducible:
100%

Steps to Reproduce:
1. enable HA reservation on cluster
2. add 1 host with HA VMs-> cluster is NOT HA safe (alert appears)
3. add second host with enough resources for migration (cluster is HA safe)

Actual results:
alert stays displayed after cluster is HA safe

Expected results:
alerts in webadmin should be updated at the same time as the message:

[org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-60) HA reservation status for cluster Default is OK

is displayed in engine.log

Additional info:
hopefully no need for logs
If necessary I'll add them additionally.

Comment 2 Kobi 2014-05-26 13:02:56 UTC

This is the 3.4 patch:
http://gerrit.ovirt.org/#/c/27471/

It's in status merge.

Comment 8 Lukas Svaty 2014-09-10 09:34:30 UTC

Alerts are stacking every 10 minutes even that no configuration was changed in cluster.
VdsHaReservationIntervalInMinutes was changed to 1 for testing purpose.
	
2014-Sep-10, 09:30 Cluster Default failed the HA Reservation check, HA VMs on host(s): host-1 will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs.
	
2014-Sep-10, 09:20 Cluster Default failed the HA Reservation check, HA VMs on host(s): host-1 will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs.
	
2014-Sep-10, 09:10 Cluster Default failed the HA Reservation check, HA VMs on host(s): host-1 will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs.

Comment 9 Jiri Moskovcak 2014-09-23 14:31:10 UTC

please provide the full engine log

Comment 10 Jiri Moskovcak 2014-09-23 14:39:11 UTC

Can you please describe your environment in more details? Are you sure that the message you're seeing is not correct?

Comment 11 Lukas Svaty 2014-09-23 14:50:38 UTC

At the moment I am unable to provide engine log as I don't have environment for this.

The messages are correct. Problem is that these messages appears every 10 minutes in alerts. Before i talked to kobi and the alert should appear only once when the HA reservation check did not pass. No other alerts should be displayed (At the moment new alert is created every 10 minutes). Once the HA reservation check pass. From this moment new alert can be created once the HA reservation will fail again.

Scenario:
1. make HA reservation fail
2. Alert should appear on this in engine
3. No more alerts should appear (does not matter for how long)
4. make cluster ha safe - HA reservation check will succeed
5. no more alerts
6. make HA reservation fail again
7. Only now should new (second) alert appear)

Current behaviour:
1. make HA reservation fail
2. Alert should appear on this in engine
3. New alert is created after some time (10 minutes for me) in engine
4. Every 10 minutes new alert is created.

3. and 4. should not appear as admin is already informed about failed ha check

Additional info:
VdsHaReservationIntervalInMinutes was changed to 1 for testing purpose.

Comment 12 Lukas Svaty 2014-09-23 14:51:42 UTC

if you will still need engine.log please re-add needinfo and I'll try to add it once I'll get in touch with the environment for this

Comment 13 Jiri Moskovcak 2014-09-24 07:19:03 UTC

What you're describing is correct behaviour, we want to warn user that the cluster is not HA safe if he has HA VMs running on that cluster until the situation stands, because it can cause serious problems. The fix for this bugzilla only chnges the logic which is responsible for showing the alert that the cluster IS HA safe, which we only want to show once. Moving back to on_qa because I believe the fix is correct.

Comment 14 Doron Fediuck 2014-09-28 14:26:41 UTC

(In reply to Jiri Moskovcak from comment #13)
> What you're describing is correct behaviour, we want to warn user that the
> cluster is not HA safe if he has HA VMs running on that cluster until the
> situation stands, because it can cause serious problems. The fix for this
> bugzilla only chnges the logic which is responsible for showing the alert
> that the cluster IS HA safe, which we only want to show once. Moving back to
> on_qa because I believe the fix is correct.

Agreed.
Being out of resources works against what the admin asked for in the policy.
If we only notify once, and it happens to be in the middle of the night the
admin will miss the notification, leaving the system without sufficient
despite the cluster polcy.

Comment 15 Lukas Svaty 2014-10-02 12:54:54 UTC

verified in vt4

Comment 17 Eyal Edri 2015-02-17 17:16:12 UTC

rhev 3.5.0 was released. closing.

Note You need to log in before you can comment on or make changes to this bug.