Description of problem: After HA reservation is enabled and cluster is NOT HA safe. When adding additional hosts system is set to HA safe in the log. But in the portal alert is still displayed: Cluster Default failed the HA Reservation check, HA VMs on host(s): red will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs. Version-Release number of selected component (if applicable): av2.1 How reproducible: 100% Steps to Reproduce: 1. enable HA reservation on cluster 2. add 1 host with HA VMs-> cluster is NOT HA safe (alert appears) 3. add second host with enough resources for migration (cluster is HA safe) Actual results: alert stays displayed after cluster is HA safe Expected results: alerts in webadmin should be updated at the same time as the message: [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-60) HA reservation status for cluster Default is OK is displayed in engine.log Additional info: hopefully no need for logs If necessary I'll add them additionally.
This is the 3.4 patch: http://gerrit.ovirt.org/#/c/27471/ It's in status merge.
Alerts are stacking every 10 minutes even that no configuration was changed in cluster. VdsHaReservationIntervalInMinutes was changed to 1 for testing purpose. 2014-Sep-10, 09:30 Cluster Default failed the HA Reservation check, HA VMs on host(s): host-1 will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs. 2014-Sep-10, 09:20 Cluster Default failed the HA Reservation check, HA VMs on host(s): host-1 will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs. 2014-Sep-10, 09:10 Cluster Default failed the HA Reservation check, HA VMs on host(s): host-1 will fail to migrate in case of a failover, consider adding resources or shutting down unused VMs.
please provide the full engine log
Can you please describe your environment in more details? Are you sure that the message you're seeing is not correct?
At the moment I am unable to provide engine log as I don't have environment for this. The messages are correct. Problem is that these messages appears every 10 minutes in alerts. Before i talked to kobi and the alert should appear only once when the HA reservation check did not pass. No other alerts should be displayed (At the moment new alert is created every 10 minutes). Once the HA reservation check pass. From this moment new alert can be created once the HA reservation will fail again. Scenario: 1. make HA reservation fail 2. Alert should appear on this in engine 3. No more alerts should appear (does not matter for how long) 4. make cluster ha safe - HA reservation check will succeed 5. no more alerts 6. make HA reservation fail again 7. Only now should new (second) alert appear) Current behaviour: 1. make HA reservation fail 2. Alert should appear on this in engine 3. New alert is created after some time (10 minutes for me) in engine 4. Every 10 minutes new alert is created. 3. and 4. should not appear as admin is already informed about failed ha check Additional info: VdsHaReservationIntervalInMinutes was changed to 1 for testing purpose.
if you will still need engine.log please re-add needinfo and I'll try to add it once I'll get in touch with the environment for this
What you're describing is correct behaviour, we want to warn user that the cluster is not HA safe if he has HA VMs running on that cluster until the situation stands, because it can cause serious problems. The fix for this bugzilla only chnges the logic which is responsible for showing the alert that the cluster IS HA safe, which we only want to show once. Moving back to on_qa because I believe the fix is correct.
(In reply to Jiri Moskovcak from comment #13) > What you're describing is correct behaviour, we want to warn user that the > cluster is not HA safe if he has HA VMs running on that cluster until the > situation stands, because it can cause serious problems. The fix for this > bugzilla only chnges the logic which is responsible for showing the alert > that the cluster IS HA safe, which we only want to show once. Moving back to > on_qa because I believe the fix is correct. Agreed. Being out of resources works against what the admin asked for in the policy. If we only notify once, and it happens to be in the middle of the night the admin will miss the notification, leaving the system without sufficient despite the cluster polcy.
verified in vt4
rhev 3.5.0 was released. closing.