Created attachment 1023948 [details] engine and vdsm logs from both hosts Description of problem: Have cluster with enabled HA reservation, two hosts on one of hosts run HA vm and on second some vm with memory equal to free memory of host, also memory optimization disable(0%), so cluster not HA safe, because if host with HA vm die, engine not have additional host to start HA vm(second host have insufficient memory), but in engine.log I can see line that cluster still HA safe. Version-Release number of selected component (if applicable): rhevm-3.5.1.1-0.1.el6ev.noarch How reproducible: Always Steps to Reproduce: 1. Add two hosts to engine(host_with_ha_vm, some_host) 2. Add two vms: 1) HA vm(ha_vm) that have memory equal to free memory of host "host_with_ha_vm" 2) Some vm(some_vm) that pinned to some_host and have also memory equal to free memory of "some_host" 3. Start both vms Actual results: 2015-05-10 13:48:53,826 INFO [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] (DefaultQuartzScheduler_Worker-38) HA reservation status for cluster cl_35_amd_el7 is OK Expected results: HA reservation status must be failed Additional info: After that I start both vms: Max free Memory for scheduling new VMs=322MB But free memory still = 15142MB So maybe when we check HA reservation status we check if on free memory and not on Max free Memory for scheduling
1) There is no some_vm in the log, did you use alloc_vm? 2) Max free Memory for scheduling new VMs=322MB But free memory still = 15142MB This means the VM is not really using the memory. So the OK status is correct. Quoting from the feature pages http://www.ovirt.org/Features/HA_VM_reservation "oVirt will continuously monitor the clusters in the system, for each Cluster the system will analyze its hosts, determining if the HA VMs on that host can survive the failover of that host by migrating to another host." This means we only need to make sure there is enough free memory to receive an HA VM on the destination host. Which there is. On the other hand, this is a bit wrong because the overcommit is disabled and so assigning more memory than available (engine's point of view) is not allowed (although it will work from the libvirt & qemu's perspectives).
See previous comment and I suggest to close the BZ unless there's something we missed.
1) Yes some_vm = alloc_vm 2) Engine start and migrate vm only on hosts that have enough "Max free Memory for scheduling new VMs", if it have not enough it will not pass memory filter, so if in scenario above, I will stop network on host, where run HA vm and "Confirm 'Host has been rebooted'", HA vm will failed to start on second host. 2015-05-17 13:38:08,940 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler_Worker-19) [1c530bb8] Candidate host alma05.qa.lab.tlv.redhat.com (48f3eae6-6ca6-4465-b59c-334a157ca256) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory 2015-05-17 13:38:08,941 WARN [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler_Worker-19) [1c530bb8] CanDoAction of action RunVm failed for user null@N/A. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName alma05.qa.lab.tlv.redhat.com,$filterName Memory,$availableMem 1257,VAR__DETAIL__NOT_ENOUGH_MEMORY,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL So it mean that cluster not HA safe By my opinion it is bug, so please add "Max free Memory for scheduling new VMs" check, when you calculate if cluster is HA safe.
Ilanit, what is the question? I did not close this, because I think it is a bug as well. > On the other hand, this is a bit wrong because the overcommit is disabled > and so assigning more memory than available (engine's point of view) > is not allowed Which is exactly what happened according to Artyom's test. It is easy to fix though (it will actually simplify the code).
Removing from 3.5.z. Keeping as 3.6.0 item.
Verified on rhevm-3.6.0.3-0.1.el6.noarch 1) Have two hosts 2) Start HA vm on host 1 3) Load memory of host 2 4) engine show message that cluster not HA safe