Hi, can you please provide the exact rhev version, and the relevant engine log files?
Created attachment 849684 [details] engine.log
(In reply to Doron Fediuck from comment #2) > Hi, can you please provide the exact rhev version, and the relevant engine > log files? - Version-Release number of selected component (if applicable): rhevm-3.2.2-0.41.el6ev.noarch - Attached engine.log file
Description of problem : After a power outage two of the VMs( out of 8 ) marked as HA failed to start automatically. Customer was required to start them manually after waiting for few hours expecting RHEV-M to handle this automatically. Environment details : - 2 Hypervisors with 24GB RAM each - 10 VMs - Example of HA VM "mastro03srv" start failure ( already two VMs were started successfully before this VM ) . Later customer started it manually without any error. ~~~ 2013-12-29 07:10:25,456 INFO [org.ovirt.engine.core.bll.VdsEventListener] (QuartzScheduler_Worker-13) [33217069] Failed to start Highly Available VM. Attempting to restart. VM Name: mastro03srv, VM Id:c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 2013-12-29 07:10:25,468 INFO [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock Acquired to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM , sharedLocks= ] 2013-12-29 07:10:25,476 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, IsVmDuringInitiatingVDSCommand( vmId = c52b7bdb-9c3a-4d76-9f06-42cbb7687a17), log id: 583aa7de 2013-12-29 07:10:25,477 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 583aa7de 2013-12-29 07:10:25,490 INFO [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Running command: RunVmCommand internal: true. Entities affected : ID: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 Type: VM 2013-12-29 07:10:25,514 INFO [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock freed to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM , sharedLocks= ] 2013-12-29 07:10:25,514 INFO [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Failed to run desktop mastro03srv, rerun 2013-12-29 07:10:25,519 INFO [org.ovirt.engine.core.vdsbroker.UpdateVdsDynamicDataVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, UpdateVdsDynamicDataVDSCommand(HostName = rhev-hv01.xxxxxx.com, HostId = 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf, vdsDynamic=org.ovirt.engine.core.common.businessentities.VdsDynamic@5bdabd1b), log id: 4e7d3ce3 2013-12-29 07:10:25,521 INFO [org.ovirt.engine.core.vdsbroker.UpdateVdsDynamicDataVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, UpdateVdsDynamicDataVDSCommand, log id: 4e7d3ce3 [...] 2013-12-29 07:10:25,549 INFO [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock Acquired to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM , sharedLocks= ] 2013-12-29 07:10:25,566 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, IsVmDuringInitiatingVDSCommand( vmId = c52b7bdb-9c3a-4d76-9f06-42cbb7687a17), log id: 474630fa 2013-12-29 07:10:25,566 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 474630fa 2013-12-29 07:10:25,576 INFO [org.ovirt.engine.core.bll.VdsSelector] (QuartzScheduler_Worker-13) [33217069] VDS rhev-hv01.xxxxxx.com 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf have failed running this VM in the current selection cycle VDS rhev-hv02.xxxxxx.com 4e942526-ac3a-4a46-b969-4bbe139c67d5 is not in up status or belongs to the VM's cluster 2013-12-29 07:10:25,577 WARN [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_CLUSTER 2013-12-29 07:10:25,577 INFO [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock freed to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM , sharedLocks= ] ~~~ After the above failure second HA VM "mx" failed with error ~~~ 2013-12-29 07:10:36,839 INFO [org.ovirt.engine.core.bll.VdsSelector] (QuartzScheduler_Worker-13) [33217069] VDS rhev-hv01.xxxxxx.com 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf has insufficient memory to run the VM VDS rhev-hv02.xxxxxx.com 4e942526-ac3a-4a46-b969-4bbe139c67d5 is not in up status or belongs to the VM's cluster 2013-12-29 07:10:36,839 WARN [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_MEMORY ~~~ Which customer does not agree with as total required memory for all the VMs is 15GB and they have 2 hosts with 24GB each. Version-Release number of selected component (if applicable): rhevm-3.2.2-0.41.el6ev.noarch How reproducible: No consistent way, it happened after power outage on one the customers setup Steps to Reproduce: 1. 2. 3. Actual results: 2 out of 8 VMs marked as HA failed to automatically start after power outage Expected results: All VMs marked HA should be started automatically by RHEV-M Additional info:
Verified on av3 Add host with 16G, and run on it four HA vms, 3 with 4096MB and one with 2048MB, after it powered off host, wait 5 minutes and power on host, all vms start fine. Also test run under 'None' cluster policy
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0506.html