Bug 1052024 - After a power outage two VMs marked as HA failed to start automatically, they were required to be started manually.
Summary: After a power outage two VMs marked as HA failed to start automatically, they...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.4.0
Assignee: Gilad Chaplik
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On:
Blocks: 1074478 rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2014-01-13 06:34 UTC by Aval
Modified: 2019-04-28 10:03 UTC (History)
14 users (show)

Fixed In Version: av3
Doc Type: Bug Fix
Doc Text:
Previously, some virtual machines did not automatically restart after a power failure. As a result, they would have to be manually restarted. Now, the issue has been corrected and all virtual machines restart as expected.
Clone Of:
: 1074478 (view as bug list)
Environment:
Last Closed: 2014-06-09 15:08:41 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:0506 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 18:55:38 UTC
oVirt gerrit 24651 0 None None None Never
oVirt gerrit 25461 0 None None None Never

Comment 2 Doron Fediuck 2014-01-13 15:25:58 UTC
Hi, can you please provide the exact rhev version, and the relevant engine log files?

Comment 3 Aval 2014-01-13 23:05:07 UTC
Created attachment 849684 [details]
engine.log

Comment 4 Aval 2014-01-13 23:22:05 UTC
(In reply to Doron Fediuck from comment #2)
> Hi, can you please provide the exact rhev version, and the relevant engine
> log files?

- Version-Release number of selected component (if applicable):

rhevm-3.2.2-0.41.el6ev.noarch

- Attached engine.log file

Comment 8 Itamar Heim 2014-03-07 12:13:10 UTC
Description of problem : After a power outage two of the VMs( out of 8 ) marked as HA failed to start automatically. Customer was required to start them manually after waiting for few hours expecting RHEV-M to handle this automatically.  

Environment details : 
- 2 Hypervisors with 24GB RAM each
- 10 VMs 

- Example of HA VM "mastro03srv" start failure ( already two VMs were started successfully before this VM ) . Later customer started it manually without any error.

~~~
2013-12-29 07:10:25,456 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (QuartzScheduler_Worker-13) [33217069] Failed to start Highly Available VM. Attempting to restart. VM Name: mastro03srv, VM Id:c52b7bdb-9c3a-4d76-9f06-42cbb7687a17
2013-12-29 07:10:25,468 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock Acquired to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
2013-12-29 07:10:25,476 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, IsVmDuringInitiatingVDSCommand( vmId = c52b7bdb-9c3a-4d76-9f06-42cbb7687a17), log id: 583aa7de
2013-12-29 07:10:25,477 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 583aa7de
2013-12-29 07:10:25,490 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Running command: RunVmCommand internal: true. Entities affected :  ID: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 Type: VM
2013-12-29 07:10:25,514 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock freed to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
2013-12-29 07:10:25,514 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Failed to run desktop mastro03srv, rerun 
2013-12-29 07:10:25,519 INFO  
[org.ovirt.engine.core.vdsbroker.UpdateVdsDynamicDataVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, UpdateVdsDynamicDataVDSCommand(HostName = rhev-hv01.xxxxxx.com, HostId = 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf, vdsDynamic=org.ovirt.engine.core.common.businessentities.VdsDynamic@5bdabd1b), log id: 4e7d3ce3
2013-12-29 07:10:25,521 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVdsDynamicDataVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, UpdateVdsDynamicDataVDSCommand, log id: 4e7d3ce3

[...]

2013-12-29 07:10:25,549 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock Acquired to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
2013-12-29 07:10:25,566 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, IsVmDuringInitiatingVDSCommand( vmId = c52b7bdb-9c3a-4d76-9f06-42cbb7687a17), log id: 474630fa
2013-12-29 07:10:25,566 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 474630fa
2013-12-29 07:10:25,576 INFO  [org.ovirt.engine.core.bll.VdsSelector] (QuartzScheduler_Worker-13) [33217069]  VDS rhev-hv01.xxxxxx.com 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf have failed running this VM in the current selection cycle VDS rhev-hv02.xxxxxx.com 4e942526-ac3a-4a46-b969-4bbe139c67d5 is not in up status or belongs to the VM's cluster
2013-12-29 07:10:25,577 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_CLUSTER  

2013-12-29 07:10:25,577 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock freed to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
~~~

After the above failure second HA VM "mx" failed with error
~~~
2013-12-29 07:10:36,839 INFO  [org.ovirt.engine.core.bll.VdsSelector] (QuartzScheduler_Worker-13) [33217069]  VDS rhev-hv01.xxxxxx.com 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf has insufficient memory to run the VM VDS rhev-hv02.xxxxxx.com 4e942526-ac3a-4a46-b969-4bbe139c67d5 is not in up status or belongs to the VM's cluster
2013-12-29 07:10:36,839 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_MEMORY
~~~

Which customer does not agree with as total required memory for all the VMs is 15GB and they have 2 hosts with 24GB each.

Version-Release number of selected component (if applicable):
rhevm-3.2.2-0.41.el6ev.noarch

How reproducible:
No consistent way, it happened after power outage on one the customers setup

Steps to Reproduce:
1.
2.
3.

Actual results:
2 out of 8 VMs marked as HA failed to automatically start after power outage

Expected results:
All VMs marked HA should be started automatically by RHEV-M

Additional info:

Comment 10 Artyom 2014-03-17 17:07:20 UTC
Verified on av3
Add host with 16G, and run on it four HA vms, 3 with 4096MB and one with 2048MB, after it powered off host, wait 5 minutes and power on host, all vms start fine.
Also test run under 'None' cluster policy

Comment 11 errata-xmlrpc 2014-06-09 15:08:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html


Note You need to log in before you can comment on or make changes to this bug.