Bug 1491026 - If SPM host panics running guest vms doesn't failover to other hosts [NEEDINFO]
Summary: If SPM host panics running guest vms doesn't failover to other hosts
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: General
Version: 2.2.0
Hardware: x86_64
OS: Linux
unspecified
high vote
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-12 19:40 UTC by deepak
Modified: 2017-09-27 06:58 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-27 06:58:57 UTC
oVirt Team: Infra
mperina: needinfo? (deepak.jagtap)


Attachments (Terms of Use)

Description deepak 2017-09-12 19:40:42 UTC
Description of problem:
Guest vm doesn't failover if host that goes down is an SPM.
HA works as expected if the host is NOT SPM. It seems as part of
handling host down it attempts to launch vms but all attempts fail
because there is no active SPM in the cluster.

Version-Release number of selected component (if applicable):
rhevm 4.1.5

How reproducible:
Always

Steps to Reproduce:
1. Create 3 node cluster with 1 storage & 1 iso domain
2. Note down SPM host and have a guest vm running on that host
3. Panic the SPM hosts by running 'echo c > /proc/sysrq-trigger'
4. Wait for HA failover to finish.

Actual results:
-HA attempts to launch vms but it fails, eventually it exhausts all vm launch attempts and vms remain in down state.
-Other host doesn't become SPM

Expected results:
-Other host should become SPM
-vms should failover and start on other host

Additional info:

Here is the log snippet from engine.log

2017-09-12 15:07:16,650-04 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler4) [1cc4342c] SPM Init: could not find reported vds or not up - pool: 'Default' vds_spm_id: '2'
2017-09-12 15:07:16,667-04 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler4) [1cc4342c] SPM selection - vds seems as spm 'kvm153.int.maxta.com'
2017-09-12 15:07:16,684-04 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler4) [1cc4342c] START, SpmStopVDSCommand(HostName = kvm153.int.maxta.com, SpmStopVDSCommandParameters:{runAsync='true', hostId='5e261fa6-b718-4e03-b675-5291e1b3b67a', storagePoolId='1eab8081-43dc-40ba-8bc4-e4b3ade2ee41'}), log id: 1c9837c7
2017-09-12 15:07:16,684-04 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler4) [1cc4342c] SpmStopVDSCommand:: vds 'kvm153.int.maxta.com' is in 'Reboot' status - not performing spm stop, pool id '1eab8081-43dc-40ba-8bc4-e4b3ade2ee41'
2017-09-12 15:07:16,684-04 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler4) [1cc4342c] FINISH, SpmStopVDSCommand, log id: 1c9837c7
2017-09-12 15:07:16,684-04 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler4) [1cc4342c] spm stop on spm failed, stopping spm selection!
2017-09-12 15:07:22,909-04 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [46211442] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,VM_CANNOT_RUN_FROM_CD_WITHOUT_ACTIVE_STORAGE_DOMAIN_ISO
2017-09-12 15:07:22,910-04 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [46211442] Lock freed to object 'EngineLock:{exclusiveLocks='[1660f092-4c67-4a00-b1b8-ff3e9cc854f9=VM]', sharedLocks=''}'
2017-09-12 15:07:22,920-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [46211442] EVENT_ID: HA_VM_RESTART_FAILED(9,603), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Restart of the Highly Available VM ub153 failed.
2017-09-12 15:07:22,946-04 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [2cfbc9b1] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,VM_CANNOT_RUN_FROM_CD_WITHOUT_ACTIVE_STORAGE_DOMAIN_ISO
2017-09-12 15:07:22,947-04 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [2cfbc9b1] Lock freed to object 'EngineLock:{exclusiveLocks='[42b272e9-bef1-4032-9649-24eaaa60da7e=VM]', sharedLocks=''}'

Comment 1 Martin Perina 2017-09-13 08:51:50 UTC
Could you please attach full logs? Also error is filed against host-engine HA, so is it normal installation or hosted engine?


Note You need to log in before you can comment on or make changes to this bug.