Bug 1491026

Summary: If SPM host panics running guest vms doesn't failover to other hosts
Product: [oVirt] ovirt-hosted-engine-ha Reporter: deepak <deepak.jagtap>
Component: GeneralAssignee: bugs <bugs>
Status: CLOSED INSUFFICIENT_DATA QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.2.0CC: bugs, deepak.jagtap, mperina, oourfali
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-27 06:58:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description deepak 2017-09-12 19:40:42 UTC
Description of problem:
Guest vm doesn't failover if host that goes down is an SPM.
HA works as expected if the host is NOT SPM. It seems as part of
handling host down it attempts to launch vms but all attempts fail
because there is no active SPM in the cluster.

Version-Release number of selected component (if applicable):
rhevm 4.1.5

How reproducible:
Always

Steps to Reproduce:
1. Create 3 node cluster with 1 storage & 1 iso domain
2. Note down SPM host and have a guest vm running on that host
3. Panic the SPM hosts by running 'echo c > /proc/sysrq-trigger'
4. Wait for HA failover to finish.

Actual results:
-HA attempts to launch vms but it fails, eventually it exhausts all vm launch attempts and vms remain in down state.
-Other host doesn't become SPM

Expected results:
-Other host should become SPM
-vms should failover and start on other host

Additional info:

Here is the log snippet from engine.log

2017-09-12 15:07:16,650-04 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler4) [1cc4342c] SPM Init: could not find reported vds or not up - pool: 'Default' vds_spm_id: '2'
2017-09-12 15:07:16,667-04 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler4) [1cc4342c] SPM selection - vds seems as spm 'kvm153.int.maxta.com'
2017-09-12 15:07:16,684-04 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler4) [1cc4342c] START, SpmStopVDSCommand(HostName = kvm153.int.maxta.com, SpmStopVDSCommandParameters:{runAsync='true', hostId='5e261fa6-b718-4e03-b675-5291e1b3b67a', storagePoolId='1eab8081-43dc-40ba-8bc4-e4b3ade2ee41'}), log id: 1c9837c7
2017-09-12 15:07:16,684-04 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler4) [1cc4342c] SpmStopVDSCommand:: vds 'kvm153.int.maxta.com' is in 'Reboot' status - not performing spm stop, pool id '1eab8081-43dc-40ba-8bc4-e4b3ade2ee41'
2017-09-12 15:07:16,684-04 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler4) [1cc4342c] FINISH, SpmStopVDSCommand, log id: 1c9837c7
2017-09-12 15:07:16,684-04 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler4) [1cc4342c] spm stop on spm failed, stopping spm selection!
2017-09-12 15:07:22,909-04 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [46211442] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,VM_CANNOT_RUN_FROM_CD_WITHOUT_ACTIVE_STORAGE_DOMAIN_ISO
2017-09-12 15:07:22,910-04 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [46211442] Lock freed to object 'EngineLock:{exclusiveLocks='[1660f092-4c67-4a00-b1b8-ff3e9cc854f9=VM]', sharedLocks=''}'
2017-09-12 15:07:22,920-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [46211442] EVENT_ID: HA_VM_RESTART_FAILED(9,603), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Restart of the Highly Available VM ub153 failed.
2017-09-12 15:07:22,946-04 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [2cfbc9b1] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,VM_CANNOT_RUN_FROM_CD_WITHOUT_ACTIVE_STORAGE_DOMAIN_ISO
2017-09-12 15:07:22,947-04 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (DefaultQuartzScheduler2) [2cfbc9b1] Lock freed to object 'EngineLock:{exclusiveLocks='[42b272e9-bef1-4032-9649-24eaaa60da7e=VM]', sharedLocks=''}'

Comment 1 Martin Perina 2017-09-13 08:51:50 UTC
Could you please attach full logs? Also error is filed against host-engine HA, so is it normal installation or hosted engine?

Comment 2 Red Hat Bugzilla 2023-09-14 04:07:44 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days