Bug 1297845

Summary: SPM should never run on the same host as the self hosted engine VM
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Jonas Lindholm <jonas.lindholm>
Component: GeneralAssignee: Martin Sivák <msivak>
Status: CLOSED NOTABUG QA Contact: Ilanit Stein <istein>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.0.0CC: bugs, dfediuck, jonas.lindholm, sbonazzo, stirabos
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-14 18:34:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
agent.log from a failed reproduction attempt none

Description Jonas Lindholm 2016-01-12 15:37:14 UTC
Description of problem:

When the host the engine VM is running on is also the SPM host a crash of that host prevents any other host in the same cluster to spin up the engine VM as there is no SPM host.
This is lke a catch 22 situation where a new SPM host can't be selected because the engine VM is not running and the engine VM can not be started because the storage domains are down and can't be brought up without the SPM.


Version-Release number of selected component (if applicable):


How reproducible:
Select the host the engine is running on as SPM and the power off that host.

Steps to Reproduce:
1. 
2.
3.

Actual results:
None of the other hosts can spin up the engine VM.

Expected results:


Additional info:

The fix would be to not allow the engine VM to run on the SPM host. Only time would be when there is a single host left in the cluster.
As soon there is a second host available the SPM role should be moved over to the host.
If the administrator migrate the engine VM to a host that is SPM the SPM role should move. The administrator should confirm that the SPM role will move before the migration start of the engine VM.

Comment 1 Jonas Lindholm 2016-01-12 15:40:29 UTC
*** Bug 1297844 has been marked as a duplicate of this bug. ***

Comment 2 Doron Fediuck 2016-01-13 07:27:48 UTC
Thanks for the report.
We need to verify the behavior here, but in general if this is a real issue the main problem is the need of SPM for the engine to start, regardless of which host is trying to run the VM. The HA agent should be capable of starting the VM on any healthy hosted-engine node without additional dependencies.

Comment 3 Simone Tiraboschi 2016-01-13 08:39:26 UTC
Unable to reproduce here with ovirt-hosted-engine-ha 1.3.3.6-1 using iSCSI for the hosted-engine storage domain.

I have two hosts: one was the SPM and the engine VM was running there.
I brutally powered it off and after about 4 minutes the engine VM successfully restarted on the other host.

I'm attaching agent.log from my reproducing attempt.

Jonas, could you please provide agent logs from your case to check what happened there?

Comment 4 Simone Tiraboschi 2016-01-13 08:40:46 UTC
Created attachment 1114315 [details]
agent.log from a failed reproduction attempt

Comment 5 Doron Fediuck 2016-01-13 13:31:14 UTC
Please provide your logs and reproduction steps.

Comment 6 Simone Tiraboschi 2016-01-14 18:34:03 UTC
ovirt-ha-agent doesn't rely on the SPM to start the engine VM.