Bug 1297845 - SPM should never run on the same host as the self hosted engine VM
SPM should never run on the same host as the self hosted engine VM
Status: CLOSED NOTABUG
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: General (Show other bugs)
2.0.0
All All
unspecified Severity medium (vote)
: ---
: ---
Assigned To: Martin Sivák
Ilanit Stein
sla
:
: 1297844 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-12 10:37 EST by Jonas Lindholm
Modified: 2016-01-14 13:34 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-14 13:34:03 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
agent.log from a failed reproduction attempt (125.06 KB, text/plain)
2016-01-13 03:40 EST, Simone Tiraboschi
no flags Details

  None (edit)
Description Jonas Lindholm 2016-01-12 10:37:14 EST
Description of problem:

When the host the engine VM is running on is also the SPM host a crash of that host prevents any other host in the same cluster to spin up the engine VM as there is no SPM host.
This is lke a catch 22 situation where a new SPM host can't be selected because the engine VM is not running and the engine VM can not be started because the storage domains are down and can't be brought up without the SPM.


Version-Release number of selected component (if applicable):


How reproducible:
Select the host the engine is running on as SPM and the power off that host.

Steps to Reproduce:
1. 
2.
3.

Actual results:
None of the other hosts can spin up the engine VM.

Expected results:


Additional info:

The fix would be to not allow the engine VM to run on the SPM host. Only time would be when there is a single host left in the cluster.
As soon there is a second host available the SPM role should be moved over to the host.
If the administrator migrate the engine VM to a host that is SPM the SPM role should move. The administrator should confirm that the SPM role will move before the migration start of the engine VM.
Comment 1 Jonas Lindholm 2016-01-12 10:40:29 EST
*** Bug 1297844 has been marked as a duplicate of this bug. ***
Comment 2 Doron Fediuck 2016-01-13 02:27:48 EST
Thanks for the report.
We need to verify the behavior here, but in general if this is a real issue the main problem is the need of SPM for the engine to start, regardless of which host is trying to run the VM. The HA agent should be capable of starting the VM on any healthy hosted-engine node without additional dependencies.
Comment 3 Simone Tiraboschi 2016-01-13 03:39:26 EST
Unable to reproduce here with ovirt-hosted-engine-ha 1.3.3.6-1 using iSCSI for the hosted-engine storage domain.

I have two hosts: one was the SPM and the engine VM was running there.
I brutally powered it off and after about 4 minutes the engine VM successfully restarted on the other host.

I'm attaching agent.log from my reproducing attempt.

Jonas, could you please provide agent logs from your case to check what happened there?
Comment 4 Simone Tiraboschi 2016-01-13 03:40 EST
Created attachment 1114315 [details]
agent.log from a failed reproduction attempt
Comment 5 Doron Fediuck 2016-01-13 08:31:14 EST
Please provide your logs and reproduction steps.
Comment 6 Simone Tiraboschi 2016-01-14 13:34:03 EST
ovirt-ha-agent doesn't rely on the SPM to start the engine VM.

Note You need to log in before you can comment on or make changes to this bug.