Red Hat Bugzilla – Bug 1297845
SPM should never run on the same host as the self hosted engine VM
Last modified: 2016-01-14 13:34:03 EST
Description of problem:
When the host the engine VM is running on is also the SPM host a crash of that host prevents any other host in the same cluster to spin up the engine VM as there is no SPM host.
This is lke a catch 22 situation where a new SPM host can't be selected because the engine VM is not running and the engine VM can not be started because the storage domains are down and can't be brought up without the SPM.
Version-Release number of selected component (if applicable):
Select the host the engine is running on as SPM and the power off that host.
Steps to Reproduce:
None of the other hosts can spin up the engine VM.
The fix would be to not allow the engine VM to run on the SPM host. Only time would be when there is a single host left in the cluster.
As soon there is a second host available the SPM role should be moved over to the host.
If the administrator migrate the engine VM to a host that is SPM the SPM role should move. The administrator should confirm that the SPM role will move before the migration start of the engine VM.
*** Bug 1297844 has been marked as a duplicate of this bug. ***
Thanks for the report.
We need to verify the behavior here, but in general if this is a real issue the main problem is the need of SPM for the engine to start, regardless of which host is trying to run the VM. The HA agent should be capable of starting the VM on any healthy hosted-engine node without additional dependencies.
Unable to reproduce here with ovirt-hosted-engine-ha 126.96.36.199-1 using iSCSI for the hosted-engine storage domain.
I have two hosts: one was the SPM and the engine VM was running there.
I brutally powered it off and after about 4 minutes the engine VM successfully restarted on the other host.
I'm attaching agent.log from my reproducing attempt.
Jonas, could you please provide agent logs from your case to check what happened there?
Created attachment 1114315 [details]
agent.log from a failed reproduction attempt
Please provide your logs and reproduction steps.
ovirt-ha-agent doesn't rely on the SPM to start the engine VM.