Description of problem: VDSM now uses a newer version of libvirt, which reports a different error message when it fails to acquire the sanlock. The engine-health submonitor has to be changed to react to this new error message. Version-Release number of selected component (if applicable): VDSM: v4.19.35 How reproducible: Always Steps to Reproduce: 1. Deploy hosted engine on 2 hosts 2. Set global maintenance mode and shut down the engine VM 3. Cancel global maintenance mode 4. Wait for engine VM to start Actual results: On the host that does not run the VM, the agent moves from EngineStarting state to EngineMaybeAway. Expected results: The agent moves from EngineStarting to EngineForceStop. There is an INFO line in agent log: "Another host already took over.."
On host that is not running SHE-VM I see state transition from EngineStarting=>EngineDown. I don't see state ever changed to "EngineForceStop" though. Andrej, please provide your input.
I also see "MainThread::INFO::2017-10-29 17:42:31,827::states::771::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(cons ume) Another host already took over.." in logs on host that reports "state=EngineDown".
Strange, in the code, there is no transition from EngineStarting to EngineDown state directly. If the agent log shows "Another host already took over..", then the right code path was chosen, so the patch works. After the line, there should be a debug line containing "Processing engine state EngineForceStop".
Just followed reproduction steps from description for ovirt-hosted-engine-setup-2.2.0-0.0.master.20171016160008.git55723e8.el7.centos.noarch, which was deployed over Gluster storage 3.12 on pair of hosts and got "EngineForceStop" message for a blink of a second on host, which was not running (starting) SHE-VM. I've had VM running on host A, then after cancellation of global maintenance mode, VM got started on host B instead of host A, so then I've seen "EngineForceStop" on host A and then it changed to "EngineDown". Moving to verified.
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.