+++ This bug is an upstream to downstream clone. The original bug is: +++ +++ bug 1504032 +++ ====================================================================== Description of problem: VDSM now uses a newer version of libvirt, which reports a different error message when it fails to acquire the sanlock. The engine-health submonitor has to be changed to react to this new error message. Version-Release number of selected component (if applicable): VDSM: v4.19.35 How reproducible: Always Steps to Reproduce: 1. Deploy hosted engine on 2 hosts 2. Set global maintenance mode and shut down the engine VM 3. Cancel global maintenance mode 4. Wait for engine VM to start Actual results: On the host that does not run the VM, the agent moves from EngineStarting state to EngineMaybeAway. Expected results: The agent moves from EngineStarting to EngineForceStop. There is an INFO line in agent log: "Another host already took over.." (Originally by Andrej Krejcir)
Works for me on ovirt-hosted-engine-setup-2.1.4-1.el7ev.noarch, thus moving to verified. Please see the details bellow: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma03 Host ID : 1 Engine status : {"reason": "Storage of VM is locked. Is another host already starting the VM?", " health": "bad", "vm": "already_locked", "detail": "down"} Score : 3400 stopped : False Local maintenance : False crc32 : 6bfb3b60 local_conf_timestamp : 7275 Host timestamp : 7273 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7273 (Mon Oct 30 17:03:25 2017) host-id=1 score=3400 vm_conf_refresh_time=7275 (Mon Oct 30 17:03:27 2017) conf_on_shared_storage=True maintenance=False state=EngineForceStop stopped=False --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma03 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail" : "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 4cc56637 local_conf_timestamp : 7316 Host timestamp : 7314 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7314 (Mon Oct 30 17:04:06 2017) host-id=1 score=3400 vm_conf_refresh_time=7316 (Mon Oct 30 17:04:08 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3136