Bug 1504150

Summary: [downstream clone - 4.1.7] Engine-health monitor should expect new sanlock error message
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-hosted-engine-haAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: alukiano, bugs, lsurette, mavital, ykaul, ylavi
Target Milestone: ovirt-4.1.7Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: 1504032 Environment:
Last Closed: 2017-11-07 17:26:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1504032    
Bug Blocks: 1464002, 1493547    

Description rhev-integ 2017-10-19 14:48:52 UTC
+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1504032 +++
======================================================================

Description of problem:

VDSM now uses a newer version of libvirt, which reports a different error message when it fails to acquire the sanlock. The engine-health submonitor has to be changed to react to this new error message.

Version-Release number of selected component (if applicable):
VDSM: v4.19.35

How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted engine on 2 hosts
2. Set global maintenance mode and shut down the engine VM
3. Cancel global maintenance mode
4. Wait for engine VM to start

Actual results:
On the host that does not run the VM, the agent moves from EngineStarting state to EngineMaybeAway.

Expected results:
The agent moves from EngineStarting to EngineForceStop.
There is an INFO line in agent log: "Another host already took over.."

(Originally by Andrej Krejcir)

Comment 2 Nikolai Sednev 2017-10-30 15:06:01 UTC
Works for me on ovirt-hosted-engine-setup-2.1.4-1.el7ev.noarch, thus moving to verified.

Please see the details bellow:
--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 1
Engine status                      : {"reason": "Storage of VM is locked. Is another host already starting the VM?", "
health": "bad", "vm": "already_locked", "detail": "down"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 6bfb3b60
local_conf_timestamp               : 7275
Host timestamp                     : 7273
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=7273 (Mon Oct 30 17:03:25 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=7275 (Mon Oct 30 17:03:27 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineForceStop
        stopped=False


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail"
: "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 4cc56637
local_conf_timestamp               : 7316
Host timestamp                     : 7314
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=7314 (Mon Oct 30 17:04:06 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=7316 (Mon Oct 30 17:04:08 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

Comment 4 errata-xmlrpc 2017-11-07 17:26:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3136