Bug 1504032 - Engine-health monitor should expect new sanlock error message
Summary: Engine-health monitor should expect new sanlock error message
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Broker
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.2.0
: ---
Assignee: Andrej Krejcir
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1504150
TreeView+ depends on / blocked
 
Reported: 2017-10-19 10:59 UTC by Andrej Krejcir
Modified: 2017-12-20 10:56 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
: 1504150 (view as bug list)
Environment:
Last Closed: 2017-12-20 10:56:28 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: planning_ack+
msivak: devel_ack+
alukiano: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 82927 0 None None None 2017-10-19 11:02:18 UTC
oVirt gerrit 82940 0 v2.1.z MERGED Broker: Change expected error message when lock is held by another host 2017-10-19 14:52:52 UTC

Description Andrej Krejcir 2017-10-19 10:59:53 UTC
Description of problem:

VDSM now uses a newer version of libvirt, which reports a different error message when it fails to acquire the sanlock. The engine-health submonitor has to be changed to react to this new error message.

Version-Release number of selected component (if applicable):
VDSM: v4.19.35

How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted engine on 2 hosts
2. Set global maintenance mode and shut down the engine VM
3. Cancel global maintenance mode
4. Wait for engine VM to start

Actual results:
On the host that does not run the VM, the agent moves from EngineStarting state to EngineMaybeAway.

Expected results:
The agent moves from EngineStarting to EngineForceStop.
There is an INFO line in agent log: "Another host already took over.."

Comment 2 Nikolai Sednev 2017-10-29 15:52:04 UTC
On host that is not running SHE-VM I see state transition from EngineStarting=>EngineDown.
I don't see state ever changed to "EngineForceStop" though.
Andrej, please provide your input.

Comment 3 Nikolai Sednev 2017-10-29 15:55:30 UTC
I also see "MainThread::INFO::2017-10-29 17:42:31,827::states::771::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(cons
ume) Another host already took over.." in logs on host that reports "state=EngineDown".

Comment 4 Andrej Krejcir 2017-10-30 11:26:16 UTC
Strange, in the code, there is no transition from EngineStarting to EngineDown state directly.

If the agent log shows "Another host already took over..", then the right code path was chosen, so the patch works. After the line, there should be a debug line containing "Processing engine state EngineForceStop".

Comment 5 Nikolai Sednev 2017-10-30 12:23:42 UTC
Just followed reproduction steps from description for ovirt-hosted-engine-setup-2.2.0-0.0.master.20171016160008.git55723e8.el7.centos.noarch, which was deployed over Gluster storage 3.12 on pair of hosts  and got "EngineForceStop" message for a blink of a second on host, which was not running (starting) SHE-VM.

I've had VM running on host A, then after cancellation of global maintenance mode, VM got started on host B instead of host A, so then I've seen "EngineForceStop" on host A and then it changed to "EngineDown".

Moving to verified.

Comment 6 Sandro Bonazzola 2017-12-20 10:56:28 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.