Bug 1571119

Summary: [HE] - Engine complaining that the 'VM HostedEngine is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host.'
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Michael Burman <mburman>
Component: AgentAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: appraprv, bugs, dfediuck, mburman, stirabos, ylavi
Target Milestone: ovirt-4.2.4Keywords: Triaged
Target Release: 2.2.12Flags: rule-engine: ovirt-4.2+
ylavi: exception+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-ha-2.2.12-1.el7ev Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-26 08:35:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1581784    
Attachments:
Description Flags
HE logs none

Description Michael Burman 2018-04-24 06:58:19 UTC
Created attachment 1425826 [details]
HE logs

Description of problem:
[HE] - Engine complaining the 'VM HostedEngine is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host.' on HE VM migration, although the migration succeeded.

2018-04-24 09:24:25,418+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '82fe2ccb-ca42-4e19-8346-3ac7b80eb793'(HostedEngin
e) moved from 'WaitForLaunch' --> 'Down'
2018-04-24 09:24:25,499+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-3) [] EVENT_ID: VM_DOWN_ERROR(119), VM Hoste
dEngine is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host.
2018-04-24 09:24:25,516+03 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-22244) [613ab8d5] Running command: ProcessDownVmCo
mmand internal: true.


Version-Release number of selected component (if applicable):
4.2.3.2-0.1.el7
vdsm-4.20.26-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch

How reproducible:
Almost every time migrating the HE VM - around 100%

Steps to Reproduce:
1. HE setup - Migrate the HE VM

Actual results:
Engine complaining that HE VM is down, but migration succeeded

Comment 1 Andrej Krejcir 2018-05-17 14:52:19 UTC
I couldn't reproduce this on a clean HE deployment.

Do you have more specific reproduction steps?
Or some more information about the environment?

Could you also attach debug level logs from vdsm, he-agent and he-broker?

Comment 2 Michael Burman 2018-05-21 10:34:14 UTC
(In reply to Andrej Krejcir from comment #1)
> I couldn't reproduce this on a clean HE deployment.
> 
> Do you have more specific reproduction steps?
> Or some more information about the environment?
> 
> Could you also attach debug level logs from vdsm, he-agent and he-broker?

It was a HE automation environment which is no longer available.
Just reproduced it locally on my setup. Please contact me offline and i will provide you access to the setup, will be faster. Thanks

2018-05-21 13:30:37,436+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-9) [] FINISH, DestroyVDSCommand, log id: 7d5739f3
2018-05-21 13:30:37,436+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [] VM 'ac715734-5df6-49b2-a9d4-8f86d3731aeb'(HostedEngin
e) moved from 'WaitForLaunch' --> 'Down'
2018-05-21 13:30:37,473+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM Hoste
dEngine is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host.

All i did is to migrate the HE VM to a different host(was running on the SPM host prior the migration)

Comment 3 Nikolai Sednev 2018-06-05 15:06:51 UTC
I've failed to reproduce on latest components:
ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch
rhvm-appliance-4.2-20180601.0.el7.noarch
Linux 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Moving to verified.

Migration completed (VM: HostedEngine, Source: alma04.qa.lab.tlv.redhat.com, Destination: alma03.qa.lab.tlv.redhat.com, Duration: 42 seconds, Total: 53 seconds, Actual downtime: 304ms)
6/5/185:57:45 PM

Comment 4 Sandro Bonazzola 2018-06-26 08:35:14 UTC
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.