Bug 1104030 - Failed VM migrations do not release VM resource lock properly leading to failures in subsequent migration attempts
Summary: Failed VM migrations do not release VM resource lock properly leading to fail...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Arik
QA Contact: Nisim Simsolo
URL:
Whiteboard: virt
Depends On:
Blocks: 1114587 rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-06-03 06:40 UTC by Aval
Modified: 2019-04-28 09:37 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-3.5.0_beta
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1114587 (view as bug list)
Environment:
Last Closed: 2015-02-11 18:03:16 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (870.63 KB, text/x-log)
2014-06-03 07:18 UTC, Aval
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0158 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Virtualization Manager 3.5.0 2015-02-11 22:38:50 UTC
oVirt gerrit 28403 0 master MERGED core: fix reattempt to go to maintenance mechanism 2020-03-02 23:14:00 UTC

Description Aval 2014-06-03 06:40:27 UTC
Description of problem: After VM migration thread acquires resource lock and fails due to some reason, it does not release the lock properly and later when other threads try to perform operations like migrate/Run VM they fail while acquiring lock with below error

~~~
2014-06-01 18:15:53,807 INFO  [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (DefaultQuartzScheduler_Worker-3) [36055ac9] Failed to Acquire Lock to object EngineLock [exclusiveLocks= key: 40abfe32-96de-44a7-abd9-e77ecd2bec7b value: VM
, sharedLocks= ]
2014-06-01 18:15:53,808 WARN  [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (DefaultQuartzScheduler_Worker-3) [36055ac9] CanDoAction of action InternalMigrateVm failed. Reasons:VAR__ACTION__MIGRATE,VAR__TYPE__VM,ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED,$VmName zabbix-190
~~~


Version-Release number of selected component (if applicable):

rhevm-3.3.3-0.52.el6ev.noarch  

How reproducible:

Couple of customers reported the problem that VM migrations keep on failing while putting host in maintenance and later they were not able to start/stop VMs as well.

Steps to Reproduce:
1.
2.
3.

Actual results:
After first failure , subsequent attempts to run/migrate fail while acquiring locks

Expected results:
After first failure , subsequent attempts to run/migrate should get resource locks.

Additional info:

After restarting ovirt-engine service , they one of the customer was able to start/migrate VMs

Comment 2 Aval 2014-06-03 07:18:38 UTC
Created attachment 901619 [details]
engine.log

Adding engine.log from one of the customer facing this problem.

Comment 3 Arik 2014-06-08 12:38:53 UTC
This bug exposed several issues:

1. The first migrations failed not because the VMs remained locked, but because of a bug which caused 'switch host to maintenance' to be re-triggered too soon, so in the retry attempt the migrations fail since the VMs are already locked. This one is solved by http://gerrit.ovirt.org/#/c/28403

2. NPEs in the migrate operations. These exceptions in migrations which were triggered by 'switch to maintenance' operation were already solved by: http://gerrit.ovirt.org/#/c/24639

3. Migrate transaction was aborted, thus the migrate operation failed (on 2014-06-01 18:05:47,045). It won't happen anymore as the migrate operation is no longer transactive.

4. Eventually the VMs remain locked. It happens after the maximum number of retries to migrate the VM is reached (and the migration fails). It was fixed in 3.4. Patch for 3.3:  http://gerrit.ovirt.org/#/c/28460

Comment 7 Nisim Simsolo 2014-10-26 12:27:26 UTC
Fixed. Verified using the next builds:
rhevm-3.5.0-0.17.beta.el6ev.noarch
libvirt-0.10.2-46.el6.x86_64
vdsm-4.16.7.1-1.el6ev.x86_64
sanlock-2.8-1.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64

Comment 9 errata-xmlrpc 2015-02-11 18:03:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html


Note You need to log in before you can comment on or make changes to this bug.